Microsoft plans to bring ‘big data’ analysis by integrating Hadoop into Windows and SQL Server
This week Microsoft announced collaboration with Yahoo Hortonworks to develop a Apache Hadoop implementation for Windows Server, SQL Server 12, and Windows Azure platforms.
Hadoop is an open source platform for big data analysis and working with unstructured data. It also works on clusters of computers or servers. The preview of Hadoop-based service for Windows Azure will be available by the end of 2011, and the Server’s Hadoop implementation will work with the existing BI tools.
“Microsoft’s strategy is to groom Linux-friendlier Hadoop to fit snugly into Windows environments, thus giving organizations on-tap, seamless, and simultaneous access to both structured and unstructured data via familiar desktop apps, such as Excel, as well as BI tools such as Microsoft PowerPivot”, said Ted Samson, InfoWorld’s senior analyst.
Microsoft has developed an iterative MapReduce runtime for Windows Azure, code-named “Daytona”. Project Daytona is designed to support a wide class of data analytics and machine learning algorithms. It can scale out to hundreds of server cores for analysis of distributed data.
SoftElegance has started to explore this new data analysis and processing framework under the SoftElegance Research Lab initiative which officially will be started next month.
Project Daytona was developed as part of the eXtreme Computing Group’s Cloud Research Engagement Initiative. The project Daytona MapReduce Runtime for Windows Azure was developed for the organizations who have large and growing data collections and need tools to find signals in data and uncover insights.
The one of the latest and most promising IT trends today is Big Data analysis. Big Data are datasets that grow so large that they become awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analytics, and visualizing.
SoftElegance is attended in tools and technologies to work with Big Data, such as Probase, Hadoop, R project, etc. Today we would like to introduce you R programming language, that provids a wide variety of statistical and graphical techniques, including linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, and others.
R is a programming language and software environment for statistical computing and graphics. The R language has become a de facto standard among statisticians for developing statistical software, and is widely used for statistical software development and data analysis.