Project Daytona — a set of tools for working with Big Data, from Microsoft Research

Microsoft has developed an iterative MapReduce runtime for Windows Azure, code-named “Daytona”. Project Daytona is designed to support a wide class of data analytics and machine learning algorithms. It can scale out to hundreds of server cores for analysis of distributed data.

SoftElegance has started to explore this new data analysis and processing framework under the SoftElegance Research Lab initiative which officially will be started next month.

Project Daytona was developed as part of the eXtreme Computing Group’s Cloud Research Engagement Initiative. The project Daytona MapReduce Runtime for Windows Azure was developed for the organizations who have large and growing data collections and need tools to find signals in data and uncover insights.

There are a number of use cases for Project Daytona, such as for data analysis, machine learning, financial analysis, text processing, indexing, and search. Almost any application that involves data manipulation and analysis can take advantage of Project Daytona to scale out processing on Windows Azure.

If you are already interested you can download Project Daytona, already including examples ‘KMeansClustering’, ‘OutlierDetection’, and ‘WordCount’.

MapReduce

Daytona is iterative MapReduce runtime on Windows Azure platform. MapReduce is a software framework introduced to support distributed computing on large data sets on cluster computers. It was designed for processing huge datasets on certain kinds of distributable problems using a large number of computers, collectively referred to as a cluster (if all nodes use the same hardware) or as a grid (if the nodes use different hardware). Computational processing can occur on data stored either in a filesystem (unstructured) or within a database (structured).

Daytona vs Hadoop, and other Microsoft’s alternatives

Definitely Daytona provides an alternative to Apache Hadoop. Together with Windows Azure Table Storage, which is similar to Google’s BigTable or Hadoop’s data store Apache HBase.

Also MS has recently released the second beta of its other Hadoop alternative LINQ to HPC, or Dryad. Instead of Daytona’s cloud algorithms and deployment, LINQ/Dryad enables developers to use Visual Studio to create analytics applications for big data sets.

Conclusion. Some words about Daytona

Roger Barga, an architect in the eXtreme Computing Group:

‘Daytona’ has a very simple, easy-to-use programming interface for developers to write machine-learning and data-analytics algorithms. They don’t have to know too much about distributed computing or how they’re going to spread the computation out, and they don’t need to know the specifics of Windows Azure.

SoftElegance will keep watching the ‘Daytona’ project and we’ll keep you informed.

Leave a Reply

Your email address will not be published. Required fields are marked *