Spark is a framework that helps in data analytics on a distributed computing cluster. It offers in-memory computations for the faster data processing over MapReduce. It uses the Hadoop Distributed File System (HDFS) and operates on top of the current Hadoop cluster. It also processes structured data in Hive along with streaming data from various sources like HDFS, Flume, Kafka, and Twitter.
Apache Spark VS Apache Hadoop
SoftElegance sponsored the ITO & BPO Germany Forum and presented “Big Data – The Future Of Software Development” session
23 of April 2015 SoftElegance participated as a sponsor and speakers at the ITO & BPO Germany Forum, the only international industry event in Germany that focuses on onshoring and nearshoring services preferred and shows opportunities to optimize these existing models and solutions.
The speakers are Andrii Stolbov, CEO of SoftElegance, who has founded the company in 1993, with the specialization in custom software development, mainly of sophisticated business software, and Andrii Starzhinsky, VP of Marketing, who joined the company more than 5 years ago, with the aim to provide innovative software development outsourcing services to German speaking countries, the Netherlands and Scandinavian markets. Both speakers are currently researching Big Data challenges, and practical aspects of implementing the new technologies at custom applications and Enterprise.
Big data market is estimated up to 40 billions of Euro, according to IDC through 2018. Till 2020 it would create 4.4 millions of IT jobs internationally. And the volume of business data, across all companies, doubles every 15 months. Every day it generated 2.5 exabytes of information, or 2.5 Millions of Terabytes.
Microsoft plans to bring ‘big data’ analysis by integrating Hadoop into Windows and SQL Server
This week Microsoft announced collaboration with Yahoo Hortonworks to develop a Apache Hadoop implementation for Windows Server, SQL Server 12, and Windows Azure platforms.
Hadoop is an open source platform for big data analysis and working with unstructured data. It also works on clusters of computers or servers. The preview of Hadoop-based service for Windows Azure will be available by the end of 2011, and the Server’s Hadoop implementation will work with the existing BI tools.
“Microsoft’s strategy is to groom Linux-friendlier Hadoop to fit snugly into Windows environments, thus giving organizations on-tap, seamless, and simultaneous access to both structured and unstructured data via familiar desktop apps, such as Excel, as well as BI tools such as Microsoft PowerPivot”, said Ted Samson, InfoWorld’s senior analyst.