Udendran-InspireComputerScience: May 2016

Spark provides the flexibility you need to succeed, focusing on simplifying your time to deployment, making your business users self-sufficient, and accelerating your time to value. I have discussed what spark as a service has to offer:

Introduction about Spark
Spark is an open source analytic engine which is used for rapid large-scale data processing in real time. Spark provides iterative and interactive processing .Spark can be regarded as alternative for mapreduce . Apache Spark can process data from data repositories, including the Hadoop Distributed File System (HDFS) and Amazon Simple Storage Service (S3). Spark can process even unstructured data too.The main advantage is that Spark can support in-memory processing and disk based processing .Spark can be incorporated with high level programming such as Java and other languages such as Scala,R which makes it popular among data scientists to make analytical applications.Spark make use of in-memory to its full advantage where recently read data are placed in-memory allowing faster query execution.

Spark vs Mapreduce

SQL queries are executed much faster in Spark than Mapreduce.
Spark runs on hundreds of nodes whereas Mapreduce can run on thousands onf nodes.
Mapreduce is ideal for batch processing ,Spark is ideal for real time processing since it uses in-memory storage and processing.

Spark SQL

Spark sql is a component in Spark big data framework which allows sql queries on data.It can query structured ,columnar,tabular data in Spark.It can be integrated with other hadoop database like Hive allowing interaction with hadoop HDFS.

Spark's MLlib

MLlib contains functionalities such as statistics, regression,classification ,filtering which are needed for machine learning and real time analysis.

Spark's Flagship

Spark flagship converts streaming of big data into data streams where analysing, process takes place with these data streams using Spark stream module.

Spark GraphX

This component provides processing on big data graph such as changing edges , vertices in a graph dataset.

Now let us see what factors contribute Spark as a service in other words why we can consider Spark as a service.

Spark-as-a-service
Spark is provided as a cloud based service by IBM,Databricks because of its advantages over other processing frameworks such as Mapreduce (which I discussed earlier). Companies like databricks allows you faster deployment of Sparks.Databricks eases the process such as cluster buliding and configuration. Process monitoring , resource monitoring are taken care by Databricks. IBM's Spark-as-a-service offers new API and tac and cognitikles unstructured data and follows IBM analytics on Spark on IBM bluemix.IBM DataCap Insight Cloud services delivers data science based on external data about events, people.The company said that services doesn't require deep knowledge of big data analysis.Whenever we come across IBM and cognite computing ,we think of WATSON but this IBM Datacap Insight provides a way to handle unstructured data (i.e. machine- unfriendly data).

Udendran-InspireComputerScience

Monday, 2 May 2016

Apache Spark as a service -The essentials and overview of services by IBM and Databricks

Translate

Blog Archive