Google+ Badge

Saturday, 17 September 2016


This post discuss about hadoop and Elasticsearch.
Let me commence with a brief introduction about hadoop and Elasticsearch( as many starters would not know about Elasticsearch).

Elasticsearch :
Elasticsearch is a great tool for document indexing and powerful full text search. Its JSON based Domain Specific query Language (DSL) is simple and powerful, Elastic’s ELK analytics stack is gaining momentum in web analytics use cases for these reasons:
  • It is very easy to get a toy instance of Elasticsearch running with a small sample dataset.
  • Aplication developers are more comfortable maintaining a second Elasticsearch instance over a completely new technology stack like Hadoop.

What about hadoop ?
HDFS separates data from state in its node architecture, using one over-arching node that manages state for the entire cluster, and several daughter nodes that store only data. These data nodes execute commands from their master node and log all operations in a static file. This allows a replica master to quickly recreate the state of the system without needing to talk to another master node during fallback. This makes the system extremely fault tolerant, and prevents the split-brain scenario that causes data loss amongst masters that must communicate with each other to restore state. 


Implementing a Hadoop instance as the backbone of an analytics system has a steep learning curve, but it’s well worth your effort. In the end, you’ll be much better off for its rock solid data ingestion and broad compatibility with a number of third party analytics tools, including Elasticsearch. There are couple of advantages when it comes Elasticsearch in hadoop , namely, 

  • Speedy Search with Big Data Analytics.
  • Seamlessly Move Data between Elasticsearch and Hadoop.
  • Visualize HDFS Data in Real-Time with Kibana.
  • Second Search Queries and Analytics on Hadoop Data.
  • Hadoop's enhanced security includes basic HTTP authentication.
  • Works with Any Flavor of Hadoop Distribution.         
Hadoop also has a broad ecosystem of tools that support bulk uploading and ingestion of data, along with SQL engines to support the full querying power you expect from a standard database. On the other hand, it can be argued that standing up Hadoop, Zookeeper, and a Kafka ingestion agent requires as much domain specific knowledge as Elasticsearch. Thus, the raw power and stability of Hadoop comes at the price of heavy setup and maintenance costs.