What is Hadoop ? What is big data ?

Hadoop is Apache’s open source framework which allows us to do processing of large data sets in distributed environment, Hadoop uses clustering model with thousands of computers /machines to process large volumes  and variety of data with high velocity. before we looking to Hadoop we need to know  what are origins for Hadoop.

 

what are the origins of Hadoop ?

 

hadoop_origin

 

As stated in the above picture Google has released the whitepapers i.e GFS ,MapReduse and BigTable

Google File syetem(GFS) which is distributed file system to process Google’s data processing needs by using Google’s Map Reduce model. Apache also introduced the same architecture like as Google that is HDFS (Hadoop Distributed File system) which is used to  process large volumes of data (Big data) with their own Map Reduce model.

Google GFS and Apche’s Hadoop are same master slave Architecture , In GFS data will be divided in to chunks where as in HDFS the data will be divided in to blocks.

What is Big Data ?

Big data is data and we can define the characteristics  Big Data  as follows:

  • Large volumes of data
  • Variety of large sets of data
  • Huge data which can not be handled by our traditional data management systems.

we can find Big Data in different formats like as below

Structured

Un-structured

Semi-structured

Hadoop is a Freme work where exactly it fits and handling to process the Big data in  distributed environment.

 

 

Posted in hadoop.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">