Hadoop is Apache’s open source framework which allows us to do processing of large data sets in distributed environment, Hadoop uses clustering model with thousands of computers /machines to process large volumes and variety of data with high velocity. before we looking to Hadoop we need to know what are origins for Hadoop.
what are the origins of Hadoop ?
As stated in the above picture Google has released the whitepapers i.e GFS ,MapReduse and BigTable
Google File syetem(GFS) which is distributed file system to process Google’s data processing needs by using Google’s Map Reduce model. Apache also introduced the same architecture like as Google that is HDFS (Hadoop Distributed File system) which is used to process large volumes of data (Big data) with their own Map Reduce model.
Google GFS and Apche’s Hadoop are same master slave Architecture , In GFS data will be divided in to chunks where as in HDFS the data will be divided in to blocks.
What is Big Data ?
Big data is data and we can define the characteristics Big Data as follows:
- Large volumes of data
- Variety of large sets of data
- Huge data which can not be handled by our traditional data management systems.
we can find Big Data in different formats like as below
Hadoop is a Freme work where exactly it fits and handling to process the Big data in distributed environment.