People uploading videos, taking pictures on their cell phones, text friends, update their Facebook status, leave comments around the web, click on ads, and so forth.
On day-to-day basis, we create 2.5 Quintilian bytes of data —more than 90% of the data in the world today has been created in the last two years.
• Walmart handles more than 1 million customer transactions every hour.
• Facebook handles more than 40 billion photos from its user base.
Challenges for Cutting Edge Business:
The exponential growth of data first presented challenges to cutting-edge businesses such as Google, Yahoo, Amazon, and Microsoft.
They needed to go through terabytes and petabytes of data to figure out which websites were popular, what books were in demand, and what kinds of ads appealed to people.
Existing tools were becoming inadequate to process such large data sets.
What is BigData:
How is big “Big Data”? Is 30 40 Terabyte BigData ? ….
Big data are datasets that grow so large that they become awkward to work with using on-hand database management tools. Today Terabyte, Petabyte, Exabyte. Tomorrow?
As per wiki “Big data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time”.
So big, that a single dataset may contains few terabytes to many petabytes of data.
Key Characteristics of BigData:
Volume: This Characteristic describes the relative size of data to the processing capability.
Terabytes of data within few minutes. Overcoming the volume issue requires technologies that store vast amounts of data in a scalable fashion and provide distributed approaches to querying that data.
Velocity: Characteristic describes the frequency at which data is generated, captured, and shared.
Even at 140 characters per tweet, the high frequency of Twitter data ensures large volumes (over 8 TB per day).
Real-time offers in a world of engagement, require fast matching and immediate feedback loops so promotions align with geo location data, customer purchase history, and current sentiment.
Variety: Non-traditional data formats exhibit a dizzying rate of change. A proliferation of data types from social, machine and mobile sources no longer fits into neat, easy to consume structures
Based on problem stated above for exponential growth & huge size data along other BigData characteristics – we need:
-To Store the data into a scalable file system. And Hadoop provide us Distributed File System which we refer as HDFS
-Parallel processing on the data which is Hadoop Map- reduce
Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment.
Hadoop consists of two core components
– The Hadoop Distributed File System (HDFS)