Scalability : How does MNC’s like Facebook manage such a huge amount of data ?
Have you ever seen one of the videos on Facebook that shows a “flashback” of posts, likes, or images — like the ones you might see on your birthday or on the anniversary, That means you know that Facebook stores all data in their servers.
As per some reports Facebook processes 500+ terabytes of data each day.It’s pulling in 2.7 billion Like actions and 300 million photos per day, and it scans roughly 105 terabytes of data each half hour.
What is Big Data ?
Big Data is also data but with a huge size. Big Data is a term used to describe a collection of data that is huge in volume and yet growing exponentially with time. In short such data is so large and complex that none of the traditional data management tools are able to store it or process it efficiently
So , How they are handling that much amount of data.
Lets See,
Apache Hadoop
Apache Hadoop is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model.
The core of Apache Hadoop consists of a storage part, known as Hadoop Distributed File System (HDFS), and a processing part which is a MapReduce programming model. Hadoop splits files into large blocks and distributes them across nodes in a cluster. It then transfers packaged code into nodes to process the data in parallel. This approach takes advantage of data locality,where nodes manipulate the data they have access to. This allows the dataset to be processed faster and more efficiently than it would be in a more conventional supercomputer architecture that relies on a parallel file system where computation and data are distributed via high-speed networking.
The base Apache Hadoop framework is composed of the following modules:
- Hadoop Common — contains libraries and utilities needed by other Hadoop modules;
- Hadoop Distributed File System (HDFS) — a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster;
- Hadoop YARN — (introduced in 2012) a platform responsible for managing computing resources in clusters and using them for scheduling users’ applications;[10][11]
- Hadoop MapReduce — an implementation of the MapReduce programming model for large-scale data processing.
How Hadoop handles big data
As more organizations began to apply Hadoop and contribute to its development, word spread about the efficiency of this tool that can manage raw data efficiently and cost-effectively. The fact that Hadoop was able to carry out what seemed to be an imaginary task; its popularity grew widely.
The Hadoop Distributed File System, like the name suggests, is the component that is responsible for the basic distribution of data across the system of storage, which is a DataNode. This component is behind the directory of file storage as well as the file system that directs the storage of data within nodes.
How does Hadoop process large volumes of data
Hadoop is built to collect and analyze data from a wide variety of sources. It is also designed to collect and analyze data from a variety of sources because of its basic features; these basic features include the fact that the framework is run on multiple nodes which accommodate the volume of the data received and processed.
Tools based on the Hadoop framework run on a cluster of machines which allows them to expand to accommodate the required volume of data. Instead of a single storage unit on a single device, with Hadoop, there are multiple storage units across multiple devices.
Why Hadoop is used in big data
Before Hadoop, the storage and analysis of structured as well as unstructured data were unachievable tasks. Hadoop made these tasks possible, as mentioned above, because of its core and supporting components.
Certain features of Hadoop made it particularly attractive for the processing and storage of big data. The features that made more organizations subscribe to utilizing Hadoop for processing and storing data include its core ability to accept and manage data in its raw form.
Uses of Hadoop
The national security agency of the USA uses Hadoop to prevent terrorist attacks, It is used to detect and prevent cyber-attacks. Police forces use big data tools to catch criminals and even predict criminal activity and credit card companies use big data use it to detect fraudulent transactions.
Hadoop is used for understanding Customer’s requirement.
Hadoop is used in the trading field. It has a complex algorithm that scan markets with predefined condition and criteria to find out trading opportunities.
Uses of Hadoop is playing a very important role in science and research field also. Many decision has taken from extraction of a huge amount the relevant data which helps to come on the conclusion easily it helps to find out the output with less effort compared to the earlier time.
References :-
https://www.guru99.com/what-is-big-data.html
https://www.simplilearn.com/how-facebook-is-using-big-data-article
https://www.intellectyx.com/blog/how-hadoop-solve-the-big-data-problem/