BIG DATA — AN UMBRELLA OF PROBLEMS

ivsntejesh
5 min readSep 17, 2020

Have you ever thought MNC’s like Google just search our query in a few seconds or how Facebook is able to show the photo that was posted a few years back and shows up when we search for it?

Do you know In the last two years alone, 90% of the world’s data has been created?

Facebook generates 4 petabytes of data per day — that’s a million gigabytes. 350 million photos are uploaded to Facebook each day.

Every 60 seconds, 510,000 comments are posted, 293,000 statuses are updated, 4 million posts are liked, and 136,000 photos are uploaded.

65 billion messages are sent on Whats App.

Google processes over 3.5 billion search query’s every day. Every day, 306.4 billion emails are sent, and 5 million Tweets are made. Organizations need to store this data and need to retrieve data when asked by a user. This data is huge and they need to process it. Facebook cannot throw out or delete data of each profile, it has to store them somewhere forever. But is there anyone server or hard disc that stores petabytes of data every day and work super fast.

The answer is no, So this is what led to the problem of BIGDATA. Companies receive tons of data every day. They need to store it or process it in just a few seconds.

In today’s world, everything is instant and fast. It is not good that today you ask for one photo on Facebook which was posted in 2000 and it takes two days to give back your photo.

THREE V’s OF BIGDATA

Some problems associated with big data is Velocity, Variety, and Volume.

Velocity: Velocity is the measure of how fast the data is coming in. For example, Facebook has to handle a tsunami of photographs every day. It has to ingest it all, process it, file it, and somehow, later, be able to retrieve it.

Volume: Volume is the V most associated with big data because, well, the volume can be big. What we’re talking about here is quantities of data that reach almost incomprehensible proportions.

Variety: Data comes in all types of formats — from structured, numeric data in traditional databases to unstructured text documents, emails, videos, audios, stock ticker data, and financial transactions. This variety of unstructured data creates problems for storage, mining, and analyzing data.

How do big MNC’s like Google, Facebook, Twitter handle these data?

Image credit: DepositPhotos

Google: Google uses Big Data tools and techniques to understand our requirements based on several parameters like search history, locations, trends, etc. Then it goes through an algorithm where complex calculations are done and then Google effortlessly displays the sorted or ranked search results in terms of relevancy and authority designed to match the user’s requirement.

FaceBook: The main business strategy of Facebook is to understand who are their users, by understanding the user’s interest, location, and behavior to customize ads on the user’s timeline. This helps Facebook to get large profits.

Amazon: Amazon gathers data on every one of its customers while they use the site. As well as what you buy, the company monitors what you look at, your shipping address, and whether you leave reviews/feedback. Amazon uses Big Data gathered from customers while they browse to build and fine-tune its recommendation engine. The more Amazon knows about you, the better it can predict what you want to buy.

Big Data helps in analyze:

  • Time
  • Cost
  • Product Development
  • Decision Making, etc

Big data when teamed up with Analytics help you determine the root causes of failure in businesses, analyze sales trends based on analyzing the customer buying history. Also, help determine fraudulent behavior and reduce risks that might affect the organization.

Hospitality / Hotel / Travel — applications and websites are using to understand the customer needs and put their pricing models and travel packages accordingly.

Retail business like amazon, Walmart, and many FMCG companies are using big data to understand customer behavior and build suitable offers for the customers to increase their sales

Government — Even with Aadhaar and now a huge database on population, one can understand that the government also is using big data to do census calculation, provide subsidies, etc.. and plan for government schemes using big data

The concept we use for big data is Distributed storage, this consists of master-slave clusters. Where all data that comes to the master is sent between the slaves parallelly, this gives the master large space to store any amount of data.

While retrieving data back since each slave also has their RAM, CPU, and HARDDISC. They can send data to master very fast than compared to only one master sending data. Master is called NameNode and Slaves are called DataNode. The software used to perform this concept is Hadoop.

Conclusion:

So Big data is a field that treats ways to analyze, systematically extract information from, or otherwise, deal with data sets that are too large or complex to be dealt with by traditional data-processing application software.

BigData technologies are evolving with the exponential rise in data availability. It is time for enterprises to embrace this trend for a better understanding of the customers, better conversions, better decision making, and so much more.

I hope this gives a basic idea of what is Big data and how big tech companies handle data using the concept of Distributed Storage. How they perform effectively and with high efficiency. Please share if you like and found useful.

I will be posting on BIG DATA blogs in medium from now on. Here is my LinkedIn profile feel free to dm me if you have any doubts.

https://www.linkedin.com/in/tejesh-itha-59083215a/

--

--