5 Vs of Big Data
“ The goal is to turn data into information, and information into insight. ” – By Carly Fiorina, ex CEO of Hewlett-Packard.
Big data is relatively a modern field of Data Science that generally explores large data sets. These data sets are complex for processing by traditional methods. Therefore, large data sets are divided into smaller sets for further analysis. Deep analysis is done to gain insights and information from that data.
In earlier days, conventional data processing solutions were used. Those solutions were not very efficient in capturing, storing, and analyzing Big data. Hence, the companies using traditional BI solutions are not able to fully maximize the value of it. Today we are using Big data for the same, which is far better than traditional ways of handling data.
Now, in order to successfully understand how Big data handle and maintain the data, we need to understand the 5 V’s of Big data.
In the year 2001, MetaGroup (now known as Gartner) which is an analytics firm introduced data scientists and analysts to the 3Vs of Data. The 3 Vs are Volume, Velocity, and Variety. But over a period of time, Data analytics saw a change in how data was captured and processed. The observation includes that data was growing so rapidly in size, later this data came to be known as Big data.
In this way with astronomical growth of data two new Vs Value and Veracity are added. These 2 Vs were added by Gartner to the data processing concepts.
Let us learn 5V’s of Big data one by one.
Keeping you updated with latest technology trends, Join TechVidvan on Telegram
5V’s of Big data
Big data volume can be defined as the amount of data that is produced. The volume of data produced is also dependent on the size of the data.
In today’s technological world data is generated from various sources in different formats. Data formats are in the form of the word, excel documents, PDFs, media content such as images, videos, etc. are produced at a great pace.
It is becoming challenging for enterprises to store and process data using the conventional methods of business intelligence and analytics. Enterprises need to implement modern business intelligence tools to effectively capture, store and process such huge amounts of data in real life.
Some interesting facts:
- Today, around 2.7 Zettabytes of data exist in the digital world.
- Walmart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain approximately 2.5 petabytes of data.
Today data is being produced in large volumes. And just collecting the produced data is of no use. Instead, we have to look for data from which business insights can be generated which adds “value” to the company. So we can say that the Value is the most important V of all the 5 V’s.
This is where Big data analytics comes into the big picture. There are companies that have invested in establishing data and data storage infrastructure, but they fail to understand that the aggregation of data doesn’t equal value addition. What you do with that collected data is what actually matters. Data analytics helps to derive useful insights from the collected data. These insights, in turn, add value to the decision-making process.
Now, how to make sure that the value of Big data is considerable and worth investing time and effort into ???……. It can be done by conducting a cost Vs benefit analysis. By calculating the total cost of processing Big data and comparing it with the ROI, the business insights are expected to be generated. Using these companies can effectively decide whether Big data analytics adds any value to their business or not.
Some interesting facts:
- Facebook stores, accesses and analyzes around 30+ Petabytes of user-generated data
- According to McKinsey, a retailer using Big Data to its fullest potential could increase its operating margin by more than 60%
Velocity can be termed as the speed at which the data is generated, collected, and analyzed. Today Data is continuously flowing through multiple channels such as computer systems, networks, social media, mobile phones, etc. And in today’s data-driven business environment, the pace at which data grows can be described as “torrential” and “unprecedented”.
Now, this data needs to be captured as close to real-time as possible, so that the right data can be available at the right time.
For making timely and accurate business decisions the speed at which data can be accessed matters the most. A limited amount of data that is available in real-time can yield better business results. These results are even better than a large volume of data that needs a long time to capture and analyze.
Today, there are several Big data technologies that allow us to capture and analyze the data as it is being generated in real-time.
“ We are moving slowly into an era where Big Data is the starting point, not the end. ” – By Pearl Zhu, Digital Master
The volume and velocity of data add value to an organization or business, but the diverse data types collected from varied data sources are also an important factor of Big data. Big data is generally classified as structured, semi-structured, or unstructured data.
1. Structured data
Structured data is one whose format, length, and volume are clearly defined.
2. Semi-structured data
Semi-structured data is one that may partially conform to a specific data format.
3. Unstructured data
Unstructured data is unorganized data that doesn’t conform with the traditional data formats. Data generated via digital and social media such as images, videos, etc, is unstructured data.
Around 80% of the data produced globally including the videos, photos, mobile data, social media content, is unstructured in nature.
Some interesting facts:
- 94% of Hadoop users perform analytics on large volumes of data, 88% analyze data in greater detail, while 82% can now retain more of their data.
- Decoding the human genome originally took 10 years to process, but now with the help of Big data it can be achieved in one week
- A 10% increase in data accessibility by a Fortune 1000 company would give that company approximately $65 million more in annual net income.
5. Validity / Veracity
The Validity and Veracity of Big data can be described as the assurance of quality or credibility of the collected data. Since Big data is vast and involves so many data sources, it is the possibility that not all the collected data is accurate and of good quality.
Hence, when processing big data sets, it is important to check the validity of the data before proceeding with further analysis.
Questions like Can you trust the data that you have collected? Is the data reliable enough? , etc. need to be entertained. Hence, before processing the data for further analysis, it is important to check the validity of the data.
Some interesting facts:
- More than 5 billion people are calling, texting, browsing and tweeting on mobile phones worldwide.
- Bad data or poor quality data costs organizations as much as 10-20 % of their revenue.
- Poor data across businesses and the government costs the U.S. economy $3.1 trillion dollars a year.
“ Every company has Big data in its future and every company will eventually be in the data business.” – By Thomas H.Davenport
There is no denying the fact that data is the oil of the 21st century. Today in different organizations insights are derived from high volume, high velocity, and validated data collected from varied sources. All these together helped to add value to the overall decision-making of the company.
Business Intelligence and Analytics services helped companies to build a robust BI strategy, in setting up a data warehouse, integrating real-time data, and in leveraging advanced analytics.