According to many specialists, the notion of Big Data can be defined by the four V’s: Volume, Variety, Speed and Veracity. These four dimensions characterize it, and distinguish voluminous data from ordinary data. Through the computer graphics presented in this article, IBM highlights these specificities.
With the rise of digital technology, the notion of Big Data was born. The increased use of the Internet and its variations has led to an increase in data, in terms of volume, but also variety and speed, as well as a loss of accuracy. This evolution of data is, in the eyes of IBM Data Scientists, the one that characterizes and defines Big Data.
As part of its infographics on this subject, IBM begins by reminding us how this data transformation came about. The main factor of this change is of course the emergence of social networks, smart mobile devices, connected objects, but also the multiplication of transactions on the internet.
The IT giant also reminds us how companies can use this data. Big Data can be used to adapting its products and services to the needs of consumers, to optimize operations and infrastructure, or to find new sources of revenue. The Californian firm thus unveils 4.4 million jobs related to Big Data were created worldwide in 2015, of which 1.9 million in the United States.
The four V’s
To understand Big Data, IBM synthesizes this phenomenon by four major specificities: the 4 Vs. Volume, Variety, Velocity and Truthfulness.
Volume is the main characteristic of Big Data. Indeed, the term is taken directly from the huge amount of data generated on a daily basis. According to IBM, an average of 2.5 quintillion bytes of data are created every day, about 2.3 trillion gigabytes.
From year to year, the amount of data increases considerably. Over the whole year 2020, 40 zettabytes of data will be created, 43 trillion gigabytes. This is an amount 300 times more important than in 2005.
These data come from both businesses and individuals. In the U.S., most firms have at least 100 terabytes of datas, or about 100,000 gigabytes stored in their storage systems. In addition, out of 7 billion people, 6 billion currently own a mobile phone.
Beyond simple quantity, these data are also more diverse than ever. This phenomenon is linked to the diversification of Internet and digital uses. The origin of the data, its format, but also the domain to which they are linked are experiencing unprecedented variety.
In 2011, there were already 150 exabytes, or 161 billion gigabytes, of data health-related. In 2014, data from about 420 million connected objects such as activity trackers and other wireless devices for analyzing a user’s health status.
The various social networks also generate a huge amount of data. Every month, 30 billion publications are shared on Facebook. On Tweeter, the 200 million users assets are sending out about 400 million tweets a day. Similarly, the 4 billion hours of video viewed monthly on YouTube also represent a new form of unstructured data.
Data is also collected more quickly, as it is now received as a stream. To illustrate this acceleration, some now speak of Fast Data as the next step after mastering Big Data.
For example, IBM takes the example of modern cars, equipped on average with 100 sensors capable of measuring fuel level, tire pressure and many other data in real time. The phenomenon is also illustrated by the example of the New York Stock Exchange, which records approximately 1 terabyte of data during each session.
Every activity carried out on the internet is now tracked with precision, thanks to a total of 18.9 billion network connections in the world, about 2.5 for every person on Earth. The Internet has never been faster.
The truthfulness, the accuracy of the data remains the main challenge of Big Data today. At present, these data are not yet sufficiently mastered, and the accuracy of the analyses is affected.
For example, in a survey conducted by IBM, 27% of companies interviewees admit to being unsure of the accuracy of the data they collect. Similarly, one in three business leaders use data to make decisions, but are not very confident. This lack of data accuracy and quality costs about 3.1 trillion of dollars per year to United States.
In conclusion, Big Data is characterized by the volume, variety, and speed of the data. The truthfulness of the data remains a major issueThe European Union is a major player in the field of innovation, both in economic terms and as a driver for innovation within companies.