The booming Big Data market is constantly evolving, at high speed and in several directions. Storage technologies, analytical tools, and a variety of other applications are constantly adding to this new facet of entrepreneurship. At the same time, new challenges arise for the players in this evolution. Through this article, discover the 10 main changes to be expected for Big Data in 2016.
Hadoop continues its progression
Since 2005, Hadoop has continued to evolve and grow in popularity to become a leading enterprise data storage and analysis solution for big data strategy. Estimated at $1.5 billion in 2012, the value of the framework would reach 50.2 billion in 2020 according to Allied Market Research. In this year 2016, Hadoop could experience growth of more than 100%..
A recent survey of 2200 users reveals that only 3% plan to restrict their use of the framework in the coming year. 76% plan to use it even more in the next three months. Nearly half of the companies that are not yet using it are getting ready to do so.
Hadoop’s success is proportional to the growing interest in data science on the part of companies. Faced with the lack of data scientists, these companies are looking for affordable analytical tools, of which Hadoop is one.
This growing popularity also leads to the democratization of tools related to the famous framework. For example, Apache Sentry secures data and metadata stored on Hadoop clusters through a granular authorization system.
The same is true for technologies to speed up its data processing. Systems such as Cloudera Impala, AtScale, Actian Vector or Jethro Data enable the data processing speed of the Business Intelligence of yesteryear to be restored, and in particular the return of the OLAP cube.
Apache Spark takes over Hadoop
Initially a simple component of the Hadoop ecosystem, the Apache Spark data processing engine is becoming more and more emancipated and is gradually gaining ground on the yellow elephant. Spark owes its success mainly to its processing speed, up to 100 times faster than Hadoop. According to its creator Matei Zaharia, it is now the largest open source platform for Big Data processing. It is used by prestigious companies such as Goldman Sachs.
According to Monte Zweben, CEO of Splice Machine, 9 out of 10 Hadoop projects in 2016 will be linked to Spark. In his eyes, the speed of the motor makes it more natural, more mathematical and more comfortable for programmers. According to Manish Sood, CEO of Reltio, new solutions to replace Hadoop such as Spark or Drill should be available this year.
Data storage in the cloud
More and more companies are choosing to store their data in the cloud, and this trend is expected to continue in 2016. According to IDC, cloud-based Big Data solutions and analytical tools will see three times faster growth than that of physical platforms.
Through the cloud, MPP Data Warehouses are making a comeback with solutions such as Amazon Redshift, Google BigQuery, Microsoft Azure SQL Data Warehouse, Teradata, and Snowflake, winner of the Strata + Hadoop World 2015 Startup Showcase.
One of the benefits of the cloud is that it makes it easier for companies to scale the available storage space according to their needs. However, as Oracle’s Big Data vice presidents Neil Mendelson and Jeff Pollock explain, global enterprises will need to choose to Hybrid storage to meet legal standards of the countries in which they operate. A sample of data will need to be stored in local data centers, while the bulk of the data will be stored in the cloud.
According to Roger Levy, Vice President of Products at MariaDB, solutions are expected to emerge in 2016 to assist companies in migrating their data to these hybrid storages. At the moment, the transfer can be difficult and costly.
The Internet of Things, future “killer app” of Big Data
For many analysts, the cloud and the Internet of Things will soon be the raison d’être of Big Data. The data collected in real time by the billions of connected objects in circulation will have to be transferred to the cloud in real time and directly processed. According to Eric Frenkiel, CEO of MemSQL, the Internet of Things will soon be embedded in the Industrial InternetThis concept currently includes big data, machine learning, and M2M communications.
The data from the connected objects and their sensors will be streamed and continuously analysed in real time, thanks to new technologies, which are expected to grow to 30% per year from 2016 and for the next five years based on IDC predictions. Energy, health and agriculture industries will be able to combine real-time processed data with historical data to make faster decisions.
Predictive analysis and machine learning
Predictive analytics and machine learning applications are expected to experience accelerated growth in 2016 according to IDC. They would progress 65% faster than non-predictive analytical tools. In the eyes of Oracle’s Big Data VPs, this year is the year when artificial intelligence technologies such as Machine Learning, Natural Language Processing and Property Graphs will face the challenges of Big Data.
For Monte Zweben, the rise of Machine Learning will require the implementation of turnkey databases facilitating the creation of predictive models for developers, to compensate for the lack of qualified personnel in this specialized field.
For now, according to Roger Levy, companies can turn to standalone services such as Microsoft Azure ML Studio, Google Prediction API, Amazon Machine Learning, or IBM Watson Analytics. These solutions facilitate access to Machine Learning.
Data security, a major challenge
According to David Cass, head of information systems security at IBM Cloud, cyber attacks will continue to grow in 2016. In order to protect data, manufacturers of connected objects will have to redouble their efforts and inventiveness. A new framework of international standards could come into being.
For Oracle’s Big Data VPs, consumers and governments are increasingly aware of the risks of data theft and are demanding more security. To protect data, companies will use a classification system for grouping data by group. Each category will be strictly regulated regarding access and modification rights.
According to Roger Levy, native encryption will protect stored data, and SSL encryption will ensure security during transfers between applications. As Scott Zoldi, Chief Analytics Officer at FICO, explains, companies that are less selective and more data-intensive expose themselves to greater risk and disastrous consequences.
The rise of NoSQL database management systems
SQL database management systems are increasingly being replaced by NoSQL systems, usually associated with unstructured data. Faced with the reality of Big Data, such systems are proving to be more relevant. Gartner’s latest Magic Quadrant for Database Management Operating Systems highlights the rise of NoSQL vendors such as MongoDB, DataStax, Redis Labs, MarkLogic and Amazon Web Services with DynamoDB, at the expense of the former big names Oracle, IBM, Microsoft and SAP.
Paradoxically, Algebraix Data CEO Charles H. Silver points out that SQL is slowly becoming the language of choice for these NoSQL systems.. For good reason, many IT specialists have thirty years of experience with SQL and use it more efficiently for queries, even on NoSQL systems.
In 2016, some companies prefer to combine JSON, SQL and NoSQL for their databases, using tools such as Cassandra. This is what Roger Levy explains.
Self-service is increasingly in vogue
Self-service data preparation tools are becoming increasingly popular. Firms such as Alteryx, Trifacta, Paxata, Lavastorm, and Informatica and its Rev are innovating to offer efficient and intuitive solutions.
For example, IDC highlights the growing demand for visual data analysis tools. In 2016, these solutions are expected to grow 2.5 times faster than the rest of the Business Intelligence market. Oracle predicts that this increased accessibility to data could spark a new wave of innovation.
Significant reduction in data analysis costs
According to Charles H. Silver, the rise of Big Data has prompted many corporate innovations to reduce the costs of data analysis. While these prices have been steadily declining in recent years, by 2016 they will reach a level low enough to allow all sizes of firms to process their data.
Lhe lack of qualified personnel persists
Companies are still struggling to find qualified employees for data analysis, processing and interpretation. According to IDC, in the United States there will be 181,000 competent profiles in 2018 while companies will have 5 times as many vacancies in the field of Big Data.
In conclusion, 2016 provides solutions to the need to process data quickly and easily. Progress raises new issues, and companies are trying to solve them in real time. For many analysts, this year will also be the year when Big Data enters the real world.