Find out what industry experts are predicting for Big Data trends and developments in 2020. A good way to anticipate the changes ahead and build your company’s roadmap…
As 2019 draws to a close, it is time to look to the future. What does the year 2020 hold in store for Big Data and data analysis? Answer through predictions by industry experts and other specialists…
Hadoop: should we bury the elephant?
In 2019, Hadoop has had a particularly difficult year.. However, it may still be early to announce the final end of this framework. According to Haoyuan “HY” Li, founder of Alluxio, Hadoop storage in HDFS form is indeed doomed to disappear. Its cost and complexity, and the fact that the calculation is not extensible on HDFS, make it unsuited to the current constraints of companies.
Now users have need for computing capacity that can be extended at will and is available on the Cloud for real-time insights. It will therefore be necessary to transfer data stored on HDFS to the most efficient and optimized system, whether it is Cloud or on-premise storage.
On the other hand, the expert believes that other components of the Hadoop ecosystem dedicated to processing and calculation still have a bright future ahead of them. These include the case of Apache Spark and Presto.
Object storage for structuring data
As Data Lakes HDFS deployments slow down, object storage may become the new approach of choice according to Jon Toor, CMO of Cloudian. According to his predictions, more and more companies will opt for object-oriented storage. in 2020 to create structured data from unstructured data. This will allow metadata to be used to “make sense of the tsunami of data generated by the IA and ML workloads”.
Likewise, according to Thoughtspot’s CEO, Sudheesh Nair, the end of Hadoop is just the end of a cycle for the Big Data.. Now, faced with increasing data volumes and the need for agility, companies are developing their own Data Lakes or Data Warehouses or turning to the Cloud. This trend is expected to continue into 2020.
The need to understand Data Gravity
For its part, the CTO of Digital Realty considers that companies need to understand the concept of “Data Gravity”… to overcome the obstacles of digital transformation. As data grows in volume, a force similar to gravity makes it more difficult to transfer while attracting even more data.
In addition, according to Sharp, most businesses can’t keep up with the speed at which data is being generated and must manage data from multiple sources that are difficult to migrate or use effectively. It is this “Data Gravity” that prevents digital transformation initiatives from succeeding.
In 2020, companies will therefore be trying to remedy the Data Gravity problem. by bringing their applications closer to data sources rather than transporting resources to a central location. By relocating traffic, analysis and data management, companies will be better able to control them.
AI to the rescue of Big Data
TD Ameritrade’s Beaumont Vance believes that without accumulating more and more data, companies can make better use of the data at their disposal using existing technologies.
Currently, companies use less than 20% of all their internal data. However, thanks to new artificial intelligence techniques, he estimates that the 80% of unused data will be usable and easier to interpret.
Artificial intelligence is making the Big Data even more useful. From questions that have so far been impossible to answer can be elucidated very easily.
Conversely, in the eyes of Arka Dhar, CEO of Zinier, it is also possible to to train AI with Small Data. . Rather than using massive data sets to train the algorithms, Data Scientists will be able to achieve the same results with much less data. This is related to advances in AI.
According to Haoyuan “HY” Li of Alluxio, data analysis and artificial intelligence teams will be merged within companies. For good reason, artificial intelligence is the next step in the field of structured data analysis.
The statistical models of yesteryear converged with computer science to give birth to artificial intelligence and Machine Learning. This is why AI, analytics and data teams need to work together to unlock the full potential of the data. that they use.
Similarly, Eugene Roytburg of Fractal Analytics believes that AI and Machine Learning will now be part of data analysis and that their scope of application will be better defined. So far, he believes that many companies are struggling to distinguish between the two.
Nevertheless, issues of confidentiality and data protection will be given more prominence. According to Suraj Amonkar of Fractal Analytics, the AI community will continue to debate and move forward on governance, confidentiality, safety and ethical issues related to artificial intelligence.
From 2020 onwards, companies could therefore take ethical issues into account when designing new applications of artificial intelligence. This could become the new standard. The CEO of Eugenie.ai, Soudip Roy Chowdhary, even believes that research efforts will be made to build an AI ecosystem concerned with confidentiality and fairness.
Memory-based storage is gaining popularity
Memory-based storage systems are expected to gain popularity in 2020, according to GridChain CEO Abe Kleinfled. For good reason, memory-based storage offers more application possibilities than disk storage.
For example, only this type of storage currently makes it possible to compare real-time data with historical data stored on a Data Lake at an affordable cost. These include an in-memory data integration hub. based on a platform like Apache Ignite integrating Spark, Kafka and a Data Lake like Hadoop. Next year should therefore be marked by a massive gain in popularity for in-memory computing solutions.
The Big Data: dream or nightmare?
According to Eric Raab and Kabir Choudry, both vice presidents at Information Builders, Big Data can be a dream or a nightmare. For them, those who have invested in solutions to manage, analyze, and use data properly will benefit from a a clearer overview of their businesses and the road to success.
On the other hand, those who didn’t do it “will be left with a mountain of information.” that they won’t really be able to understand or exploit. Their decisions will therefore be misdirected or their data totally useless.
Todd Wright, Director of Data Privacy and Data Management Solutions at SAS agrees. According to him, Big Data management will still be difficult in 2020 and orchestration, discovery, data preparation or model management will once again be priorities for companies.
For good reason, let us recall that the World Economic Forum predicts that the volume of data produced by mankind will reach 44 zettabytes in 2020. If companies do not have the analytical models and data management programs to access, integrate, cleanse and govern data to gain insights from it through advances in AI, Machine Learning or analysis, all of this data will be generated in vain.
The rise of computer storage
According to Nader Salessi, CEO and founder of NGD Systems, companies are now looking to fill NVMe disks as quickly as possible to speed up the storage and analysis of data, particularly those related to IoT.
However, this expert believes that NVMe is not fast enough on its own to analyze and process petabytes of data in real time. The solution to the problem of data management and movement would therefore be computer storage or computational storage.
Data Fabric to Simplify Data Integration
Faced with the explosion of data volume and the multiplication of AI and Machine Learning use cases, data integration is more complex than ever. According to Denodo, the solution could be the architectural concept of the Data Fabric.
This architecture will allow access structured data sets in real timeThe financial statements are either unstructured, unstructured or semi-structured. Companies will therefore be able to focus on Machine Learning and Artificial Intelligence.
Thanks to advances in the Internet of Things, a dynamic Data Fabric offers “fast, secure and reliable access”. to large volumes of data through a logical Data Warehouse architecture.
The end of the talent shortage?
The lack of data analysis specialists remains a scourge at present. According to Hugh Owen of Microstrategy, however, this problem could be solved in 2020. if companies train their employees so that they acquire the required skills. This approach may be more relevant than recruiting already trained specialists.
On the other hand, according to John LaRocca of Fractal Analytics, the shortage of talent will hit companies head on with regard to artificial intelligence.. The demand for AI solutions will far exceed the number of available talent, and organizations will need to adapt by ensuring that non-professionals can develop AI applications.
This is why, as Marcus Borba of Borba Consulting points out, we can see that a growing demand for ready-to-use Machine Learning models that can be used without expertise.
The Cloud continues its breakthrough
According to Eric Raab and Kabir Choydry of Information Builders, enterprises will be turning to the Cloud to take advantage of the convenience, scalability and flexibility of Cloud-Native solutions in 2020. And with good reason: these solutions are proven and trusted by enterprises.
Similarly, David Richards, CEO of WANdisco, expects that thousands of companies created before the Cloud’s emergence will decide to “join the party” in 2020. This should lead to strong growth in the Cloud. Companies will start by migrating their data to the Cloud, in order to lay the foundations for an optimal environment for artificial intelligence and Machine Learning applications.
Finally, Fractal Analytics’ Sandeep Dutta expects companies to continue to create Data Lakes on the Cloud. This allows them to improve the quality of their datasets. The Cloud also enables hardware innovation to lower prices while improving performance as databases will run on increasingly specialized hardware.
AI and EKG against data silos
According to Sauray Chakravorty, Principal Data Scientist at Brillio, Semantic Artificial Intelligence and EKGs (Enterprise Knowledge Graphs) help solve the problem of data silos and disparate data sets.
With EKGs, companies can get rid of semantic inconsistencies within their data sets. By combining them with semantic AI, it is possible to unleash the full potential of Data Lakes and Big Data.
The Year of the Storage-Class Memory?
CEO and co-founder of MemVerge, Charles Fan, believes that 2020 could be the year of flight for the storage-class memory.. Memory-centric data centers could help meet the increasing demand for data center applications, while providing increased processing speed.
The expert is convinced that innovations in the field of information technology will make it possible to open up a new topology between CPU units and memory units. Thus, the calculation layer and the storage layer could begin to converge within the memory-centric Data Centers.
This is also the opinion of Bob Moul, CEO of Circonus, who is convinced that we are moving quickly towards a combined storage and processing architecture for Edge deployments. With Gartner predicting 20 billion IoT devices in 2020, enterprises need to be able to adopt scalable storage solutions to accommodate this massive growth.
The Dark Data finally exploited
According to Rob Perry, vice president at ASG Technologies, the Dark Data could finally be operational in 2020. The term “Dark Data” refers to the numerous data collected without being used by companies.
However, these data may contain valuable information for business. They may also contain personal information that needs to be protected or even deleted. In the coming year, companies may finally get to grips with Dark Data.
Data governance is a priority
The governance, data protection and privacy will be more important than ever in 2020. That’s what Wright from SAS predicts. He says the emergence of data protection laws such as the DPMR in Europe is forcing companies to develop data governance programs that take privacy into account.
Similarly, Bailis de Sisu expects that the new governance tools will allow for ensure confidentiality and data integrity until they disappear into the background again.
Open Source databases are gaining ground
As Karthik Ranganathan, founder and CTO of Yugabyte, reminds us, Open Source databases represented 0% of the market ten years ago. Today, they have a market share of around 7%..
This trend is expected to increase in 2020. In the face of rapid technological advances, database providers are realizing that an 100% open source model is the most advantageous since it allows to take advantage of a longer software maturity to reach with a “freemium” model.
However, it should be remembered that several companies have, on the contrary, chosen to move away from the Open Source model in 2019. This is the case of Confluent, Redis or MongoDB. Thus, the co-founder of Rockset, Dhruba Borthakur, believes that it is more likely that the industry will turn to open services… than to open source software.
In his view, given that the Cloud has completely changed the way software is distributed and monetized, it is likely that that new technologies are less and less offered in open source.
Instead, it is the data services that will be open. This is particularly likely to be the case in the field of data management. Just as the decade that is coming to an end was the decade of Open Source, the decade that is about to begin will be the decade of Open Service .
Kubernetes becomes mainstream
In 2019, the open source container orchestration platform Kubernetes (K8s) was a real success in the field of tech. According to Pallanck de O’Reilly, this trend will continue in 2020. The the speed at which companies adopt this technology will increase.
Similarly, Alluxio’s Haoyuan Li expects the data analysis and artificial intelligence workloads to be more and more often deployed on Kubernetes. This is also the opinion of Stephen Fabel of Canonical, according to whom “Kubernetes has become an integral part of the modern Cloud infrastructure” and now serves as a development and experimentation portal for new technologies.
However, this expert also expects companies to seek to determine if Kubernetes is really the tool that meets their needs. This is because, in some situations, this technology can be complex to manage and require specialized skills that organizations may not have.