Data Virtualization allows the integration of data from various sources. This technique simplifies data access and analysis. Find out everything you need to know: how it works, use cases, better tools…
Data offers many opportunities for businesses, but also many challenges. These data from multiple sourcesand have different shapes. A distinction is made between structured and unstructured data.
All this information is stored in different locations, in databases, SaaS applications or CRM platforms. This can make it difficult to manage the Big Data, and to get a complete view of the data.
What is Data Virtualization?
One of the solutions to this problem is Data Virtualization. It allows data to be manipulated and retrieved even without knowing where it is stored or in what format.
According to the Data Management Association, ” Data Virtualization allows distributed databases and multiple heterogeneous data stores to be accessed and visualized as a single database. Rather than performing ETL on data physically using transformation engines, Data Virtualization servers perform extraction, transformation and integration virtually. “.
Data from multiple disparate sources can be integrated, without needing to copy or move them around. by storing metadata only. Data Virtualization does not replicate data from databases, data stores or other source systems. It stores the metadata to provide an overview.
Users benefit from a single virtual layer covering multiple applications, formats and physical locations. This simplifies and accelerates data access. Data Virtualization puts an end to to the problem of data silos and format differences.
Data may be collected and processed in real timeData virtualization is therefore a valuable asset for Data Mining, Predictive Data Analysis, Machine Learning and Artificial Intelligence.
How does Data Virtualization work?
A Data Virtualization software is presented as middleware (third-party software) to virtually integrate data stored on different sources and in different formats. Such a platform allows authorized users to access all of a company’s data from a single access point.
They no longer have to worry about whether the data is stored on a physical server, an on-site Data Warehouse or Data Lake or the Cloud. Generally speaking, access to and use of data are therefore greatly simplified.
Data virtualization software aggregates structured and unstructured data sources for virtual visualization via a dashboard or dataviz tool. It allows “discovery” of metadata, but hides the complexity of accessing different types of data from different sources.
It is not intended to replicate the data from their source systems, but only to store metadata and the logic of integration to allow for viewing.
What is Data Virtualization used for?
Data Virtualization allows you to simplifying Big Data through abstraction and data federation. It allows easy integration of data from platforms such as Hadoop or NoSQL databases, while removing their complexity.
Data virtualization makes it possible to reduce storage costs and data maintenance, since it is no longer necessary to replicate or transform data into different formats. It also facilitates the interaction between data from heterogeneous sources, whether structured or unstructured.
Centralized management also offers better data governancesince it is possible to apply rules to all data from a centralized platform. Finally, data virtualization makes it easier to test and deploy data-driven applications, as data sources can be integrated more quickly. This increases productivity.
Use cases and applications
Generally speaking, Data Virtualization consists of simplifying access to data from various sources through dashboards or other visualization tools. Its use cases are therefore very numerous.
The most common use case is data integration. All companies today have data from many different sources, and integration between these data has become essential.
This may include, for example, establishing a gateway between an old database stored on a local server and new digital systems such as social networks. Different connections can be used such as Java DAO, ODBC, SOAP or other APIs.
Another interest of Data Virtualization is the creation of a logical Data Warehouse. It differs from a physical data warehouse in several ways. The data is not stored on such a platform.
They remain at the source, which can be a traditional data warehouse. All the data sources are federated and the logical Warehouse acts as a unique platform allowing integration using various services and APIs.
In addition, Data Virtualization is closely related to Big Data and predictive analysis. By enabling the integration of data from many heterogeneous sources, data virtualization facilitates data analysis.
This practice can also be very useful for call centres or customer service. Data Virtualization puts an end to data silos, allowing access to all company databases from a single access point.
Conversely, data virtualization also allows you to isolate certain data sources in order to limit access. This can be very useful to preserve the confidentiality of the most sensitive information, especially for confidentiality or compliance reasons.
Data Virtualization vs. Data Federation: concepts not to be confused.
Data Virtualization is often mistakenly confused with another concept: Data Federation. The data federation is another technology whose goal is to aggregate heterogeneous data from disparate sources and view it from a single access point.
Data virtualization can have the same purposebut is simply to conceal the technical information on the data. Data federation is therefore only one of the possibilities offered by virtualization.
Another concept often confused with Data Virtualization is Data Vizualization. This practice consists of displaying data in the form of graphs, charts, maps or reports. However, a virtualization tool provides data to visualization tools, but is not designed specifically and solely for visualization.
Similarly, Data Virtualization technology can be used in the architecture of a logical Data Warehousebut it’s not synonymous. Logical Data Warehouse is an architecture based on many components, and Data Virtualization is a technology with multiple use cases.
Finally, the term “virtualization” can create ambiguity. Data virtualization should not be confused with virtualized data storageThe main advantage of virtual database software or storage hardware virtualization solutions is that they can be used in a variety of ways. These solutions do not offer the real-time data integration capabilities and data services between disparate sources.
Data Virtualization Tools
There are a wide variety of platforms of Data Virtualization, designed to unify disparate data sources. These various solutions differ in the methods used to achieve this common goal.
Some market references have now disappearedThe same is true of Cisco, which sold its Data Virtualization product to TIBCO in 2017. IBM entered the market in 2014, but has since stopped selling its SmartCloud Data Virtualization product.
Among the best known products, we can cite DataCurrentThe company is specialized on data stored in NoSQL repositories, Cloud services and application data. It also offers Business Intelligence tools to connect to these different sources.
On his side, Denodo specializes in real time data. Its tool has the advantage of being easy to learn and use.
The database giant, Oracle, offers its Data Service Integrator. It is a powerful data integrator compatible with the different products of the firm.
Recently acquired by IBM, Red Hat Offers JBoss Data Virtualization Solution. It is a tool written in Java optimized for JDBC interfaces.
The SAS Federation ServerFor its part, it gives priority to data security. Finally, TIBCO Data Virtualization offers the ability to connect a wide variety of data sources.