Data Virtualization: A Technology Overview

By |2020-01-15T04:38:40+00:00January 15th, 2020|

Owing to their widespread operations, enterprises resort to different types of systems that manage heterogenous data. These systems are connected via an intricately knit data infrastructure, comprising of databases, data warehouses, marts, and lakes, storing key pieces of intelligible insights. However, facilitating data movement and extracting business insights require using a myriad of data management technologies, which can be complex to learn and manage. This is where data virtualization tools come into play.

Let’s explore the technology and how it allows businesses to maximize the operational capabilities of their comprehensive data infrastructure.

What is Data Virtualization?

Data virtualization creates an abstraction layer that brings in data from different sources without performing the entire Extract-Transform-Load (ETL) process or creating a separate, integrated platform for viewing data. Instead, it virtually connects to different databases, integrates all the information to provide virtual views, and publishes them as a data service, like REST. This enhances data accessibility, making specific bits of information readily available for reporting, analysis, and decision making.

By creating an abstraction layer, virtualization tools expose only the required data to users without requiring technical details about the location or structure of the data source. As a result, organizations are able to restrict data access to authorized users only to ensure security and meet data governance requirements.

The technology simplifies key processes, such as data integration, federation, and transformation, making data accessible for dashboards, portals, applications, and other front-end solutions. Moreover, by compressing or deduplicating data across storage systems, businesses can meet their infrastructure needs more efficiently, resulting in substantial cost savings.

Data virtualization tools

Data virtualization integrates data from heterogeneous sources, instead of extracting and loading it onto a single platform, like Enterprise Service Bus (ESB), Extract-Transform-Load (ETL), and other middleware applications. When utilized properly,  a data virtualization tool can serve as an integral part of the data integration strategy. It can provide greater flexibility in data access, limit data silos, and automate query execution for faster time-to-insight.

Businesses nowadays make data virtualization software an integral part of their approach to data management, as it allows complementing processes like data warehousing, data preparation, data quality management, and data integration.

Applications of Data Virtualization

Businesses can leverage virtualization technology to optimize their systems and operations in several ways, such as:

  • Data Delivery: It enables you to publish datasets (requested by users or generated through client application) as data services or business data views.
  • Data Federation: It works in unison with data federation software to provide integrated views of data sources from disparate databases.
  • Data Transformation: It allows users to apply transformation logics on the presentation layer, thus improving the overall quality of data.
  • Data Movement and Replication: Data virtualization tools don’t copy or move data from the primary system or storage location, saving users from performing extraction processes and keeping multiple copies of inconsistent, outdated data.
  • Virtualized Data Access: It allows you to break down data stores by establishing a logical data access point to disparate sources.
  • Abstraction: It creates an abstraction layer that hides away the technical aspects, such as storage technology, system language, APIs, storage structure, and location, of the data.

Since a data virtualization tool offers a comprehensive set of capabilities, it has been proven useful for management, operational, and development purposes.

Benefits of Data Virtualization

According to Gartner, by 2020, about 35 percent of enterprises will make data virtualization a part of their data integration strategy. Here is why enterprises are increasingly opting for data virtualization tools:

  • Multi-mode and multi-source data access, making it easy for users at different levels to use data as per their requirements
  • Enhanced security and data governance for keeping critical data safe from unauthorized users
  • Hiding away the complexity of underlying data sources, while presenting the data as if it is from a single database or system
  • Information agility, which is integral in business environments, as data is readily available for swift decision making
  • Infrastructure agnostic platform, as it enables data from a variety of databases and systems to be easily integrated, leading to reduced operational costs and data redundancy
  • Simplified table structure, which can streamline application development and reduce the need for application maintenance
  • Easy integration of new cloud sources to existing IT systems easily, allowing users to have a complete picture of external and internal information
  • Hybrid query optimization, enabling you to streamline queries for a scheduled push, demand pull, and other types of data requests
  • Increased speed-to-market, as it cuts down the time needed to obtain data for improving new or existing products or services to meet consumer demands

Other benefits include cost savings due to fewer hardware requirements and lower operational and maintenance costs associated with performing ETL processes for populating and maintaining databases. In addition, data virtualization tools store metadata information and create reusable virtual layers, allowing you to experience improved data quality and reduced data latency.

Data Virtualization Use Cases

According to Forrester, data virtualization has become a critical asset to any business looking to triumph the growing data challenges. With innovations like query pushdown, query optimization, caching, process automation, data catalog, and others, data virtualization technology is making headway in addressing a variety of multi-source data integration pain points.

Here are a few use cases and applications that show how database virtualization is helping businesses address master data management challenges:

          1.    Enhances Logical Data Warehouse Functionality

Data virtualization acts as a fuel for the logical data warehouse architecture. The technology enables federating queries across traditional and modern enterprise data repositories and software utilities, such as data warehouses, data lakes, web services, Hadoop, NoSQL, etc. making them appear to users as if they are sourced from a single database/storage location.

In a logical data warehouse architecture, data virtualization allows you to create a single logical place for users to acquire analytical data, irrespective of the application or source. It enables quick data transfer through several commonly used protocols and APIs, such as REST, JDBC, ODBC, and others. It also enables you to assign workloads automatically to ensure compliance with Service Level Agreement (SLA) requirements.

          2.    Addresses Complexity of Big Data Analytics

Businesses utilize predictive, cognitive, real-time, and historical forms of big data analytics to gain an edge over the competition. However, due to the increasing volume and complexity of data, businesses must adopt a wide range of technologies, such as Hadoop systems, data warehouses, real-time analytics platforms, and others to take advantage of arising opportunities.

Through data federation and abstraction, you can create logical views of data residing in disparate sources, enabling you to use the derived data for advanced analytics faster. In addition, it allows easy integration with your data warehouse, business intelligence tools, and other analytics platforms within your enterprise data infrastructure for information agility.

        3.    Facilitates Application Data Access

Systems and applications require data to produce insights needed for decision making. However, one major challenge when working with applications is accessing distributed data types and sources. Moreover, you may need to write extended lines of code to facilitate sharing data assets among systems and applications. Some operations may also need complex transformations, which is only achievable through specialized techniques or tools.

For example, if you have two datasets residing in IBM DB2 and PostgreSQL, the tool will map to the target databases, automatically execute separate queries (for each database) to fetch the required data,  and federate it into a single integrated platform, providing the virtual views through a semantic presentation layer. It will also perform joins, filters, or other transformations on the canonical layer to present the data in the desired format.

         4.    Optimizes the Enterprise Data Warehouse (EDW)

Data warehouses play a crucial role in helping enterprises handle massive amounts of incoming data from multiple sources and preparing it for query and analysis. While ETL and other traditional data integration methods are good for bulk data movement, users have to work with outdated data from the last ETL operation. Additionally, moving large volumes of data (in petabytes and zettabytes) becomes time-intensive and requires advanced, more powerful hardware and software.

Data virtualization streamlines the data integration process. It utilizes a federation mechanism to homogenize data from different databases and create a single integrated platform that becomes a single point of access for users. It offers on-demand integration, providing real-time data for reporting and analyses.