Applying Data Virtualization in the Enterprise: Operational Data Stores

By | 2019-08-27T09:20:50+00:00 August 27th, 2019|

Operational data stores (ODS) are another key component in high-volume data architectures. Like staging tables, these structures collect heterogeneous data from across the enterprise but while the former is just used for data storage and cleansing, the latter provides additional reporting and analytics functions. Although these capabilities may seem similar to those of a traditional data warehouse, the ODS differs significantly in both its construction and purpose.

Features of the ODS

Time Period of Data

The ODS only holds short-term data (generally anywhere from 60 to 90 days’ worth), and is designed to deliver a current snapshot of specific operations or business applications.

Data Granularity

ODS data is usually stored at the lowest granular level so as to provide a detailed picture of its corresponding source system.

Data Loading Schedule

Data loading in the ODS is far more frequent than in other reporting and storage architectures, such as the EDW. By comparison, the ODS is supposed to maintain real-time (or near real-time) data. As a result, queries are far more time-dependent with results differing significantly based on the point of calculation.

Data Loading Method

The ODS is emptied and updated with each data load and does not contain any historical records. Users must ensure that the latest version of transactional records is loaded into the data store.

Application

ODS reports are used for day-to-day decision-making, which makes these structures better-suited to the needs of operational users such as sales executives, support agents, and line managers.

Complexity and Volume of Data

ODS queries are typically inexpensive and executed on a consistent basis, so a defined query process is required.

Benefits of Integrating an ODS into Your Data Architecture

  • Gain a comprehensive and accessible view of key business processes

 

  • Offers a flexible and scalable repository that can store and process data from several different source systems

 

  • More efficient and targeted reporting

 

  • Provides a single version of current operational data that is available for querying

 

  • Quickly query large datasets

Applying Data Virtualization to Operational Data Stores

If the ODS is to serve as a real-time reflection of specific business processes, it must contain only current and unaltered data at all times. Data virtualization can enable this objective in a number of ways.

In the modern enterprise, data is generated from a variety of platforms in a multitude of formats, from cloud-based applications to flat files and traditional RDBMS. When data virtualization is applied, all of these datasets are mirrored faithfully and abstracted into the ODS where they are presented as a single consolidated source of intelligence. By comparison, physically extracting and loading data from these systems will be a more time- and resource-intensive process that may result in changes to the source schema after replication.

Another key advantage of the virtualized operational data store is that changes to source systems can be immediately updated in ODS tables. Astera Data Virtualization contains an intuitive caching system that captures changes to the source system and preserves them in-database. This updated view is then brought in the virtualized ODS on a user-defined schedule.

Once sources have been integrated into the ODS, they can be cleaned and checked for inaccuracies or redundancies. In Astera Data Virtualization, this can be achieved through subflows, applied to a virtual model. The standardized data can then be queried in real-time using Astera’s proprietary Astera Query language (AQL) which is applied seamlessly across all source datasets.

Operational Data Stores in Astera Data Virtualization

Astera Data Virtualization offers a number of features that enable users to quickly build and deploy their operational data stores.

Fig 1: An Operational Data Store Built Using Astera Data Virtualization

Fig. 1 shows a virtualized operational data store consisting of a number of transactional sources from across the enterprise.

We have:

  • Supplier data recorded in an SQL Server Database

 

  • Stock items information recorded in an Excel Workbook

 

  • Granular order form details recorded in delimited files

 

  • Customer order details recorded in another Excel Workbook

 

  • Customer relationship data retrieved from Salesforce CRM, contained within a database table

 

  • Customer tickets retrieved from Zendesk’s support portal, contained within a REST Client object

As you can see, our intuitive drag-and-drop interface and array of connectors have allowed these entities to be integrated into the model with relative ease.

Furthermore, the REST client object shown above is part of a subflow in which additional transformations have been applied to cleanse and prepare customer tickets for querying.

We have also used our entity relationship builder node to link the Orders table to Order Lines. This feature automatically creates relationships between tables based on shared fields. In case of naming discrepancies these fields can also be manually selected in the Edit Relationship window as shown below.

Entity Relationship Created Based on Common OrderID Field

Creating these relationships will help users during the consumption phase, as linked tables can be imported into a dataflow as one entity in a multi-table query source.

Fig. 2: Combined MultiTable Source Tables Extracted to an Excel Workbook

Now, if we preview the output from this Excel workbook destination, we can see that data from both tables has been merged in the destination dataset despite the fact that no join transformation has been applied.

Merged Order Lines and Orders Table Fields

However, the day-to-day queries that are typical of the ODS are best conducted in-product through our native Astera Query Language (AQL).

To do so, simply verify and publish your virtual model then create a new dataflow using the newly deployed virtual database as your source.

SQL Query Source Connected to Newly Deployed Virtual Database

Once the connection has been established, write your custom query in the next window. In this case, we’ve used AQL dot notation to join together the Order Lines and Orders tables without the need for an expensive join transformation. This will create the same output as a Multi-Table query source.

Joining Order Lines to Orders through AQL in SQL Query Source

The resulting output can be previewed in ADV or exported to a preferred destination for further analysis as in Fig. 3 below.

Fig. 3: Virtual Database Query Output Exported to Sales Reporting Table

Taking Data Virtualization Further with ADV

Perhaps the most critical advantage offered by ADV is its ability to streamline  the publishing and consumption of enterprise data. In the next article of this series we will explore how our technology can optimize the delivery of real-time intelligence to business users through self-service analytics.

If you have a more specific use-case that you would like to explore, you can always contact our sales and support team for a private consultation. Or, try out the product yourself by downloading the trial version of Astera Data Virtualization.