Optimize Performance by Leveraging Subflows as a Source in Astera Data Virtualization

By | 2019-09-25T11:27:40+00:00 September 25th, 2019|

The success of businesses is deeply rooted in their ability to utilize the data at their disposal. And with enterprise data spread across different source systems, finding a strategy that integrates dispersed data without incurring high costs or too much time can be challenging.

Data virtualization is one such approach that offers an efficient way to integrate disparate data. It is fast becoming a significant part of the enterprise information fabric due to its agility and flexibility.

According to Forrester, data virtualization is critical to solving enterprise challenges related to big data. It also states that 56 percent of global technology decision-makers have successfully implemented data virtualization.

However, the challenge arises in finding a solution that delivers high performance without compromising on modularity or reusability. Astera Data Virtualization’s flexible architecture enables you to streamline your data virtualization projects by enclosing reusable logics in small components, such as subflows. Using these objects reduces deployment load on the virtualization layer, which speeds up the project execution.  to

Performance Considerations in Astera Data Virtualization

The performance of a data virtualization solution is dependent on many factors, ranging from the diversity and volume of sources to the nature and type of source system.

To understand this concept better, we can categorize the performance considerations of Astera Data Virtualization broadly into three segments:

Query execution: It corresponds to the concurrency, throughput, latency, etc., of processing queries through the abstraction layer to retrieve disparate data in near real-time.

Query optimization: It involves using hybrid strategies to execute data virtualization projects by complementing abstraction with caching and pushdown optimization to improve query performance.

Logic impact: It includes applying transformation logics in the pre-deployment phase before publishing the entity as a virtual model against Astera Integration Server. This ensures that the performance at the deployment level is improved despite the complexity of the logic applied during the designing stage.

We’ve already covered query execution and optimization in this whitepaper. Here, we will discuss how to optimize the logic impact in Astera Data Virtualization using subflows. But before going into that, let’s briefly look into subflows.

Introduction to Subflows

Found in the toolbox, under the transformations section, a subflow is an output-only transformation object created just like any other dataflow in Astera Centerprise. The object encapsulates reusable, complex logic using transformations. The subflow output’s layout is used to populate a virtual model entity, establishing a connection between the virtual data model and subflow.

If the transformation logic inside the subflow is altered, it will not impact the dataflow design. You can call the modified subflow by entering the updated file path, and the integration flow will automatically update the dataflow logic. All transformations supported in Astera Centerprise can be used in the subflow.

Subflows reduce complexity from the dataflow design and enable reusability, simplifying your integration flows and data virtualization projects.

Using Subflows as a Source in Astera Data Virtualization

In Astera Data Virtualization, a subflow functions as a dynamic source that can be used as an entity in a virtual data model. It allows you to create a complicated flow at the back end, such as merging two data points or validating incoming data, and write it to a single subflow output.

Using subflow in a virtual data model

Figure 1: Using subflow in a virtual data model

To illustrate this functionality with an example, let us consider a scenario in which a business is consolidating data from six different sources, through a virtual layer, to perform analytics on their operational systems.

  • The supplier data is stored in a database table (SQL Server Supplies)
  • The order data is available in delimited file format (Order Lines)
  • Data of received shipments is recorded in an Excel file (Stock Items and Orders)
  • The information about prospects and current customers is regularly updated in Salesforce CRM (Salesforce Customers)
  • Customer complaints are lodged in a customer portal in the form of online support tickets (REST Client Customer Tickets)
Connecting all sources using a virtual data model

Figure 2: Connecting all sources using a virtual data model

In this scenario, we will focus on the Customer Tickets entity, which is accessed using subflow as a source. In the subflow, we will apply a transformation on the source, in this case Zendesk data. The flow output will be treated as a source in the virtual data model.

You can view the subflow by right-clicking on the object.

Accessing the subflow

Figure 3: Accessing the subflow

The subflow used within the virtual data model

Figure 4: The subflow used within the virtual data model

The screenshot above shows the complete subflow. First, we have connected to Zendesk through a REST client object. Next, we applied Expression transformation to convert data type format from integer to string before sending it to the subflow output. This subflow is a part of the data virtualization project, which can be seen in the Project Explorer sidebar.

Pushing down the transformation logic into the subflow enables fast deployment of the virtual data model, improving the performance of virtual database. Moreover, the consumer gets a complete picture of the data without worrying about applying their own transformations on the virtual database.

The subflow object in Astera Data Virtualization is a powerful, modular feature that combines the capabilities of dataflows and virtualization to enable fast data access and better query performance. Want to get the first-hand experience of this feature? Download the trial version of Astera Data Virtualization.