The widespread adoption of cloud databases and transactional reporting systems across the business landscape has led to exponential growth in the volume and variety of data moving through organizations. As a result, analysts that were once tasked with performing ETL or ELT on thousands of records now have to contend with datasets numbering in the millions or even billions.
In this environment, the technology responsible for processing data and delivering business intelligence to relevant decision-makers has come under renewed strain. Traditional ETL tools are tasked with handling increasingly complex and high-volume data operations while transformation servers are pushed to their limit.
While ELT can significantly lessen this burden by offloading resource-intensive data processing to scalable cloud-based RDBMS, pushdown optimization cannot replace ETL altogether. Any dataflows containing non-database sources and destinations, or unsupported transformations will still require extraction to an intermediary server. In addition, certain operations may function more efficiently when pushed through a purpose-built ETL tool.
In the latest release of Centerprise, we aim to provide our users with the ability to customize and combine both ETL and ELT processes in order to ensure optimal performance in even their most complex dataflows. Many of the improvements seen in this build are a direct result of practical feedback received from clients that have struggled to realize the full benefits of pushdown optimization in their own business processes. Here are some of the key changes we’ve made in Centerprise 8.0.
Limit In-Memory Sorts
As block-level transformation, sorts require all input datasets to be cached in-memory before they can be ordered according to the user’s criteria. These operations are not just limited to the sort transformation, other transformations such as aggregates and joins also provide the option to sort incoming data before it is processed further. As a result, a dataflow might involve several instances of this expensive transformation occurring across the dataflow, which will of course add significantly to its runtime.
For this reason, we recommend sorting input datasets at the source before they are imported into the Centerprise server by running this transformation in pushdown optimization mode. Furthermore, any pushdown-excluded transformations that provide options to sort data should have this setting disabled to ensure maximum efficiency in the dataflow pipeline.
Verify and Correct Invalid Boolean Expressions
In pushdown optimization mode, Centerprise will automatically convert boolean values into integers (e.g. 1 and 0 for true or false outputs) to improve the functionality of database SELECT statements. However, this feature can lead to issues when boolean expressions are used in the dataflow. These WHERE statements require additional operators to execute correctly, i.e., IIF(1=1, 1, 0). But in pushdown optimization mode, boolean expressions are converted as follows: IIF (1,1,0). While this statement is logically valid, it will lead to errors when nested boolean expressions are used.
For example, when the inner IIF clause is converted in the following expression:
IIF(IIF(CHARINDEX(‘Accrual’, [DimClaim].[ClaimSource]) > 0, 1, 0), ‘Accrual’, [DimClaim].[ClaimSource])
it produces an output of:
which is a syntactically invalid statement.
In such cases, the user will need to correct the expression themselves by adding the 1=1 operator manually in the dataflow. The example above could be changed to a CONTAINS statement as follows:
If(Contains(“Accrual”, [DimClaim].[ClaimSource])) = true, “Accrual”, [DimClaim].[ClaimSource])
In Centerprise 8, we have made it easier to share expressions between SELECT and WHERE statements by adding support for variables (reusable objects) in pushdown optimization mode.
Exclude Objects From Pushdown Where Preferable
Alternatively, the user can exempt the expressions from pushdown mode altogether using the new “Exclude from Pushdown” attribute. This feature is also useful for running objects that cannot be efficiently mapped to the source or target database, or those that perform differently in the database.
For example, when expressions result in a X=Y output, where both X and Y are null values, Centerprise is able to evaluate the expressions as true. However, if these values are contained within a WHERE clause in the database, they will always be assessed as false. In this case, it would be preferable to execute the expression through the ETL tool rather than in ELT.
In previous instances, we have been able to increase client dataflow runtimes by 30 minutes or more through the judicious exclusion of objects and expressions based on their suitability for pushdown.
Check for Supported Pushdown Transformations
Certain if/else expressions could also be converted into switch transformations which are now supported in Centerprise. In-database, this function performs similarly to a CASE statement and can be used to simplify complex expressions. Newest versions of Centerprise also include support for layout fields that lack an input port as well as a number of other ETL functions which can now be reliably replicated in supported databases.
For a full list of currently supported transformations in Astera Centerprise pushdown mode, reference our comprehensive guide on this topic.
Improve Dataflow Performance with Manual SQL Hints
If a user feels that the database optimizer is executing their dataflow in a less than optimal manner, they can use optimizer hints to modify the execution plan. In Centerprise, these hints can be entered in the scheduler tab for the dataflow, as a result hints will be applied to all queries and sources in the flow.
In a previous use case, our client was able to employ optimizer hints to enable additional degrees of parallelism to their dataflow, thus ensuring that server resources were maximized to execute a resource-intensive ELT process in which several different data sources were brought together in a single query.
The query hint optimizer will also be helpful during the data warehouse loading process when dimension tables from different sources are joined to a fact table in the staging server. In these instances, the user can make use of SQL hints to decide what gets staged and how for optimal performance.
Take Control of Your Enterprise Data Processing
This combination of additional features and optimizations should give our users the ability to control the efficiency and overall performance of their ETL and ELT operations at each step of their data journey. Whether they are engineering new enterprise data warehouses or migrating their records to a cloud database infrastructure, Centerprise’s pushdown capabilities can prove essential.
For more information on how pushdown optimization mode can benefit your business, check out the other blogs in this series or schedule a personal consultation with our sales team today.