Pushdown Optimization Mode in Centerprise

By |2019-07-09T10:34:20+00:00June 3rd, 2019|

ELT process complementing traditional ETL approach for improved performance and agile results

Introduction

ETL (Extract, Transform, Load) and ELT (Extract, Load and Transform) are two approaches to data manipulation that are often mistaken for substitutes. The situation in which either approach is used depends on multiple factors such as the type of source and target databases, the database server’s processing power, volume of data, and complexity of the transformations. Ultimately, the goal is to enable users to efficiently move data in large volumes.

ELT is differentiated from ETL based on the location of the transformation processing. In an ETL process, the data is transformed in the staging area that is usually placed in between the source and target systems (data warehouse). On the contrary, in an ELT process the transformations take place within the target database. The database engine powering the Database Management System (DBMS) performs the transformations in an ELT process while transformations happen on a third machine in an ETL process. Transformations in ETL are usually carried out on a relational database server that is physically and logically separated from the source and target systems. This is done to minimize the impact of periodic ETL activity on source and target systems.

ETL – Extract, Transform, and Load Approach

Enterprises collect and merge data from multiple sources, which results in heterogeneity in the data platforms and formats. An ETL software brings the source data to the server for transformation and writes the transformed data to the target. It performs integrity checks on the incoming data as well.

An ETL tool is an important part of building a data warehouse. It can be used to extract only the relevant information, parse and validate it to fit the business’ standard format, and finally load structured and cleansed data into a data warehouse. Since the data loaded in the data warehouse is already cleansed and transformed, it becomes easier for business users to analyze it.

ELT – Extract, Load, and Transform Approach

In certain situations, instead of moving volumes of data to an ETL software system, users can maximize the performance by placing the staging tables within the destination database. The primary benefit of an ELT approach is that it is time-effective because it loads the data directly on the destination and delegates the processing to the DBMS. This reduces the time between extraction and load process and the data is readily available to the end-user as a result. However, since untransformed data is loaded on the target systems, neat visual interface is the opportunity cost of the reduced data delivery time. Gartner Analyst Mei Yang Selvage says that, “ELT is primarily appropriate for workloads with limited transformation and cleansing.”

Pushdown Optimization Mode

What is Pushdown Optimization Mode?

The Pushdown mode in Centerprise enables the users to choose where data processing happens by pushing down the transformation logic to the source or target database when they reside on the same server. This eliminates the unnecessary movement of the data, reduces network latency by delivering agile results and improves the overall performance.

How Does Pushdown Optimization Work?

The need to extract and migrate large volumes of data, from source systems to a third-party server for transformations, makes users lose valuable time. When source database and target database are on the same server, users can save time by avoiding unnecessary data movement. Running a job in pushdown mode transfers the transformation logic to the source or target database. Instead of running the
transformation logic in parts by extracting the information from source, applying transformations and then loading it on the destination, the commands are issued and executed through automatically generated consolidated SQL queries. When businesses need bulk data processing, this approach yields major performance benefits by eliminating data movements between database and the ETL server.

Types of Pushdown Optimization Mode

There are two types of pushdown optimization modes:

  1. Full pushdown optimization mode
  2. Partial pushdown optimization mode

Centerprise’s intelligent algorithm decides whether the job’s performance will be optimized by running it in partial pushdown optimization
mode or full pushdown optimization mode. Figure 3-4 shows SQL queries for jobs that are executed in partial and full pushdown optimization modes.

The highlighted part in figure 3 is the portion of the job that is executed in the pushdown mode. This example is only partially pushing down the transformation logic (aggregate transformation) to the source database, therefore it is referred to as partial pushdown.

Full pushdown, on the contrary, refers to running a job in pushdown mode from beginning to end. Figure 4 shows an example of a job executed in full pushdown mode.

An important thing to note here is that the corresponding SQL queries (in both the modes) are automatically generated without the need of manual scripting or coding.

Verify Pushdown Job Feature in Centerprise

Not every job can be run in the pushdown mode. For instance, if the source and target destinations are not in the same relational database, the data extraction and transfer step cannot be eliminated. Moreover, certain transformation logics such as ‘find’, name and address parsing or complex string processing cannot be pushed down.

Verify pushdown job feature in Centerprise identifies the transformation logics and SQL statements that can be pushed down to the source or destination database.

 

Database Providers Supported in the Pushdown Optimization Mode

In Centerprise, Pushdown Optimization mode is supported for the following providers:

1. MSSQL
2. Postgres
3. Oracle
4. SQL
5. MySQL (version 7.3)

Conclusion

Instead of comparing ETL and ELT (pushdown optimization) as different approaches to achieve same results, see them as different tools designed to suit different business needs. Both the approaches have their benefits and limitations therefore, using them as complimenting tools instead of substitutes is going to yield best results for your business. Centerprise Data Integrator offers both capabilities. Users can decide on the approach by considering the factors affecting overall data processing.

Want to try the pushdown optimization feature in Centerprise? Download a free trial to integrate massive volumes of data at incredible speeds.