home
 

Centerprise | CDC

Moving large amounts of data between multiple disparate systems is an essential requirement in most organizations. With ever increasing data volumes, it is becoming impractical to move entire tables and databases. Throwing hardware and bandwidth at the problem is no longer enough. Smarter approaches are needed to make data available in a timely manner.

Change Data Capture (CDC) refers to a variety of approaches that optimize data transfers by reducing the amount of data transferred between two systems. The idea is to either filter out source records that have not changed since the last transfer or apply diff processing at the destination to update only the records that have changed since the last transfer. These approaches produce major performance gains over full data load.

 

Approaches to Change Data Capture

Centerprise Data Integrator offers three change data capture strategies—Data synchronization at destination, incremental read at source, and audit fields. Depending on the situation at hand, you can employ one or more of these approaches.

Data Synchronization

This strategy works on minimizing database writes once data is already at its destination. This approach is ideal in situations where data between two data sets must be synchronized regularly and a small percentage of records are modified between transfers. This approach works with all data sources including flat files, XML, COBOL files and databases.

Centerprise data synchronization employs a high-speed multithreaded algorithm to compute differences between two data sets and then applies these differences to the destination. This approach results in substantial performance increases in situations where a small percentage of records are updated during synchronization.

For situations where data arrives in text files, XML, or COBOL files, or where direct access to the data source is not a feasible option, this approach results in very high performance and scalability.

Incremental Read

Incremental read minimizes the amount of data transferred by keeping a hash for each record that is read. On subsequent runs, the data set is compared against the copies and any changes since the last run are sent to the destination. For frequently running incremental update jobs, this approach reduces load times by as much as 90% in many cases.

This is the least intrusive approach in that it puts very little demand on the source database. This approach captures all changes to the source data including deletes and minimizes updates to destination. The downside is that it requires a full read of source on every transfer. The only requirement for this approach is that the source must have a unique key.

Audit Fields

This approach uses specific fields such as create time or update time to determine the rows that changed since the previous transfer and transfer only the rows created or updated since. This approach requires that the source database maintain these audit fields. This approach cannot capture deletes in the source database. If the source application can guarantee proper maintenance of audit fields and missing deletes is acceptable, then this approach provides potential for very efficient data transfers.

The Centerprise Audit Field feature is useful where a source application maintains updated date and and/or created date or contains timestamp columns. For Salesforce and some ERP environments, that maintain timestamp and, in some instances, deleted flag, this strategy is highly desirable.

High Performance Database Writes

In addition to performance gains afforded by the CDC approaches, Centerprise offers high performance database write features including native bulk copy, batched updates and deletes, and other optimizations. The combination of CDC and database write optimizations deliver enormous performance boost to your integration jobs.

An Integrated Environment

Centerprise CDC approaches are fully integrated with Centerprise Data Mapping engine. Centerprise Data Mapping enables you to build complex data conversion jobs with ease. A simple drag-and-drop interface enables you to do direct mapping. More complex maps can be developed by using lookups, expressions, SQL commands, and other mapping patterns.

Centerprise Transformation Engine

The Centerprise Transformation Engine is designed from the ground up to be a multithreaded parallel processing component that harnesses the power of today’s multi-core systems by processing tasks in parallel. Doubling the number of CPUs or cores on a machine usually doubles the throughput. Parallelism is used extensively throughout the engine including file reads and writes, database writes, and transformations. The engine is designed to efficiently process very large data sets.

Data Profiling

Data profiling is built right into the Centerprise transformation engine. You can obtain a detailed profile of your data as part of your transfer process or just run the profiler. The profile provides valuable record and field level statistics including minimum, maximum, average, sum, count, minimum length, maximum length, unique count and percentage, distinct count and percentage, duplicate count and percentage, etc.

Free Trial with Customer Support

Astera offers a free product trial of Centerprise. Additionally, you have access to our acclaimed customer support during the trial period. To start your free trial, send an email to sales@astera.com or call 1-888-77-ASTERA.