Change Data Capture Strategies in Centerprise Data Integrator
Moving large amount of data between multiple disparate systems is a common requirement. With ever increasing data volumes, it is becoming impractical to move entire databases or tables. Throwing hardware and bandwidth at the problem is no longer enough. Smarter approaches are needed to make data available in a timely manner.
Change Data Capture (CDC) refers to a variety of approaches that optimize data transfer by transferring only the data that has changed since last transfer.
Centerprise Data Integrator offers multiple change data capture strategies. This enables you to select the appropriate strategy that meets your situation and requirements.
Incremental Load
Incremental Load strategy minimizes amount of data transferred by keeping a copy or a computed hash value about each record that is read. On subsequent runs, the data set is compared against this hash value and any changes since the previous run are sent to the destination.
This strategy is very easy to implement. All it requires is that source have one or more fields that can uniquely identify each row. This approach captures all the changes to the source data including deletes. It also minimizes updates to destination. The downside is that it requires a full read of source on every transfer.
Centerprise’s Incremental Load feature is designed to be very easy to set up. You can create synchronization job for the entire database in a few minutes. Moreover, Incremental Load can substantially reduce the time it takes to move data. It is not uncommon for Incremental Load to reduce load time by as much as 90%.
Audit Fields
Audit Fields strategy uses specific fields such as create time or update time to identify the records that changed since the previous run and transfers only the rows created or updated since.
Centerprise Audit Field Strategy feature is useful where a source application maintains last updated date or timestamp columns. For Salesforce and some ERP environments that maintain timestamp and, in some instances, ‘IsDeleted’ flag, this strategy is highly desirable.
If the source application guarantees proper maintenance of audit fields or uses timestamp data type supported by most modern databases, then this approach provides potential for very efficient data transfers.
Data Synchronization at Destination
Unlike the first two CDC strategies, which filter data before it is sent to the destination, this strategy works on minimizing database writes once the data is already at the destination.
Often, source data comes to the destination in text files and it is not practical or cost effective to access the source directly. For such scenarios, Incremental Load or Audit Field strategies cannot be used and the best approach is to efficiently compute differences between source and destination and apply these differences to the destination.
Centerprise Synchronization Strategy calculates the differences between the source and the destination data and applies these differences to the destination database. Centerprise Synchronization Diff Builder is a high performance component that uses parallelism to efficiently compute differences. Diff Writer then uses bulk insert and batch updates to apply these updates to the destination database. This approach is very effective where an existing database is updated with feeds from multiple sources.
Conclusion
Centerprise delivers a flexible and scalable approach to dealing with data transformation and change data capture challenges. It’s clean-cut design and superior usability helps you meet your integration challenges efficiently and affordably. If you are contemplating custom development or another integration tool, we encourage you to try Centerprise and see first hand why Centerprise has become a key tool for a number of Fortune 500 companies in such a short time. So, consider Data Integrator.
You can trial it, risk free, for 14 days!