Moving large amounts of data between multiple disparate systems is a common
requirement. With ever increasing data volumes, it is becoming impractical to
move entire databases or tables. Throwing hardware and bandwidth at the problem
is no longer enough. Smarter approaches are needed to make data available in a
timely manner.
Change Data Capture (CDC) refers to a variety of approaches that optimize data
transfers by moving only the data that has changed since last transfer.
Centerprise Data Integrator offers multiple change data capture strategies. This
enables you to select the appropriate strategy that meets your situation and
requirements.
Incremental Load
The Incremental Load strategy minimizes the amount of data transferred by
keeping a copy or a computed hash value about each record that is read. On
subsequent runs, the data set is compared against this hash value and any
changes since the previous run are sent to the destination.
This strategy for Change Data Capture is very easy to implement. All it requires
is that the source have one or more fields that can uniquely identify each row.
This approach captures all of the changes to the source data including deletes.
It also minimizes updates to destination eliminating unnecessary updates. The
downside is that it requires a full read of source on every transfer.
Centerprise’s Incremental Load feature is designed to be very easy to set up.
You can create a synchronization job for the entire database in a few minutes.
Moreover, Incremental Load can substantially reduce the time it takes to move
data. It is not uncommon for Incremental Load to reduce load time by as much as
90%.
Audit Fields
The Audit Fields strategy for Change Data Capture uses specific fields such as
create time or update time to identify the records that changed since the
previous run and transfers only the rows created or updated since.
Centerprise’s Audit Field Strategy feature is useful where a source application
maintains the date and time of last update or timestamp columns. For Salesforce
and some ERP environments that maintain a timestamp and, in some instances, an
‘IsDeleted’ flag, this strategy is highly desirable.
If the source application guarantees proper maintenance of audit fields or uses
timestamp data type supported by most modern databases, then this approach
provides potential for very efficient data transfers.
Data Synchronization at Destination
Unlike the first two Change Data Capture strategies, which filter data before it
is sent to the destination, this strategy works on minimizing database writes
once the data is already at the destination.
In many cases source data comes to the destination in text files and it is not
practical or cost effective to access the source directly. For such scenarios,
Incremental Load or Audit Field strategies cannot be used and the best approach
is to efficiently compute differences between source and destination and apply
these differences to the destination.
Centerprise’s Synchronization Strategy calculates the differences between the
source and the destination data and applies these differences to the destination
database. Centerprise’s Synchronization Diff Builder is a high performance
component that uses parallelism to efficiently compute differences. The Diff
Writer then uses bulk inserts and batch updates to apply these updates to the
destination database. This approach is very effective where an existing database
is updated with feeds from multiple sources.
Conclusion
Centerprise delivers a flexible and scalable approach to dealing with data
transformation and change data capture challenges. It’s clean-cut design and
superior usability helps you meet your integration challenges efficiently and
affordably. If you are contemplating custom development or another integration
tool, we encourage you to try Centerprise and see first hand why Centerprise has
become a key tool for a number of Fortune 500 companies in such a short time.
So, consider Data Integrator. You can
trial it, risk free,
for 14 days!