Centerprise Best Practices: Modularity and Reusability in Dataflow Design

By |2013-11-18T10:57:47+00:00November 18th, 2013|

Dataflows are the cornerstone of any data integration project in Centerprise. The Dataflow Designer, with its visual interface, drag-and-drop capabilities, instant preview, and full complement of sources, targets, and transformations, ensures users will be able to create and maintain effective and efficient dataflows. This two-part blog offers some best practices for getting the most out of your Centerprise integration projects. Part One addresses modularity and reusability, key design principles in the development of good dataflows. We’ll discuss performance-tweaking best practices in the second part of this blog.

Modularity and Reusability
Modularity enhances the maintain¬ability of your dataflows by making them easier to read and understand. It also promotes reusability by isolating frequently-used logic into individual components that can be leveraged as “black boxes” by other flows. Centerprise supports multiple types of reusable components, including subflows, shared actions, shared connections, and detached transformations.

Subflows are reusable blocks of frequently-used dataflow steps that have inputs and outputs. Once created, subflows can be used just like built-in Centerprise transformations. Examples of reusable logic that can be housed in subflows include:

  • Validations that are applied to data coming from multiple sources, frequently in incompatible formats
  • Transformation sequences such as a combination of lookup, expression, and function transformations that occur in multiple places in the project
  • Processing of incoming data that arrives in different formats but must be normalized, validated, and boarded

Example Centerprise subflow

Shared Actions are similar to subflows but contain only a single action. They are useful when a source or destination is used in multiple places within a project. If a field is added or removed from the source, all the projects inherit that change automatically.

Shared Connections contain database connection information that can be shared by multiple actions within a dataflow. They can also be used to enforce transaction management across a number of database destinations. Use them whenever multiple actions in a dataflow use the same database connection information.

Detached Transformations are a capability within Centerprise developed for scenarios where a lookup or expression is used in multiple places within a dataflow. Detached Transfor­mations enable you to create a single instance and use it in multiple places. They are available in expressions as callable functions, enabling you to use them in multiple expressions. Additionally, Detached Transformations allow you to use lookups conditionally. An example of a conditional lookup would be, “if party type is insurer, perform lookup on insurer table else perform lookup on provider table.”

Centerprise Detached Transformation

Input parameters and output variables make it possible to supply values to dataflows at runtime and return output results from dataflows. Well-designed parameter and output variable structures promote reusability and reduce ongoing maintenance costs.

Input Parameters

Input parameter values can be supplied from a calling workflow using data mapping. When designing dataflows, analyze the values that could change between different runs of the flow and define these values as input parame­ters. Input parameters can include file paths, database connection information, and other data values.

Output Variables

If you would like to make decisions about subsequent execution paths based on the result of a dataflow run, define output variables and use an expression transformation to store values into output variables. These output variables can be used in subsequent workflow actions to control the execution path.

Centerprise Performance Best Practices

Next week we’ll share performance best practices. Because Centerprise has been designed as a parallel-processing platform to deliver superior speed and performance, designing dataflows to take advantage of the software’s abilities can significantly affect your data integration performance. The performance best practices we’ll discuss next week can result in a major performance boost.