Ingredients of the Metadata-Driven Data Warehouse Architecture

By |2021-06-08T11:50:45+00:00February 22nd, 2021|

Let’s face it, building a data warehouse architecture that serves all of your needs requires a lot of planning and expertise. A modern data warehouse architecture must integrate operational system data accurately with correct formatting and naming conventions in place, it must be flexible enough to adapt to changes in the structure of these underlying sources and must provide optimized performance to support timely reporting.

In Astera DW Builder (ADWB), a data warehouse tool,  we provide a code-free solution that brings scalability, speed, and agility to data warehouse development. From the unified data model designer, you can access a range of in-depth functionalities that drastically save time and cost involved in the design, configuration, and deployment of your BI architecture. Let’s take a look at how these ingredients of an enterprise data warehouse architecture come together:

Control End-to-End Data Warehouse Development

Data Warehouse Development in Astera Dw Builder

Data Warehouse Development

 

With the data warehouse data model designer, ADWB provides a unified interface where source system data can be imported, aligned with the destination schema, denormalized, and further prepared for migration into a dimensional model that is optimized for reporting and analytics. ADWB facilitates this integration process through its reverse engineering and forward engineering functions.

Create Enriched DWH Data Models for Your Source Systems

Our reverse engineer feature takes a source database schema and replicates it in the form of an entity-relationship model. This model shows the logical structure of the underlying database and gives you the ability to enrich this schema in several ways to facilitate loading to the data warehouse.

ADWB offers integrations with a range of leading databases including SQL Server and Oracle Database as well as cloud providers such as Amazon and Microsoft Azure. You can also import data models directly from modeling software such as Erwin Data Modeler, using the same technique.

After database entities have been imported, users can begin to normalize tables based on shared key relationships or establish relationships within the model if these are not automatically identified during the reverse engineering process.

They can also edit individual tables to ensure that relevant fields and naming conventions are reflected in the data warehouse.

Design and Configure a Data Warehouse Schema That Fits Your Reporting Requirements

With ADWB, you can create a dimensional model using your preferred technique, from star and snowflake schemas to data vaults and operational data stores, our platform allows for them all. Again, our data model designer enables users to manage all of these tasks at the logical level without diving into any of the code themselves.

If the enterprise has an existing database in use for data warehousing purposes, they can reverse engineer it and begin modeling or they can build the schema from scratch using drag & drop tables within the data model designer.

With either approach, the basic process remains the same. Once you’ve configured all of the entities in your schema and ensured that relationships are established correctly between them, you define them as facts or dimensions.  We’ve also included a dedicated date dimension entity so that you can group business measures according to the most suitable time period. From fiscal quarters to holiday seasons, we’ve got you covered.

Next, surrogate keys (uniquely identifies each version of records) and business keys (an identifying value assigned in transactional systems based on internal business logic) will be assigned to appropriate fields in the layout builder for each entity.

You can also customize how the data is formatted, whether specific fields are mandatory or not, and decide any default values to be shown if a value doesn’t appear for a particular attribute. Did a business measure arrive in your fact table without an associated dimension? No problem – just set up a placeholder dimension in the relevant entity so that referential integrity is always maintained.

Again, all of these metadata level changes will affect how the data warehouse processes and presents data after deployment.

Automatically Track Changes in Your Source System Data

One of the main aspects of data warehouse maintenance is handling updates, deletions, and additions in source system tables on an ongoing basis. After all, the modern EDW is built to provide both a current and historical view of an organization’s data. In DWB, we automate these processes through slowly changing dimension types. It supports multiple SCD-handling techniques, including SCD Type 1, Type 2, Type 3, and Type 6.

 Slowly Changing Dimension Types

Slowly Changing Dimension Types

In the layout builder, users can choose the most effective slowly changing dimension (SCD) type for each dimension field.

Easily Propagate Changes from the Data Model to your Data Warehouse

Now that the data warehouse schema is set up at the metadata level, you just have to ensure that your database is ready to be populated. This is done through the forward engineering function, which applies all the structural changes made in the dimensional model to your physical database.

Moving forward, you can use this option to quickly propagate changes from your data model to the destination database.

Verify and Deploy Your Data Model in a Few Clicks

With your data model set up, you’re now ready for deployment. But first, you’ll want to check the integrity of your data model using our handy data verification tool and avoid hours of manual troubleshooting.

Our tool performs thorough data verification checks to cut down these repetitive tasks by highlighting any errors in your data model before moving to the next stage of production. From incomplete fields to referential errors, you can catch and correct potential issues at the first pass using this feature.

Radically Accelerate Data Warehouse Loading

In ADWB, all ETL to the data warehouse is handled by dedicated fact and dimension loading objects. Now, instead of building complex dataflows, you can select a single source object or multiple tables from a source data model (multiple tables can be selected using the Data Model Query object in a dataflow) and map them to a loader. Then, just point your loader towards a relevant fact or dimension table in your deployed dimensional model and your mapping is complete.

If you need to apply additional aggregates, filters, or validation rules to your fact or dimension data, you just need to drag-and-drop your desired transformation from the toolset and configure it in this dataflow.

Once you have completed mapping from source to the data warehouse, ADWB will execute the flows. Data is taken from the source and processed through the necessary transformations before being loaded into the relevant tables in the warehouse. Here, appropriate surrogate and business keys will be assigned, and lookups will be performed as defined during the modeling stage. In ADWB, we’ve added a dedicated dimension lookup transformation that automatically cross-references every business key against the relevant SCD table and matches it with an appropriate surrogate key.

With a metadata data warehousing solution, you only need to create the initial dataflow. All the coding involved in populating the data warehouse is generated automatically by our platform in dedicated pushdown (ELT) mode to ensure minimal load is placed on your server during these resource-intensive operations. In other words, you can populate your data warehouse in minutes.

ADWB is Platform-Agnostic!

ADWB offers out-of-the-box connectors for a range of database destinations, so you can set up your data warehouse on the platform of your choice without worrying about compatibility issues. Currently, we support the following industry-leading cloud and on-premise databases:

  • Snowflake​
  • Amazon Redshift​
  • Azure Synapse​ Analytics
  • Oracle Autonomous Data Warehouse​
  • Teradata​
  • SAP​ Data Warehouse
  • SQL Server​
  • MariaDB​
  • Vertica
  • IBM DB2

Query and Visualize Your Enterprise Data from Any Authorized Application

All deployed data models are also made available as OData services. Our metadata data warehouse engine takes these services, and finally to SQL so that tables can be viewed or queried outside applications and browsers.

All you need is the web address of your deployment and a bearer token to authenticate the connection, and your warehouse data is accessible to end-users via any connected application.

You can also consume your data warehouse directly through leading reporting and visualization tools such as Tableau, Power BI, Domo, and more.

Easily Orchestrate All of Your ETL Operations

Once your data warehouse is deployed, our workflow functionality will help you manage exactly how different tables are populated. Once you’ve decided on how to orchestrate these operations, each dataflow will retrieve data from the source system via the staging area and migrate it into the dimensional data model.

Automate Updates and Maintain the Timeliness of Your Enterprise Data

Users can set the frequency of data loads for each dimension based on the frequency with which the related source system tables are updated. With the Job Scheduler feature, you can orchestrate these operations to run continuously, at specific time intervals, or on an incremental basis when modifications are made to the source system.

With a metadata-driven data warehouse, you don’t need to worry about code quality and how it will stand up to high volumes of data. Our solution generates all of the necessary ETL scripts on the backend by the metadata engine, and it’s backed by an industrial-strength ETL engine that is built to scale with your requirements. Add in real-time job monitoring and logging features and major design errors become a thing of the past.

Agile, Scalable, and Accessible Anywhere. Build Your Data Warehouse in Days with Astera Data Warehouse Builder.

Interested in trying out our solution? We’re offering you the opportunity to participate in our exclusive launch campaign right now. Click here to get in touch, and find out how you can get on board.

FAQs

A metadata-driven architecture focuses on metadata management and plays a critical role in ensuring the effectiveness of decision support systems.  Meta-driven data warehousing is a new-gen ETL too, and a unified platform that allows users to design the data warehouse at the logical level. It incapsulates the design of both the ETL and Data Warehouse schema.

In a data warehouse, metadata falls under one of three categories:

  1. Operational Metadata: Source system data is usually filtered, transformed, combined, and further enhanced before being integrated into the data warehouse. As a result, it can be difficult to ascertain where these records originated from. Operational metadata provides the entire history of a dataset, who owns it, the specific transformations it went through, as well as its current status i.e., whether they are current or historical in nature.
  2. ETL Metadata: This metadata is used to guide the transformation and loading process for your data warehouse. It encompasses the physical schema of migrated entities, including tables and column names, contained data types and values, as well as the prescribed layout for destination tables. ETL metadata also includes applicable transformation rules, fact/dimension definitions, load frequencies, and extraction methods.
  3. End-User Metadata: This type of metadata is particularly useful for consumers that query and search the data warehouse on a day-to-day basis. It essentially functions as a map of the data warehouse providing details on the data contained within the architecture, how datasets relate to each other (primary/foreign keys), necessary calculations for mapping from source to destination, specific datasets that need to be reported on and how.

The key benefits of metadata in the EDW are:

  • Provides connective tissue for otherwise disparate data in a high-volume, complex data architecture.
  • Facilitates mapping from source systems to the data warehouse.
  • Optimizes querying by categorizing and summarizing datasets.
  • Is used effectively at several stages in the data warehouse lifecycle, including schema generation, extraction, loading to the data warehouse, transformation in the staging layer, and during the reporting process.