What if you could combine data warehouse design and development in a seamless, automated process that took you from basic reporting to advanced high level analytics in just weeks? Well, get data modeling process right, and you can build a BI architecture that’s fast, scalable, and agile. Read on to get an inside look at how to make enterprise data model development work for your business.
At their core, data warehouses are built to answer business questions. These could range from simple descriptive queries focused on generating periodic reports to far more ambitious analyses geared towards investigating factors behind a scenario for effective data management. To get this kind of targeted intelligence from your BI system, you must ensure that user requirements are tightly interwoven with business concepts during the development phase of the data warehouse.
The best way to connect end-user objectives to your data architecture is by putting enterprise data modeling at the front and center of your data warehouse. These schemas are visualized entity relationship diagrams and constitute source systems, arranging them in an optimized structure suited to quick querying and usability. More importantly, they show the underlying business logic behind different operational processes in a format immediately understandable to non-technical users and developers alike.
If designed correctly, the benefits of an enterprise data model can be realized for your entire data warehouse. They will help you identify where critical data needs to be retrieved from, the platforms that can best support your physical data model and database, and how data pipelines will flow from source to destination.
Let’s take a look at some fundamental data modeling techniques that are essential to this process.
Start at the Source
Replicate Your Source Systems Accurately, and You Might End Up With a Schema That Looks Something Like This
The first thing you need to ensure during schema modeling is that you can replicate source systems at the logical level accurately.
To do so, you must identify where your critical data resides – is it in an on-premise database, a cloud data lake, or on a CRM platform like Salesforce? Of course, only specific tables within these applications will be relevant for BI purposes. If you’ve already been creating reports in your transactional systems, then you’ll have a good idea about what datasets need to be integrated into your data warehouse. Ultimately, you want to make sure that you’re able to run all the same queries as before, without interruption.
Create a Standardized Metadata Framework
As you’re building enterprise logical data models to encompass all of your sources, metadata must be applied consistently across each one.
At the design stage, you want to:
Establish relationships between entities using appropriate primary keys and foreign keys
Ensure that you’re joining tables correctly, and that entity-relationship types are correctly defined so, many-to-many, one-to-many, parent-child etc.
Have proper aliasing in place to ensure the entity type/field is returned when a query is performed in the data warehouse. For example, if you establish the Customers and Orders have a parent-child relationship, it is easy to filter Customers by Order, but if you try to do so the other way around, then you’ll need to ensure orders are linked to a single customer otherwise the query will fail. This issue is resolved using aliasing.
Naming conventions for attributes should also be standardized across your enterprise data model to ensure easy comprehensibility.
Remember, you need to ensure that there are no variances between the metadata in the source system tables and the data warehouse entities; otherwise, incompatibilities will occur when the data warehouse is being populated. These discrepancies will make it far more difficult to create data mapping to move data from source to destination.
Putting a metadata framework in place significantly reduces the technical debt created by variances in the data model and subsequent data pipelines. These discrepancies necessitate more maintenance activities and make it much harder to update different types of schemas consistently. Instead, you have tested and proven, and most importantly, adaptable templates without writing any code.
Scripts can then be generated based on these metadata-enriched data models and propagated data elements directly to a physical database.
Make Your Enterprise Data Model Agile
With agile, you should be able to manage data and integrate changes into your data model a lot more seamlessly than this.
It’s important to understand that data modeling is not a static activity. In the modern enterprise, accurate and timely BI is essential to almost every department. That means more users accessing the data warehouse. Now, as these requirements evolve, the data consumer will need to update the existing architecture to integrate new sources at speed. This capability must be integrated into your design approach.
Schema modeling should focus on providing deliverables based on current business requirements rather than creating an approved enterprise data model before development begins. Creating a data model that represents the entirety of your business operations would take a significant amount of time and upfront design effort. So much so that by the time development actually begins, the schema may well be outdated.
In an agile approach, data models would be built according to current BI requirements in several iterations. Say your sales manager wants to compare performance across multiple regional units after an organizational restructuring. Your team would build a data model to specifically reflect these business rules (also known as a data mart) that would be capable of generating the necessary insights within a couple of weeks. As a result, the data warehouse grows incrementally in a far more consistent manner. At each phase, a new output is produced that the end-user can immediately check for accuracy and relevance.
In practice, several aspects need to be in place to allow such an approach to flourish.
First, the business users must be able to work closely with the data modeler across the design process to ensure high data quality. This means that the schema should be designed in a way that’s readily understood by both teams so that any miscommunications are minimized.
Second, if the design process allows users to access source systems, select relevant tables, and replicate them in a data model, then time-to-deployment will be significantly reduced. Now, data consumers will have the functionalities in place to ensure that the datasets, which need to be reported on, are available in the finished data model. When these requirements change, new tables can also be added to the initial model using the same techniques.
An agile enterprise data modeling approach should also allow for some system of versioning to be put in place so that updates to the original schema can be tracked and monitored. With multiple versions in place, you can rollback any modifications if needed.
Expose Data on a Need-to-Know Basis
Data for me but not for thee
The iterative approach allows you to take a much more granular outlook on the data delivered for BI purposes.
Instead of exposing all of your data stores in an enterprise data model that is then open for access to all organization members, you design customized schemas to fulfill each user groups’ requirements. These teams can then limit their queries to relevant datasets and thus improve their reporting and analytics efficiency. At the same time, from a data security perspective, the organization ensures that sensitive information is only exposed to authorized users.
Adopt a Schema for Your Enterprise Data Model -Agnostic Approach
Which schema design will you choose? So when we’re talking about data modeling best practices, the schema that comes to mind is Kimball’s dimensional model. In this schema, tables are arranged into a rough star-shaped structure with a central fact table providing business measures and linked dimension tables providing context to these numbers.
This structure has been dominant in the industry for over three decades now, for a good reason. The dimensional model is primarily designed for quick query performance and provides a lot of flexibility when adding new sources for reporting purposes. It is also a business process-oriented approach to data warehouse design. In other words, it organizes and presents data in a way that end-users can easily understand.
However, there are reasons why end-users might consider an alternative architecture. For example, a traditional star schema incorporates many joins, which can hamper query performance in some cases. Instead, they may prefer to go with a few wide tables (many columns) as this design better suits some visualization tools. In addition, the approach would allow end-users to simplify queries for a single table rather than applying multiple joins, which increases the risk of errors.
An alternative conceptual model that has gained considerable popularity in recent years is the data vault architecture. This schema results in a flexible architecture that combines the business-oriented approach of the dimensional model with the scalability of the 3NF format espoused by Bill Inmon. The DV consists of hubs representing the identifying aspects of a business, and each one contains natural keys for these processes. There are also links that serve as intersectional tables defining many-to-many relationships between the different hubs in the architecture. Finally, satellites contain the descriptive attributes for both Hubs and Links.
Depending on your BI requirements, any one of these architectures could be preferred, but the data modeling techniques you employ should allow you to design and propagate different types of schema with ease, whether it is a dimensional model, 3NF, or data vault.
Astera DW Builder – An Enterprise Data Modeling Tool for DW Development
Automated Data Modeling Sits at the Very Foundation of ADWB
Astera DW Builder is a metadata-driven data warehouse automation tool that radically accelerates your path to data warehouse development.
This product puts data modeling at the very core of its capabilities. Allowing you to design best practice enterprise data models from scratch or reverse engineer them from existing source systems with equal ease through simple drag & drop commands.
From there, you can enrich your schemas with additional specifications for things like table attributes, data types, primary keys, foreign keys. At the dimensional model level, you can define SCD-types for dynamic fields, effective/expiry dates, and surrogate keys to facilitate effective loading and querying. ADWB also supports other leading design approaches, including data vaults and 3NF data models. These descriptions are then fed to an engine that automatically engineers this entire schema in a physical database.
Working logically in one unified process supercharges your data warehouse development, cutting deployment times from months or even years to just days.
This new approach also encourages collaboration. Non-technical stakeholders can easily understand the logical definitions and visual elements used in a metadata-enriched data model. This knowledge enables them to participate in implementations from inception to completion, ensuring that deliverables are much more closely aligned with their needs.
The data model now essentially serves as an abstraction layer, allowing end-users to view source system data without directly accessing it. This separation means that you can provide relevant data to specific users without worrying about data governance issues.
When it comes time to analyze delivered data, the metadata can be propagated to any BI platform where it is used to identify the datasets to be imported via the data model.
Another significant benefit here is platform-agnosticism because the enterprise data model is not hardcoded to any specific database it serves as a schema that can be read and easily replicated in any database across the cloud or on-premises.
Finally, if you look at the rapidly fluctuating nature of enterprise data, the ability to update and modify metadata elements in the data model is essential, allowing you to iterate at speed and therefore adjust reporting pipelines based on business users’ current needs.
When you put all of these elements together, you’re talking about a fast, agile, and truly resource-efficient process for data warehouse development.
Want to get a firsthand look at how these functionalities can turbocharge your data warehouse development? Register now for our upcoming webinar, where product experts and industry insiders will demonstrate the potential of this best practices approach. You can also get in touch with us directly to arrange a consultation based on your current requirements.