Home > Type > Blog > A Complete Guide to Dimensional Modeling
A Complete Guide to Dimensional Modeling
By Iqbal Ahmed|2022-07-20T07:14:37+00:00February 1st, 2021|
Dimensional modeling involves the use of fact and dimension tables to maintain a record of historical data in data warehouses. Different types of data modeling techniques are optimized for different applications. Normalized Entity-Relationship models (ER models) are designed to eliminate data redundancy and quickly perform the Insert, Update, and Delete operations, and get the data inside a database.
In contrast to that, dimensional models or kimball dimensional data models – data models based on the technique developed by Ralph Kimball – are denormalized structures designed to retrieve data out of a data warehouse. They are optimized to perform the Select operation and are used in the basic design framework to build highly optimized and functional data warehouses.
In this article, we will cover the basics of dimensional modeling and the concepts related to it. We will also discuss different tools and implementation methods for successfully designing dimensional data models.
Dimensional modeling is still the most commonly used data modeling technique for designing enterprise data warehouses because of the benefits it yields. These include:
Faster Retrieval of Data
Dimensional data modeling merges the tables in the model itself, which enables users to retrieve data faster from different data sources by running join queries. The denormalized schema of a dimensional model data warehouse, as opposed to normalized one in snowflake schema, is optimized to run ad hoc queries. As a result, it greatly complements the business intelligence (BI) goals of an organization.
Better Understanding of Business Processes
The principles of dimensional modeling are based on fact and dimension tables. We will cover what facts and dimensions are in the subsequent sections. This categorization of data into facts and dimensions, and the entity-relationship structure of a dimensional model, present complex business processes in an easy-to-understand manner to analysts.
Flexible to Change
Dimensional modeling framework makes the data warehousing process extensible. The design can be easily modified to incorporate any new business requirements or make any adjustments to the central repository. New entities can be added in the model or layout of the existing ones can be changed to reflect modified business processes.
Elements involved in Dimensional Modeling
Fact Tables or Business Measures
Fact tables store the numeric information about business measures and foreign keys to the dimensional tables. Business facts can be additive, semi-additive, or non-additive. Table 1 explains the three types of fact tables.
Type of Facts
Business measures that can be aggregated across all dimensions
Business measures that can be aggregated across some dimensions and not across others (usually date and time dimensions)
Business measures that cannot be aggregated across any dimension
Table 1: Types of facts in a fact table
Fact Types Explained with a Dimensional Data Model
An apparel store maintains the following data in fact table rows for a sales transaction:
Table 2: Transactional table maintained by an apparel store
The columns containing numeric information about the business process are our business facts. In this example, Quantity, Unit Price, Sales Amount, Inventory, and Sales Tax are facts. And the rest of the entities (Date, Store, and Product Type) are dimensions.
Sales Amount can be added across all dimensions therefore, it is an additive fact. Moreover, adding Inventory information across Store dimension provides useful business information but since this is just the snapshot of the amount of goods at a certain point, adding it across the Date dimension does not give any useful business insights. Since Inventory is additive across some dimensions and non-additive across others, it is a semi-additive fact. Now consider Sales Tax. Adding Sales Tax across any dimension will pose problems during analytical processing. Sales Tax is, therefore, a non-additive fact.
Develop and deploy high-volume data warehouses with ADWB
Dimension tables store descriptive information about the business facts to help understand and analyze the data better. In the example presented in Table 2, Date, Store Location, and Product Type are dimension entities, and they give more information about the business facts. Total amount of Sales is an important measure to record, but without the dimensions, a business cannot assess which store location or product type is yielding more sales.
Figure 1: Star schema with fact and dimension tables
Primary key is a column in dimension tables that identifies unique records. For the slowly changing dimensions, the surrogate key will be the primary key.
Foreign keys are used to join two tables (usually fact and dimension tables). The primary key in a dimension table is a foreign key in the related fact table and is used to reference that particular dimension.
Designing a Dimensional Data Model
To understand the process of designing dimensional models, let’s consider the example of an apparel line that sells two kinds of windbreakers – Nylon and Polyester in its two stores across California and Pennsylvania. The sample data for the example is shown in Table 2.
Step 1: Identify the Business Processes
Before modeling the data, you should types of dimensional modeling are appropriate for your data model. The dimensional modeling process (or any type of data modeling) begins with the identification of the business process that you want to track. In this case, we want to track sales for the two types of windbreakers.
Step 2: Identify Facts and Dimensions in Your Dimensional Data Model
The information in a dimensional model is categorized into two tables types – Facts and Dimensions. The next step is to identify the business facts that you want to measure and their associated dimensions. In our example, windbreaker sale is the fact that we want to measure. Date, store location (California and Pennsylvania), and product type (Nylon windbreakers and Polyester windbreakers) are the dimensions that give us further insights into the sales process.
Step 3: Identify the Attributes for Dimensions
After you’ve identified the dimensions and facts for your business process, the next step is to identify attributes and create a separate dimensional table for each of the dimensions. There are different types of dimensional tables for each data type. Each record in the dimension table should have a unique key. This key will be used to uniquely identify the records in the dimension table and will be used as the foreign key in the fact table to reference the particular dimension and join it with the fact table. Tables 3-5 show the different types of dimensions in a data warehouse in our apparel line example.
Table 3: Dimension table for Date
Table 4: Dimension table for Store
Windbreaker – Fall Collection
Windbreaker – Fall Collection
Table 5: Dimension table for Product
Step 4: Define the Granularity for Business Facts
Granularity refers to the level of information that is stored in any table. For instance, in our example, the sales amount is recorded on a daily basis, therefore the granularity, in this case, is daily. The fact tables in a dimensional model should be consistent with the pre-defined granularity.
Step 5: Storing Historical Information (Slowly Changing Dimensions)
An important feature of dimensional models is that the dimensional attributes can be easily modified without changing the complete transaction information. For example, the apparel line decides to continue the Nylon windbreaker from Fall Collection into the Spring Collection and updates the name in the Collection attribute. Making the update is an easy process in the dimensional table, but with the update we will lose our previous data. If the goal of your data modeling and data warehouse is maintaining and storing history, this could be a problem. Dimensions that change slowly over time are called Slowly Changing Dimensions. In addition, the time dimension table in a data warehouse is automatically generated and captures the time at which different transactions occur. You can maintain and store historical data by tracking slowly changing dimensions.
Automation – A Game Changer for Dimensional Modeling
Designing dimensional models is an essential step in building the framework of an enterprise data warehouse. The process can be streamlined with the help of a robust data warehouse automation tool such as Astera Data Warehouse Builder.
With Astera DW Builder, you can quickly build dimensional models in a visual code-free integrated development environment. Entities can be denormalized with simple drag-and-drop and merges. Entity roles (facts and dimensions) can be assigned in bulk, which can save you valuable time when working with hundreds of entities. In addition, the product enables you to manage slowly changing dimensions with built-in support for SCD types 1, 2, 3, and 6.
Astera DW Builder is an end-to-end data warehouse automation platform that comes with built-in dimensional data modeling capabilities, support for a wide range of databases and CRM applications, automated data mapping and data loading features, and native integration with business intelligence platforms such as Tableau and Power BI.