Best Practices for a Scalable Data Mart Architecture Design

By | 2019-10-23T11:15:22+00:00 October 23rd, 2019|

A scalable data mart architecture design can reduce the risk of data loss, as well as the implementation cost and time, as it mainly focuses on a subset of data instead of complete enterprise data. Therefore, data marts are often regarded as one of the most effective mechanisms for providing quick and consistent decision support.

Although a data mart significantly decreases the risk associated with developing a decision support system (DSS), it needs proficiency and expertise to correctly implement one.

In this article, we’ve compiled a list of best practices that’ll help you easily design a scalable data mart architecture for your business needs.

Data Mart Architecture Design – Best Practices

To ensure the efficiency and scalability of your data mart, follow these design tips.

1. Define the Scope

Before jumping to the implementation phase, it’s essential to have a fool-proof plan that takes into consideration all the business needs and priorities of the end-users.

Start by outlining the scope of the project, highlighting all risks and limitations. It’ll help set the right expectations and estimate expenses.

You may have to adjust the requirements with respect to the resources (like human, technical, and financial resources) to keep up with the planned completion date.

In the light of this scope, develop the list of main deliverables and allocate duties to your team.

2. Pay Attention to The Logical Data Mart Model

A logical model is a theoretical, intangible design that organizes data in terms of logical relations known as entities and attributes. An entity is a data item, whereas an attribute helps define the exclusivity of the entity.

When designing a logical model, focus on your business needs. Map source data to subject-oriented information in the destination data mart schema. The source data model and end-user requirements are the essential elements used to design a data mart schema.

You may have to modify the physical implementation of the logical data model based on the system parameters, such as the computer size, number of operators, disk storage, network type, and software.

3. Identify Relevant Data

Generally, data elements are identified based on the business requirements. However, you may often have to look beyond the end-user requests and expect upcoming requirements.

A good tip is to begin with the business factors relevant to your subject area and critical to your department. For instance, if you’re designing a data mart for your sales and marketing department, key factors might be client, location, product, sales, and promotions. Also, consider if you’re interested in monthly, daily, or weekly records.

Next, generate a list of critical data fields based on the needs put forward by the data mart operators. For instance, some fields of interest in the marketing data mart could be product names, promotion characteristics, areas, and countries.

You should also divide the data into numeric metrics (called facts) and descriptive records (called dimensions). Here’s an example that illustrates facts and dimensions.

Data mart architecture 1

    Figure 1. Table based on Sales Facts (Source: Datamartist)

Data mart architecture 2

Figure 2. Table based on Product Dimensions (Source: Datamartist)

4. Narrow Down the Data Sources

Once you’ve listed all dimensions and facts needed to design the data mart, the next step is to identify the sources that will feed the repository. These sources can include databases, Excel files, delimited files, etc.

Next, proceed to mapping dimensions to lookup tables in your operational system whereas, and facts can be mapped to transaction tables.

You may also find out that some of the required data cannot be mapped. This typically occurs when fields in the source system aren’t consistent with the required data groups in the data mart.

For instance, in a telecom corporation, phone calls can be grouped by area code, but the data mart requires data in terms of postal code. Now it’s difficult to map these dimensions, as one area code comprises of many postal codes, and a postal code can include several area codes. In this situation, translating data into a common system format could involve costly processing.

5. Design the Star Schema

When creating a star schema, it’s essential to describe the relationship between the fact and dimension tables. This is done using keys that include single or multiple columns, making the row within a table exclusive. A primary key that includes several columns is known as a composite or concatenated key.

To link the facts and the dimensions, it’s a good practice to use surrogate keys instead of the primary key of the actual source table. It allows the data mart manager to control the keys within the data mart environment, even if the keys change in the operational system.

A surrogate key is a system-created series of integers that can be included in the dimension table along with the primary key. It offers more benefits as compared to a primary key because the latter is often a lengthy string of characters. Whereas, a primary key includes integers, so it improves the query response time.

One Last Thought

Over time, the data volume of your data mart is likely to increase. Thus, it’s essential to consider the factor of scalability when physically implementing your logical data mart model. To cater the scalability requirements, consider minimizing the restrictions of factors like hardware size, software capacity, and system bandwidths.

Designing a data mart architecture is a complex process that involves several time-consuming steps and at times, substantial costs. By following the five best practices mentioned in this article, you can reduce the chances of errors and speed up the designing process.