We recently introduced our next-gen data warehouse automation (DWA) platform, Astera DW Builder, which offers an agile, metadata-driven approach to building data warehouses. Our solution is aimed at organizations that want to fast-track their project lifecycle and simplify the data warehouse design and development process.
At Astera, we’re always focused on innovation and improving our technology to offer the best experience to our users. To gain deeper insights into data modeling for modern enterprises, we got in touch with experts and industry leaders to find out their thoughts on the topic.
We had a fantastic opportunity to have a conversation with Krish Krishnan — a visionary data thought leader who is ranked among the top data warehouse consultants in the world. He has authored three eBooks and numerous articles, whitepapers, case studies, and other publications on data warehouse appliances and architectures.
In our discussion, he shared his thoughts on the critical role of data modeling in building BI architectures and the transformative potential of automated data modeling in today’s world. He also shed light on the importance of a metadata-driven approach. Let’s look at some key takeaways from the discussion:
What is the importance of data modeling in effective data warehouse implementation?
A data warehouse is a repository of all transactional behaviors that happen across the system. Without data modeling, an efficient solution cannot be built from a consumption perspective. [So], It’s essential to create a model before moving to data consumption. It means converting semi-structured and unstructured data to structured format. The recommendation is “do not model when the model comes in but do model when the data is pulled out.”
Should data models be built around your existing data or designed to reflect the underlying business process?
The data is generated to satisfy the process outcomes, so the data model must be built around the end-user requirements. It means you must have a model when data exits the data warehouse. The business consumer model must be put in between the data landing, and a series of transformations should be built around the process. So, it is a combination of both.
But how deep and wide you would want to go is where automation comes in. In today’s data-driven world, it’s necessary to automate the process. Data landing in the warehouse should be closer to the process. Data exiting to serve an analytic should be more data-centric. In between, you can bring in automation to run transformations and maximize efficiencies.
Is it a good idea to build an enterprise data model upfront?
A preferred solution is a business-centric model that would take raw logical data gathered. Hence, the raw data is [present in] a single central source but can serve multiple end-state purposes. It would allow users to spin in and spin out models as needed.
That’s where data centricity comes into the equation. It demands that every person in an organization must understand who produces data, what comes in data, who will use that data, and how they plan to use that data.
“Data centricity means that you are not aligned to the technology but the process that can be modeled and studied using the technology layer.”
Should a more iterative approach to data modeling be the preferred choice?
The conventional ‘Inmon’ approach was to build a behemoth and fill it. A more iterative approach is a dimensional modeling ‘Kimball’ approach that gives an ability to spin up on demand and have a set of integration points by which you can connect each spin-off. Therefore, it’s not essential to build a star schema every time.
Previously, the schema design was primarily driven by a relational database management system. That’s because the schema needs to follow the design. The transactional system requires that discipline. But it’s important to understand that a data warehouse is not transactional but storing everything that happened in multiple transactional systems.
An iterative approach to data modeling that would focus on business-centricity should be a preferred choice.
What are data vaults?
“Data vaults were introduced around 2010 by Dan Linstedt and Hans Hallgren, but they got noticed and became popular in recent years as more companies are moving to the cloud. Tech giants like Amazon, Google, and Microsoft and a host of other vendors that are doing services with them are working on data vaulting due to the benefits it brings to the table.
It’s a methodology by which you could take critical data that’s master to your systems. You could put it [data] into a vault and secure it. Then you can access it from a wallet on a last native basis by which you can bring in different kinds of data from multiple areas.”
What is the transformative potential of automated data modeling?
Krish believes that more automated data modeling means you can handle data more efficiently. For instance, there’s a customer table ready in the system. You can add meta-data, including customer name, addresses, city, state, country, zip code, contact details, etc. The defined fields help speed up the entire process of documenting this information.
“What does that automation do for you? It speeds up the necessity of trying to document each requirement. All that’s removed. Gone. That’s the level to which we need to automate.”
That’s quite true. The automated data modeling process greatly facilitates the creation of a meta-repository that establishes relationships, minimizes discrepancies, and integrates disparate systems. It also eliminates data inconsistencies and inaccuracies, thereby increasing the value of the analytics and reporting.
“We need to start embracing that change, and change means chaos. Chaos is the only constant in the world we live in.
You can see that literally in the world we live in today. So, take a step back spend the week. do some reading, listen to some serious podcasts from multiple vendors, and understand what problems are being solved.
[And] then, try to put your mind and see how you do the same thing with a new tool in your hand. those are my parting comments for today’s discussion.”