Last updated on November 30, 2022
Is Dimensional Data Modeling Still Relevant in the Modern Data Stack?
By John Swift
In this blog, we discuss the merits of dimensional data modeling in the modern data stack given technology advancements in recent years. We dive into how to think about dimensional data modeling in context of defining requirements for analytics specifically.
Technology in the data and analytics industry has taken giant leaps forward, leading many organizations to question if the old ways of working are still valid. Historically, dimensional data modeling was accepted as one of the best standards to present data for business user consumption in data warehouses. The (re)rise of the data warehouse as a central concept in modern cloud data platform technologies has led to the important question: Is dimensional data modeling still relevant in the modern data stack? Yes, specifically for defining requirements and creating a modular solution that presents data for analytics. Before we dive into the nuances and the question of dimensional data modeling specifically, I think it is valuable to consider the broader context of data modeling generally. In my experience, the definition and purpose of data modeling are often misunderstood by data and analytics professionals.- What is Data Modeling and How is it Related to Analytics? ↵
- The Categories of Data Modeling: From Conceptual to Dimensional ↵
- What is Dimensional Data Modeling? ↵
- Core Attributes of Dimensional Data Modeling ↵
- How Do I Establish Requirements for Analytics? ↵
- Dimensional Data Modeling in the Context of Analytics Requirements ↵
- You Don’t Need Source Data to be Available to Begin Dimensional Modeling ↵
- Dimensional Data Modeling: The Missing Piece to The Way You Approach Analytics in the Modern Data Stack ↵
What is Data Modeling and How is it Related to Analytics?
Data modeling is a discipline that is widely applicable to any intersection of people, data, and technology. It is a well-defined approach to gain agreement on business needs, to understand requirements, to establish a business solution, and to create a technical design artifact. Without modeling data, you create risk in technical projects by allowing for unchecked assumptions to creep into your technical designs. This can lead to incredibly costly mistakes and failed implementations, especially for (but not limited to) analytics projects. Although data models are often thought of as purely technical design artifacts, that is not their primary purpose.A data model is primarily a communication and consensus building tool—a way to explain information, gain agreement, and mitigate risk from unchecked assumptions.
The Categories of Data Modeling: From Conceptual to Dimensional
In his book, Data Modeling Made Simple, Steve Hoberman helpfully describes data models as “wayfinding tools designed with the single purpose of simplifying complex information in our real world”, working to build consensus from people with different backgrounds, experiences, and perspectives. Hoberman describes data models in three different general categories—conceptual, logical, and physical.- A conceptual data model represents business needs within a defined scope.
- A logical data model represents the business solution.
- A physical data model represents the technical solution.
What is Dimensional Data Modeling?
In 1996, Ralph Kimball broadly introduced the concept of dimensional data modeling to the world. Over the next twenty years, organizations of all sizes adopted dimensional modeling as the way to present data in a data warehouse for business user consumption via reports and dashboards aimed at supporting decision making. In the mid-2010s, data lakes appeared and decried that traditional data warehousing was dead. That didn’t last long—early adopters found in less than a decade that data lakes alone were insufficient to meet all the demands of the organization for analytics. Tools like Snowflake, Databricks, and BigQuery emerged as the next-generation wave of alternatives to data lakes. These easy-to-use, extremely scalable modern data platforms somehow made the concept of a data warehouse cool again, even if sometimes going by a different name (data lakehouse in the case of Databricks). While dimensional data modeling is traditionally tightly coupled with physical database design within data warehouse implementations, I have found the modeling concepts applicable even outside of data warehouses. It is different than relational entity-relationship modeling, which is often optimized for data capture.Dimensional data modeling aims to make analyzing data as simple as possible for business users while still maintaining design principles that allow for agile iterative extensibility of analytics solutions over time.At the most fundamental level, dimensional data modeling aims to model the business rather than just modeling relationships among data elements. Dimensional modeling is a collaborative process to connect the needs of the business with the realities of the underlying source data.
Core Attributes of Dimensional Data Modeling
In dimensional data modeling, core business attributes are grouped together into entities called dimensions.- Dimension objects contain information attributes about things like customers, products, employees, vendors, and suppliers. Dimensions are conformed, meaning they are consistently reused regardless of the business processes being analyzed. This leads to consistency in definitions and a scalable framework established with a resilient design.
- Fact objects represent the detailed data-capture events of one or many business processes and events, or they exist as point-in-time snapshots of detailed measurements. Dimensions are connected via a fabric of many fact tables that link everything together in a connected fabric.
How Do I Establish Requirements for Analytics?
Data and analytics teams rarely establish their own requirements, but rather need to work with non-technical business users to understand the information required to make decisions operationally day-to-day and the data required to set the strategic direction of the organization. We find that requests come to data teams in the form of report requests, raw data requests (i.e., can you integrate this new data source), or likely some combination of the two. Lawrence Corr, in Agile Data Warehouse Design, describes this phenomenon as the design bias of teams.
Dimensional Data Modeling in the Context of Analytics Requirements
Dimensional modeling importantly deconstructs the particular demand of business users in the form of report requests and identifies the underlying business processes behind the request. Business processes tend to be more resilient than ever-changing requests for information. Isolating business processes and events in dimensional data model design allows for increased resilience of solutions through identifying the lowest level of detail captured in data. Dimensional modeling allows for modular and reusable designs when creating presentation layers for data consumption. For example, take a report request for revenue. At most organizations, there are multiple streams of revenue generated by a variety of business processes. If a team receives a report request for revenue by customer, month over month, and designs a solution to that specification exactly, the team must redesign their solution as soon as a follow-up question appears around needing a product-level of detail compared to the customer alone. In a dimensional data modeling approach, teams will deconstruct a request for a revenue report into the identification of all the business processes that generate revenue. A model can then be created that shows each process, the level of detail available, independent of each other, and the dimensions that connect each process. With this increased understanding of requirements, development teams can then incrementally work on each individual business process and dimension as standalone artifacts that are modular and more resilient to changing demands for information.Dimensional modeling does not prevent that same team from implementing a flattened model to make consumption easier for business users in the last mile of a technical physical solution design. It ensures that the individual components are kept separate until the right time to establish the principle of modular design.
You Don’t Need Source Data to be Available to Begin Dimensional Modeling
One of the most underrated aspects of dimensional data modeling is that you can contemplate the realities of your underlying source data separately from the design of how data is presented for analytics solutions. Because it focuses on the presentation of data for consumption by business users, you can optimistically model data according to the business rules and processes in place today, even before underlying data is available. I’ve used this approach to:- Build analytics solutions that combine data from two very differently structured ERPs during an acquisition and merger.
- Harmonize and analyze data from a legacy human resource information system (HRIS) and a modern cloud-based HRIS during a platform migration.
- Allow for seamless transition of analytics reporting from one eCommerce platform to another, even before the future state platform was available in production.
Dimensional Data Modeling: The Missing Piece to The Way You Approach Analytics in the Modern Data Stack
If you are struggling with keeping up with seemingly ever-changing business demands for information, or you are having trouble figuring out how to combine data from disparate systems in your organization today, you are not alone. Consider dimensional modeling as a way to build consensus and understanding of business need and to conceptualize how you present data for consumption in your organization today.Talk With a Data Analytics Expert
Key Takeaways
- Dimensional data modeling remains relevant in the modern data stack due to its ability to define requirements and create modular analytics solutions.
- Data modeling is crucial in aligning business needs with technical designs, mitigating risks from unchecked assumptions.
- Conceptual, logical, and physical models are different categories of data modeling, each serving specific purposes from understanding business needs to technical solutions.
- Dimensional data modeling is designed to simplify data analysis for business users while allowing for extensibility in analytics solutions.
- Core attributes of dimensional modeling include dimensions for consistency and fact tables for capturing detailed events across business processes.
- Teams must balance data supply and reporting demands to avoid overfitting solutions to specific requests, which can hinder future adaptability.
- Dimensional modeling isolates business processes, leading to more robust and adaptable solutions that can handle changing demands.
- You can begin dimensional modeling before source data is available by focusing on current business rules and processes.
- Implementing dimensional modeling helps build consensus and understand business needs, making it a strategic tool for data presentation.
John Swift
John is a thirty-year veteran of data and analytics based in the greater Boston area. He leads cloud implementations for data warehouses with a focus on system and data design to support analytics and AI use cases. Away from work, John enjoys photography, cycling, philosophy, and spending time with his family.
