In this blog, we discuss the merits of dimensional data modeling in the modern data stack given technology advancements in recent years. We dive into how to think about dimensional data modeling in context of defining requirements for analytics specifically.

Technology in the data and analytics industry has taken giant leaps forward, leading many organizations to question if the old ways of working are still valid. Historically, dimensional data modeling was accepted as one of the best standards to present data for business user consumption in data warehouses.

The (re)rise of the data warehouse as a central concept in modern cloud data platform technologies has led to the important question: Is dimensional data modeling still relevant in the modern data stack?

Yes, specifically for defining requirements and creating a modular solution that presents data for analytics.

Before we dive into the nuances and the question of dimensional data modeling specifically, I think it is valuable to consider the broader context of data modeling generally. In my experience, the definition and purpose of data modeling are often misunderstood by data and analytics professionals.

What is Data Modeling and How is it Related to Analytics?

Data modeling is a discipline that is widely applicable to any intersection of people, data, and technology. It is a well-defined approach to gain agreement on business needs, to understand requirements, to establish a business solution, and to create a technical design artifact.

Without modeling data, you create risk in technical projects by allowing for unchecked assumptions to creep into your technical designs. This can lead to incredibly costly mistakes and failed implementations, especially for (but not limited to) analytics projects.

Although data models are often thought of as purely technical design artifacts, that is not their primary purpose.

A data model is primarily a communication and consensus building tool—a way to explain information, gain agreement, and mitigate risk from unchecked assumptions.

The Categories of Data Modeling: From Conceptual to Dimensional

In his book, Data Modeling Made Simple, Steve Hoberman helpfully describes data models as “wayfinding tools designed with the single purpose of simplifying complex information in our real world”, working to build consensus from people with different backgrounds, experiences, and perspectives.

Hoberman describes data models in three different general categories—conceptual, logical, and physical.

  • A conceptual data model represents business needs within a defined scope.
  • A logical data model represents the business solution.
  • A physical data model represents the technical solution.

There is great value in starting any analytics-related project by creating conceptual models as you gather requirements and delve into business rules, especially since these models define the needs of the business within a defined scope. Conceptual data modeling allows you to tease out nuances and bring forth diverse perspectives to have a more complete understanding of how the business works, to avoid major gotchas down the road caused by misunderstandings and missing requirements.

With a conceptual data model in hand, the next step for an analytics project is to translate it into a dimensional data model.

What is Dimensional Data Modeling?

In 1996, Ralph Kimball broadly introduced the concept of dimensional data modeling to the world. Over the next twenty years, organizations of all sizes adopted dimensional modeling as the way to present data in a data warehouse for business user consumption via reports and dashboards aimed at supporting decision making.

In the mid-2010s, data lakes appeared and decried that traditional data warehousing was dead. That didn’t last long—early adopters found in less than a decade that data lakes alone were insufficient to meet all the demands of the organization for analytics. Tools like Snowflake, Databricks, and BigQuery emerged as the next-generation wave of alternatives to data lakes. These easy-to-use, extremely scalable modern data platforms somehow made the concept of a data warehouse cool again, even if sometimes going by a different name (data lakehouse in the case of Databricks).

While dimensional data modeling is traditionally tightly coupled with physical database design within data warehouse implementations, I have found the modeling concepts applicable even outside of data warehouses. It is different than relational entity-relationship modeling, which is often optimized for data capture.

Dimensional data modeling aims to make analyzing data as simple as possible for business users while still maintaining design principles that allow for agile iterative extensibility of analytics solutions over time.

At the most fundamental level, dimensional data modeling aims to model the business rather than just modeling relationships among data elements. Dimensional modeling is a collaborative process to connect the needs of the business with the realities of the underlying source data.

Core Attributes of Dimensional Data Modeling

In dimensional data modeling, core business attributes are grouped together into entities called dimensions.

  • Dimension objects contain information attributes about things like customers, products, employees, vendors, and suppliers. Dimensions are conformed, meaning they are consistently reused regardless of the business processes being analyzed. This leads to consistency in definitions and a scalable framework established with a resilient design.
  • Fact objects represent the detailed data-capture events of one or many business processes and events, or they exist as point-in-time snapshots of detailed measurements. Dimensions are connected via a fabric of many fact tables that link everything together in a connected fabric.

Like any data model, a conceptual model does not need to perfectly mirror the physical technical implementation of that model. Dimensional data modeling is a helpful activity because it helps organizations think in modular terms about how they operate their business, what the connective dimensions between business processes are, and understand how things relate to one another. Because of this, dimensional data modeling is a valuable design activity regardless of how you end up implementing a physical technical solution.

How Do I Establish Requirements for Analytics?

Data and analytics teams rarely establish their own requirements, but rather need to work with non-technical business users to understand the information required to make decisions operationally day-to-day and the data required to set the strategic direction of the organization.

We find that requests come to data teams in the form of report requests, raw data requests (i.e., can you integrate this new data source), or likely some combination of the two. Lawrence Corr, in Agile Data Warehouse Design, describes this phenomenon as the design bias of teams.

Y and X axis graph illustrating dimensional data modeling with design and analysis.

Not all data warehouses are created equal and often carry an implicit design bias. Does your data warehouse have a data supply-oriented bias or a reporting demand-oriented bias?

When a team is supply-driven and only considers the data available for analysis, we find that solutions defined mirror the complexity of the operational source systems themselves. This ultimately makes it more difficult to combine disparate data together. Other teams that are more mature look at the data available and attempt to model the entire organization in a fully normal form without consideration of how it will be used and by whom.

When a team is report-driven and only considers specific requests for data, we find that solutions defined tend to overfit specific report requests and are not easily iterated upon. The teams aggregate, filter, and manipulate the way data is presented to end users to fulfill a specific request at the expense of a modular design that could have been properly designed to accommodate future requests as well.

Dimensional data modeling takes a different approach.

Dimensional Data Modeling in the Context of Analytics Requirements

Dimensional modeling importantly deconstructs the particular demand of business users in the form of report requests and identifies the underlying business processes behind the request. Business processes tend to be more resilient than ever-changing requests for information. Isolating business processes and events in dimensional data model design allows for increased resilience of solutions through identifying the lowest level of detail captured in data.

Dimensional modeling allows for modular and reusable designs when creating presentation layers for data consumption.

For example, take a report request for revenue. At most organizations, there are multiple streams of revenue generated by a variety of business processes. If a team receives a report request for revenue by customer, month over month, and designs a solution to that specification exactly, the team must redesign their solution as soon as a follow-up question appears around needing a product-level of detail compared to the customer alone.

In a dimensional data modeling approach, teams will deconstruct a request for a revenue report into the identification of all the business processes that generate revenue. A model can then be created that shows each process, the level of detail available, independent of each other, and the dimensions that connect each process. With this increased understanding of requirements, development teams can then incrementally work on each individual business process and dimension as standalone artifacts that are modular and more resilient to changing demands for information.

Dimensional modeling does not prevent that same team from implementing a flattened model to make consumption easier for business users in the last mile of a technical physical solution design. It ensures that the individual components are kept separate until the right time to establish the principle of modular design.

You Don’t Need Source Data to be Available to Begin Dimensional Modeling

One of the most underrated aspects of dimensional data modeling is that you can contemplate the realities of your underlying source data separately from the design of how data is presented for analytics solutions. Because it focuses on the presentation of data for consumption by business users, you can optimistically model data according to the business rules and processes in place today, even before underlying data is available.

I’ve used this approach to:

  • Build analytics solutions that combine data from two very differently structured ERPs during an acquisition and merger.
  • Harmonize and analyze data from a legacy human resource information system (HRIS) and a modern cloud-based HRIS during a platform migration.
  • Allow for seamless transition of analytics reporting from one eCommerce platform to another, even before the future state platform was available in production.

These solutions were data source and technology-agnostic. I started with a dimensional model before ever working on a physical, technology, data-dependent model.

Dimensional Data Modeling: The Missing Piece to The Way You Approach Analytics in the Modern Data Stack

If you are struggling with keeping up with seemingly ever-changing business demands for information, or you are having trouble figuring out how to combine data from disparate systems in your organization today, you are not alone. Consider dimensional modeling as a way to build consensus and understanding of business need and to conceptualize how you present data for consumption in your organization today.

Talk With a Data Analytics Expert

John Swift_Analytics8
John Swift John is a Principal Consultant at Analytics8 and a thirty-year veteran of data and analytics based in the greater Boston area. He leads cloud implementations for data warehouses with a focus on system and data design to support analytics and AI use cases. Away from work, John enjoys photography, cycling, philosophy, and spending time with his family.
Subscribe to

The Insider

Sign up to receive our monthly newsletter, and get the latest insights, tips, and advice.

By submitting this form, I understand Analytics8 will process my personal information in accordance with their privacy policy, and may contact you about products and services. You can opt out of our communications at any time by unsubscribing here.
This field is hidden when viewing the form
This field is hidden when viewing the form
This field is hidden when viewing the form
This field is hidden when viewing the form
This field is hidden when viewing the form
This field is hidden when viewing the form
This field is hidden when viewing the form
This field is hidden when viewing the form
Analytics8
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.