In this blog, we discuss the merits of dimensional data modeling in the modern data stack given technology advancements in recent years. We dive into how to think about dimensional data modeling in context of defining requirements for analytics specifically.

Technology in the data and analytics industry has taken giant leaps forward, leading many organizations to question if the old ways of working are still valid. Historically, dimensional data modeling was accepted as one of the best standards to present data for business user consumption in data warehouses.

The (re)rise of the data warehouse as a central concept in modern cloud data platform technologies has led to the important question: Is dimensional data modeling still relevant in the modern data stack?

Yes—specifically for defining requirements and creating a modular solution presenting data for analytics.

Before we dive into the nuances and the question of dimensional data modeling specifically, I think it is valuable to consider the broader context of data modeling generally. In my experience, the definition and purpose of data modeling are often misunderstood by data and analytics professionals.

What is Data Modeling and How is it Related to Analytics?

Data modeling is a discipline that is widely applicable to any intersection of people, data, and technology. It is a well-defined approach to gain agreement of business needs, to understand requirements, to establish a business solution, and to create a technical design artifact.

Without modeling data, you create risk in technical projects by allowing for unchecked assumptions to creep into your technical designs. This can lead to incredibly costly mistakes and failed implementations—especially for (but not limited to) analytics projects.

Although data models are often thought of as purely technical design artifacts, that is not their primary purpose.

A data model is primarily a communication and consensus building tool—a way to explain information, gain agreement, and mitigate risk from unchecked assumptions.

The Categories of Data Modeling: From Conceptual to Dimensional

In his book, Data Modeling Made Simple, Steve Hoberman helpfully describes data models as “wayfinding tools designed with the single purpose of simplifying complex information in our real world”, working to build consensus from people with different backgrounds, experiences, and perspectives.

Hoberman describes data models in three different general categories—conceptual, logical, and physical.

  • A conceptual data model represents business need within a defined scope.
  • A logical data model represents the business solution.
  • A physical data model represents the technical solution.

There is great value in starting any analytics-related project by creating conceptual models as you gather requirements and delve into business rules, especially since these models define the needs of the business within a defined scope. Conceptual data modeling allows you to tease out nuances and bring forth diverse perspectives to have a more complete understanding of how the business works to avoid major gotchas down the road caused by misunderstandings and missing requirements.

With a conceptual data model in-hand, the next step for an analytics project is to translate it into a dimensional data model.

Join the conversation on LinkedIn: The most common mistakes around data modeling

What is Dimensional Data Modeling?

In 1996, Ralph Kimball broadly introduced the concept of dimensional data modeling to the world. Over the next twenty years, organizations of all sizes adopted dimensional modeling as the way to present data in a data warehouse for business user consumption via reports and dashboards aimed at supporting decision making.

Read: Unravelling the Concepts of a Data Warehouse and a Data Lake: How to Build a Modern Data Architecture to Overcome Modern Data Problems

In the mid 2010s, data lakes appeared and decried that traditional data warehousing was dead. That didn’t last long—early adopters found in less than a decade that data lakes alone were insufficient to meet all the demands of the organization for analytics. Tools like Snowflake, Databricks, and BigQuery emerged as the next-generation wave of alternatives to data lakes. These easy to use, extremely scalable modern data platforms, somehow made the concept of a data warehouse cool again—even if sometimes going by a different name (data lakehouse in the case of Databricks).

While dimensional data modeling is tightly coupled traditionally with physical database design within data warehouse implementations, I have found the modeling concepts applicable even outside of data warehouses. It is different than relational entity-relationship modeling, which often is optimized for data capture.

Dimensional data modeling aims to make analyzing data as simple as possible for business users while still maintaining design principles that allow for agile iterative extensibility of analytics solutions over time.

At the most fundamental level, dimensional data modeling aims to model the business rather than just modeling relationships among data elements. Dimensional modeling is a collaborative process to connect needs of the business with the realities of underlying source data.

Core Attributes of Dimensional Data Modeling

In dimensional data modeling, core business attributes are grouped together into entities called dimensions.

  • Dimension objects contain information attributes about things like customers, products, employees, vendors, and suppliers. Dimensions are conformed, meaning they are consistently reused regardless of the business processes being analyzed. This leads to consistency in definitions and a scalable framework established with a resilient design.
  • Fact objects represent the detailed data-capture events of one or many business processes and events, or they exist as point-in-time snapshots of detailed measurements. Dimensions connected via a fabric of many fact tables that link everything together in a connected fabric.

Like any data model, a conceptual model does not need to perfectly mirror the physical technical implementation of that model. Dimensional data modeling is a helpful activity because it helps organizations think in modular terms about how they operate their business, what are the connective dimensions between business processes, and understand how things relate to one another. Because of this, dimensional data modeling is a valuable design activity regardless of how you end up implementing a physical technical solution.

How Do I Establish Requirements for Analytics?

Data and analytics teams rarely establish their own requirements, but rather need to work with non-technical business users to understand the information required to make decisions operationally day-to-day and the data required to set strategic direction of the organization.

We find that requests come to data teams in the form of report requests, raw data requests (i.e. can you integrate this new data source), or likely some combination of the two. Lawrence Corr in Agile Data Warehouse Design describes this phenomenon as the design bias of teams.

Y and X axis graph illustrating dimensional data modeling with design and analysis.

Not all data warehouses are created equal and often carry an implicit design bias. Does your data warehouse have a data supply-oriented bias or a reporting demand-oriented bias?

When a team is supply driven and only considers the data available for analysis, we find that solutions defined mirror the complexity of the operational source systems themselves. This ultimately makes it more difficult to combine disparate data together. Other teams that are more mature look at the data available and attempt to model the entire organization in a fully normal form without consideration of how it will be used and by whom.

When a team is report driven and only considers specific requests for data, we find that solutions defined tend to overfit specific report requests and are not easily iterated upon. The teams aggregate, filter, and manipulate the way data is presented to end users to fulfill a specific request at the expense of modular design that could have been properly designed to accommodate future requests as well.

Dimensional data modeling takes a different approach.

Dimensional Data Modeling in Context of Analytics Requirements

Dimensional modeling importantly deconstructs the particular demand of business users in the form of report requests and identifies the underlying business processes behind the request. Business processes tend to be more resilient than ever-changing requests for information. Isolating business processes and events in dimensional data model design allows for increased resilience of solutions through identifying the lowest level of detail captured in data.

Dimensional modeling allows for modular and reusable designs when creating presentation layers for data consumption.

For example: Take a report request for revenue. At most organizations, there are multiple streams of revenue generated by a variety of business processes. If a team receives a report request of revenue by customer month over month and designs a solution to that specification exactly, the team must redesign their solution as soon as a follow-up question appears around needing a product-level of detail compared to customer alone.

In a dimensional data modeling approach, teams will deconstruct a request for a revenue report into identification of all the business processes that generate revenue. A model can then be created that shows each process, the level of detail available independent of each other, and the dimensions that connect each process. With this increased understanding of requirements, development teams can then incrementally work on each individual business process and dimension as standalone artifacts that are modular and more resilient to changing demands for information.

Dimensional modeling does not prevent that same team from implementing a flattened model to make consumption easier for business users in the last mile of a technical physical solution design. It ensures that the individual components are kept separate until the right time to establish the principle of modular design.

 

Talk to an expert about your dimensional data modeling needs.

Contact us today to get started

You Don’t Need Source Data to be Available to Begin Dimensional Modeling

One of the most underrated aspects of dimensional data modeling is that you can contemplate the realities of your underlying source data separately from the design of how data is presented for analytics solutions. Because it focuses on the presentation of data for consumption by business users, you can optimistically model data according to the business rules and processes in place today even before underlying data is available.

I’ve used this approach to:

  • Build analytics solutions that combine data from two very differently structured ERPs during an acquisition and merger.
  • Harmonize and analyze data from a legacy human resource information system (HRIS) and a modern cloud based HRIS during a platform migration.
  • Allow for seamless transition of analytics reporting from one eCommerce platform to another, even before the future state platform was available in production.

These solutions were data source and technology agnostic. I started with a dimensional model before ever working on a physical, technology, data-dependent model.

Dimensional Data Modeling: The Missing Piece to The Way You Approach Analytics in the Modern Data Stack

If you are struggling with keeping up with seemingly ever-changing business demands for information, or you are having trouble figuring out how to combine data from disparate systems in your organization today, you are not alone. Consider dimensional modeling as a way to build consensus and understanding of business need and to conceptualize how you present data for consumption in your organization today.

Watch: Tony Dahlager and John Barcheski present “Back to the Future: Where Dimensional Modeling Enters the Modern Data Stack” for dbt Coalesce

 

Get In Touch With a Data Expert Today

Tony Dahlager Tony is the VP of Account Management at Analytics8. With a focus on ensuring client satisfaction, he leads the company’s efforts in nurturing our client relationships and designing strategic service solutions. His background in technical consulting enriches his approach, integrating sales, marketing, consulting, technology, and partnerships to foster robust and effective client engagements.
Subscribe to

The Insider

Sign up to receive our monthly newsletter, and get the latest insights, tips, and advice.

Thank You!