Data Warehousing #5: Dimensional Design Process

Data Warehousing #5: Dimensional Design Process

This is the 5th in series of posts on datawarehousing.  To see the entire post list, click here.

The Dimensional Design Process

The four key decisions that are made when designing a Kimball-style dimensional star schema are as follows:

  1. Identify the Business Process
  2. Identify the Grain
  3. Identify the Dimensions
  4. Identify the Facts

When embarking upon the design process, your final deliverable will be a bus matrix of the processes, conformed dimensions, and facts that will be the foundation of your star schema.

1 – Identify the Business Process

Clearly before starting a design project, you need a clear understanding of the business requirements associated with each organization.  Each organization has a series of business processes (and hopefully a source system in which you’ll be able to collect data relevant to that business process).

Examples of business processes:

  • Sales process
  • Inventory management
  • Customer support
  • Shipping
  • Returns

2 – Identify the Grain

The grain of a business process is the level of detail in which information is stored in your data warehouse.

Examples of levels of grain in business processes:

  • Sales process
    • one interaction per lead per channel
  • Inventory management
    • one inventory snapshot per day per warehouse
  • Customer support
    • one response per ticket per customer
  • Shipping
    • one shipping event per order line item
  • Returns
    • one return event per order per order line item

Sometimes it will not be possible to get information at the finest level of grain due to constraints in source systems, but my recommendation is to try to get the lowest level of grain possible, to enable maximum flexibility down the line.

Note that there will be times in which you’ll want to store information in coarser grains, for performance or simplicity reasons.  The best practice in this case is to build the information in coarser grained tables on top of the finer grained information.  This is so that you can decompose the reasons why the coarser grained tables are the way they are if needed.

3 – Identify the Dimensions

Dimensions represent the way business people talk about the data coming out of a business process.  They are the criteria you will SELECT upon in a WHERE clause. (click here for a refresher on conformed dimensions)

4 – Identify the Facts

Facts are the metrics that result from a business process.  (click here for a refresher on facts)

Your bus matrix

After you’ve gone through your Dimensional Design process, the information gathered is typically put together into a a bus matrix that looks like this:

Bus Matrix, courtesy Kimball Group
Bus Matrix, courtesy Kimball Group

For more information on bus matrix design, check out Kimball’s post on this topic at http://www.kimballgroup.com/2005/12/the-matrix-revisited/ and http://www.kimballgroup.com/2008/09/practical-steps-for-designing-a-dimensional-model/.

You can follow along with the data warehouse post series here.

 

Comments are closed.