Data Warehousing #3: Dimensions vs. Facts
This is the 3rd in series of posts on datawarehousing. To see the entire post list, click here.
Onto the schema design portion of the the series! There are two main types of tables used to store information in a data warehouse:
Table Type #1: Dimension Table
A dimension table contains the attributes by which users will query your data warehouse. They are the content of the WHERE clause in your SELECT statement.
They are typically denormalized many-to-one hierarchies. An example of a dimension would be a product:
| dim _product |
| Product Code
| Product Desc
| Brand ID
| Brand Desc
Note that the product and brand and pricing information is denormalized into the above table This simplifies the model for the downstream user and makes join paths much less painful (both for a user and an optimizer) and is a common pattern in star schemas (more on them later).
Another example of a dimension that is used across all product verticals is a date dimension. By adding values to the date table that represent relationships between time periods (day of week, day of month, different formats, etc), you as a schema designer enable all sorts of fun relationships for users to analyze downstream.
Dimensions can change over time. I’ll be analyzing different ways to handle the evolution of dimensions over time in an upcoming post.
Table Type #2: Fact Table
A fact table represents a measurement event of an action or state of a business system. A great example of a fact event is a purchase:
| fac_purchase |
| Date ID
| Product ID
| Customer ID
| Order Number
| Order Amount
| Order Currency
Facts are typically numeric, and recorded in an additive pattern. Additivity is important so that BI applications can SUM, AVERAGE, or compute other mathematical functions over time (ie. revenue and median order size in March ).
There are a few subsets of fact tables:
- A transactional fact table record facts about a specific event (like the purchase example, above).
- A snapshot fact table is a fact that records a snapshot of the state of a business entity at a state in time.
- An accumulating snapshot records aggregate facts at a given point in time.
Those are the two patterns for table types in a dimensional model. There are permutations of the above, which I will cover in an upcoming post. You can follow along with the post series if you click here.