Dimensional modeling: Identify the dimensions

After you have determined the grain of your model, you identify the dimensions that are true to that grain. You create columns, hierarchies, and cases for snowflaking.

The following metadata is collected when you identify dimension tables:

Dimension names
Business definitions
Hierarchies
Handling dimension changes
Load frequency and statistics
Usage statistics
Archive rules and statistics
Purge rules and statistics
Data quality and accuracy
Primary and foreign keys and how the keys are generated
Data source information
Facts

To fully define the dimensions of your dimensional model, you perform the following steps:

Identify dimensions

Identify the dimensions that are true to the grain of your model.

Dimension tables contain columns that describe the fact records in the fact table. Some of these columns provide descriptive information. Other columns specify how the data in the fact table is summarized to provide useful information. Dimension tables contain hierarchies that help to summarize data. Dimension tables are smaller, denormalized lookup tables that contain descriptive columns that you reference when you define queries.

To learn more about dimension tables, see Dimension tables and entities.

To identify dimensions, you perform the following steps:

Use the grain definition to locate possible dimensions.
List all of the dimensions that are associated with this grain. Define the level of detail to include in that dimension.
Create surrogate keys for the primary keys of the dimensions. To learn more about surrogate keys, see Surrogate keys.

Identify shared dimensions

Identify any shared dimensions that are available, instead of redesigning the dimensions. Identify whether a dimension that is used exists inside the enterprise data warehouse or dimensional model. To learn more about shared dimensions, see Shared dimensions.

To identify a shared dimension, perform the following steps:

Identify whether a dimension exists. If the dimension exists, then you must use that dimension. If the dimension exists, then you do not identify dimensional attributes for that particular dimension. Share a dimension with an existing dimension even if the shared dimension contains only a subset of attributes from the primary dimension.
If the dimension does not exist, create a dimension that you use across the enterprise. If the dimensional model requires a dimension that does not exist, design a new dimension. When you design this dimension, interact with subject matter experts to find out how they plan to use the dimension.

Identify dimensional columns and hierarchies

After you identify the dimensions, fill the dimensions with columns. Use the descriptive columns to define the restriction criteria for queries.

Consider the following points when you design your dimensions:

Preserve the source system key

Use a separate entry in the dimension table to preserve the natural source system key of the entity being used in the source system.

Use surrogate keys

Use a surrogate key for the primary key of a dimension. You do not analyze the surrogate key. To learn more about surrogate keys, see Surrogate keys.

Create descriptive, unique columns in your model

The columns of a dimension reflect the potential areas of interest that you can use to aggregate data or to create constraints and report breaks.

Create columns that are descriptive and easily understood.

Define columns that can contain a NULL value when a column does not apply to a specific item or its value is unknown.

Define unique column names within the model. If you have duplicate names in different dimension tables, create a distinction. For example, if you have multiple columns that are named Address Type Code, rename one column Beneficiary Address Type Code and another column Premium Address Type Code.

Properly document all columns.

Handling codes

If you use codes in your dimension, provide documentation of the code. For example, if you identify branches by a branch code, where each code represents a branch name, then include both the code and the name.

After you have defined the columns, you can define the hierarchies of the dimension. A hierarchy is a cascaded series of many-to-one relationships. A hierarchy contains different levels, each corresponding to a dimension attribute. To learn more about hierarchies, see Hierarchies.

Identify the granularity of date and time dimensions

Identify the granularity of the date and time dimensions. The date and time dimensions help determine the granularity of the overall dimensional model and the level of information that is stored in the model. If you define an incorrect grain in either the date or time dimension, you may omit important dimensions. For example, if you design a dimensional model for an order management system and design the grain of the date dimension as Quarter, you may miss many other dimensions (such as Time and Employee). You can include those other dimensions if the model defines the grain of the date dimension been at the day level.

Date dimension

Since all dimensional models are based on units of time, every data mart has a date dimension. For example, you may want to measure how the business performs over a period of time. A dimensional model may contain several date dimensions.

The date dimension typically does not have an OLTP source system that is connected to the dimension. You can develop the date dimension before you design the dimensional model. To build a date dimension, perform the following steps:

Identify all of the columns that you need.
Use SQL statements to populate the date dimension table for columns such as Date, Day, Month, and Year. You may need to manually enter data for special columns. For example, if you are tracking non-business days (such as holidays), you should enter Christmas or Boxing Day manually. You should also manually enter dates such as fiscal date, fiscal month, fiscal quarter, and fiscal years.

Date data is often stored in a separate date dimension (instead of a Date column in a fact table) for two reasons:

There are several date attributes that the SQL date functions do not support. These functions include fiscal periods, holidays, seasons, weekdays, weekends, and national events. When you create a date dimension, the business can look at the performance indicators of their business across various fiscal and date-related attributes. The performance indicators are not apparent if you use a SQL date or time column inside the fact table.
It is much easier to drag the columns from a date table instead of using complex SQL functions to create the logic for the reports.

Time dimension

You can handle time in dimensional modeling in two ways:

Time of day as a separate dimension
Time of day as a fact

You should handle time data in a dimension table and not a fact inside the fact table. You should create a time dimension if you must support the roll-up of time periods into more summarized groupings for reporting and analysis. For example, create a time dimension for the following summarized groupings:

Hours
Business-specific time groupings (weekday mornings, late-night shifts, or late-evenings)

You should also create a time dimension if you want to represent different hierarchies for the time that you are measuring. For example, create different hierarchies for Standard and Military time.

However, if you do not roll up or filter time-of-day groups, express time as a fact inside the fact table. In this case, time is treated as a simple numeric fact, in the timestamp data type.

Note: Expressing time as a fact inside the fact table makes the reporting and summarization process difficult if you need to analyze data that was stored over a period of time. You can easily summarize and report on data if you create a separate time dimension.

Identify slowly changing dimensions

You must identify slowly changing dimensions and determine how you will handle the changing data.

A slowly changing dimension is a dimension whose attributes for a record change slowly over time. For example, you may need to track employee transfers within the company.

The workbench supports the following types of slowly changing dimensions:

Type 0: No changes are made to the dimension.
Type 1: New data overwrites the old data. Historical changes are not tracked.
Type 2: With this method, two separate entries are created. The original record and the new record are both available in the table. The new row gets its own primary key (surrogate key).
Type 4: Separate tables are created to store some or all historical data. Only one table contains the current data, and when updates occur, the old data is moved into a historical table.
Type 6: Type 1, Type 2, and Type 3 (storing data in separate columns in the same table) are combined. Typically, the old data is moved into separate records and columns, and the new data is placed in the original records.

Identify cases for snowflaked dimensions

You create a snowflake schema deign if you normalize and expand the dimension tables in a star schema. A dimension table is snowflaked when the low-cardinality attributes in the dimension have been removed to separate normalized tables and these normalized tables are then joined back into the original dimension table.

Note: Snowflaking in the dimensional model environment is not recommended. Snowflaking makes the dimensional model harder to understand. Snowflaking can also cause decreased performance, because you must join more tables.

You must only snowflake a dimension in a dimensional model in two cases:

The dimension table contains two or more sets of attributes that define information at different grains.
You populate the sets of attributes of the same dimension table via different source systems.

Do not snowflake hierarchies of one dimension table into separate tables. Hierarchies should belong to the dimension table only. Do not snowflake the hierarchies. Multiple hierarchies can belong to the same dimension if the dimension has been designed at the lowest possible detail.

If hierarchies are split into separate tables, performance is impacted, because more joins are required. In some situations, you may snowflake the hierarchies of a main table. When you use an aggregate of the fact table, only snowflake dimensions with hierarchies to avoid joins to large dimension tables. For example, if you have brand information that you want to separate from a product dimension table, create a Brand snowflaked dimension that contains a single row for each brand, using fewer rows than the Product dimension table.