Step 2: Identify the grain

Identify the granularity of each fact table and business process. During this process, you identify the fact table types and the preliminary candidates for dimensions and measures.
The following list contains several characteristics of grain identification:
Specifying what the records contain
When you identify the grain, you specify exactly what a fact table record contains. The grain conveys the level of detail that is associated with the fact table measurements. When you identify the grain, you also decide on the level of detail you want to make available in the dimensional model. If more detail is included, the level of granularity is lower. If less detail is included, the level of granularity is higher.
Identifying the level of detail
The level of detail that is available in a star schema is known as the grain. Each fact and dimension table has its own grain or granularity. Each table (either fact or dimension) contains some level of detail that is associated with it. The grain of the dimensional model is the finest level of detail that is implied when the fact and dimension tables are joined. For example, the granularity of a dimensional model that consists of the dimensions Date, Store, and Product is product sold in store by day.
Identifying the data
Each row holds the same type of data. For example, each row could contain daily sales by store by product or daily line items by store.
For example, grain definitions can include the following items:

The fact and dimension tables have a granularity associated with them. In dimensional modeling, granularity refers to the level of detail stored in a table. For example, a dimension such as Date (with Year and Quarter hierarchies) has a granularity at the quarter level but does not have information for individual days or months. Alternately, a Date dimension table (with Year, Quarter, and Month hierarchies) has granularity at the Month level, but does not contain information at the day level.

You can handle different data granularities by using multiple fact tables (daily, monthly, and yearly tables). You can also use a single table with a granularity flag, or a column that indicates the grain of the table. However, do not store data with different granularities in the same fact table.

When you identify the grains of your data objects, perform the following steps:
  1. Determine the granularity of the fact table.
  2. Determine how to handle multiple separate grains.
  3. Determine the type of fact table to use.
  4. Check the atomicity of your grains.
  5. Determine the high-level dimensions and measures based on your grain definitions.
  6. Create a report of your grain definitions.

Identify the metadata of the grain

You collect the following metadata during this phase:

Determining the granularity of the fact table

The grain detail is based on the requirements findings that were analyzed and documented in Step 1: Identify business process requirements. Collect documents, such as invoices, receipts, and order memos. These documents often have information that you can use to define the grain. These documents also have information which helps identify the dimensions and measures for the dimensional models.

The grain you choose determines the level of detailed information that can be made available to the dimensional model.

The grain definition is the base of every dimensional model. The grain definition determines the level of information that is available. Guidelines for choosing the grain definition include the following considerations:
  • During the business requirements gathering phase, try to collect any documents, such as invoice forms, order forms, and sales receipts. Typically, these documents have transactional data associated with them, such as order number and invoice number.
  • Documents can often point you to the important elements of the business, such as customer and the products. The documents often contain information at the lowest level that may be required by the business.
  • Another important point to consider is the date. Understand what level of detail is associated with a customer, product, or supplier. Is the information in the source systems available at the day, month, or year level?

Handling multiple separate grains

Determine if there are multiple grains associated with the business process for which you are designing your dimensional model. There can be more than one grain definition associated with a single business process. In these cases, design separate fact tables with separate grains. Do not force all measures into a single fact table.

Different data granularities can be handled by using multiple fact tables (daily, monthly, and yearly tables, for example). Also, consider the amount of data, space, and the performance requirements when you decide how to handle multiple granularities.

Criteria for one or multiple fact tables
To determine whether to use one or multiple fact tables, consider the following criteria:
  • Consider the measures. Decide whether to group the measures in one fact table or separate in fact tables with different grains.
  • Are multiple OLTP source systems involved? Each source system is designed with a particular, specific purpose. If two source systems do not serve different purposes, consolidate the systems into a single source. If you must keep the systems separate, each source system caters to a particular requirement of the business. If the business processes include order management, store inventory, or warehouse inventory, separate source systems are probably involved. In this case, use separate fact tables.
  • Determine if multiple, unrelated business processes are involved. Create separate fact tables for unrelated business processes. If a single business process requires different levels of granularity, create separate fact tables to handle those levels.
  • If a dimension is not true to the grain definition, design a new fact table with its own grain definition.
  • Consider the timing and sequencing of events. You might need separate processes to handle a single event. For example, a company markets its product. Customers order the products. The accounts receivable department produces an invoice. The customer pays the invoice. After the purchase, the customer may return some of the products or send some of the products back for repairs. If any of the products are out of warranty, this process requires new charges. Several processes are involved in the sequence of single purchase event. Each of these processes are likely working with a particular different point in time. Each of these processes are handled using separate fact tables.
Multiple granularities in a single fact table
If you use multiple grains in one fact table, add a column called the granularity flag. This column would indicate the grain of the table. The column defines whether the information is stored at the daily, weekly, monthly, or yearly level.
Note: You can store multiple grains in a single fact table, but this approach is not recommended. Design separate fact tables and star schemas for each grain definition.

Identifying the types of fact tables to use

Identify the type of fact table involved in the design of the dimensional model. To learn more about fact table types, see Fact tables and entities.

Checking the atomicity of the grain

Review the atomicity (level of detail) of the grain to make sure that it is at the most detailed level. This decision includes consideration for anticipated future needs in order to minimize the potential for a required redesign as business requirements change.

The grain of the dimensional model is important when you design the dimensional model. Even if the business requirements need information at the monthly or quarterly level, make this information available at the daily level. If dimensions are more detailed (atomic), the business can retrieve more detailed information.

For example, consider a Date dimension that has only a Year attribute. Since there is only one attribute, you cannot query for information at the quarter, month, or day level. To maximize available information, choose a detailed atomic grain. In this example, you can define the grain to the Day level.

For example, assume a grain of one product sold in a store. You cannot associate a customer with a particular product that is purchased, because there is only one row for a product. If the product is purchased a thousand times by a thousand different customers, you cannot discover that information.

You can always declare higher-level grains for a business process by using aggregations of the most atomic and detailed data. However, when a higher-level grain is selected, the number of dimensions are limited and may be less granular. You cannot drill down into these less granular dimensions to get a lower level of detail.

Granularity provides the opportunity for a trade-off between important issues in data warehousing:
  • The performance versus the volume of data (and the related cost of storing that data)
  • The ability to access data at a detailed level versus performance (and the cost of storing and accessing large volumes of data)
Selecting the appropriate level of granularity significantly affects the volume of data in the data warehouse. Selecting the appropriate level of granularity can also determine the capability of the data warehouse to satisfy query requirements.

When you consider disk space and volume of data, a higher granularity provides a more efficient way of storing data than a lower granularity. You also must consider the disk space for the index of the data. By using an index, you save more disk space. Also consider the manipulation of large volumes of data. Data manipulation can affect performance at the cost of more processing power.

There are always trade-offs when you process data. For example, as the granularity becomes higher, the ability to answer different types of queries (that require data at a more detailed level) diminishes. If the table uses a low level of granularity, you can support any queries that using that data at the cost of increased storage space and diminished performance.

While the granularity does not affect the ability to answer a query, the system may need more resources to run the same query. Suppose that you have two tables with different levels of granularity: a table with transaction details and a table that summarizes accounts by month. To create a monthly account report, you can use either table without any dependency on the granularity. However, querying the detailed transaction table searches through more of the data, which requires processing to calculate the results. The monthly account summary table requires less resource.

When you decide the level of granularity, consider the trade-off between the cost of the volume of data and the ability to answer queries.

Identify high-level dimensions and measures

Identify high-level preliminary dimensions and measures from what you understand of the grain definition. No detailed analysis is carried out to identify these preliminary dimensions and measures. When you define the grain appropriately, you can easily find the preliminary dimensions and measures.

Preliminary measures are measures that can be easily identified by looking at the grain definition. For example, measures such as unit price, quantity, and discount are easily identifiable by looking at the grain. However, detailed measures such as cost, manufacturing price, and transportation cost are not preliminary measures that are identified by the grain. These types of measures are hidden and typically never visible on a report. Preliminary measures are not the final set of measures. The formal detailed measure identification occurs when you identify the measures.

These preliminary high-level dimensions and measures are helpful when you formally identify dimensions.

Create a report of the grain identification phase

The grain definition report is created for this phase. The report contains one or more definitions for the grain of the business process and defines the type of fact table. The report also includes the high-level preliminary dimensions and measures.


Feedback