Identify the granularity of each fact table and business
process. During this process, you identify the fact table types and
the preliminary candidates for dimensions and measures.
The following list contains several characteristics of grain identification:
- Specifying what the records contain
- When you identify the grain, you specify exactly what a fact table
record contains. The grain conveys the level of detail that is associated
with the fact table measurements. When you identify the grain, you
also decide on the level of detail you want to make available in the
dimensional model. If more detail is included, the level of granularity
is lower. If less detail is included, the level of granularity is
higher.
- Identifying the level of detail
- The level of detail that is available in a star schema is known
as the grain. Each fact and dimension table has its own
grain or granularity. Each table (either fact or dimension) contains
some level of detail that is associated with it. The grain of the
dimensional model is the finest level of detail that is implied when
the fact and dimension tables are joined. For example, the granularity
of a dimensional model that consists of the dimensions Date, Store,
and Product is product sold in store by day.
- Identifying the data
- Each row holds the same type of data. For example, each row could
contain daily sales by store by product or daily line items by store.
For example, grain definitions can include the following
items:
- A line item on a grocery receipt
- A monthly snapshot of a bank account statement
- A single airline ticket purchased on a day
The fact and dimension tables have a granularity associated with them.
In dimensional modeling, granularity refers to the level
of detail stored in a table. For example, a dimension such as Date
(with Year and Quarter hierarchies) has a granularity at the quarter
level but does not have information for individual days or months.
Alternately, a Date dimension table (with Year, Quarter, and Month
hierarchies) has granularity at the Month level, but does not contain
information at the day level.
You can handle different data granularities by using multiple fact
tables (daily, monthly, and yearly tables). You can also use a single
table with a granularity flag, or a column that indicates the grain
of the table. However, do not store data with different granularities
in the same fact table.
Identify the metadata of the grain
You collect
the following metadata during this phase:
- One or more grain definitions for the business process that you
are modeling
- Type of fact table that you use
- High-level preliminary dimensions and measures
Determining the granularity of
the fact table
The grain detail is based on the requirements
findings that were analyzed and documented in Step 1: Identify business process requirements. Collect documents, such as invoices, receipts,
and order memos. These documents often have information that you can use
to define the grain. These documents also have information which helps
identify the dimensions and measures for the dimensional models.
The grain you choose determines the level of detailed information
that can be made available to the dimensional model.
The grain
definition is the base of every dimensional model. The grain definition
determines the level of information that is available. Guidelines
for choosing the grain definition include the following considerations:
- During the business requirements gathering phase, try to collect
any documents, such as invoice forms, order forms, and sales receipts.
Typically, these documents have transactional data associated with
them, such as order number and invoice number.
- Documents can often point you to the important elements of the
business, such as customer and the products. The documents often contain
information at the lowest level that may be required by the business.
- Another important point to consider is the date. Understand what
level of detail is associated with a customer, product, or supplier.
Is the information in the source systems available at the day, month,
or year level?
Handling multiple separate grains
Determine if there are multiple grains associated with the business
process for which you are designing your dimensional model. There
can be more than one grain definition associated with a single business
process. In these cases, design separate fact tables with separate
grains. Do not force all measures into a single fact table.
Different
data granularities can be handled by using multiple fact tables (daily,
monthly, and yearly tables, for example). Also, consider the amount
of data, space, and the performance requirements when you decide how
to handle multiple granularities.
- Criteria for one or multiple fact tables
- To determine whether to use one or multiple fact tables, consider
the following criteria:
- Consider the measures. Decide whether to group the measures in
one fact table or separate in fact tables with different grains.
- Are multiple OLTP source systems involved? Each source system
is designed with a particular, specific purpose. If two source systems
do not serve different purposes, consolidate the systems into a single
source. If you must keep the systems separate, each source system
caters to a particular requirement of the business. If the business
processes include order management, store inventory, or warehouse
inventory, separate source systems are probably involved. In this
case, use separate fact tables.
- Determine if multiple, unrelated business processes are involved.
Create separate fact tables for unrelated business processes. If a
single business process requires different levels of granularity, create
separate fact tables to handle those levels.
- If a dimension is not true to the grain definition, design a new
fact table with its own grain definition.
- Consider the timing and sequencing of events. You might need separate
processes to handle a single event. For example, a company markets
its product. Customers order the products. The accounts receivable
department produces an invoice. The customer pays the invoice. After
the purchase, the customer may return some of the products or send
some of the products back for repairs. If any of the products are
out of warranty, this process requires new charges. Several processes
are involved in the sequence of single purchase event. Each of these
processes are likely working with a particular different point in time.
Each of these processes are handled using separate fact tables.
- Multiple granularities in a single fact table
- If you use multiple grains in one fact table, add a column called
the granularity flag. This column would indicate the
grain of the table. The column defines whether the information is
stored at the daily, weekly, monthly, or yearly level.
Note: You can
store multiple grains in a single fact table, but this approach is
not recommended. Design separate fact tables and star schemas for
each grain definition.
Identifying the types of fact
tables to use
Identify the type of fact table involved in the
design of the dimensional model. To learn more about fact table types, see
Fact tables and entities.
Checking the atomicity of the
grain
Review the atomicity (level of detail)
of the grain to make sure that it is at the most detailed level. This
decision includes consideration for anticipated future needs in order
to minimize the potential for a required redesign as business requirements
change.
The grain of the dimensional model is important when
you design the dimensional model. Even if the business requirements
need information at the monthly or quarterly level, make this information
available at the daily level. If dimensions are more detailed (atomic),
the business can retrieve more detailed information.
For example,
consider a Date dimension that has only a Year attribute. Since there
is only one attribute, you cannot query for information at the quarter,
month, or day level. To maximize available information, choose a detailed
atomic grain. In this example, you can define the grain to the Day
level.
For example, assume a grain of one product sold in a
store. You cannot associate a customer with a particular product that
is purchased, because there is only one row for a product. If the
product is purchased a thousand times by a thousand different customers,
you cannot discover that information.
You can always declare
higher-level grains for a business process by using aggregations of
the most atomic and detailed data. However, when a higher-level grain
is selected, the number of dimensions are limited and may be less
granular. You cannot drill down into these less granular dimensions
to get a lower level of detail.
Granularity provides the opportunity
for a trade-off between important issues in data warehousing:
- The performance versus the volume of data (and the related cost
of storing that data)
- The ability to access data at a detailed level versus performance
(and the cost of storing and accessing large volumes of data)
Selecting the appropriate level of granularity significantly affects
the volume of data in the data warehouse. Selecting the appropriate
level of granularity can also determine the capability of the data
warehouse to satisfy query requirements.
When you consider disk
space and volume of data, a higher granularity provides a more efficient
way of storing data than a lower granularity. You also must consider
the disk space for the index of the data. By using an index, you save
more disk space. Also consider the manipulation of large volumes of
data. Data manipulation can affect performance at the cost of more
processing power.
There are always trade-offs when you process
data. For example, as the granularity becomes higher, the ability
to answer different types of queries (that require data at a more
detailed level) diminishes. If the table uses a low level of granularity,
you can support any queries that using that data at the cost of increased
storage space and diminished performance.
While the granularity
does not affect the ability to answer a query, the system may need
more resources to run the same query. Suppose that you have two tables
with different levels of granularity: a table with transaction details
and a table that summarizes accounts by month. To create a monthly
account report, you can use either table without any dependency on
the granularity. However, querying the detailed transaction table
searches through more of the data, which requires processing to calculate
the results. The monthly account summary table requires less resource.
When you decide the level of granularity, consider the trade-off
between the cost of the volume of data and the ability to answer queries.
Identify high-level dimensions
and measures
Identify high-level preliminary dimensions
and measures from what you understand of the grain definition. No
detailed analysis is carried out to identify these preliminary dimensions
and measures. When you define the grain appropriately, you can easily
find the preliminary dimensions and measures.
Preliminary
measures are measures that can be easily identified by looking
at the grain definition. For example, measures such as unit price,
quantity, and discount are easily identifiable by looking at the grain. However,
detailed measures such as cost, manufacturing price, and transportation
cost are not preliminary measures that are identified by the grain.
These types of measures are hidden and typically never visible on
a report. Preliminary measures are not the final set of measures.
The formal detailed measure identification occurs when you identify the measures.
These preliminary
high-level dimensions and measures are helpful when you formally identify
dimensions.
Create a report of the grain
identification phase
The grain definition report is created
for this phase. The report contains one or more definitions for the
grain of the business process and defines the type of fact table.
The report also includes the high-level preliminary dimensions and
measures.