Step 1: Identify business process requirements

Select the business process for which the dimensional model will be designed. Based on the selection, the requirements for the business process are gathered. A business process require more than one dimensional model.

In dimensional modeling, the best unit of analysis is the business process in which the organization has the most interest. A business process is a set of related activities. Business processes are classified by the topics of interest to the business. When you create a candidate list of high potential business processes, you must prioritize the requirements. Examples of business processes are customers, profit, sales, organizations, and products.

Business processes are not always business departments. For example, consider a scenario where the sales and marketing department accesses the order data. In this case, you would build a single dimensional model to handle order data instead of building separate dimensional models for the sales and marketing departments. If you create dimensional models based on departments, you would store duplicate data. Duplication, or data redundancy, can result in many data quality and data consistency issues.

When you select a single business process (out of all of the possible processes that exist in a company), you must prioritize the business processes according to certain criteria. Criteria might include business process significance, quality of data in the source systems, and the feasibility and complexity of the business processes.

When you identify the business processes of a dimensional model, you collect the following metadata:
Dimensional models are typically used in two environments: the data warehouse and OLTP systems.
The data warehouse and the dimensional model
When you partition the data in a data warehouse, you divide the data based on subject area. A data warehouse is subject-oriented. The data warehouse contains specific, selected subject areas in the organization, such as customer and product. For a practical implementation of a data warehouse, the most important data is contained in a specific business process. This requirement is quite different from OLTP requirements.

Queries in the data warehouse environment are more strategic in nature and ask questions that are concerned with a larger scope. An example of a query is, "What products are selling well?" or "Where are my weakest sales offices?" To answer those queries, the data warehouse is structured and oriented to subject areas such as product or organization. These subject areas are the most common unit of logical partitioning in the data warehouse.

OLTP systems and the dimensional model
In the operational environment, you partition data by application or function because the operational environment is built around transaction-oriented applications that perform a specific set of functions. The objective of the operational environment is to perform those functions as quickly as possible. If there are queries performed in the operational environment, they are more tactical in nature and should answer questions that are concerned with that instant in time. An example query could consist of the question, "Has the check from Mr. Smith been processed?"

Create and study the enterprise business process list

Create a complete enterprise-wide business process list. Consider the following assessment factors when you create your list:
Tip: It might be helpful to assign values to each assessment factor and business process. When you assign values, you can determine the priority of each business process.

Identify the business process that you want to model

Prioritize the business processes. You should identify the most and least feasible processes when you build a dimensional model. This step summarizes the assessment factors that you determined above. The processes that are the most significant to the business should be modeled first.

Identify high-level entities and measures that are common across several processes

Determine the high-level business entities that are involved in each process. Determine which entities are common across several business processes. After the common entities are identified, you can tie business processes together through these common (shared) dimensions.

To create shared dimensions that are used across the enterprise, you must ensure that the various parts of the business agree on the definitions for these common entities. This process can take some time, because the definitions for common entities can vary between various parts of the business. You should define the common entities early, because if you need to change this definition in the future, existing applications can be impacted.

A data warehouse must provide consistent information for queries that request similar information. One method to maintain consistency is to create dimension tables that are shared and used by all applications and data marts (dimensional models) in the data warehouse. Candidates for shared dimensions include customers, time, products, and geographical dimensions, such as the store dimension.

Developing a set of shared dimensions is a significant challenge. Any dimensions that are common across the business processes must represent the dimension information in the same way. That is, the information and underlying data must be shared. Each business process will typically have its own schema that contains a fact table, several shared dimension tables, and dimension tables that are unique to the specific business function.

Identify data sources

Identify the data sources that are involved with the business processes. A dimensional model is created from one of the following sources:
  • An enterprise-wide data warehouse
  • OLTP source systems (in the case of independent or dependent data mart architectures)
  • Independent data marts (in this situation, you might be interested in consolidating the independent data marts into another data mart or data warehouse)

Select the requirements gathering method

Requirements are typically difficult to define. Typically, only after seeing a result can you decide that the result does (or does not) satisfy a requirement. The requirements of an organization also change over time. What is valid one day might no longer be valid the next day. Regardless, you use the requirements identified at this point in the development cycle to build the dimensional model.

The questions are:
  • How can you build something that cannot be precisely defined?
  • How do you know when you have successfully identified the requirements?
Although there is no definitive test, you can typically begin the modeling process if your requirements address the following questions:
To gather the full set of requirements, you should consider the following questions:
  • Who are the people, groups, and organizations of interest?
  • What functions need to be analyzed?
  • Why is the data required?
  • When does the data need to be recorded?
  • Where, geographically and organizationally, do relevant processes occur?
  • How is performance of the functions measured?
  • How is performance of the business process measured? What factors determine the success or failure?
  • What is the method of information distribution? Is it a data report, paper, email, or another method?
  • What types of information are lacking for analysis and decision making?
  • What steps are currently taken to fulfill the information gap?
  • What level of detail would enable data analysis?
In general, most of the methods for deriving business requirements conform to one of two approaches: source-driven and user-driven.
Source-driven
Source-driven requirements gathering is based on defining the requirements by using the source data in production operational systems. You can define the requirements by analyzing a source data model if one is available or the actual physical record layouts and selecting data of interest.

The major advantage of this method is that you know from the beginning that you can supply all the data, because you are already limiting yourself to what is available. A second benefit is that you can minimize the time required by the users in the early stages of the project. However, there is no substitute for the importance and value you get when you involve the users.

Of course there are also disadvantages to this method:
  • By minimizing user involvement, you increase the risk of producing an incorrect set of requirements.
  • Depending on the volume of source data that you have, and the availability of source models for the data, this can also be a very time-consuming method.
  • Some users might need to access data that is currently unavailable.
Without the opportunity to identify all of the requirements, there is no chance to investigate what is needed to obtain external data. External data is data that exists outside the enterprise. Even so, external data can often be of significant value to the business users.
The result of the source-driven method is to provide what you have, which is appropriate in at least two cases:
  • The method can be used to develop a fairly comprehensive list of the major dimensions of interest to the enterprise. If you plan to create an enterprise-wide data warehouse, this could minimize the proliferation of duplicate dimensions across separately developed data marts.
  • Analyzing relationships in the source data can identify areas on which to focus your data warehouse development efforts.
User-driven
User-driven requirements gathering is a method based on defining the requirements by investigating the functions the users perform. This is usually done through a series of meetings or interviews with users.

The major advantage to this method is that the focus is on providing what is really needed, rather than what is available. In general, this method has a smaller scope than the source-driven method. Therefore, the user-driven method generally produces a useful data warehouse or a data mart in a shorter time span.

However, expectations must be closely managed. The users must clearly understand that some of the data they need can simply not be made available for a variety of reasons. However, you should not to limit the things for which the user asks. Alternative ideas should be promoted when you define the requirements for a data warehouse. These requirements prevent you from eliminating requirements simply because you think they might not be possible. If a user is too tightly focused, you can miss useful data that is available in the production systems.

User-driven requirements gathering is typically the method of choice, especially when developing dependent data marts or populating data marts from a business-wide enterprise warehouse.

Gather the requirements

When you gather the requirements, the needs of business users are collected and documented. When you gather the requirements, you study the business processes and information analysis activities with which users are involved. A user typically needs to evaluate or analyze some aspect of the business. Focus your efforts on gathering the two key elements of analysis that business users are involved with on a day-to-day basis:
When you gather the requirements, you should work to understand the problem domain for which the modeling is done. Typically, the requirements at this stage are documented informally, and schemas are not fully detailed. When you gather these requirements, identify the following areas of interest:
  • Identify the most important questions that the business needs to address. You can assign importance values to each question to determine the most important questions to address.
  • Determine how the business wants to record the data as it changes. For example, you will want to know how to manage historical data for discontinued products or employee records.

Analyze the requirements

Analyze the business requirements. Determine informal requirements, and set up high-level measures and high-level entities. These objects might become dimensions when you model the data. Use this draft as a starting point to filter down the entities and measures. You can start to draft the structure of the requirements, drafting dimensions, hierarchies, and the measures for each part of the data model.

Summarize the business process analysis

Create a report from the analysis. The report should contain the following information:

Feedback