Troubleshooting
Problem
With a relatively small about of data the READ DATA SOURCE step takes very long to complete. This occurs regardless of whether the data source is a Cognos Package, an IQD or a a flat file.
Symptom
Compared to other Transformer models with equivalent volumes of data it can take 10 times longer to read same volume of data. The same model may even have large difference in read time between two different sets of data (prod, uat, dev etc). The problem is also demonstrated by only Generating Categories. It could be limited to specific data sources or specific Dimensions. An example of poor data read performance, 100,000 rows take 4 hours to read.
Cause
Data Source containing non-Unique Data populating Levels with Unique and Move properties set will impact read times due to Transformer performing Category Moves to maintain Category Uniqueness. Performance is further impacted by multiple drill down paths and Custom Views which multiplies the effect.
Diagnosing The Problem
Identify the source of poor cube build performance by analyzing the build log file. Check number of records processed and READ DATA SOURCE timing.
Example:
4:21:59 PM Timing, OPEN DATA SOURCE,00:00:00
9:25:03 PM End processing 90004 records from uat.csv
9:25:03 PM Timing, READ DATA SOURCE,05:03:04
Check the Dimensions associated with the slow data source for Levels with properties Unique and Move checked.
Test by unchecking the Move option on the levels and test generating categories for the slow data source. You can right-click on the data source in Transformer and select 'Generating Categories for Selected Data Source'. If the data is read quickly and you get an error indicating attempts to create a category in more than one path, (TR2318 Transformer has detected 101372 attempts to create a category
in more than one path.).
Resolving The Problem
The problem can be attributed to either data that is not properly conformed or the Levels being incorrectly set to Unique. The solution will rely on the business requirement.
If the data should contain data Categories with the same name in differnet paths then the levels should not be set to Unique.
If the Categories should be unique then it needs to be determined why the data source contains non-unique data.
Was this topic helpful?
Document Information
Modified date:
15 June 2018
UID
swg21616176