Precision analysis

Precision analysis is used to refine the existing precision metadata definition for selective columns (for example, data type numeric columns) based on the actual data values that are present in the column.

Function

Precision analysis is useful if the original column precision was set without knowledge or regard to the actual data values that the column would contain. If a different precision for a column is determined from analysis, the existing metadata for the column can be changed in the original data source, or can be used to define the column in a new target schema for the data.

Technique

Each data value in a column's frequency distribution is analyzed to infer the precision required for storing that individual data value. Then, all of the individual precision inferences for the column are summarized by precision length to develop a frequency distribution of inferred precisions for the column. The system determines the longest inferred precision length in that frequency distribution that can store all of the columns data values.

System capability

During column analysis processing, the system constructs each column's frequency distribution, and then analyzes each distinct value to determine what precision length must be used for storing that particular data value. After each data value has been analyzed, the system summarizes the individual results to create a frequency distribution by inferred precision lengths for that column. The system uses the longest precision length as the inferred precision for the column because it can hold all of the existing data values. This system-inferred precision is then recorded in the repository as the inferred selection and is also defaulted at this time to be the chosen selection, as shown in the following figure.

Figure 1. An example of inferred precision that is recorded in the repository as the inferred selection
Shows inferred precision that is recorded in the repository as the inferred selection.

User responsibility

You can view the precision analysis, if applicable, when the column analysis review of columns is viewed. At the detailed column view, precision analysis has its own panel for viewing results as part of the properties analysis tab. From this panel, you can accept the system-inferred precision or can use a drop-down list to override the system's inference with another precision length. If you override the system-inferred precision selection, the new selection is recorded in the repository as the chosen precision. The process is ultimately completed when you review all of the column properties and mark the column property function as reviewed.

Interpreting results

Typically, a column's required precision will be obvious to you by a quick view of the precision analysis summary. Sometimes a column will result in only a single inferred precision for all of its data values. In that case, unless you are aware of some future data values outside of the capabilities of the inferred precision, you should accept the system's inference.

However, when there are multiple inferred precisions, you should take notice of the frequency count for the selected inferred precision. If that frequency count is low relative to the row count of the table, it might be that some invalid data value(s) are causing an excessive precision length to be inferred. (A drill-down from the inferred precision in the summary will show what data values require that precision length.) If that is the case, you can either override the inferred precision length property or can flag those data values as “invalid” and ask the system to re-inference.

Like other column properties, there is an advantage to maintain the consistency of precision assignments across columns in the data environment.

Decisions and actions

You can either accept the system-inferred precision or override the inferred precision by selecting another.

Once that decision has been made, you can continue to review the other column properties or can mark the column properties review as complete.

Performance considerations

There are no system significant performance considerations for the precision analysis function.