Validation of the ETL process

The validation module gives information about the data from the ETL process. You can make any necessary corresponding changes to the input data.

The data after ETL procedure is saved in the .tmp store. You can choose to run this validation procedure.

Any error must be corrected for the analysis module. You can also define customized validation rules in this module.

The files are located:
  • The configure file: /home/<utility>/conf/
  • The python source code: /home/<utility>/bin
  • The shell files: /home/<utility>/bin

Master data validation

Validation of the master data changes according to the asset.

For different assumptions of the data, the validation module gives different validation levels, warning and error.

The input data of validation is from: /user/<utility>/cm/tmp/<asset entity>.

Voltage data validation

Validation of the voltage data focuses on the timestamps and the voltage values levels.

The timestamps describe the process variation of the voltage data that is required for the phase analysis in the analysis module.

The level of the voltage values describes the voltage level of different meters in the feeders. In practice, the voltage drop between the meters and their feeder should not be large.

The input data of validation is from: /user/<utility>/cm/tmp/feeder_voltage or /user/<utility>/cm/tmp/meter_voltage.
  • Basic validation
  • Timestamp validation - the validation of the fixed time interval and the validation of timestamp alignment between meters and their feeder. The fixed time interval is a basic assumption that the SCADA device provides a fixed time interval. The timestamp alignment of the feeder and the meters are not aligned. This is the basis of the analysis as it compares the variation trend of the feeder and the variation trend of meters in the feeder.
  • Voltage data level validation - the validation of the voltage data three parts: the feeder voltage level, the meter voltage level, and the voltage level between the meter and feeder. In practice, the voltage level of the feeder and the meter must be the same. The population standard deviation value is used as the index to verify the voltage level. The user can design different indices to check the voltage level.

Load data validation

Validation of the load data focuses on the timestamps and the load values level.

The timestamps describe the variation process of the load data that is used in the phase analysis in the analysis module. The load values level describes the load level of different meters in feeders. In practice, the load gaps between meters and their feeders should not be large.

The input data of validation is from: /user/<utility>/cm/tmp/feeder_load or /user/<utility>/cm/tmp/meter_load
  • Basic validation
  • Timestamp validation - the validation of the fixed time interval and the validation of timestamp alignment between meters and their feeder. The fixed time interval is a basic assumption that the SCADA device provides a fixed time interval. The timestamp alignment of the feeder and the meters are not aligned. This is the basis of the analysis as it compares the variation trend of the feeder and the variation trend of meters in the feeder.
  • Load data level validation - the validation of the load data level is in three parts: the feeder load level, the meter load level, and the load level between meter and feeder. In practice, the load level of the feeder and the meter must be the same. The population standard deviation value is used as the index to verify the load level. The user can design different indices to check the load level.

Move data from the tmp store to the ana store

After ETL or ETL with Validation, the you move the data from the .tmp store to ana store.

The input raw data: /user/<utility>/cm/tmp/*

The output data: /user/<utility>/cm/ana/*

The master data will overwrite from .tmp to ana.

Populate the operational store

After the ETL procedure, the python script py_PopulateOperationalStore writes to the master data in the HBase.

The input raw data: /user/<utility>/cm/raw/feeder_voltage.