The function pipeline

In Analytics Service, a function might generate a KPI that is used as input to a secondary calculation, which in turn might feed into the calculation of other functions. The process of calculating KPIs in stages is known as developing a function pipeline. By calculating and recalculating KPIs, you can produce precise metrics for your organization.

Analytics Service automatically builds the development pipeline for each device type based on the data items that are configured for it. You can configure how frequently calculations in the pipeline run. By default, calculations run every 5 minutes. See Scheduling calculations for more information.

In the following example, the output of f1 feeds into f2. The output of f2 feeds into f3.

Function pipeline

Each time Analytics Service runs the pipeline, it performs these steps:

Looks for new device data
Identifies that calculations are needed
Builds a pipeline that orders functions into stages
Runs the functions at each stage
Aggregates the results, if required
Writes the results to the data lake

Secondary calculations in a development pipeline might transform data or aggregate data that is produced at an earlier stage.

Example 1

The first example is of a pipeline with transform functions. In company A, the device A101 on the factory floor captures data about the revolutions per minute (RPM) and the temperature of a machine.

Table 1. Captured input data from entities

RPM	Temperature (°C)
2050	96
2060	97
2043	89

The operations manager reviews the data in Table 1 but wants to perform an analysis to understand it better. The manager wants to see at a glance when a threshold of 0.047 is exceeded for this machine.

A data scientist in the company devises two functions to be run on the data.

f1 = °C ÷ RPM
f2 = f1 >= 0.047

Custom functions are created for f1 and f2, added to the catalog, and applied to the device type.

Analytics Service builds a pipeline with two transform stages:

Stage 1: Calculate f1 based on device data and produce new data.
Stage 2: Calculate f2 based on the results of f1 and produce new data.

The transform stages are run in sequence and create new columns in the database.

Table 2. Transformed data for device A101

RPM	Temperature (°C)	f1	f2
2050	96	0.046829268	0
2060	97	0.047087379	1
2043	89	0.043563387	0

In f2, a value of 1 shows that the threshold value is exceeded. The operations manager can see that for one of its data points the device exceeded the threshold.

In this example, the transform created new data. Depending on how you design your functions, transform can also:

Change existing data
Add multiple new columns in a single stage
Add new rows of data
Run a predictive model
Retrieve other data and merge it into the existing data

Note:

In your custom functions, your transform must be able to take a dataframe as input and return a dataframe as output.
If you don't want to use dataframes, then define whatever you need as parameters to drive your transform as instance variables.

Example 2

The second example is of a pipeline with aggregation functions. Aggregation functions take the transformed results and summarize them. In the example from company A, the operation manager also wants to know how many times the threshold is exceeded. An aggregation function is added to sum ( ∑ ) the output of f2.

Before the pipeline is run, an existing stored summary table shows:

Table 3. Existing stored summary

Entity	Date	f2
A101	09/14/2018	2
A101	09/15/2018	0

After the pipeline is run, the stored summary table is updated to show:

Table 4. Existing stored summary

Entity	Date	f2 count
A101	09/14/2018	3
A101	09/15/2018	1

You develop custom aggregate functions in the same way that you build a custom function for a transform.