The function pipeline

In Analytics Service, a function might generate a KPI that is used as input to a secondary calculation, which in turn might feed into the calculation of other functions. The process of calculating KPIs in stages is known as developing a function pipeline. By calculating and recalculating KPIs, you can produce precise metrics for your organization.

Analytics Service automatically builds the development pipeline for each device type based on the data items that are configured for it. You can configure how frequently calculations in the pipeline run. By default, calculations run every 5 minutes. See Scheduling calculations for more information.

In the following example, the output of f1 feeds into f2. The output of f2 feeds into f3.

Function pipeline

Each time Analytics Service runs the pipeline, it performs these steps:

  1. Looks for new device data
  2. Identifies that calculations are needed
  3. Builds a pipeline that orders functions into stages
  4. Runs the functions at each stage
  5. Aggregates the results, if required
  6. Writes the results to the data lake

Secondary calculations in a development pipeline might transform data or aggregate data that is produced at an earlier stage.

Example 1

The first example is of a pipeline with transform functions. In company A, the device A101 on the factory floor captures data about the revolutions per minute (RPM) and the temperature of a machine.

Table 1. Captured input data from entities

RPM Temperature (°C)
2050 96
2060 97
2043 89

The operations manager reviews the data in Table 1 but wants to perform an analysis to understand it better. The manager wants to see at a glance when a threshold of 0.047 is exceeded for this machine.

A data scientist in the company devises two functions to be run on the data.

Custom functions are created for f1 and f2, added to the catalog, and applied to the device type.

Analytics Service builds a pipeline with two transform stages:

The transform stages are run in sequence and create new columns in the database.

Table 2. Transformed data for device A101

RPM Temperature (°C) f1 f2
2050 96 0.046829268 0
2060 97 0.047087379 1
2043 89 0.043563387 0

In f2, a value of 1 shows that the threshold value is exceeded. The operations manager can see that for one of its data points the device exceeded the threshold.

In this example, the transform created new data. Depending on how you design your functions, transform can also:

Note:

Example 2

The second example is of a pipeline with aggregation functions. Aggregation functions take the transformed results and summarize them. In the example from company A, the operation manager also wants to know how many times the threshold is exceeded. An aggregation function is added to sum ( ∑ ) the output of f2.

Before the pipeline is run, an existing stored summary table shows:

Table 3. Existing stored summary

Entity Date f2
A101 09/14/2018 2
A101 09/15/2018 0

After the pipeline is run, the stored summary table is updated to show:

Table 4. Existing stored summary

Entity Date f2 count
A101 09/14/2018 3
A101 09/15/2018 1

You develop custom aggregate functions in the same way that you build a custom function for a transform.