Window size and scheduling frequency

Anomaly functions apply a sliding window to a signal of time series data to capture patterns in the signal. The window size determines the size of the sliding window.

In the Analytics Service, a signal represents all data points that are included when the pipeline runs an anomaly function. For example, the pipeline might include 60 data points.

An anomaly function breaks up a signal into segments or windows. It uses the user-defined window size parameter to define the segments. For example, you might set the window size to 12 data points.

To capture each segment, the anomaly function slides the window by one data point (see figure 1). Each segment overlaps the next segment by one data point.

Figure 1. Sliding window
A sliding window

Window size

An anomaly function builds an internal catalog of patterns from the segments it captures. The function uses these patterns to match each segment to an existing pattern. For example, the KMeansAnomalyScore function groups similar patterns into categories over time. Then, for each new window, the function finds the closest pattern from its catalog. It subtracts the window signal from the category signal. The result is a noisy signal. The function has some built-in maximum and minimum values based on a normal distribution. If any part of the noisy signal is outside of the normal levels, an anomaly is detected.

Table 1 displays a typical window size and minimum window size for each function. It also displays the minimum number of data points that are required to support the typical window size.

Anomaly function Typical window size Minimum window size Minimum number of data points
FFTbasedGeneralizedAnomalyScore 12 3 24
GeneralizedAnomalyScore 12 6 24
KMeansAnomalyScore 12 1 24
NoDataAnomalyScore 12 6 24
SpectralAnomalyScore 12 6 24
SaliencybasedGeneralizedAnomalyScore 12 6 24
MatrixProfileAnomalyScore 12 6 24

Cap the window size to the typical window size values. A small window size is preferable because you gather more patterns for analysis. However, if the volume of data points is small, the anomaly detector might not have enough data points to perform the analysis. For example, if you run a KmeansAnomalyScore function with only 12 data points and you set a window size of 12, the function has one segment of data points and nothing to compare this window to.

A minimum number of data points must be available in a pipeline run for the anomaly detectors to work effectively. As a rule, the minimum number of data points per signal or pipeline run is to have at least twice the window size (see Table 1).

For an understanding of the impact that window size has on computational complexity, see Computational complexity of anomaly models.

Scheduling frequency

Before you define the scheduling criteria for an anomaly function, identify the number of data points you expect for each pipeline run. The size of a pipeline run is determine by these factors:

  1. The frequency at which your devices are sending data.
  2. The timing of a pipeline run.
  3. The scheduling frequency of the anomaly function in the pipeline.
  4. The volume of historical data included for the function.

Example:

Your devices are sending events roughly every 5 seconds. You schedule an anomaly function to run every 5 minutes as part of a pipeline run. You do not include historical data in your analysis. The pipeline for your device type typically completes within 5 minutes so no delay in the start of the next pipeline. The signal has approximately 60 data points. If the signal has 60 data points and the window size is 10, the anomaly function breaks the signal into 50 sliding windows.

Table 2 provides some guidance on how to set window size and configure the schedule parameters for anomaly functions based on the frequency of your data:

Data frequency Window size Historical data Schedule:
Critical data
Schedule:
Non-critical data
1 event per day 12 data points Last 24 days of data Run once a day Run every 12 days
1 event per hour 12 data points Last 24 hours of data Run once an hour Run every 12 hours
1 event per 5 min 12 data points Last 2 hours of data Run every 5 minutes Run every 60 minutes
1 event per min 12 data points Last 1 hour of data Run every 5 minutes Run every 12 minutes

Verify that the parameters you specify for anomaly models work well in your environment.

Next topic

Using anomaly detectors