Trend detection

For the purposes of Surveillance Insight complaint trend detection, a trend is a defined as a significant and prolonged change in the number of complaints per day that you would expect to see. The trend detection algorithm identifies all such shifts in the complaint counts, either up or down, and presents these in an easy to read user interface.

The following diagram shows one example of a trend: the number of complaints per day for one complaint theme (or complaint type) has been plotted for 30 days. This is trending because the number of complaints started increasing rapidly after June 4th, over and above the normal historic variability.

Diagram showing trend detection example — Figure 1. Trend detection example

The lighter blue line is the raw amount of complaints per day, and is very jagged (noisy) due to the number of complaints varying between weekdays and weekends. To make it easier to understand, Surveillance Insights plots a smoothed line (dark blue line) by taking the average over the previous 7 days on each day. This removes the noise caused by weekdays/weekends. The red dot indicates the day on which the detected trend started.

Business problem

Financial institutions require a method of identifying unusual increases in complaints, and in identifying specific causes of those complaints outside of the usual internal methods. In particular, a method is required for finding sudden increases in the number of complaints that are related to a specific theme that may help in the early detection of a business problem or a customer pain-point.

Approach to solving the business problem

Surveillance Insight uses a combinatorial and statistical approach to systematically check and find groups of related trends that show trending behavior.

All complaints are classified by the Natural Language Classifier and Natural Language Understanding models, and the results are combined with customer data. The result is a set of data points for each complaint such as category, process, product, theme, sub-product, age, zip code, state, gender etc.

Diagram showing the grouped trends — Figure 2. Grouped complaints

The trend detection algorithm groups all of the complaints according to these features. Then the algorithm calculates all of the combinations of these features, and then checks each one to see if a trend can be detected.

For example, the trend detection algorithm derives all of the combinations of these fields:

complaint category
complaint category + process
complaint category + process + zip code

This forms a hierarchy of complaint types: the top level is very generic (complaint category only) but more specific details appear as further columns are added. For example:

complaint category = ‘burdensome request’, …
complaint category = ‘burdensome request’ + process = ‘Mortgage Application”,
complaint category = ‘burdensome request’ + process = ‘Mortgage Application” + zip code = 90410

The trend detection algorithm then counts the complaints in each combination of feature-value pairs over periods of 30, 60, and 90 days. Each count is evaluated to see if a trend can be found using a trend detection rule. This method enables the detection of cross-feature trends that would otherwise be hard to detect; for example, complaints about loan applications for customers younger than 30 in the mid-western states.

Trend detection rule

Different statistical methods are used to identify trends:

A Mann-Kendall score is a statistical assessment of whether there is a consistent upward or downward trend in the count over time.
A Poisson Distribution Model is a statistical model that is used to evaluate how unusual a point in a time-series is when it is compared to previous points. It helps identify sudden increases (or decreases) in the number of complaints that may help indicate the presence of a trend.

These statistical measures are combined to determine if a trend is occurring:

a significant and sustained increase (or decrease) in the count
at least one day where the count is unexpectedly large or small given the preceding counts (using a Poisson distribution model)
at least one day greater than the average count
a clearly defined start and end
for an increasing trend, the end-count must be greater than the start-count
for a decreasing trend, the end-count must be less than the start-count

Figure 3. Diagram showing the trend detection rule

The parameters that Surveillance Insight uses to detect trends can be tuned for each customer in the configuration files. The configurable parameters include:

Relative importance of rate of increase or decrease in a trend
Relative importance of absolute count of complaint
Relative importance of consistency of trend: once a trend starts, does it only increase or does it fluctuate?
Duration of trend

Early trend detection

The trend detection algorithm also shows the date on which a time-series began to show a trend. This is called the “early trend detection” date. An example is shown by the red dot in the following diagram.

To explain this better, the chart above also shows a horizontal dotted red line which shows the average number of complaints per day for this combination for the whole 30-day period. The dotted line does not appear in the Surveillance Insight, it is only used for explanation here.

The red dot shows the earliest date on which a trend was detected because:

All counts after this date are increasing: for example, there is a trend
The red dot shows the first date (in raw count) with a count above the 30-day average
Early Trend Detection is a way of indicating “the trend started on this date…”

Assumptions

Trends are derived based on aggregate data from the complaints meta-data.

The NLC and NLU models which create the meta-data (complaint type, process etc.) have accurately classified the complaints in most of the cases.

Accuracy and limitations

There is no absolute definition of a trend.

Trend detection is a compromise between detecting significant (but possibly small) trends, and ignoring noise due to natural (i.e. random) variability.

Trends may vary widely in terms of number of complaints per day, speed of increase in count per day and other factors. It is difficult to find a “one-size fits all” method that will discover trends for themes at a high level general view (e.g. a countrywide perspective) and also for low level highly specific characteristics. For example, there may tens of thousands of complaints related to ‘delay’ while the count of complaints related to ‘poor customer service for female customers over 50 in New Mexico’ may only contain tens of complaints.