Timeseries Anomaly Detection

3 min readJun 2, 2022

Anomalies are the rare events or observations which deviates significantly from standard behaviour of any system. In timeseries, an anomaly could either be significant spike/drop at a specific point in time or disruptive trend in specific time period.

For e.g., the recent COVID-19 pandemic has affected drastically almost every industry. If we try to visualise industrial data over time, we can clearly highlight the disruptive behaviour.

There are multiple techniques to perform anomaly detection like One-vs-All SVM, statistical methods, hierarchical methods etc., but to identify disruption or anomaly in temporal data we have some specific approaches like:

Timeseries model based Confidence Interval
Domain expertise based statistical profiling
Unsupervised based Clustering Approach

Timeseries model based Confidence Interval

By creating a predictive model based on historical data which can later be used to determining the bounds for the timeseries. The prediction model can be any timeseries model like AR, MA, ARIMA etc, based on best fit and generate predictive trend line. On predicted line perform confidence interval analysis and detect the bounds. Later those bounds helps detecting final anomalies.

PROS

Can solve domain-agnostic problems
High explainability
Good with local outliers

CONS

Highly dependent on Predictive Models
Performance Issue
Lower the granularity higher the risk of false positive

Domain knowledge based statistical profiling

Statistical profiling is a domain enriched methods, where domain experts create a statistic profile of time series and use the profile to build a rule-based engine which will later perform the anomaly analysis and check for the rules.

PROS

Easily capture the edge cases in the domain
High performance throughput

CONS

Require lot of domain expertise
Low model explainability
Can’t be scaled to different domains
Unable to detect local outlier

Unsupervised learning based Clustering Approach

Unsupervised learning based Clustering approach could be an ideal choice for performing anomaly detection in time series as unsupervised learning does not require explicit labelled data. For anomaly detection, we can use hard clustering or soft clustering techniques on time series data which will later used for anomalies the specific datapoint. But clustering algorithms are parametric and require number of clusters as an input which could be a bottleneck for the detection. Although, there are multiple techniques for estimating the number of clusters, but it not efficient to estimate number of clusters on run-time.

In this case, density models like Density Based Spatial Cluster of Application with Noise (DBSCAN) became the obvious choice. It have two parameters (minimum number of points in each cluster, distance between clusters), which is easier to find.

How DBSCAN works?

DBSCAN is trying to detect the neighbour points and connect them to assign as a single cluster. Below is step by step approach for DBSCAN

Select a random point on the hyperplane
Try to find the closest neighbour to the selected point.
Check if neighbour points are equal or greater than minimum criteria assign them to one cluster.
Select the neighbour points and reiterate step 2 and step 3.
Check the distance between each cluster, if it is lesser then the minimum criteria assign them to same cluster.
Repeat the task until all points and small clusters are covered.

PROS

Require less to no domain expertise.
Better model explainability
Easily detect localise outliers
Scalable as a generalised approach

CONS

Difficult to implement
Generate false negative if repetitive anomalies

To rectify the generation of false negative due to repetitive spatial points, rolling-window based DBSCAN can be helpful as it will map the local anomalies.

CONCLUSION

Anomalies are the rare observations which deviates significantly from the standard behaviour. There are multiple techniques for anomaly detection but few specific approaches can only be applied on temporal data. Unsupervised approaches out perform with domain-agnostic or generalised anomaly detectors.

Timeseries Anomaly Detection

Timeseries model based Confidence Interval

Domain knowledge based statistical profiling

Unsupervised learning based Clustering Approach

How DBSCAN works?

CONCLUSION

Written by Ritik Jain