Photo by Maxim Hopman on Unsplash

Timeseries Anomaly Detection

Ritik Jain

--

Anomalies are the rare events or observations which deviates significantly from standard behaviour of any system. In timeseries, an anomaly could either be significant spike/drop at a specific point in time or disruptive trend in specific time period.

For e.g., the recent COVID-19 pandemic has affected drastically almost every industry. If we try to visualise industrial data over time, we can clearly highlight the disruptive behaviour.

Temporal Data with outliers

There are multiple techniques to perform anomaly detection like One-vs-All SVM, statistical methods, hierarchical methods etc., but to identify disruption or anomaly in temporal data we have some specific approaches like:

  • Timeseries model based Confidence Interval
  • Domain expertise based statistical profiling
  • Unsupervised based Clustering Approach

Timeseries model based Confidence Interval

By creating a predictive model based on historical data which can later be used to determining the bounds for the timeseries. The prediction model can be any timeseries model like AR, MA, ARIMA etc, based on best fit and generate predictive trend line. On predicted line perform confidence interval analysis and detect the bounds. Later those bounds helps detecting final anomalies.

PROS

  • Can solve domain-agnostic problems
  • High explainability
  • Good with local outliers

CONS

  • Highly dependent on Predictive Models
  • Performance Issue
  • Lower the granularity higher the risk of false positive

Domain knowledge based statistical profiling

Statistical profiling is a domain enriched methods, where domain experts create a statistic profile of time series and use the profile to build a rule-based engine which will later perform the anomaly analysis and check for the rules.

PROS

  • Easily capture the edge cases in the domain
  • High performance throughput

CONS

  • Require lot of domain expertise
  • Low model explainability
  • Can’t be scaled to different domains
  • Unable to detect local outlier

Unsupervised learning based Clustering Approach

Unsupervised learning based Clustering approach could be an ideal choice for performing anomaly detection in time series as unsupervised learning does not require explicit labelled data. For anomaly detection, we can use hard clustering or soft clustering techniques on time series data which will later used for anomalies the specific datapoint. But clustering algorithms are parametric and require number of clusters as an input which could be a bottleneck for the detection. Although, there are multiple techniques for estimating the number of clusters, but it not efficient to estimate number of clusters on run-time.

In this case, density models like Density Based Spatial Cluster of Application with Noise (DBSCAN) became the obvious choice. It have two parameters (minimum number of points in each cluster, distance between clusters), which is easier to find.

How DBSCAN works?

DBSCAN is trying to detect the neighbour points and connect them to assign as a single cluster. Below is step by step approach for DBSCAN

  1. Select a random point on the hyperplane
  2. Try to find the closest neighbour to the selected point.
  3. Check if neighbour points are equal or greater than minimum criteria assign them to one cluster.
  4. Select the neighbour points and reiterate step 2 and step 3.
  5. Check the distance between each cluster, if it is lesser then the minimum criteria assign them to same cluster.
  6. Repeat the task until all points and small clusters are covered.

PROS

  • Require less to no domain expertise.
  • Better model explainability
  • Easily detect localise outliers
  • Scalable as a generalised approach

CONS

  • Difficult to implement
  • Generate false negative if repetitive anomalies

To rectify the generation of false negative due to repetitive spatial points, rolling-window based DBSCAN can be helpful as it will map the local anomalies.

CONCLUSION

Anomalies are the rare observations which deviates significantly from the standard behaviour. There are multiple techniques for anomaly detection but few specific approaches can only be applied on temporal data. Unsupervised approaches out perform with domain-agnostic or generalised anomaly detectors.

--

--

Ritik Jain

Fallen for data and understand the problems which can be resolve. Passionate for ML and MLOps.