model list - isolation forest (#5235)

* model list - isolation forest

* curse of dimensionality

* isol forest definition change, minor fixes

* blank line fix
This commit is contained in:
Daria Karavaieva 2023-10-26 12:25:54 +02:00 committed by GitHub
parent 68b1b3c4d4
commit b60bb1d98a
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -17,13 +17,13 @@ Please [contact us](https://victoriametrics.com/contact-us/) to find out more._*
## About
**VictoriaMetrics Anomaly Detection** is a service that continuously scans Victoria Metrics time
**VictoriaMetrics Anomaly Detection** is a service that continuously scans VictoriaMetrics time
series and detects unexpected changes within data patterns in real-time. It does so by utilizing
user-configurable machine learning models.
It periodically queries user-specified metrics, computes an “anomaly score” for them, based on how
well they fit a predicted distribution, taking into account periodical data patterns with trends,
and pushes back the computed “anomaly score” to Victoria Metrics. Then, users can enable alerting
and pushes back the computed “anomaly score” to VictoriaMetrics. Then, users can enable alerting
rules based on the “anomaly score”.
Compared to classical alerting rules, anomaly detection is more “hands-off” i.e. it allows users to
@ -37,7 +37,7 @@ metrics.
## How?
Victoria Metrics Anomaly Detection service (**vmanomaly**) allows you to apply several built-in
VictoriaMetrics Anomaly Detection service (**vmanomaly**) allows you to apply several built-in
anomaly detection algorithms. You can also plug in your own detection models, code doesnt make any
distinction between built-in models or external ones.
@ -94,6 +94,12 @@ Currently, vmanomaly ships with a few common models:
A simple moving window of quantiles. Easy to use, easy to understand, but not as powerful as
other models.
1. **Isolation Forest**
Detects anomalies using binary trees. It works for both univariate and multivariate data. Be aware of [the curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality) in the case of multivariate data - we advise against using a single model when handling multiple time series *if the number of these series significantly exceeds their average length (# of data points)*.
The algorithm has a linear time complexity and a low memory requirement, which works well with high-volume data. See [scikit-learn.org documentation](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html) for Isolation Forest.
### Examples
For example, heres how Prophet predictions could look like on a real-data example
@ -115,7 +121,7 @@ Then, reads new data from VictoriaMetrics, according to schedule, and invokes it
“anomaly score” for each data point. The anomaly score ranges from 0 to positive infinity.
Values less than 1.0 are considered “not an anomaly”, values greater or equal than 1.0 are
considered “anomalous”, with greater values corresponding to larger anomaly.
Then, VMAnomaly pushes the metric to vminsert (under the user-configured metric name,
Then, vmanomaly pushes the metric to vminsert (under the user-configured metric name,
optionally preserving labels).