docs/vmanomaly: v1.12 updates & fixes (#6046)

* docs/vmanomaly: v1.12.0 & link updates

* add autotuned description to model section

* - update refs of vmanomaly on enterprise and vmalert pages
- add diagrams for model types
- update self-monitoring section

* - fix typos
- remove .index.html from links
This commit is contained in:
Fred Navruzov 2024-04-01 16:41:55 +03:00 committed by GitHub
parent c79bf3925c
commit c300ce659f
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
12 changed files with 146 additions and 96 deletions

View file

@ -17,6 +17,20 @@ Please find the changelog for VictoriaMetrics Anomaly Detection below.
> **Important note: Users are strongly encouraged to upgrade to `vmanomaly` [v1.9.2](https://hub.docker.com/repository/docker/victoriametrics/vmanomaly/tags?page=1&ordering=name) or later versions for optimal performance and accuracy. <br><br> This recommendation is crucial for configurations with a low `infer_every` parameter [in your scheduler](https://docs.victoriametrics.com/anomaly-detection/components/scheduler/#parameters-1), and in scenarios where data exhibits significant high-order seasonality patterns (such as hourly or daily cycles). Previous versions from v1.5.1 to v1.8.0 were identified to contain a critical issue impacting model training, where models were inadvertently trained on limited data subsets, leading to suboptimal fits, affecting the accuracy of anomaly detection. <br><br> Upgrading to v1.9.2 addresses this issue, ensuring proper model training and enhanced reliability. For users utilizing Helm charts, it is recommended to upgrade to version [1.0.0](https://github.com/VictoriaMetrics/helm-charts/blob/master/charts/victoria-metrics-anomaly/CHANGELOG.md#100) or newer.**
## v1.12.0
Released: 2024-03-31
- FEATURE: Introduction of `AutoTunedModel` model class to optimize any [built-in model](/anomaly-detection/components/models/#built-in-models) on data during `fit` phase. Specify as little as `anomaly_percentage` param from `(0, 0.5)` interval and `tuned_model_class` (i.e. [`model.zscore.ZscoreModel`](/anomaly-detection/components/models/#z-score)) to get it working with best settings that match your data. See details [here](/anomaly-detection/components/models/#autotuned).
<!--
- FEATURE: Preset support enablement. From now users will be able to specify only a few parameters (like `datasource_url`) + a new (backward-compatible) `preset: preset_name` field in a config file and get a service run with **predefined queries, scheduling and models**. Also, now preset assets (guide, configs, dashboards) will be available at `:8490/presets` endpoint.
-->
- IMPROVEMENT: Better logging of model lifecycle (fit/infer stages).
- IMPROVEMENT: Introduce `provide_series` arg to all the [built-in models](/anomaly-detection/components/models/#built-in-models) to define what output fields to generate for writing (i.e. `provide_series: ['anomaly_score']` means only scores are being produced)
- FIX: [Self-monitoring metrics](anomaly-detection/components/monitoring/#models-behaviour-metrics) are now aggregated to `queries` aliases level (not to label sets of individual timeseries) and aligned with [reader, writer and model sections](/anomaly-detection/components/monitoring/#metrics-generated-by-vmanomaly) description , so `/metrics` endpoint holds only necessary information for scraping.
- FIX: Self-monitoring metric `vmanomaly_models_active` now has additional labels `model_alias`, `scheduler_alias`, `preset` to align with model-centric [self-monitoring](https://docs.victoriametrics.com/anomaly-detection/components/monitoring/#models-behaviour-metrics).
- IMPROVEMENT: Add possibility to use temporal information in [IsolationForest models](/anomaly-detection/components/models/#isolation-forest-multivariate) via [cyclical encoding](https://towardsdatascience.com/cyclical-features-encoding-its-about-time-ce23581845ca). This is particularly helpful to detect multivariate [seasonality](https://victoriametrics.com/blog/victoriametrics-anomaly-detection-handbook-chapter-1/#seasonality)-dependant anomalies.
- BREAKING CHANGE: **ARIMA** model is removed from [built-in models](/anomaly-detection/components/models/#built-in-models). For affected users, it is suggested to replace ARIMA by [Prophet](/anomaly-detection/components/models/#prophet) or [Holt-Winters](/anomaly-detection/components/models/#holt-winters).
## v1.11.0
Released: 2024-02-22
- FEATURE: Multi-scheduler support. Now users can use multiple [model specs](https://docs.victoriametrics.com/anomaly-detection/components/models/) in a single config (via aliasing), each spec can be run with its own (even multiple) [schedulers](https://docs.victoriametrics.com/anomaly-detection/components/scheduler/).

View file

@ -16,18 +16,19 @@ aliases:
For service introduction visit [README](/anomaly-detection/) page
and [Overview](/anomaly-detection/overview.html) of how `vmanomaly` works.
## How to install and run `vmanomaly`
## How to install and run vmanomaly
> To run `vmanomaly` you need to have VictoriaMetrics Enterprise license. You can get a trial license key [**here**](https://victoriametrics.com/products/enterprise/trial/index.html).
> To run `vmanomaly`, you need to have VictoriaMetrics Enterprise license. You can get a trial license key [**here**](https://victoriametrics.com/products/enterprise/trial/).
The following options are available:
- [To run Docker image](#docker-image)
- [To run in Kubernetes with Helm charts](#helm-charts)
- [To run Docker image](#docker)
- [To run in Kubernetes with Helm charts](#kubernetes-with-helm-charts)
### Docker
> To run `vmanomaly` you need to have a VictoriaMetrics Enterprise [licence](https://victoriametrics.com/products/enterprise/) or request a trial [here](https://victoriametrics.com/products/enterprise/trial/index.html).
> To run `vmanomaly`, you need to have VictoriaMetrics Enterprise license. You can get a trial license key [**here**](https://victoriametrics.com/products/enterprise/trial/).
Below are the steps to get `vmanomaly` up and running inside a Docker container:
@ -62,6 +63,8 @@ See also:
### Kubernetes with Helm charts
> To run `vmanomaly`, you need to have VictoriaMetrics Enterprise license. You can get a trial license key [**here**](https://victoriametrics.com/products/enterprise/trial/).
You can run `vmanomaly` in Kubernetes environment
with [these Helm charts](https://github.com/VictoriaMetrics/helm-charts/blob/master/charts/victoria-metrics-anomaly/README.md).

View file

@ -17,23 +17,23 @@ Begin your VictoriaMetrics Anomaly Detection journey with ease using our guides
- **Quickstart**: Check out how to get `vmanomaly` up and running [here](/anomaly-detection/QuickStart.html).
- **Overview**: Find out how `vmanomaly` service operates [here](/anomaly-detection/Overview.html)
- **Integration**: Integrate anomaly detection into your observability ecosystem. Get started [**here**](/anomaly-detection/guides/guide-vmanomaly-vmalert.html).
- **Integration**: Integrate anomaly detection into your observability ecosystem. Get started [here](/anomaly-detection/guides/guide-vmanomaly-vmalert.html).
- **Installation Options**: Select the method that aligns with your technical requirements:
- **Docker Installation**: Suitable for containerized environments. See [Docker guide](/anomaly-detection/Overview.html#run-vmanomaly-docker-container).
- **Helm Chart Installation**: Appropriate for those using Kubernetes. See our [Helm charts](https://github.com/VictoriaMetrics/helm-charts/tree/master/charts/victoria-metrics-anomaly).
> **Note**: starting from [v1.5.0](./CHANGELOG.md#v150) `vmanomaly` requires a [license key](/anomaly-detection/Overview.html#licensing) to run. You can obtain a trial license key [**here**](https://victoriametrics.com/products/enterprise/trial/index.html).
> **Note**: starting from [v1.5.0](./CHANGELOG.md#v150) `vmanomaly` requires a [license key](/anomaly-detection/Overview.html#licensing) to run. You can obtain a trial license key [**here**](https://victoriametrics.com/products/enterprise/trial/).
## Key Components
Explore the integral components that configure VictoriaMetrics Anomaly Detection:
* [Explore components and their interation](/anomaly-detection/components)
- [Models](/anomaly-detection/components/models.html)
- [Reader](/anomaly-detection/components/reader.html)
- [Scheduler](/anomaly-detection/components/scheduler.html)
- [Writer](/anomaly-detection/components/writer.html)
- [Monitoring](/anomaly-detection/components/monitoring.html)
- [Models](/anomaly-detection/components/models)
- [Reader](/anomaly-detection/components/reader)
- [Scheduler](/anomaly-detection/components/scheduler)
- [Writer](/anomaly-detection/components/writer)
- [Monitoring](/anomaly-detection/components/monitoring)
## Deep Dive into Anomaly Detection
Enhance your knowledge with our handbook on Anomaly Detection & Root Cause Analysis and stay updated:
@ -46,12 +46,12 @@ Enhance your knowledge with our handbook on Anomaly Detection & Root Cause Analy
## Frequently Asked Questions (FAQ)
Got questions about VictoriaMetrics Anomaly Detection? Chances are, we've got the answers ready for you.
Dive into [our FAQ section](/anomaly-detection/FAQ.html) to find responses to common questions.
Dive into [our FAQ section](/anomaly-detection/FAQ) to find responses to common questions.
## Get in Touch
We're eager to connect with you and tailor our solutions to your specific needs. Here's how you can engage with us:
* [Book a Demo](https://calendly.com/victoriametrics-anomaly-detection) to discover what our product can do.
* Interested in exploring our [Enterprise features](https://victoriametrics.com/products/enterprise), including Anomaly Detection? [Request your trial license](https://victoriametrics.com/products/enterprise/trial/) today and take the first step towards advanced system observability.
* Interested in exploring our [Enterprise features](https://victoriametrics.com/products/enterprise), including [Anomaly Detection](https://victoriametrics.com/products/enterprise/anomaly-detection)? [Request your trial license](https://victoriametrics.com/products/enterprise/trial/) today and take the first step towards advanced system observability.
---
Our [CHANGELOG is just a click away](./CHANGELOG.md), keeping you informed about the latest updates and enhancements.

Binary file not shown.

After

Width:  |  Height:  |  Size: 15 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 66 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 66 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

View file

@ -25,18 +25,18 @@ vmanomaly includes various [built-in models](#built-in-models) and you can integ
```yaml
models:
model_univariate_1:
class: "model.zscore.ZscoreModel"
class: 'model.zscore.ZscoreModel'
z_threshold: 2.5
queries: ["query_alias2"] # referencing queries defined in `reader` section
queries: ['query_alias2'] # referencing queries defined in `reader` section
model_multivariate_1:
class: "model.isolation_forest.IsolationForestMultivariateModel"
contamination: "auto"
class: 'model.isolation_forest.IsolationForestMultivariateModel'
contamination: 'auto'
args:
n_estimators: 100
# i.e. to assure reproducibility of produced results each time model is fit on the same input
random_state: 42
# if there is no explicit `queries` arg, then the model will be run on ALL queries found in reader section
...
# ...
```
Old-style configs (< [1.10.0](/anomaly-detection/changelog#v1100))
@ -44,9 +44,9 @@ Old-style configs (< [1.10.0](/anomaly-detection/changelog#v1100))
```yaml
model:
class: "model.zscore.ZscoreModel"
z_threshold: 2.5
z_threshold: 3.0
# no explicit `queries` arg is provided
...
# ...
```
will be **implicitly** converted to
@ -55,10 +55,10 @@ will be **implicitly** converted to
models:
default_model: # default model alias, backward compatibility
class: "model.zscore.ZscoreModel"
z_threshold: 2.5
z_threshold: 3.0
# queries arg is created and propagated with all query aliases found in `queries` arg of `reader` section
queries: ["q1", "q2", "q3"] # i.e., if your `queries` in `reader` section has exactly q1, q2, q3 aliases
...
queries: ['q1', 'q2', 'q3'] # i.e., if your `queries` in `reader` section has exactly q1, q2, q3 aliases
# ...
```
@ -77,7 +77,7 @@ This arg is **backward compatible** - if there is no explicit `queries` arg, the
```yaml
models:
model_alias_1:
...
# ...
# no explicit `queries` arg is provided
```
@ -86,9 +86,9 @@ will be implicitly converted to
```yaml
models:
model_alias_1:
...
# queries arg is created and propagated with all query aliases found in `queries` arg of `reader` section
queries: ["q1", "q2", "q3"] # i.e., if your `queries` in `reader` section has exactly q1, q2, q3 aliases
# ...
# if not set, `queries` arg is created and propagated with all query aliases found in `queries` arg of `reader` section
queries: ['q1', 'q2', 'q3'] # i.e., if your `queries` in `reader` section has exactly q1, q2, q3 aliases
```
### Schedulers
@ -102,7 +102,7 @@ This arg is **backward compatible** - if there is no explicit `schedulers` arg,
```yaml
models:
model_alias_1:
...
# ...
# no explicit `schedulers` arg is provided
```
@ -111,11 +111,24 @@ will be implicitly converted to
```yaml
models:
model_alias_1:
...
# queries arg is created and propagated with all query aliases found in `queries` arg of `reader` section
schedulers: ["s1", "s2", "s3"] # i.e., if your `schedulers` section has exactly s1, s2, s3 aliases
# ...
# if not set, `schedulers` arg is created and propagated with all scheduler aliases found in `schedulers` section
schedulers: ['s1', 's2', 's3'] # i.e., if your `schedulers` section has exactly s1, s2, s3 aliases
```
### Provide Series
Introduced in [1.12.0](/anomaly-detection/changelog#1120), `provide_series` arg limit the [output generated](#vmanomaly-output) by `vmanomaly` for writing. I.e. if the model produces default output series `['anomaly_score', 'yhat', 'yhat_lower', 'yhat_upper']` by specifying `provide_series` section as below, you limit the data being written to only `['anomaly_score']` for each metric received as a subject to anomaly detection.
```yaml
models:
model_alias_1:
# ...
provide_series: ['anomaly_score'] # only `anomaly_score` metric will be available for writing back to the database
```
**Note** If `provide_series` is not specified in model config, the model will produce its default [model-dependent output](#vmanomaly-output). The output can't be less than `['anomaly_score']. Even if `timestamp` column is ommitted, it will be implicitly added to `provide_series` list, as it's required for metrics to be properly written.
## Model types
There are **2 model types**, supported in `vmanomaly`, resulting in **4 possible combinations**:
@ -139,7 +152,8 @@ If during an inference, you got a series having **new labelset** (not present in
**Examples:** [Prophet](#prophet), [Holt-Winters](#holt-winters)
<!-- TODO: add schema -->
<p></p>
<img alt="vmanomaly-model-type-univatiate" src="/anomaly-detection/components/model-lifecycle-univariate.webp" width="800px"/>
### Multivariate Models
@ -149,11 +163,12 @@ For example, if you have some **multivariate** model to use 3 [MetricQL queries]
If during an inference, you got a **different amount of series** or some series having a **new labelset** (not present in any of fitted models), the inference will be skipped until you get a model, trained particularly for such labelset during forthcoming re-fit step.
**Implications:** Multivariate models are a go-to default, when your queries returns **fixed** amount of **individual** time series (say, some aggregations), to be used for adding cross-series (and cross-query) context, useful for catching [collective anomalies](https://victoriametrics.com/blog/victoriametrics-anomaly-detection-handbook-chapter-2/index.html#collective-anomalies) or [novelties](https://victoriametrics.com/blog/victoriametrics-anomaly-detection-handbook-chapter-2/index.html#novelties) (expanded to multi-input scenario). For example, you may set it up for anomaly detection of CPU usage in different modes (`idle`, `user`, `system`, etc.) and use its cross-dependencies to detect **unseen (in fit data)** behavior.
**Implications:** Multivariate models are a go-to default, when your queries returns **fixed** amount of **individual** time series (say, some aggregations), to be used for adding cross-series (and cross-query) context, useful for catching [collective anomalies](https://victoriametrics.com/blog/victoriametrics-anomaly-detection-handbook-chapter-2/#collective-anomalies) or [novelties](https://victoriametrics.com/blog/victoriametrics-anomaly-detection-handbook-chapter-2/#novelties) (expanded to multi-input scenario). For example, you may set it up for anomaly detection of CPU usage in different modes (`idle`, `user`, `system`, etc.) and use its cross-dependencies to detect **unseen (in fit data)** behavior.
**Examples:** [IsolationForest](#isolation-forest-multivariate)
<!-- TODO: add schema -->
<p></p>
<img alt="vmanomaly-model-type-multivariate" src="/anomaly-detection/components/model-lifecycle-multivariate.webp" width="800px"/>
### Rolling Models
@ -169,7 +184,8 @@ Such models put **more pressure** on your reader's source, i.e. if your model sh
**Examples:** [RollingQuantile](#rolling-quantile)
<!-- TODO: add schema -->
<p></p>
<img alt="vmanomaly-model-type-rolling" src="/anomaly-detection/components/model-type-rolling.webp" width="800px"/>
### Non-Rolling Models
@ -185,7 +201,8 @@ Produced model instances are **stored in-memory** between consecutive re-fit cal
**Examples:** [Prophet](#prophet)
<!-- TODO: add schema -->
<p></p>
<img alt="vmanomaly-model-type-non-rolling" src="/anomaly-detection/components/model-type-non-rolling.webp" width="800px"/>
## Built-in Models
@ -199,18 +216,52 @@ VM Anomaly Detection (`vmanomaly` hereinafter) models support 2 groups of parame
**Models**:
* [AutoTuned](#autotuned) - designed to take the cognitive load off the user, allowing any of built-in models below to be re-tuned for best params on data seen during each `fit` phase of the algorithm. Tradeoff is between increased computational time and optimized results / simpler maintenance.
* [Prophet](#prophet) - the most versatile one for production usage, especially for complex data ([trends](https://victoriametrics.com/blog/victoriametrics-anomaly-detection-handbook-chapter-1/#trend), [change points](https://victoriametrics.com/blog/victoriametrics-anomaly-detection-handbook-chapter-2/#novelties), [multi-seasonality](https://victoriametrics.com/blog/victoriametrics-anomaly-detection-handbook-chapter-1/#seasonality))
* [Z-score](#z-score) - useful for testing and for simpler data ([de-trended](https://victoriametrics.com/blog/victoriametrics-anomaly-detection-handbook-chapter-1/#trend) data without strict [seasonality](https://victoriametrics.com/blog/victoriametrics-anomaly-detection-handbook-chapter-1/#seasonality) and with anomalies of similar magnitude as your "normal" data)
* [Holt-Winters](#holt-winters) - well-suited for **data with moderate complexity**, exhibiting distinct [trends](https://victoriametrics.com/blog/victoriametrics-anomaly-detection-handbook-chapter-1/#trend) and/or [seasonal patterns](https://victoriametrics.com/blog/victoriametrics-anomaly-detection-handbook-chapter-1/#seasonality).
* [MAD (Median Absolute Deviation)](#mad-median-absolute-deviation) - similarly to Z-score, is effective for **identifying outliers in relatively consistent data** (useful for detecting sudden, stark deviations from the median)
* [Rolling Quantile](#rolling-quantile) - best for **data with evolving patterns**, as it adapts to changes over a rolling window.
* [Seasonal Trend Decomposition](#seasonal-trend-decomposition) - similarly to Holt-Winters, is best for **data with pronounced [seasonal](https://victoriametrics.com/blog/victoriametrics-anomaly-detection-handbook-chapter-1/#seasonality) and [trend](https://victoriametrics.com/blog/victoriametrics-anomaly-detection-handbook-chapter-1/#trend) components**
* [ARIMA](#arima) - use when your data shows **clear patterns or autocorrelation (the degree of correlation between values of the same series at different periods)**. However, good understanding of machine learning is required to tune.
* [Isolation forest (Multivariate)](#isolation-forest-multivariate) - useful for **metrics data interaction** (several queries/metrics -> single anomaly score) and **efficient in detecting anomalies in high-dimensional datasets**
* [Custom model](#custom-model-guide) - benefit from your own models and expertise to better support your **unique use case**.
### AutoTuned
Tuning hyperparameters of a model can be tricky and often requires in-depth knowledge of Machine Learning. `AutoTunedModel` is designed specifically to take the cognitive load off the user - specify as little as `anomaly_percentage` param from `(0, 0.5)` interval and `tuned_model_class` (i.e. [`model.zscore.ZscoreModel`](/anomaly-detection/components/models/#z-score)) to get it working with best settings that match your data.
*Parameters specific for vmanomaly*:
* `class` (string) - model class name `"model.auto.AutoTunedModel"`
* `tuned_class_name` (string) - Built-in model class to tune, i.e. `model.zscore.ZscoreModel`.
* `optimization_params` (dict) - Optimization parameters for unsupervised model tuning. Control % of found anomalies, as well as a tradeoff between time spent and the accuracy. The more `timeout` and `n_trials` are, the better model configuration can be found for `tuned_class_name`, but the longer it takes and vice versa. Set `n_jobs` to `-1` to use all the CPUs available, it makes sense if only you have a big dataset to train on during `fit` calls, otherwise overhead isn't worth it.
- `anomaly_percentage` (float) - expected percentage of anomalies that can be seen in training data, from (0, 0.5) interval.
- `seed` (int) - Random seed for reproducibility and deterministic nature of underlying optimizations.
- `n_splits` (int) - How many folds to create for hyperparameter tuning out of your data. The higher, the longer it takes but the better the results can be. Defaults to 3.
- `n_trials` (int) - How many trials to sample from hyperparameter search space. The higher, the longer it takes but the better the results can be. Defaults to 128.
- `timeout` (float) - How many seconds in total can be spent on each model to tune hyperparameters. The higher, the longer it takes, allowing to test more trials out of defined `n_trials`, but the better the results can be.
<img alt="vmanomaly-autotune-schema" src="/anomaly-detection/components/autotune.webp" width="800px"/>
```yaml
# ...
models:
your_desired_alias_for_a_model:
class: 'model.auto.AutoTunedModel'
tuned_class_name: 'model.zscore.ZscoreModel'
optimization_params:
anomaly_percentage: 0.004 # required. i.e. we expect <= 0.4% of anomalies to be present in training data
seed: 42 # fix reproducibility & determinism
n_splits: 4 # how much folds are created for internal cross-validation
n_trials: 128 # how many configurations to sample from search space during optimization
timeout: 10 # how many seconds to spend on optimization for each trained model during `fit` phase call
n_jobs: 1 # how many jobs in parallel to launch. Consider making it > 1 only if you have fit window containing > 10000 datapoints for each series
# ...
```
**Note**: Autotune can't be made on your [custom model](#custom-model-guide). Also, it can't be applied to itself (like `tuned_class_name: 'model.auto.AutoTunedModel'`)
### [Prophet](https://facebook.github.io/prophet/)
Here we utilize the Facebook Prophet implementation, as detailed in their [library documentation](https://facebook.github.io/prophet/docs/quick_start.html#python-api). All parameters from this library are compatible and can be passed to the model.
@ -218,7 +269,6 @@ Here we utilize the Facebook Prophet implementation, as detailed in their [libra
* `class` (string) - model class name `"model.prophet.ProphetModel"`
* `seasonalities` (list[dict], optional) - Extra seasonalities to pass to Prophet. See [`add_seasonality()`](https://facebook.github.io/prophet/docs/seasonality,_holiday_effects,_and_regressors.html#modeling-holidays-and-special-events:~:text=modeling%20the%20cycle-,Specifying,-Custom%20Seasonalities) Prophet param.
* `provide_series` (dict, optional) - model resulting metrics. If not specified [standard metrics](#vmanomaly-output) will be provided.
**Note**: Apart from standard vmanomaly output Prophet model can provide [additional metrics](#additional-output-metrics-produced-by-fb-prophet).
@ -237,7 +287,8 @@ Depending on chosen `seasonality` parameter FB Prophet can return additional met
```yaml
models:
your_desired_alias_for_a_model:
class: "model.prophet.ProphetModel"
class: 'model.prophet.ProphetModel'
provide_series: ['anomaly_score', 'yhat', 'yhat_lower', 'yhat_upper', 'trend']
seasonalities:
- name: 'hourly'
period: 0.04166666666
@ -383,40 +434,6 @@ Resulting metrics of the model are described [here](#vmanomaly-output).
* `trend` - The trend component of the data series.
* `seasonal` - The seasonal component of the data series.
### [ARIMA](https://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average)
Here we use ARIMA implementation from `statsmodels` [library](https://www.statsmodels.org/dev/generated/statsmodels.tsa.arima.model.ARIMA.html)
*Parameters specific for vmanomaly*:
* `class` (string) - model class name `"model.arima.ArimaModel"`
* `z_threshold` (float, optional) - [standard score](https://en.wikipedia.org/wiki/Standard_score) for calculating boundaries to define anomaly score. Defaults to `2.5`.
* `provide_series` (list[string], optional) - List of columns to be produced and returned by the model. Defaults to `["anomaly_score", "yhat", "yhat_lower" "yhat_upper", "y"]`. Output can be **only a subset** of a given column list.
* `resample_freq` (string, optional) - Frequency to resample input data into, e.g. data comes at 15 seconds resolution, and resample_freq is '1m'. Then fitting data will be downsampled to '1m' and internal model is trained at '1m' intervals. So, during inference, prediction data would be produced at '1m' intervals, but interpolated to "15s" to match with expected output, as output data must have the same timestamps.
*Default model parameters*:
* `order` (list[int]) - ARIMA's (p,d,q) order of the model for the autoregressive, differences, and moving average components, respectively.
* `args` (dict, optional) - Inner model args (key-value pairs). See accepted params in [model documentation](https://www.statsmodels.org/dev/generated/statsmodels.tsa.arima.model.ARIMA.html). Defaults to empty (not provided). Example: {"trend": "c"}
*Config Example*
```yaml
models:
your_desired_alias_for_a_model:
class: "model.arima.ArimaModel"
# ARIMA's (p,d,q) order
order: [1, 1, 0]
z_threshold: 2.7
resample_freq: '1m'
# Inner model args (key-value pairs) accepted by statsmodels.tsa.arima.model.ARIMA
args:
trend: 'c'
```
### [Isolation forest](https://en.wikipedia.org/wiki/Isolation_forest) (Multivariate)
Detects anomalies using binary trees. The algorithm has a linear time complexity and a low memory requirement, which works well with high-volume data. It can be used on both univatiate and multivariate data, but it is more effective in multivariate case.
@ -431,6 +448,15 @@ Here we use Isolation Forest implementation from `scikit-learn` [library](https:
* `contamination` (float or string, optional) - The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the scores of the samples. Default value - "auto". Should be either `"auto"` or be in the range (0.0, 0.5].
* `seasonal_features` (list of string) - List of seasonality to encode through [cyclical encoding](https://towardsdatascience.com/cyclical-features-encoding-its-about-time-ce23581845ca), i.e. `dow` (day of week). **Introduced in [1.12.0](/anomaly-detection/CHANGELOG/#v1120)**.
- Empty by default for backward compatibility.
- Example: `seasonal_features: ['dow', 'hod']`.
- Supported seasonalities:
- "minute" - minute of hour (0-59)
- "hod" - hour of day (0-23)
- "dow" - day of week (1-7)
- "month" - month of year (1-12)
* `args` (dict, optional) - Inner model args (key-value pairs). See accepted params in [model documentation](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html). Defaults to empty (not provided). Example: {"random_state": 42, "n_estimators": 100}
*Config Example*
@ -441,7 +467,9 @@ models:
your_desired_alias_for_a_model:
# To use univariate model, substitute class argument with "model.isolation_forest.IsolationForestModel".
class: "model.isolation_forest.IsolationForestMultivariateModel"
contamination: "auto"
contamination: "0.01"
provide_series: ['anomaly_score']
seasonal_features: ['dow', 'hod']
args:
n_estimators: 100
# i.e. to assure reproducibility of produced results each time model is fit on the same input
@ -499,6 +527,7 @@ In the `CustomModel` class there should be three required methods - `__init__`,
* `__init__` method should initiate parameters for the model.
**Note**: if your model relies on configs that have `arg` [key-value pair argument](./models.md#section-overview), do not forget to use Python's `**kwargs` in method's signature and to explicitly call
```python
super().__init__(**kwargs)
```
@ -556,7 +585,6 @@ class CustomModel(Model):
```
### 2. Configuration file
Next, we need to create `config.yaml` file with VM Anomaly Detection configuration and model input parameters.
@ -565,13 +593,14 @@ You can find out more about configuration parameters in [vmanomaly config docs](
```yaml
scheduler:
infer_every: "1m"
fit_every: "1m"
fit_window: "1d"
schedulers:
s1:
infer_every: "1m"
fit_every: "1m"
fit_window: "1d"
models:
your_desired_alias_for_a_model:
custom_model:
# note: every custom model should implement this exact path, specified in `class` field
class: "model.model.CustomModel"
# custom model params are defined here
@ -588,7 +617,6 @@ writer:
metric_format:
__name__: "custom_$VAR"
for: "$QUERY_KEY"
model: "custom"
run: "test-format"
monitoring:
@ -631,8 +659,7 @@ Please find more detailed instructions (license, etc.) [here](/anomaly-detection
As the result, this model will return metric with labels, configured previously in `config.yaml`.
In this particular example, 2 metrics will be produced. Also, there will be added other metrics from input query result.
```
{__name__="custom_anomaly_score", for="ingestion_rate", model="custom", run="test-format"}
{__name__="custom_anomaly_score", for="churn_rate", model="custom", run="test-format"}
```text
{__name__="custom_anomaly_score", for="ingestion_rate", model_alias="custom_model", scheduler_alias="s1", run="test-format"},
{__name__="custom_anomaly_score", for="churn_rate", model_alias="custom_model", scheduler_alias="s1", run="test-format"}
```

View file

@ -144,37 +144,37 @@ Label names [description](#labelnames)
<td><code>vmanomaly_model_runs</code></td>
<td>Counter</td>
<td>How many times models ran (per model)</td>
<td><code>stage, query_key, model_alias</code></td>
<td><code>stage, query_key, model_alias, scheduler_alias, preset</code></td>
</tr>
<tr>
<td><code>vmanomaly_model_run_duration_seconds</code></td>
<td>Summary</td>
<td>How much time (in seconds) model invocations took</td>
<td><code>stage, query_key, model_alias</code></td>
<td><code>stage, query_key, model_alias, scheduler_alias, preset</code></td>
</tr>
<tr>
<td><code>vmanomaly_model_datapoints_accepted</code></td>
<td>Counter</td>
<td>How many datapoints did models accept</td>
<td><code>stage, query_key, model_alias</code></td>
<td><code>stage, query_key, model_alias, scheduler_alias, preset</code></td>
</tr>
<tr>
<td><code>vmanomaly_model_datapoints_produced</code></td>
<td>Counter</td>
<td>How many datapoints were generated by models</td>
<td><code>stage, query_key, model_alias</code></td>
<td><code>stage, query_key, model_alias, scheduler_alias, preset</code></td>
</tr>
<tr>
<td><code>vmanomaly_models_active</code></td>
<td>Gauge</td>
<td>How many models are currently inferring</td>
<td><code>query_key</code></td>
<td><code>query_key, model_alias, scheduler_alias, preset</code></td>
</tr>
<tr>
<td><code>vmanomaly_model_runs_skipped</code></td>
<td>Counter</td>
<td>How many times a run was skipped (per model)</td>
<td><code>stage, query_key, model_alias</code></td>
<td><code>stage, query_key, model_alias, scheduler_alias, preset</code></td>
</tr>
</tbody>
</table>
@ -284,14 +284,18 @@ Label names [description](#labelnames)
</table>
### Labelnames
<code>stage</code> - stage of model - 'fit', 'infer' or 'fit_infer' for models that do it simultaneously.
<code>stage</code> - stage of model - 'fit', 'infer' or 'fit_infer' for models that do it simultaneously, see [model types](/anomaly-detection/components/models/#model-types).
<code>query_key</code> - query alias from [`reader`](/anomaly-detection/components/reader.html) config section.
<code>model_alias</code> - model alias from [`models`](/anomaly-detection/components/models.html) config section. **Introduced in [v1.10.0](/anomaly-detection/changelog/#v1100).**
<code>scheduler_alias</code> - scheduler alias from [`schedulers`](anomaly-detection/components/scheduler/) config section. **Introduced in [v1.11.0](/anomaly-detection/changelog/#v1110).**
<code>preset</code> - preset alias for forthcoming `preset` section compatibility. **Introduced in [v1.12.0](/anomaly-detection/changelog/#v1120).**
<code>url</code> - writer or reader url endpoint.
<code>code</code> - response status code or `connection_error`, `timeout`.
<code>step</code> - json or dataframe reading step.
<code>step</code> - json or dataframe reading step.

View file

@ -55,6 +55,7 @@ On top of this, Enterprise package of VictoriaMetrics includes the following fea
by specifying different retentions for different datasets.
- [Automatic discovery of vmstorage nodes](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#automatic-vmstorage-discovery) -
this feature allows updating the list of `vmstorage` nodes at `vminsert` and `vmselect` without the need to restart these services.
- [Anomaly Detection Service](https://docs.victoriametrics.com/anomaly-detection) - this feature allows automation and simplification of your alerting rules, covering [complex anomalies](https://victoriametrics.com/blog/victoriametrics-anomaly-detection-handbook-chapter-2/) found in metrics data.
- [Backup automation](https://docs.victoriametrics.com/vmbackupmanager.html).
- [Advanced per-tenant stats](https://docs.victoriametrics.com/PerTenantStatistic.html).
- [Advanced auth and rate limiter](https://docs.victoriametrics.com/vmgateway.html).
@ -66,7 +67,6 @@ On top of this, Enterprise package of VictoriaMetrics includes the following fea
- [Multitenant support in vmalert](https://docs.victoriametrics.com/vmalert.html#multitenancy).
- [Ability to read alerting and recording rules from Object Storage](https://docs.victoriametrics.com/vmalert.html#reading-rules-from-object-storage).
- [Ability to filter incoming requests by IP at vmauth](https://docs.victoriametrics.com/vmauth.html#ip-filters).
- [Anomaly Detection Service](https://docs.victoriametrics.com/anomaly-detection).
Contact us via [this page](https://victoriametrics.com/products/enterprise/) if you are interested in VictoriaMetrics Enterprise.

View file

@ -114,6 +114,8 @@ groups:
[ - <rule_group> ]
```
> Explore how to integrate `vmalert` with [VictoriaMetrics Anomaly Detection](/anomaly-detection/) in the following [guide](/anomaly-detection/guides/guide-vmanomaly-vmalert/)
### Groups
Each group has the following attributes: