mirror of
https://github.com/VictoriaMetrics/VictoriaMetrics.git
synced 2024-11-21 14:44:00 +00:00
docs/vmanomaly - release 1.13.0 preparation (#6436)
### Describe Your Changes [vmanomaly docs](https://docs.victoriametrics.com/anomaly-detection/) update for changes, introduced in v1.13.0 ### Checklist The following checks are **mandatory**: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/).
This commit is contained in:
parent
33d07e915f
commit
1feb5d04d7
16 changed files with 285 additions and 78 deletions
|
@ -15,8 +15,19 @@ aliases:
|
||||||
|
|
||||||
Please find the changelog for VictoriaMetrics Anomaly Detection below.
|
Please find the changelog for VictoriaMetrics Anomaly Detection below.
|
||||||
|
|
||||||
> **Important note: Users are strongly encouraged to upgrade to `vmanomaly` [v1.9.2](https://hub.docker.com/repository/docker/victoriametrics/vmanomaly/tags?page=1&ordering=name) or later versions for optimal performance and accuracy. <br><br> This recommendation is crucial for configurations with a low `infer_every` parameter [in your scheduler](https://docs.victoriametrics.com/anomaly-detection/components/scheduler/#parameters-1), and in scenarios where data exhibits significant high-order seasonality patterns (such as hourly or daily cycles). Previous versions from v1.5.1 to v1.8.0 were identified to contain a critical issue impacting model training, where models were inadvertently trained on limited data subsets, leading to suboptimal fits, affecting the accuracy of anomaly detection. <br><br> Upgrading to v1.9.2 addresses this issue, ensuring proper model training and enhanced reliability. For users utilizing Helm charts, it is recommended to upgrade to version [1.0.0](https://github.com/VictoriaMetrics/helm-charts/blob/master/charts/victoria-metrics-anomaly/CHANGELOG.md#100) or newer.**
|
> **Important note: Users are strongly encouraged to upgrade to `vmanomaly` [v1.9.2](https://hub.docker.com/repository/docker/victoriametrics/vmanomaly/tags?page=1&ordering=name) or newer for optimal performance and accuracy. <br><br> This recommendation is crucial for configurations with a low `infer_every` parameter [in your scheduler](https://docs.victoriametrics.com/anomaly-detection/components/scheduler/#parameters-1), and in scenarios where data exhibits significant high-order seasonality patterns (such as hourly or daily cycles). Previous versions from v1.5.1 to v1.8.0 were identified to contain a critical issue impacting model training, where models were inadvertently trained on limited data subsets, leading to suboptimal fits, affecting the accuracy of anomaly detection. <br><br> Upgrading to v1.9.2 addresses this issue, ensuring proper model training and enhanced reliability. For users utilizing Helm charts, it is recommended to upgrade to version [1.0.0](https://github.com/VictoriaMetrics/helm-charts/blob/master/charts/victoria-metrics-anomaly/CHANGELOG.md#100) or newer.**
|
||||||
|
|
||||||
|
## v1.13.0
|
||||||
|
Released: 2024-06-11
|
||||||
|
- FEATURE: Introduced `min_dev_from_expected` [model common arg](/anomaly-detection/components/models/#minimal-deviation-from-expected), aimed at **reducing [false positives](https://victoriametrics.com/blog/victoriametrics-anomaly-detection-handbook-chapter-1/#false-positive)** in scenarios where deviations between the real value `y` and the expected value `yhat` are **relatively** high and may cause models to generate high [anomaly scores](https://docs.victoriametrics.com/anomaly-detection/faq/#what-is-anomaly-score). However, these deviations are not significant enough in **absolute values** to be considered anomalies based on domain knowledge.
|
||||||
|
- FEATURE: Introduced `detection_direction` [model common arg](/anomaly-detection/components/models/#detection-direction), enabling domain-driven anomaly detection strategies. Configure models to identify anomalies occurring *above, below, or in both directions* relative to the expected values.
|
||||||
|
- FEATURE: add `n_jobs` arg to [`BacktestingScheduler`](/anomaly-detection/components/scheduler/#backtesting-scheduler) to allow *proportionally faster (yet more resource-intensive)* evaluations of a config on historical data. Default value is 1, that implies *sequential* execution.
|
||||||
|
- FEATURE: allow anomaly detection models to be dumped to a host filesystem after `fit` stage (instead of in-memory). Resource-intensive setups (many models, many metrics, bigger [`fit_window` arg](/anomaly-detection/components/scheduler/#periodic-scheduler-config-example)) and/or 3rd-party models that store fit data (like [ProphetModel](/anomaly-detection/components/models/index.html#prophet) or [HoltWinters](/anomaly-detection/components/models/index.html#holt-winters)) will have RAM consumption greatly reduced at a cost of slightly slower `infer` stage. Please find how to enable it [here](/anomaly-detection/faq/#resource-consumption-of-vmanomaly)
|
||||||
|
- IMPROVEMENT: Reduced the resource used for each fitted [`ProphetModel`](/anomaly-detection/components/models/index.html#prophet) by up to 6 times. This includes both RAM for in-memory models and disk space for on-disk models storage. For more details, refer to [this discussion on Facebook's Prophet](https://github.com/facebook/prophet/issues/1159#issuecomment-537415637).
|
||||||
|
- IMPROVEMENT: now config [components](/anomaly-detection/components/index.html) class can be referenced by a short alias instead of a full class path - i.e. `model.zscore.ZscoreModel` becomes `zscore`, `reader.vm.VmReader` becomes `vm`, `scheduler.periodic.PeriodicScheduler` becomes `periodic`, etc.
|
||||||
|
- FIX: if using multi-scheduler setup (introduced in [v1.11.0](/anomaly-detection/changelog/#v1110)), prevent schedulers (and correspondent services) that are not attached to any model (so neither found in ['schedulers' arg](/anomaly-detection/components/models/index.html#schedulers) nor left blank in `model` section) from being spawn, causing resource overhead and slight interference with existing ones.
|
||||||
|
- FIX: set random seed for [ProphetModel](/anomaly-detection/components/models#prophet) to assure uncertainty estimates (like `yhat_lower`, `yhat_upper`) and dependant series (like `anomaly_score`), produced during `.infer()` calls are always deterministic given the same input. See [initial issue](https://github.com/facebook/prophet/issues/1124) for the details.
|
||||||
|
- FIX: prevent *orphan* queries (that are not attached to any model or scheduler) found in `queries` arg of [Reader config section](/anomaly-detection/components/reader/index.html#vm-reader) to be fetched from VictoriaMetrics TSDB, avoiding redundant data processing. A warning will be logged, if such queries exist in a parsed config.
|
||||||
|
|
||||||
## v1.12.0
|
## v1.12.0
|
||||||
Released: 2024-03-31
|
Released: 2024-03-31
|
||||||
|
|
|
@ -76,7 +76,7 @@ Starting from [v1.7.2](/anomaly-detection/changelog/#v172) you can produce (and
|
||||||
```yaml
|
```yaml
|
||||||
schedulers:
|
schedulers:
|
||||||
scheduler_alias:
|
scheduler_alias:
|
||||||
class: "scheduler.backtesting.BacktestingScheduler"
|
class: 'backtesting' # or "scheduler.backtesting.BacktestingScheduler" until v1.13.0
|
||||||
# define historical period to backtest on
|
# define historical period to backtest on
|
||||||
# should be bigger than at least (fit_window + fit_every) time range
|
# should be bigger than at least (fit_window + fit_every) time range
|
||||||
from_iso: '2024-01-01T00:00:00Z'
|
from_iso: '2024-01-01T00:00:00Z'
|
||||||
|
@ -116,6 +116,37 @@ Configuration above will produce N intervals of full length (`fit_window`=14d +
|
||||||
## Resource consumption of vmanomaly
|
## Resource consumption of vmanomaly
|
||||||
`vmanomaly` itself is a lightweight service, resource usage is primarily dependent on [scheduling](/anomaly-detection/components/scheduler.html) (how often and on what data to fit/infer your models), [# and size of timeseries returned by your queries](/anomaly-detection/components/reader.html#vm-reader), and the complexity of the employed [models](anomaly-detection/components/models.html). Its resource usage is directly related to these factors, making it adaptable to various operational scales.
|
`vmanomaly` itself is a lightweight service, resource usage is primarily dependent on [scheduling](/anomaly-detection/components/scheduler.html) (how often and on what data to fit/infer your models), [# and size of timeseries returned by your queries](/anomaly-detection/components/reader.html#vm-reader), and the complexity of the employed [models](anomaly-detection/components/models.html). Its resource usage is directly related to these factors, making it adaptable to various operational scales.
|
||||||
|
|
||||||
|
> **Note**: Starting from [v1.13.0](/anomaly-detection/changelog/#v1130), there is a mode to save anomaly detection models on host filesystem after `fit` stage (instead of keeping them in-memory by default). **Resource-intensive setups** (many models, many metrics, bigger [`fit_window` arg](/anomaly-detection/components/scheduler/#periodic-scheduler-config-example)) and/or 3rd-party models that store fit data (like [ProphetModel](/anomaly-detection/components/models/index.html#prophet) or [HoltWinters](/anomaly-detection/components/models/index.html#holt-winters)) will have RAM consumption greatly reduced at a cost of slightly slower `infer` stage. To enable it, you need to set environment variable `VMANOMALY_MODEL_DUMPS_DIR` to desired location. [Helm charts](https://github.com/VictoriaMetrics/helm-charts/blob/master/charts/victoria-metrics-anomaly/README.md) are being updated accordingly ([`StatefulSet`](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/) for persistent storage starting from chart version `1.3.0`).
|
||||||
|
|
||||||
|
Here's an example of how to set it up in docker-compose using volumes:
|
||||||
|
```yaml
|
||||||
|
services:
|
||||||
|
# ...
|
||||||
|
vmanomaly:
|
||||||
|
container_name: vmanomaly
|
||||||
|
image: victoriametrics/vmanomaly:latest
|
||||||
|
# ...
|
||||||
|
ports:
|
||||||
|
- "8490:8490"
|
||||||
|
restart: always
|
||||||
|
volumes:
|
||||||
|
- ./vmanomaly_config.yml:/config.yaml
|
||||||
|
- ./vmanomaly_license:/license
|
||||||
|
# map the host directory to the container directory
|
||||||
|
- vmanomaly_model_dump_dir:/vmanomaly/tmp/models
|
||||||
|
environment:
|
||||||
|
# set the environment variable for the model dump directory
|
||||||
|
- VMANOMALY_MODEL_DUMPS_DIR=/vmanomaly/tmp/models/
|
||||||
|
platform: "linux/amd64"
|
||||||
|
command:
|
||||||
|
- "/config.yaml"
|
||||||
|
- "--license-file=/license"
|
||||||
|
|
||||||
|
volumes:
|
||||||
|
# ...
|
||||||
|
vmanomaly_model_dump_dir: {}
|
||||||
|
```
|
||||||
|
|
||||||
## Scaling vmanomaly
|
## Scaling vmanomaly
|
||||||
> **Note:** As of latest release we don't support cluster or auto-scaled version yet (though, it's in our roadmap for - better backends, more parallelization, etc.), so proposed workarounds should be addressed manually.
|
> **Note:** As of latest release we don't support cluster or auto-scaled version yet (though, it's in our roadmap for - better backends, more parallelization, etc.), so proposed workarounds should be addressed manually.
|
||||||
|
|
||||||
|
@ -128,6 +159,7 @@ Configuration above will produce N intervals of full length (`fit_window`=14d +
|
||||||
- or **schedulers** (in case you want the same models to be trained under several schedules) - see `schedulers` arg [model section](/anomaly-detection/components/models/#schedulers) and `scheduler` [component itself](/anomaly-detection/components/scheduler/)
|
- or **schedulers** (in case you want the same models to be trained under several schedules) - see `schedulers` arg [model section](/anomaly-detection/components/models/#schedulers) and `scheduler` [component itself](/anomaly-detection/components/scheduler/)
|
||||||
|
|
||||||
|
|
||||||
|
Here's an example of how to split on `extra_filters` param
|
||||||
```yaml
|
```yaml
|
||||||
# config file #1, for 1st vmanomaly instance
|
# config file #1, for 1st vmanomaly instance
|
||||||
# ...
|
# ...
|
||||||
|
|
|
@ -167,7 +167,7 @@ schedulers:
|
||||||
|
|
||||||
models:
|
models:
|
||||||
prophet: # or use a model alias of your choice here
|
prophet: # or use a model alias of your choice here
|
||||||
class: "model.prophet.ProphetModel"
|
class: "prophet" # or "model.prophet.ProphetModel" until v1.13.0
|
||||||
args:
|
args:
|
||||||
interval_width: 0.98
|
interval_width: 0.98
|
||||||
|
|
||||||
|
|
|
@ -25,6 +25,7 @@ The following options are available:
|
||||||
- [To run Docker image](#docker)
|
- [To run Docker image](#docker)
|
||||||
- [To run in Kubernetes with Helm charts](#kubernetes-with-helm-charts)
|
- [To run in Kubernetes with Helm charts](#kubernetes-with-helm-charts)
|
||||||
|
|
||||||
|
> **Note**: Starting from [v1.13.0](/anomaly-detection/changelog/#v1130) there is a mode to keep anomaly detection models on host filesystem after `fit` stage (instead of keeping them in-memory by default); This may lead to **noticeable reduction of RAM used** on bigger setups. See instructions [here](/anomaly-detection/faq/#resource-consumption-of-vmanomaly).
|
||||||
|
|
||||||
### Docker
|
### Docker
|
||||||
|
|
||||||
|
@ -83,7 +84,7 @@ scheduler:
|
||||||
|
|
||||||
models:
|
models:
|
||||||
prophet_model:
|
prophet_model:
|
||||||
class: "model.prophet.ProphetModel"
|
class: "prophet" # or "model.prophet.ProphetModel" until v1.13.0
|
||||||
args:
|
args:
|
||||||
interval_width: 0.98
|
interval_width: 0.98
|
||||||
|
|
||||||
|
|
|
@ -23,8 +23,68 @@ This chapter describes different components, that correspond to respective secti
|
||||||
|
|
||||||
> **Note**: starting from [v1.7.0](/anomaly-detection/CHANGELOG.html#v172), once the service starts, automated config validation is performed. Please see container logs for errors that need to be fixed to create fully valid config, visiting sections above for examples and documentation.
|
> **Note**: starting from [v1.7.0](/anomaly-detection/CHANGELOG.html#v172), once the service starts, automated config validation is performed. Please see container logs for errors that need to be fixed to create fully valid config, visiting sections above for examples and documentation.
|
||||||
|
|
||||||
|
> **Note**: starting from [v1.13.0](/anomaly-detection/CHANGELOG.html#v1130), components' class can be referenced by a short alias instead of a full class path - i.e. `model.zscore.ZscoreModel` becomes `zscore`, `reader.vm.VmReader` becomes `vm`, `scheduler.periodic.PeriodicScheduler` becomes `periodic`, etc. Please see according sections for the details.
|
||||||
|
|
||||||
Below, you will find an example illustrating how the components of `vmanomaly` interact with each other and with a single-node VictoriaMetrics setup.
|
Below, you will find an example illustrating how the components of `vmanomaly` interact with each other and with a single-node VictoriaMetrics setup.
|
||||||
|
|
||||||
> **Note**: [Reader](/anomaly-detection/components/reader.html#vm-reader) and [Writer](/anomaly-detection/components/writer.html#vm-writer) also support [multitenancy](/Cluster-VictoriaMetrics.html#multitenancy), so you can read/write from/to different locations - see `tenant_id` param description.
|
> **Note**: [Reader](/anomaly-detection/components/reader.html#vm-reader) and [Writer](/anomaly-detection/components/writer.html#vm-writer) also support [multitenancy](/Cluster-VictoriaMetrics.html#multitenancy), so you can read/write from/to different locations - see `tenant_id` param description.
|
||||||
|
|
||||||
<img alt="vmanomaly-components" src="vmanomaly-components.webp" width="800px"/>
|
<img alt="vmanomaly-components" src="vmanomaly-components.webp" width="800px"/>
|
||||||
|
|
||||||
|
Here's a minimalistic full config example, demonstrating many-to-many configuration (actual for [latest version](/anomaly-detection/CHANGELOG/)):
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# how and when to run the models is defined by schedulers
|
||||||
|
# https://docs.victoriametrics.com/anomaly-detection/components/scheduler/
|
||||||
|
schedulers:
|
||||||
|
periodic_1d: # alias
|
||||||
|
class: 'periodic' # scheduler class
|
||||||
|
infer_every: "30s"
|
||||||
|
fit_every: "10m"
|
||||||
|
fit_window: "24h"
|
||||||
|
periodic_1w:
|
||||||
|
class: 'periodic'
|
||||||
|
infer_every: "15m"
|
||||||
|
fit_every: "1h"
|
||||||
|
fit_window: "7d"
|
||||||
|
|
||||||
|
# what model types and with what hyperparams to run on your data
|
||||||
|
# https://docs.victoriametrics.com/anomaly-detection/components/models/
|
||||||
|
models:
|
||||||
|
zscore: # alias
|
||||||
|
class: 'zscore' # model class
|
||||||
|
z_threshold: 3.5
|
||||||
|
provide_series: ['anomaly_score'] # what series to produce
|
||||||
|
queries: ['host_network_receive_errors'] # what queries to run particular model on
|
||||||
|
schedulers: ['periodic_1d'] # will be attached to 1-day schedule, fit every 10m and infer every 30s
|
||||||
|
prophet: # alias
|
||||||
|
class: 'prophet'
|
||||||
|
provide_series: ['anomaly_score', 'yhat', 'yhat_lower', 'yhat_upper']
|
||||||
|
queries: ['cpu_seconds_total']
|
||||||
|
schedulers: ['periodic_1w'] # will be attached to 1-week schedule, fit every 1h and infer every 15m
|
||||||
|
args: # model-specific arguments
|
||||||
|
interval_width: 0.98
|
||||||
|
|
||||||
|
# where to read data from
|
||||||
|
# https://docs.victoriametrics.com/anomaly-detection/components/reader/
|
||||||
|
reader:
|
||||||
|
datasource_url: "http://victoriametrics:8428/"
|
||||||
|
tenant_id: "0:0"
|
||||||
|
class: 'vm'
|
||||||
|
sampling_period: "30s" # what data resolution of your data to have
|
||||||
|
queries: # aliases to MetricsQL expressions
|
||||||
|
cpu_seconds_total: 'avg(rate(node_cpu_seconds_total[5m])) by (mode)'
|
||||||
|
host_network_receive_errors: 'rate(node_network_receive_errs_total[3m]) / rate(node_network_receive_packets_total[3m])'
|
||||||
|
|
||||||
|
# where to write data to
|
||||||
|
# https://docs.victoriametrics.com/anomaly-detection/components/writer/
|
||||||
|
writer:
|
||||||
|
datasource_url: "http://victoriametrics:8428/"
|
||||||
|
|
||||||
|
# enable self-monitoring in pull and/or push mode
|
||||||
|
# https://docs.victoriametrics.com/anomaly-detection/components/monitoring/
|
||||||
|
monitoring:
|
||||||
|
pull: # Enable /metrics endpoint.
|
||||||
|
addr: "0.0.0.0"
|
||||||
|
port: 8490
|
||||||
|
```
|
|
@ -26,11 +26,11 @@ This section describes `Models` component of VictoriaMetrics Anomaly Detection (
|
||||||
```yaml
|
```yaml
|
||||||
models:
|
models:
|
||||||
model_univariate_1:
|
model_univariate_1:
|
||||||
class: 'model.zscore.ZscoreModel'
|
class: 'zscore' # or 'model.zscore.ZscoreModel' until v1.13.0
|
||||||
z_threshold: 2.5
|
z_threshold: 2.5
|
||||||
queries: ['query_alias2'] # referencing queries defined in `reader` section
|
queries: ['query_alias2'] # referencing queries defined in `reader` section
|
||||||
model_multivariate_1:
|
model_multivariate_1:
|
||||||
class: 'model.isolation_forest.IsolationForestMultivariateModel'
|
class: 'isolation_forest_multivariate' # or model.isolation_forest.IsolationForestMultivariateModel until v1.13.0
|
||||||
contamination: 'auto'
|
contamination: 'auto'
|
||||||
args:
|
args:
|
||||||
n_estimators: 100
|
n_estimators: 100
|
||||||
|
@ -44,7 +44,7 @@ Old-style configs (< [1.10.0](/anomaly-detection/changelog#v1100))
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
model:
|
model:
|
||||||
class: "model.zscore.ZscoreModel"
|
class: "zscore" # or 'model.zscore.ZscoreModel' until v1.13.0
|
||||||
z_threshold: 3.0
|
z_threshold: 3.0
|
||||||
# no explicit `queries` arg is provided
|
# no explicit `queries` arg is provided
|
||||||
# ...
|
# ...
|
||||||
|
@ -69,7 +69,7 @@ From [1.10.0](/anomaly-detection/changelog#1100), **common args**, supported by
|
||||||
|
|
||||||
### Queries
|
### Queries
|
||||||
|
|
||||||
Introduced in [1.10.0](/anomaly-detection/changelog#1100), as a part to support multi-model configs, `queries` arg is meant to define [queries from VmReader](https://docs.victoriametrics.com/anomaly-detection/components/reader/?highlight=queries#config-parameters) particular model should be run on (meaning, all the series returned by each of these queries will be used in such model for fitting and inferencing).
|
Introduced in [1.10.0](/anomaly-detection/changelog#1100), as a part to support multi-model configs, `queries` arg is meant to define [queries from VmReader](/anomaly-detection/components/reader/?highlight=queries#config-parameters) particular model should be run on (meaning, all the series returned by each of these queries will be used in such model for fitting and inferencing).
|
||||||
|
|
||||||
`queries` arg is supported for all [the built-in](#built-in-models) (as well as for [custom](#custom-model-guide)) models.
|
`queries` arg is supported for all [the built-in](#built-in-models) (as well as for [custom](#custom-model-guide)) models.
|
||||||
|
|
||||||
|
@ -130,6 +130,76 @@ models:
|
||||||
|
|
||||||
**Note** If `provide_series` is not specified in model config, the model will produce its default [model-dependent output](#vmanomaly-output). The output can't be less than `['anomaly_score']`. Even if `timestamp` column is omitted, it will be implicitly added to `provide_series` list, as it's required for metrics to be properly written.
|
**Note** If `provide_series` is not specified in model config, the model will produce its default [model-dependent output](#vmanomaly-output). The output can't be less than `['anomaly_score']`. Even if `timestamp` column is omitted, it will be implicitly added to `provide_series` list, as it's required for metrics to be properly written.
|
||||||
|
|
||||||
|
### Detection direction
|
||||||
|
Introduced in [1.13.0](/anomaly-detection/CHANGELOG/#1130), `detection_direction` arg can help in reducing the number of [false positives](https://victoriametrics.com/blog/victoriametrics-anomaly-detection-handbook-chapter-1/index.html#false-positive) and increasing the accuracy, when domain knowledge suggest to identify anomalies occurring when actual values (`y`) are *above, below, or in both directions* relative to the expected values (`yhat`). Available choices are: `both`, `above_expected`, `below_expected`.
|
||||||
|
|
||||||
|
Here's how default (backward-compatible) behavior looks like - anomalies will be tracked in `both` directions (`y > yhat` or `y < yhat`). This is useful when there is no domain expertise to filter the required direction.
|
||||||
|
|
||||||
|
<img src="schema_detection_direction=both.webp" width="800px" alt="schema_detection_direction=both"/>
|
||||||
|
|
||||||
|
When set to `above_expected`, anomalies are tracked only when `y > yhat`.
|
||||||
|
|
||||||
|
*Example metrics*: Error rate, response time, page load time, number of failed transactions - metrics where *lower values are better*, so **higher** values are typically tracked.
|
||||||
|
|
||||||
|
<img src="schema_detection_direction=above_expected.webp" width="800px" alt="schema_detection_direction=above_expected"/>
|
||||||
|
|
||||||
|
When set to `below_expected`, anomalies are tracked only when `y < yhat`.
|
||||||
|
|
||||||
|
*Example metrics*: Service Level Agreement (SLA) compliance, conversion rate, Customer Satisfaction Score (CSAT) - metrics where *higher values are better*, so **lower** values are typically tracked.
|
||||||
|
|
||||||
|
<img src="schema_detection_direction=below_expected.webp" width="800px" alt="schema_detection_direction=below_expected"/>
|
||||||
|
|
||||||
|
Config with a split example:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
models:
|
||||||
|
model_above_expected:
|
||||||
|
class: 'zscore'
|
||||||
|
z_threshold: 3.0
|
||||||
|
# track only cases when y > yhat, otherwise anomaly_score would be explicitly set to 0
|
||||||
|
detection_direction: 'above_expected'
|
||||||
|
# for this query we do not need to track lower values, thus, set anomaly detection tracking for y > yhat (above_expected)
|
||||||
|
queries: ['query_values_the_lower_the_better']
|
||||||
|
model_below_expected:
|
||||||
|
class: 'zscore'
|
||||||
|
z_threshold: 3.0
|
||||||
|
# track only cases when y < yhat, otherwise anomaly_score would be explicitly set to 0
|
||||||
|
detection_direction: 'below_expected'
|
||||||
|
# for this query we do not need to track higher values, thus, set anomaly detection tracking for y < yhat (above_expected)
|
||||||
|
queries: ['query_values_the_higher_the_better']
|
||||||
|
model_bidirectional_default:
|
||||||
|
class: 'zscore'
|
||||||
|
z_threshold: 3.0
|
||||||
|
# track in both direction, same backward-compatible behavior in case this arg is missing
|
||||||
|
detection_direction: 'both'
|
||||||
|
# for this query both directions can be equally important for anomaly detection, thus, setting it bidirectional (both)
|
||||||
|
queries: ['query_values_both_direction_matters']
|
||||||
|
reader:
|
||||||
|
# ...
|
||||||
|
queries:
|
||||||
|
query_values_the_lower_the_better: metricql_expression1
|
||||||
|
query_values_the_higher_the_better: metricql_expression2
|
||||||
|
query_values_both_direction_matters: metricql_expression3
|
||||||
|
# other components like writer, schedule, monitoring
|
||||||
|
```
|
||||||
|
|
||||||
|
### Minimal deviation from expected
|
||||||
|
|
||||||
|
Introduced in [v1.13.0](/anomaly-detection/CHANGELOG/#1130), the `min_dev_from_expected` argument is designed to **reduce [false positives](https://victoriametrics.com/blog/victoriametrics-anomaly-detection-handbook-chapter-1/#false-positive)** in scenarios where deviations between the actual value (`y`) and the expected value (`yhat`) are **relatively** high. Such deviations can cause models to generate high [anomaly scores](/anomaly-detection/faq/#what-is-anomaly-score). However, these deviations may not be significant enough in **absolute values** from a business perspective to be considered anomalies. This parameter ensures that anomaly scores for data points where `|y - yhat| < min_dev_from_expected` are explicitly set to 0. By default, if this parameter is not set, it behaves as `min_dev_from_expected=0` to maintain backward compatibility.
|
||||||
|
|
||||||
|
> **Note**: `min_dev_from_expected` must be >= 0. The higher the value of `min_dev_from_expected`, the fewer data points will be available for anomaly detection, and vice versa.
|
||||||
|
|
||||||
|
*Example*: Consider a scenario where CPU utilization is low and oscillates around 0.3% (0.003). A sudden spike to 1.3% (0.013) represents a +333% increase in **relative** terms, but only a +1 percentage point (0.01) increase in **absolute** terms, which may be negligible and not warrant an alert. Setting the `min_dev_from_expected` argument to `0.01` (1%) will ensure that all anomaly scores for deviations <= `0.01` are set to 0.
|
||||||
|
|
||||||
|
Visualizations below demonstrate this concept; the green zone defined as the `[yhat - min_dev_from_expected, yhat + min_dev_from_expected]` range excludes actual data points (`y`) from generating anomaly scores if they fall within that range.
|
||||||
|
|
||||||
|
<img src="schema_min_dev_from_expected=0.webp" width="800px" alt="min_dev_from_expected-default"/>
|
||||||
|
|
||||||
|
<img src="schema_min_dev_from_expected=1.0.webp" width="800px" alt="min_dev_from_expected-small"/>
|
||||||
|
|
||||||
|
<img src="schema_min_dev_from_expected=5.0.webp" width="800px" alt="min_dev_from_expected-big"/>
|
||||||
|
|
||||||
|
|
||||||
## Model types
|
## Model types
|
||||||
|
|
||||||
There are **2 model types**, supported in `vmanomaly`, resulting in **4 possible combinations**:
|
There are **2 model types**, supported in `vmanomaly`, resulting in **4 possible combinations**:
|
||||||
|
@ -137,7 +207,7 @@ There are **2 model types**, supported in `vmanomaly`, resulting in **4 possible
|
||||||
- [Univariate models](#univariate-models)
|
- [Univariate models](#univariate-models)
|
||||||
- [Multivariate models](#multivariate-models)
|
- [Multivariate models](#multivariate-models)
|
||||||
|
|
||||||
Each of these models can be
|
Each of these models can also be
|
||||||
- [Rolling](#rolling-models)
|
- [Rolling](#rolling-models)
|
||||||
- [Non-rolling](#non-rolling-models)
|
- [Non-rolling](#non-rolling-models)
|
||||||
|
|
||||||
|
@ -154,7 +224,7 @@ If during an inference, you got a series having **new labelset** (not present in
|
||||||
**Examples:** [Prophet](#prophet), [Holt-Winters](#holt-winters)
|
**Examples:** [Prophet](#prophet), [Holt-Winters](#holt-winters)
|
||||||
|
|
||||||
<p></p>
|
<p></p>
|
||||||
<img alt="vmanomaly-model-type-univariate" src="/anomaly-detection/components/model-lifecycle-univariate.webp" width="800px"/>
|
<img alt="vmanomaly-model-type-univariate" src="model-lifecycle-univariate.webp" width="800px"/>
|
||||||
|
|
||||||
### Multivariate Models
|
### Multivariate Models
|
||||||
|
|
||||||
|
@ -169,7 +239,7 @@ If during an inference, you got a **different amount of series** or some series
|
||||||
**Examples:** [IsolationForest](#isolation-forest-multivariate)
|
**Examples:** [IsolationForest](#isolation-forest-multivariate)
|
||||||
|
|
||||||
<p></p>
|
<p></p>
|
||||||
<img alt="vmanomaly-model-type-multivariate" src="/anomaly-detection/components/model-lifecycle-multivariate.webp" width="800px"/>
|
<img alt="vmanomaly-model-type-multivariate" src="model-lifecycle-multivariate.webp" width="800px"/>
|
||||||
|
|
||||||
### Rolling Models
|
### Rolling Models
|
||||||
|
|
||||||
|
@ -186,7 +256,7 @@ Such models put **more pressure** on your reader's source, i.e. if your model sh
|
||||||
**Examples:** [RollingQuantile](#rolling-quantile)
|
**Examples:** [RollingQuantile](#rolling-quantile)
|
||||||
|
|
||||||
<p></p>
|
<p></p>
|
||||||
<img alt="vmanomaly-model-type-rolling" src="/anomaly-detection/components/model-type-rolling.webp" width="800px"/>
|
<img alt="vmanomaly-model-type-rolling" src="model-type-rolling.webp" width="800px"/>
|
||||||
|
|
||||||
### Non-Rolling Models
|
### Non-Rolling Models
|
||||||
|
|
||||||
|
@ -203,7 +273,7 @@ Produced model instances are **stored in-memory** between consecutive re-fit cal
|
||||||
**Examples:** [Prophet](#prophet)
|
**Examples:** [Prophet](#prophet)
|
||||||
|
|
||||||
<p></p>
|
<p></p>
|
||||||
<img alt="vmanomaly-model-type-non-rolling" src="/anomaly-detection/components/model-type-non-rolling.webp" width="800px"/>
|
<img alt="vmanomaly-model-type-non-rolling" src="model-type-non-rolling.webp" width="800px"/>
|
||||||
|
|
||||||
## Built-in Models
|
## Built-in Models
|
||||||
|
|
||||||
|
@ -233,8 +303,8 @@ Tuning hyperparameters of a model can be tricky and often requires in-depth know
|
||||||
|
|
||||||
*Parameters specific for vmanomaly*:
|
*Parameters specific for vmanomaly*:
|
||||||
|
|
||||||
* `class` (string) - model class name `"model.auto.AutoTunedModel"`
|
* `class` (string) - model class name `"model.auto.AutoTunedModel"` (or `auto` starting from [v1.13.0](/anomaly-detection/CHANGELOG/#1130) with class alias support)
|
||||||
* `tuned_class_name` (string) - Built-in model class to tune, i.e. `model.zscore.ZscoreModel`.
|
* `tuned_class_name` (string) - Built-in model class to tune, i.e. `model.zscore.ZscoreModel` (or `zscore` starting from [v1.13.0](/anomaly-detection/CHANGELOG/#1130) with class alias support).
|
||||||
* `optimization_params` (dict) - Optimization parameters for unsupervised model tuning. Control % of found anomalies, as well as a tradeoff between time spent and the accuracy. The more `timeout` and `n_trials` are, the better model configuration can be found for `tuned_class_name`, but the longer it takes and vice versa. Set `n_jobs` to `-1` to use all the CPUs available, it makes sense if only you have a big dataset to train on during `fit` calls, otherwise overhead isn't worth it.
|
* `optimization_params` (dict) - Optimization parameters for unsupervised model tuning. Control % of found anomalies, as well as a tradeoff between time spent and the accuracy. The more `timeout` and `n_trials` are, the better model configuration can be found for `tuned_class_name`, but the longer it takes and vice versa. Set `n_jobs` to `-1` to use all the CPUs available, it makes sense if only you have a big dataset to train on during `fit` calls, otherwise overhead isn't worth it.
|
||||||
- `anomaly_percentage` (float) - expected percentage of anomalies that can be seen in training data, from (0, 0.5) interval.
|
- `anomaly_percentage` (float) - expected percentage of anomalies that can be seen in training data, from (0, 0.5) interval.
|
||||||
- `seed` (int) - Random seed for reproducibility and deterministic nature of underlying optimizations.
|
- `seed` (int) - Random seed for reproducibility and deterministic nature of underlying optimizations.
|
||||||
|
@ -242,14 +312,14 @@ Tuning hyperparameters of a model can be tricky and often requires in-depth know
|
||||||
- `n_trials` (int) - How many trials to sample from hyperparameter search space. The higher, the longer it takes but the better the results can be. Defaults to 128.
|
- `n_trials` (int) - How many trials to sample from hyperparameter search space. The higher, the longer it takes but the better the results can be. Defaults to 128.
|
||||||
- `timeout` (float) - How many seconds in total can be spent on each model to tune hyperparameters. The higher, the longer it takes, allowing to test more trials out of defined `n_trials`, but the better the results can be.
|
- `timeout` (float) - How many seconds in total can be spent on each model to tune hyperparameters. The higher, the longer it takes, allowing to test more trials out of defined `n_trials`, but the better the results can be.
|
||||||
|
|
||||||
<img alt="vmanomaly-autotune-schema" src="/anomaly-detection/components/autotune.webp" width="800px"/>
|
<img alt="vmanomaly-autotune-schema" src="autotune.webp" width="800px"/>
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
# ...
|
# ...
|
||||||
models:
|
models:
|
||||||
your_desired_alias_for_a_model:
|
your_desired_alias_for_a_model:
|
||||||
class: 'model.auto.AutoTunedModel'
|
class: 'auto' # or 'model.auto.AutoTunedModel' until v1.13.0
|
||||||
tuned_class_name: 'model.zscore.ZscoreModel'
|
tuned_class_name: 'zscore' # or 'model.zscore.ZscoreModel' until v1.13.0
|
||||||
optimization_params:
|
optimization_params:
|
||||||
anomaly_percentage: 0.004 # required. i.e. we expect <= 0.4% of anomalies to be present in training data
|
anomaly_percentage: 0.004 # required. i.e. we expect <= 0.4% of anomalies to be present in training data
|
||||||
seed: 42 # fix reproducibility & determinism
|
seed: 42 # fix reproducibility & determinism
|
||||||
|
@ -268,7 +338,7 @@ Here we utilize the Facebook Prophet implementation, as detailed in their [libra
|
||||||
|
|
||||||
*Parameters specific for vmanomaly*:
|
*Parameters specific for vmanomaly*:
|
||||||
|
|
||||||
* `class` (string) - model class name `"model.prophet.ProphetModel"`
|
* `class` (string) - model class name `"model.prophet.ProphetModel"` (or `prophet` starting from [v1.13.0](/anomaly-detection/CHANGELOG/#1130) with class alias support)
|
||||||
* `seasonalities` (list[dict], optional) - Extra seasonalities to pass to Prophet. See [`add_seasonality()`](https://facebook.github.io/prophet/docs/seasonality,_holiday_effects,_and_regressors.html#modeling-holidays-and-special-events:~:text=modeling%20the%20cycle-,Specifying,-Custom%20Seasonalities) Prophet param.
|
* `seasonalities` (list[dict], optional) - Extra seasonalities to pass to Prophet. See [`add_seasonality()`](https://facebook.github.io/prophet/docs/seasonality,_holiday_effects,_and_regressors.html#modeling-holidays-and-special-events:~:text=modeling%20the%20cycle-,Specifying,-Custom%20Seasonalities) Prophet param.
|
||||||
|
|
||||||
**Note**: Apart from standard `vmanomaly` output, Prophet model can provide [additional metrics](#additional-output-metrics-produced-by-fb-prophet).
|
**Note**: Apart from standard `vmanomaly` output, Prophet model can provide [additional metrics](#additional-output-metrics-produced-by-fb-prophet).
|
||||||
|
@ -288,7 +358,7 @@ Depending on chosen `seasonality` parameter FB Prophet can return additional met
|
||||||
```yaml
|
```yaml
|
||||||
models:
|
models:
|
||||||
your_desired_alias_for_a_model:
|
your_desired_alias_for_a_model:
|
||||||
class: 'model.prophet.ProphetModel'
|
class: 'prophet' # or 'model.prophet.ProphetModel' until v1.13.0
|
||||||
provide_series: ['anomaly_score', 'yhat', 'yhat_lower', 'yhat_upper', 'trend']
|
provide_series: ['anomaly_score', 'yhat', 'yhat_lower', 'yhat_upper', 'trend']
|
||||||
seasonalities:
|
seasonalities:
|
||||||
- name: 'hourly'
|
- name: 'hourly'
|
||||||
|
@ -308,7 +378,7 @@ Resulting metrics of the model are described [here](#vmanomaly-output)
|
||||||
### [Z-score](https://en.wikipedia.org/wiki/Standard_score)
|
### [Z-score](https://en.wikipedia.org/wiki/Standard_score)
|
||||||
*Parameters specific for vmanomaly*:
|
*Parameters specific for vmanomaly*:
|
||||||
|
|
||||||
* `class` (string) - model class name `"model.zscore.ZscoreModel"`
|
* `class` (string) - model class name `"model.zscore.ZscoreModel"` (or `zscore` starting from [v1.13.0](/anomaly-detection/CHANGELOG/#1130) with class alias support)
|
||||||
* `z_threshold` (float, optional) - [standard score](https://en.wikipedia.org/wiki/Standard_score) for calculation boundaries and anomaly score. Defaults to `2.5`.
|
* `z_threshold` (float, optional) - [standard score](https://en.wikipedia.org/wiki/Standard_score) for calculation boundaries and anomaly score. Defaults to `2.5`.
|
||||||
|
|
||||||
*Config Example*
|
*Config Example*
|
||||||
|
@ -317,8 +387,8 @@ Resulting metrics of the model are described [here](#vmanomaly-output)
|
||||||
```yaml
|
```yaml
|
||||||
models:
|
models:
|
||||||
your_desired_alias_for_a_model:
|
your_desired_alias_for_a_model:
|
||||||
class: "model.zscore.ZscoreModel"
|
class: "zscore" # or 'model.zscore.ZscoreModel' until v1.13.0
|
||||||
z_threshold: 2.5
|
z_threshold: 3.5
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
|
@ -329,7 +399,7 @@ Here we use Holt-Winters Exponential Smoothing implementation from `statsmodels`
|
||||||
|
|
||||||
*Parameters specific for vmanomaly*:
|
*Parameters specific for vmanomaly*:
|
||||||
|
|
||||||
* `class` (string) - model class name `"model.holtwinters.HoltWinters"`
|
* `class` (string) - model class name `"model.holtwinters.HoltWinters"` (or `holtwinters` starting from [v1.13.0](/anomaly-detection/CHANGELOG/#1130) with class alias support)
|
||||||
|
|
||||||
* `frequency` (string) - Must be set equal to sampling_period. Model needs to know expected data-points frequency (e.g. '10m'). If omitted, frequency is guessed during fitting as **the median of intervals between fitting data timestamps**. During inference, if incoming data doesn't have the same frequency, then it will be interpolated. E.g. data comes at 15 seconds resolution, and our resample_freq is '1m'. Then fitting data will be downsampled to '1m' and internal model is trained at '1m' intervals. So, during inference, prediction data would be produced at '1m' intervals, but interpolated to "15s" to match with expected output, as output data must have the same timestamps. As accepted by pandas.Timedelta (e.g. '5m').
|
* `frequency` (string) - Must be set equal to sampling_period. Model needs to know expected data-points frequency (e.g. '10m'). If omitted, frequency is guessed during fitting as **the median of intervals between fitting data timestamps**. During inference, if incoming data doesn't have the same frequency, then it will be interpolated. E.g. data comes at 15 seconds resolution, and our resample_freq is '1m'. Then fitting data will be downsampled to '1m' and internal model is trained at '1m' intervals. So, during inference, prediction data would be produced at '1m' intervals, but interpolated to "15s" to match with expected output, as output data must have the same timestamps. As accepted by pandas.Timedelta (e.g. '5m').
|
||||||
|
|
||||||
|
@ -354,7 +424,7 @@ Used to compute "seasonal_periods" param for the model (e.g. '1D' or '1W').
|
||||||
```yaml
|
```yaml
|
||||||
models:
|
models:
|
||||||
your_desired_alias_for_a_model:
|
your_desired_alias_for_a_model:
|
||||||
class: "model.holtwinters.HoltWinters"
|
class: "holtwinters" # or 'model.holtwinters.HoltWinters' until v1.13.0
|
||||||
seasonality: '1d'
|
seasonality: '1d'
|
||||||
frequency: '1h'
|
frequency: '1h'
|
||||||
# Inner model args (key-value pairs) accepted by statsmodels.tsa.holtwinters.ExponentialSmoothing
|
# Inner model args (key-value pairs) accepted by statsmodels.tsa.holtwinters.ExponentialSmoothing
|
||||||
|
@ -371,7 +441,7 @@ The MAD model is a robust method for anomaly detection that is *less sensitive*
|
||||||
|
|
||||||
*Parameters specific for vmanomaly*:
|
*Parameters specific for vmanomaly*:
|
||||||
|
|
||||||
* `class` (string) - model class name `"model.mad.MADModel"`
|
* `class` (string) - model class name `"model.mad.MADModel"` (or `mad` starting from [v1.13.0](/anomaly-detection/CHANGELOG/#1130) with class alias support)
|
||||||
* `threshold` (float, optional) - The threshold multiplier for the MAD to determine anomalies. Defaults to `2.5`. Higher values will identify fewer points as anomalies.
|
* `threshold` (float, optional) - The threshold multiplier for the MAD to determine anomalies. Defaults to `2.5`. Higher values will identify fewer points as anomalies.
|
||||||
|
|
||||||
*Config Example*
|
*Config Example*
|
||||||
|
@ -380,7 +450,7 @@ The MAD model is a robust method for anomaly detection that is *less sensitive*
|
||||||
```yaml
|
```yaml
|
||||||
models:
|
models:
|
||||||
your_desired_alias_for_a_model:
|
your_desired_alias_for_a_model:
|
||||||
class: "model.mad.MADModel"
|
class: "mad" # or 'model.mad.MADModel' until v1.13.0
|
||||||
threshold: 2.5
|
threshold: 2.5
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@ -391,7 +461,7 @@ Resulting metrics of the model are described [here](#vmanomaly-output).
|
||||||
|
|
||||||
*Parameters specific for vmanomaly*:
|
*Parameters specific for vmanomaly*:
|
||||||
|
|
||||||
* `class` (string) - model class name `"model.rolling_quantile.RollingQuantileModel"`
|
* `class` (string) - model class name `"model.rolling_quantile.RollingQuantileModel"` (or `rolling_quantile` starting from [v1.13.0](/anomaly-detection/CHANGELOG/#1130) with class alias support)
|
||||||
* `quantile` (float) - quantile value, from 0.5 to 1.0. This constraint is implied by 2-sided confidence interval.
|
* `quantile` (float) - quantile value, from 0.5 to 1.0. This constraint is implied by 2-sided confidence interval.
|
||||||
* `window_steps` (integer) - size of the moving window. (see 'sampling_period')
|
* `window_steps` (integer) - size of the moving window. (see 'sampling_period')
|
||||||
|
|
||||||
|
@ -400,7 +470,7 @@ Resulting metrics of the model are described [here](#vmanomaly-output).
|
||||||
```yaml
|
```yaml
|
||||||
models:
|
models:
|
||||||
your_desired_alias_for_a_model:
|
your_desired_alias_for_a_model:
|
||||||
class: "model.rolling_quantile.RollingQuantileModel"
|
class: "rolling_quantile" # or 'model.rolling_quantile.RollingQuantileModel' until v1.13.0
|
||||||
quantile: 0.9
|
quantile: 0.9
|
||||||
window_steps: 96
|
window_steps: 96
|
||||||
```
|
```
|
||||||
|
@ -412,7 +482,7 @@ Here we use Seasonal Decompose implementation from `statsmodels` [library](https
|
||||||
|
|
||||||
*Parameters specific for vmanomaly*:
|
*Parameters specific for vmanomaly*:
|
||||||
|
|
||||||
* `class` (string) - model class name `"model.std.StdModel"`
|
* `class` (string) - model class name `"model.std.StdModel"` (or `std` starting from [v1.13.0](/anomaly-detection/CHANGELOG/#1130) with class alias support)
|
||||||
* `period` (integer) - Number of datapoints in one season.
|
* `period` (integer) - Number of datapoints in one season.
|
||||||
* `z_threshold` (float, optional) - [standard score](https://en.wikipedia.org/wiki/Standard_score) for calculating boundaries to define anomaly score. Defaults to `2.5`.
|
* `z_threshold` (float, optional) - [standard score](https://en.wikipedia.org/wiki/Standard_score) for calculating boundaries to define anomaly score. Defaults to `2.5`.
|
||||||
|
|
||||||
|
@ -423,7 +493,7 @@ Here we use Seasonal Decompose implementation from `statsmodels` [library](https
|
||||||
```yaml
|
```yaml
|
||||||
models:
|
models:
|
||||||
your_desired_alias_for_a_model:
|
your_desired_alias_for_a_model:
|
||||||
class: "model.std.StdModel"
|
class: "std" # or 'model.std.StdModel' starting from v1.13.0
|
||||||
period: 2
|
period: 2
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@ -445,7 +515,7 @@ Here we use Isolation Forest implementation from `scikit-learn` [library](https:
|
||||||
|
|
||||||
*Parameters specific for vmanomaly*:
|
*Parameters specific for vmanomaly*:
|
||||||
|
|
||||||
* `class` (string) - model class name `"model.isolation_forest.IsolationForestMultivariateModel"`
|
* `class` (string) - model class name `"model.isolation_forest.IsolationForestMultivariateModel"` (or `isolation_forest_multivariate` starting from [v1.13.0](/anomaly-detection/CHANGELOG/#1130) with class alias support)
|
||||||
|
|
||||||
* `contamination` (float or string, optional) - The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the scores of the samples. Default value - "auto". Should be either `"auto"` or be in the range (0.0, 0.5].
|
* `contamination` (float or string, optional) - The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the scores of the samples. Default value - "auto". Should be either `"auto"` or be in the range (0.0, 0.5].
|
||||||
|
|
||||||
|
@ -467,7 +537,7 @@ Here we use Isolation Forest implementation from `scikit-learn` [library](https:
|
||||||
models:
|
models:
|
||||||
your_desired_alias_for_a_model:
|
your_desired_alias_for_a_model:
|
||||||
# To use univariate model, substitute class argument with "model.isolation_forest.IsolationForestModel".
|
# To use univariate model, substitute class argument with "model.isolation_forest.IsolationForestModel".
|
||||||
class: "model.isolation_forest.IsolationForestMultivariateModel"
|
class: "isolation_forest_multivariate" # or 'model.isolation_forest.IsolationForestMultivariateModel' until v1.13.0
|
||||||
contamination: "0.01"
|
contamination: "0.01"
|
||||||
provide_series: ['anomaly_score']
|
provide_series: ['anomaly_score']
|
||||||
seasonal_features: ['dow', 'hod']
|
seasonal_features: ['dow', 'hod']
|
||||||
|
@ -589,10 +659,9 @@ class CustomModel(Model):
|
||||||
### 2. Configuration file
|
### 2. Configuration file
|
||||||
|
|
||||||
Next, we need to create `config.yaml` file with `vmanomaly` configuration and model input parameters.
|
Next, we need to create `config.yaml` file with `vmanomaly` configuration and model input parameters.
|
||||||
In the config file's `models` section we need to put our model class `model.custom.CustomModel` and all parameters used in `__init__` method.
|
In the config file's `models` section we need to set our model class to `model.custom.CustomModel` (or `custom` starting from [v1.13.0](/anomaly-detection/CHANGELOG/#1130) with class alias support) and define all parameters used in `__init__` method.
|
||||||
You can find out more about configuration parameters in `vmanomaly` [config docs](/anomaly-detection/components/).
|
You can find out more about configuration parameters in `vmanomaly` [config docs](/anomaly-detection/components/).
|
||||||
|
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
schedulers:
|
schedulers:
|
||||||
s1:
|
s1:
|
||||||
|
@ -603,7 +672,7 @@ schedulers:
|
||||||
models:
|
models:
|
||||||
custom_model:
|
custom_model:
|
||||||
# note: every custom model should implement this exact path, specified in `class` field
|
# note: every custom model should implement this exact path, specified in `class` field
|
||||||
class: "model.model.CustomModel"
|
class: "custom" # or 'model.model.CustomModel' until v1.13.0
|
||||||
# custom model params are defined here
|
# custom model params are defined here
|
||||||
percentage: 0.9
|
percentage: 0.9
|
||||||
|
|
||||||
|
|
|
@ -36,7 +36,7 @@ Future updates will introduce additional readers, expanding the range of data so
|
||||||
<tbody>
|
<tbody>
|
||||||
<tr>
|
<tr>
|
||||||
<td><code>class</code></td>
|
<td><code>class</code></td>
|
||||||
<td><code>"reader.vm.VmReader"</code></td>
|
<td><code>"reader.vm.VmReader" (or "vm" starting from <a href="https://docs.victoriametrics.com/anomaly-detection/changelog/#v1130">v1.13.0</a>)</code></td>
|
||||||
<td>Name of the class needed to enable reading from VictoriaMetrics or Prometheus. VmReader is the default option, if not specified.</td>
|
<td>Name of the class needed to enable reading from VictoriaMetrics or Prometheus. VmReader is the default option, if not specified.</td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
|
@ -57,7 +57,7 @@ Future updates will introduce additional readers, expanding the range of data so
|
||||||
<tr>
|
<tr>
|
||||||
<td><code>sampling_period</code></td>
|
<td><code>sampling_period</code></td>
|
||||||
<td><code>"1h"</code></td>
|
<td><code>"1h"</code></td>
|
||||||
<td>Frequency of the points returned. Will be converted to <code>"/query_range?step=%s"</code> param (in seconds). **Required** since <a href="https://docs.victoriametrics.com/anomaly-detection/changelog/#v190">1.9.0</a>.</td>
|
<td>Frequency of the points returned. Will be converted to <code>"/query_range?step=%s"</code> param (in seconds). <b>Required</b> since <a href="https://docs.victoriametrics.com/anomaly-detection/changelog/#v190">v1.9.0</a>.</td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td><code>query_range_path</code></td>
|
<td><code>query_range_path</code></td>
|
||||||
|
@ -106,7 +106,7 @@ Config file example:
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
reader:
|
reader:
|
||||||
class: "reader.vm.VmReader"
|
class: "vm" # or "reader.vm.VmReader" until v1.13.0
|
||||||
datasource_url: "http://localhost:8428/"
|
datasource_url: "http://localhost:8428/"
|
||||||
tenant_id: "0:0"
|
tenant_id: "0:0"
|
||||||
queries:
|
queries:
|
||||||
|
|
|
@ -20,12 +20,12 @@ Is specified in `scheduler` section of a config for VictoriaMetrics Anomaly Dete
|
||||||
```yaml
|
```yaml
|
||||||
schedulers:
|
schedulers:
|
||||||
scheduler_periodic_1m:
|
scheduler_periodic_1m:
|
||||||
# class: "scheduler.periodic.PeriodicScheduler"
|
# class: "periodic" # or class: "scheduler.periodic.PeriodicScheduler" until v1.13.0 with class alias support)
|
||||||
infer_every: "1m"
|
infer_every: "1m"
|
||||||
fit_every: "2m"
|
fit_every: "2m"
|
||||||
fit_window: "3h"
|
fit_window: "3h"
|
||||||
scheduler_periodic_5m:
|
scheduler_periodic_5m:
|
||||||
# class: "scheduler.periodic.PeriodicScheduler"
|
# class: "periodic" # or class: "scheduler.periodic.PeriodicScheduler" until v1.13.0 with class alias support)
|
||||||
infer_every: "5m"
|
infer_every: "5m"
|
||||||
fit_every: "10m"
|
fit_every: "10m"
|
||||||
fit_window: "3h"
|
fit_window: "3h"
|
||||||
|
@ -36,7 +36,7 @@ Old-style configs (< [1.11.0](/anomaly-detection/changelog#v1110))
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
scheduler:
|
scheduler:
|
||||||
# class: "scheduler.periodic.PeriodicScheduler"
|
# class: "periodic" # or class: "scheduler.periodic.PeriodicScheduler" until v1.13.0 with class alias support)
|
||||||
infer_every: "1m"
|
infer_every: "1m"
|
||||||
fit_every: "2m"
|
fit_every: "2m"
|
||||||
fit_window: "3h"
|
fit_window: "3h"
|
||||||
|
@ -47,8 +47,8 @@ will be **implicitly** converted to
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
schedulers:
|
schedulers:
|
||||||
default_scheduler: # default scheduler alias, backward compatibility
|
default_scheduler: # default scheduler alias added, for backward compatibility
|
||||||
# class: "scheduler.periodic.PeriodicScheduler"
|
class: "scheduler.periodic.PeriodicScheduler"
|
||||||
infer_every: "1m"
|
infer_every: "1m"
|
||||||
fit_every: "2m"
|
fit_every: "2m"
|
||||||
fit_window: "3h"
|
fit_window: "3h"
|
||||||
|
@ -64,6 +64,8 @@ options={`"scheduler.periodic.PeriodicScheduler"`, `"scheduler.oneoff.OneoffSche
|
||||||
- `"scheduler.oneoff.OneoffScheduler"`: runs the process once and exits. Useful for testing.
|
- `"scheduler.oneoff.OneoffScheduler"`: runs the process once and exits. Useful for testing.
|
||||||
- `"scheduler.backtesting.BacktestingScheduler"`: imitates consecutive backtesting runs of OneoffScheduler. Runs the process once and exits. Use to get more granular control over testing on historical data.
|
- `"scheduler.backtesting.BacktestingScheduler"`: imitates consecutive backtesting runs of OneoffScheduler. Runs the process once and exits. Use to get more granular control over testing on historical data.
|
||||||
|
|
||||||
|
> **Note**: starting from [v.1.13.0](/anomaly-detection/CHANGELOG/#v1130), class aliases are supported, so `"scheduler.periodic.PeriodicScheduler"` can be substituted to `"periodic"`, `"scheduler.oneoff.OneoffScheduler"` - to `"oneoff"`, `"scheduler.backtesting.BacktestingScheduler"` - to `"backtesting"`
|
||||||
|
|
||||||
**Depending on selected class, different parameters should be used**
|
**Depending on selected class, different parameters should be used**
|
||||||
|
|
||||||
## Periodic scheduler
|
## Periodic scheduler
|
||||||
|
@ -139,11 +141,13 @@ Examples: `"50s"`, `"4m"`, `"3h"`, `"2d"`, `"1w"`.
|
||||||
### Periodic scheduler config example
|
### Periodic scheduler config example
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
scheduler:
|
schedulers:
|
||||||
class: "scheduler.periodic.PeriodicScheduler"
|
periodic_scheduler_alias:
|
||||||
fit_window: "14d"
|
class: "periodic"
|
||||||
infer_every: "1m"
|
# (or class: "scheduler.periodic.PeriodicScheduler" until v1.13.0 with class alias support)
|
||||||
fit_every: "1h"
|
fit_window: "14d"
|
||||||
|
infer_every: "1m"
|
||||||
|
fit_every: "1h"
|
||||||
```
|
```
|
||||||
|
|
||||||
This part of the config means that `vmanomaly` will calculate the time window of the previous 14 days and use it to train a model. Every hour model will be retrained again on 14 days’ data, which will include + 1 hour of new data. The time window is strictly the same 14 days and doesn't extend for the next retrains. Every minute `vmanomaly` will produce model inferences for newly added data points by using the model that is kept in memory at that time.
|
This part of the config means that `vmanomaly` will calculate the time window of the previous 14 days and use it to train a model. Every hour model will be retrained again on 14 days’ data, which will include + 1 hour of new data. The time window is strictly the same 14 days and doesn't extend for the next retrains. Every minute `vmanomaly` will produce model inferences for newly added data points by using the model that is kept in memory at that time.
|
||||||
|
@ -244,23 +248,27 @@ If a time zone is omitted, a timezone-naive datetime is used.
|
||||||
|
|
||||||
### ISO format scheduler config example
|
### ISO format scheduler config example
|
||||||
```yaml
|
```yaml
|
||||||
scheduler:
|
schedulers:
|
||||||
class: "scheduler.oneoff.OneoffScheduler"
|
oneoff_scheduler_alias:
|
||||||
fit_start_iso: "2022-04-01T00:00:00Z"
|
class: "oneoff"
|
||||||
fit_end_iso: "2022-04-10T00:00:00Z"
|
# (or class: "scheduler.oneoff.OneoffScheduler" until v1.13.0 with class alias support)
|
||||||
infer_start_iso: "2022-04-11T00:00:00Z"
|
fit_start_iso: "2022-04-01T00:00:00Z"
|
||||||
infer_end_iso: "2022-04-14T00:00:00Z"
|
fit_end_iso: "2022-04-10T00:00:00Z"
|
||||||
|
infer_start_iso: "2022-04-11T00:00:00Z"
|
||||||
|
infer_end_iso: "2022-04-14T00:00:00Z"
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
### UNIX time format scheduler config example
|
### UNIX time format scheduler config example
|
||||||
```yaml
|
```yaml
|
||||||
scheduler:
|
schedulers:
|
||||||
class: "scheduler.oneoff.OneoffScheduler"
|
oneoff_scheduler_alias:
|
||||||
fit_start_iso: 1648771200
|
class: "oneoff"
|
||||||
fit_end_iso: 1649548800
|
# (or class: "scheduler.oneoff.OneoffScheduler" until v1.13.0 with class alias support)
|
||||||
infer_start_iso: 1649635200
|
fit_start_s: 1648771200
|
||||||
infer_end_iso: 1649894400
|
fit_end_s: 1649548800
|
||||||
|
infer_start_s: 1649635200
|
||||||
|
infer_end_s: 1649894400
|
||||||
```
|
```
|
||||||
|
|
||||||
## Backtesting scheduler
|
## Backtesting scheduler
|
||||||
|
@ -275,6 +283,26 @@ ISO format supported time zone offset formats are:
|
||||||
|
|
||||||
If a time zone is omitted, a timezone-naive datetime is used.
|
If a time zone is omitted, a timezone-naive datetime is used.
|
||||||
|
|
||||||
|
### Parallelization
|
||||||
|
<table>
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th>Parameter</th>
|
||||||
|
<th>Type</th>
|
||||||
|
<th>Example</th>
|
||||||
|
<th>Description</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td><code>n_jobs</code></td>
|
||||||
|
<td>int</td>
|
||||||
|
<td><code>1</code></td>
|
||||||
|
<td>Allows <i>proportionally faster (yet more resource-intensive)</i> evaluations of a config on historical data. Default value is 1, that implies <i>sequential</i> execution. Introduced in <a href="https://docs.victoriametrics.com/anomaly-detection/changelog/#v1130">v1.13.0</a></td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
|
||||||
### Defining overall timeframe
|
### Defining overall timeframe
|
||||||
|
|
||||||
This timeframe will be used for slicing on intervals `(fit_window, infer_window == fit_every)`, starting from the *latest available* time point, which is `to_*` and going back, until no full `fit_window + infer_window` interval exists within the provided timeframe.
|
This timeframe will be used for slicing on intervals `(fit_window, infer_window == fit_every)`, starting from the *latest available* time point, which is `to_*` and going back, until no full `fit_window + infer_window` interval exists within the provided timeframe.
|
||||||
|
@ -374,20 +402,26 @@ In `BacktestingScheduler`, the inference window is *implicitly* defined as a per
|
||||||
|
|
||||||
### ISO format scheduler config example
|
### ISO format scheduler config example
|
||||||
```yaml
|
```yaml
|
||||||
scheduler:
|
schedulers:
|
||||||
class: "scheduler.backtesting.BacktestingScheduler"
|
backtesting_scheduler_alias:
|
||||||
from_iso: '2021-01-01T00:00:00Z'
|
class: "backtesting"
|
||||||
to_iso: '2021-01-14T00:00:00Z'
|
# (or class: "scheduler.backtesting.BacktestingScheduler" until v1.13.0 with class alias support)
|
||||||
fit_window: 'P14D'
|
from_iso: '2021-01-01T00:00:00Z'
|
||||||
fit_every: 'PT1H'
|
to_iso: '2021-01-14T00:00:00Z'
|
||||||
|
fit_window: 'P14D'
|
||||||
|
fit_every: 'PT1H'
|
||||||
|
n_jobs: 1 # default = 1 (sequential), set it up to # of CPUs for parallel execution
|
||||||
```
|
```
|
||||||
|
|
||||||
### UNIX time format scheduler config example
|
### UNIX time format scheduler config example
|
||||||
```yaml
|
```yaml
|
||||||
scheduler:
|
schedulers:
|
||||||
class: "scheduler.backtesting.BacktestingScheduler"
|
backtesting_scheduler_alias:
|
||||||
from_s: 167253120
|
class: "backtesting"
|
||||||
to_s: 167443200
|
# (or class: "scheduler.backtesting.BacktestingScheduler" until v1.13.0 with class alias support)
|
||||||
fit_window: '14d'
|
from_s: 167253120
|
||||||
fit_every: '1h'
|
to_s: 167443200
|
||||||
|
fit_window: '14d'
|
||||||
|
fit_every: '1h'
|
||||||
|
n_jobs: 1 # default = 1 (sequential), set it up to # of CPUs for parallel execution
|
||||||
```
|
```
|
Binary file not shown.
After Width: | Height: | Size: 32 KiB |
Binary file not shown.
After Width: | Height: | Size: 29 KiB |
Binary file not shown.
After Width: | Height: | Size: 32 KiB |
Binary file not shown.
After Width: | Height: | Size: 30 KiB |
Binary file not shown.
After Width: | Height: | Size: 30 KiB |
Binary file not shown.
After Width: | Height: | Size: 32 KiB |
|
@ -31,7 +31,7 @@ Future updates will introduce additional export methods, offering users more fle
|
||||||
<tbody>
|
<tbody>
|
||||||
<tr>
|
<tr>
|
||||||
<td><code>class</code></td>
|
<td><code>class</code></td>
|
||||||
<td><code>"writer.vm.VmWriter"</code></td>
|
<td><code>"writer.vm.VmWriter" (or "vm" starting from <a href="https://docs.victoriametrics.com/anomaly-detection/changelog/#v1130">v1.13.0</a>)</code></td>
|
||||||
<td>Name of the class needed to enable writing to VictoriaMetrics or Prometheus. VmWriter is the default option, if not specified.</td>
|
<td>Name of the class needed to enable writing to VictoriaMetrics or Prometheus. VmWriter is the default option, if not specified.</td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
|
@ -103,7 +103,7 @@ Config example:
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
writer:
|
writer:
|
||||||
class: "writer.vm.VmWriter"
|
class: "vm" # or "writer.vm.VmWriter" until v1.13.0
|
||||||
datasource_url: "http://localhost:8428/"
|
datasource_url: "http://localhost:8428/"
|
||||||
tenant_id: "0:0"
|
tenant_id: "0:0"
|
||||||
metric_format:
|
metric_format:
|
||||||
|
|
|
@ -151,14 +151,14 @@ Below is an illustrative example of a `vmanomaly_config.yml` configuration file.
|
||||||
``` yaml
|
``` yaml
|
||||||
schedulers:
|
schedulers:
|
||||||
periodic:
|
periodic:
|
||||||
# class: "scheduler.periodic.PeriodicScheduler"
|
# class: 'periodic' # or "scheduler.periodic.PeriodicScheduler" until v1.13.0
|
||||||
infer_every: "1m"
|
infer_every: "1m"
|
||||||
fit_every: "2m"
|
fit_every: "2m"
|
||||||
fit_window: "3h"
|
fit_window: "3h"
|
||||||
|
|
||||||
models:
|
models:
|
||||||
prophet:
|
prophet:
|
||||||
class: "model.prophet.ProphetModel"
|
class: "prophet" # or "model.prophet.ProphetModel" until v1.13.0
|
||||||
args:
|
args:
|
||||||
interval_width: 0.98
|
interval_width: 0.98
|
||||||
|
|
||||||
|
|
Loading…
Reference in a new issue