* docs/vmanomaly: v1.12.0 & link updates * add autotuned description to model section * - update refs of vmanomaly on enterprise and vmalert pages - add diagrams for model types - update self-monitoring section * - fix typos - remove .index.html from links
35 KiB
title | weight | sort | menu | aliases | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Models | 1 | 1 |
|
|
Models
This section describes Models
component of VictoriaMetrics Anomaly Detection (or simply vmanomaly
) and the guide of how to define a respective section of a config to launch the service.
vmanomaly includes various built-in models and you can integrate your custom model with vmanomaly see custom model
Note: Starting from v1.10.0 model section in config supports multiple models via aliasing.
Also,vmanomaly
expects model section to be namedmodels
. Using old (flat) format withmodel
key is deprecated and will be removed in future versions. Havingmodel
andmodels
sections simultaneously in a config will result in onlymodels
being used:
models:
model_univariate_1:
class: 'model.zscore.ZscoreModel'
z_threshold: 2.5
queries: ['query_alias2'] # referencing queries defined in `reader` section
model_multivariate_1:
class: 'model.isolation_forest.IsolationForestMultivariateModel'
contamination: 'auto'
args:
n_estimators: 100
# i.e. to assure reproducibility of produced results each time model is fit on the same input
random_state: 42
# if there is no explicit `queries` arg, then the model will be run on ALL queries found in reader section
# ...
Old-style configs (< 1.10.0)
model:
class: "model.zscore.ZscoreModel"
z_threshold: 3.0
# no explicit `queries` arg is provided
# ...
will be implicitly converted to
models:
default_model: # default model alias, backward compatibility
class: "model.zscore.ZscoreModel"
z_threshold: 3.0
# queries arg is created and propagated with all query aliases found in `queries` arg of `reader` section
queries: ['q1', 'q2', 'q3'] # i.e., if your `queries` in `reader` section has exactly q1, q2, q3 aliases
# ...
Common args
From 1.10.0, common args, supported by every model (and model type) were introduced.
Queries
Introduced in 1.10.0, as a part to support multi-model configs, queries
arg is meant to define queries from VmReader particular model should be run on (meaning, all the series returned by each of these queries will be used in such model for fitting and inferencing).
queries
arg is supported for all the built-in (as well as for custom) models.
This arg is backward compatible - if there is no explicit queries
arg, then the model, defined in a config, will be run on ALL queries found in reader section:
models:
model_alias_1:
# ...
# no explicit `queries` arg is provided
will be implicitly converted to
models:
model_alias_1:
# ...
# if not set, `queries` arg is created and propagated with all query aliases found in `queries` arg of `reader` section
queries: ['q1', 'q2', 'q3'] # i.e., if your `queries` in `reader` section has exactly q1, q2, q3 aliases
Schedulers
Introduced in 1.11.0, as a part to support multi-scheduler configs, schedulers
arg is meant to define schedulers particular model should be attached to.
schedulers
arg is supported for all the built-in (as well as for custom) models.
This arg is backward compatible - if there is no explicit schedulers
arg, then the model, defined in a config, will be attached to ALL the schedulers found in scheduler section:
models:
model_alias_1:
# ...
# no explicit `schedulers` arg is provided
will be implicitly converted to
models:
model_alias_1:
# ...
# if not set, `schedulers` arg is created and propagated with all scheduler aliases found in `schedulers` section
schedulers: ['s1', 's2', 's3'] # i.e., if your `schedulers` section has exactly s1, s2, s3 aliases
Provide Series
Introduced in 1.12.0, provide_series
arg limit the output generated by vmanomaly
for writing. I.e. if the model produces default output series ['anomaly_score', 'yhat', 'yhat_lower', 'yhat_upper']
by specifying provide_series
section as below, you limit the data being written to only ['anomaly_score']
for each metric received as a subject to anomaly detection.
models:
model_alias_1:
# ...
provide_series: ['anomaly_score'] # only `anomaly_score` metric will be available for writing back to the database
Note If provide_series
is not specified in model config, the model will produce its default model-dependent output. The output can't be less than ['anomaly_score']. Even if
timestampcolumn is ommitted, it will be implicitly added to
provide_series` list, as it's required for metrics to be properly written.
Model types
There are 2 model types, supported in vmanomaly
, resulting in 4 possible combinations:
Each of these models can be
Univariate Models
For a univariate type, one separate model is fit/used for inference per each time series, defined in its queries arg.
For example, if you have some univariate model, defined to use 3 MetricQL queries, each returning 5 time series, there will be 3*5=15 models created in total. Each such model produce individual output for each of time series.
If during an inference, you got a series having new labelset (not present in any of fitted models), the inference will be skipped until you get a model, trained particularly for such labelset during forthcoming re-fit step.
Implications: Univariate models are a go-to default, when your queries returns changing amount of individual time series of different magnitude, trend or seasonality, so you won't be mixing incompatible data with different behavior within a single fit model (context isolation).
Examples: Prophet, Holt-Winters
Multivariate Models
For a multivariate type, one shared model is fit/used for inference on all time series simultaneously, defined in its queries arg.
For example, if you have some multivariate model to use 3 MetricQL queries, each returning 5 time series, there will be one shared model created in total. Once fit, this model will expect exactly 15 time series with exact same labelsets as an input. This model will produce one shared output.
If during an inference, you got a different amount of series or some series having a new labelset (not present in any of fitted models), the inference will be skipped until you get a model, trained particularly for such labelset during forthcoming re-fit step.
Implications: Multivariate models are a go-to default, when your queries returns fixed amount of individual time series (say, some aggregations), to be used for adding cross-series (and cross-query) context, useful for catching collective anomalies or novelties (expanded to multi-input scenario). For example, you may set it up for anomaly detection of CPU usage in different modes (idle
, user
, system
, etc.) and use its cross-dependencies to detect unseen (in fit data) behavior.
Examples: IsolationForest
Rolling Models
A rolling model is a model that, once trained, cannot be (naturally) used to make inference on data, not seen during its fit phase.
An instance of rolling model is simultaneously fit and used for inference during its infer
method call.
As a result, such model instances are not stored between consecutive re-fit calls (defined by fit_every
arg in PeriodicScheduler
), leading to lower RAM consumption.
Such models put more pressure on your reader's source, i.e. if your model should be fit on large amount of data (say, 14 days with 1-minute resolution) and at the same time you have frequent inference (say, once per minute) on new chunks of data - that's because such models require (fit + infer) window of data to be fit first to be used later in each inference call.
Note
: Rolling models require
fit_every
to be set equal toinfer_every
in your PeriodicScheduler.
Examples: RollingQuantile
Non-Rolling Models
Everything that is not classified as rolling.
Produced models can be explicitly used to infer on data, not seen during its fit phase, thus, it doesn't require re-fit procedure.
Such models put less pressure on your reader's source, i.e. if you fit on large amount of data (say, 14 days with 1-minute resolution) but do it occasionally (say, once per day), at the same time you have frequent inference(say, once per minute) on new chunks of data
Note
: However, it's still highly recommended, to keep your model up-to-date with tendencies found in your data as it evolves in time.
Produced model instances are stored in-memory between consecutive re-fit calls (defined by fit_every
arg in PeriodicScheduler
), leading to higher RAM consumption.
Examples: Prophet
Built-in Models
Overview
VM Anomaly Detection (vmanomaly
hereinafter) models support 2 groups of parameters:
vmanomaly
-specific arguments - please refer to Parameters specific for vmanomaly and Default model parameters subsections for each of the models below.- Arguments to inner model (say, Facebook's Prophet), passed in a
args
argument as key-value pairs, that will be directly given to the model during initialization to allow granular control. Optional.
Note: For users who may not be familiar with Python data types such as list[dict]
, a dictionary in Python is a data structure that stores data values in key-value pairs. This structure allows for efficient data retrieval and management.
Models:
- AutoTuned - designed to take the cognitive load off the user, allowing any of built-in models below to be re-tuned for best params on data seen during each
fit
phase of the algorithm. Tradeoff is between increased computational time and optimized results / simpler maintenance. - Prophet - the most versatile one for production usage, especially for complex data (trends, change points, multi-seasonality)
- Z-score - useful for testing and for simpler data (de-trended data without strict seasonality and with anomalies of similar magnitude as your "normal" data)
- Holt-Winters - well-suited for data with moderate complexity, exhibiting distinct trends and/or seasonal patterns.
- MAD (Median Absolute Deviation) - similarly to Z-score, is effective for identifying outliers in relatively consistent data (useful for detecting sudden, stark deviations from the median)
- Rolling Quantile - best for data with evolving patterns, as it adapts to changes over a rolling window.
- Seasonal Trend Decomposition - similarly to Holt-Winters, is best for data with pronounced seasonal and trend components
- Isolation forest (Multivariate) - useful for metrics data interaction (several queries/metrics -> single anomaly score) and efficient in detecting anomalies in high-dimensional datasets
- Custom model - benefit from your own models and expertise to better support your unique use case.
AutoTuned
Tuning hyperparameters of a model can be tricky and often requires in-depth knowledge of Machine Learning. AutoTunedModel
is designed specifically to take the cognitive load off the user - specify as little as anomaly_percentage
param from (0, 0.5)
interval and tuned_model_class
(i.e. model.zscore.ZscoreModel
) to get it working with best settings that match your data.
Parameters specific for vmanomaly:
class
(string) - model class name"model.auto.AutoTunedModel"
tuned_class_name
(string) - Built-in model class to tune, i.e.model.zscore.ZscoreModel
.optimization_params
(dict) - Optimization parameters for unsupervised model tuning. Control % of found anomalies, as well as a tradeoff between time spent and the accuracy. The moretimeout
andn_trials
are, the better model configuration can be found fortuned_class_name
, but the longer it takes and vice versa. Setn_jobs
to-1
to use all the CPUs available, it makes sense if only you have a big dataset to train on duringfit
calls, otherwise overhead isn't worth it.anomaly_percentage
(float) - expected percentage of anomalies that can be seen in training data, from (0, 0.5) interval.seed
(int) - Random seed for reproducibility and deterministic nature of underlying optimizations.n_splits
(int) - How many folds to create for hyperparameter tuning out of your data. The higher, the longer it takes but the better the results can be. Defaults to 3.n_trials
(int) - How many trials to sample from hyperparameter search space. The higher, the longer it takes but the better the results can be. Defaults to 128.timeout
(float) - How many seconds in total can be spent on each model to tune hyperparameters. The higher, the longer it takes, allowing to test more trials out of definedn_trials
, but the better the results can be.
# ...
models:
your_desired_alias_for_a_model:
class: 'model.auto.AutoTunedModel'
tuned_class_name: 'model.zscore.ZscoreModel'
optimization_params:
anomaly_percentage: 0.004 # required. i.e. we expect <= 0.4% of anomalies to be present in training data
seed: 42 # fix reproducibility & determinism
n_splits: 4 # how much folds are created for internal cross-validation
n_trials: 128 # how many configurations to sample from search space during optimization
timeout: 10 # how many seconds to spend on optimization for each trained model during `fit` phase call
n_jobs: 1 # how many jobs in parallel to launch. Consider making it > 1 only if you have fit window containing > 10000 datapoints for each series
# ...
Note: Autotune can't be made on your custom model. Also, it can't be applied to itself (like tuned_class_name: 'model.auto.AutoTunedModel'
)
Prophet
Here we utilize the Facebook Prophet implementation, as detailed in their library documentation. All parameters from this library are compatible and can be passed to the model.
Parameters specific for vmanomaly:
class
(string) - model class name"model.prophet.ProphetModel"
seasonalities
(list[dict], optional) - Extra seasonalities to pass to Prophet. Seeadd_seasonality()
Prophet param.
Note: Apart from standard vmanomaly output Prophet model can provide additional metrics.
Additional output metrics produced by FB Prophet
Depending on chosen seasonality
parameter FB Prophet can return additional metrics such as:
trend
,trend_lower
,trend_upper
additive_terms
,additive_terms_lower
,additive_terms_upper
,multiplicative_terms
,multiplicative_terms_lower
,multiplicative_terms_upper
,daily
,daily_lower
,daily_upper
,hourly
,hourly_lower
,hourly_upper
,holidays
,holidays_lower
,holidays_upper
,- and a number of columns for each holiday if
holidays
param is set
Config Example
models:
your_desired_alias_for_a_model:
class: 'model.prophet.ProphetModel'
provide_series: ['anomaly_score', 'yhat', 'yhat_lower', 'yhat_upper', 'trend']
seasonalities:
- name: 'hourly'
period: 0.04166666666
fourier_order: 30
# Inner model args (key-value pairs) accepted by
# https://facebook.github.io/prophet/docs/quick_start.html#python-api
args:
# See https://facebook.github.io/prophet/docs/uncertainty_intervals.html
interval_width: 0.98
country_holidays: 'US'
Resulting metrics of the model are described here
Z-score
Parameters specific for vmanomaly:
class
(string) - model class name"model.zscore.ZscoreModel"
z_threshold
(float, optional) - standard score for calculation boundaries and anomaly score. Defaults to2.5
.
Config Example
models:
your_desired_alias_for_a_model:
class: "model.zscore.ZscoreModel"
z_threshold: 2.5
Resulting metrics of the model are described here.
Holt-Winters
Here we use Holt-Winters Exponential Smoothing implementation from statsmodels
library. All parameters from this library can be passed to the model.
Parameters specific for vmanomaly:
-
class
(string) - model class name"model.holtwinters.HoltWinters"
-
frequency
(string) - Must be set equal to sampling_period. Model needs to know expected data-points frequency (e.g. '10m'). If omitted, frequency is guessed during fitting as the median of intervals between fitting data timestamps. During inference, if incoming data doesn't have the same frequency, then it will be interpolated. E.g. data comes at 15 seconds resolution, and our resample_freq is '1m'. Then fitting data will be downsampled to '1m' and internal model is trained at '1m' intervals. So, during inference, prediction data would be produced at '1m' intervals, but interpolated to "15s" to match with expected output, as output data must have the same timestamps. As accepted by pandas.Timedelta (e.g. '5m'). -
seasonality
(string, optional) - As accepted by pandas.Timedelta. -
If
seasonal_periods
is not specified, it is calculated asseasonality
/frequency
Used to compute "seasonal_periods" param for the model (e.g. '1D' or '1W'). -
z_threshold
(float, optional) - standard score for calculating boundaries to define anomaly score. Defaults to 2.5.
Default model parameters:
-
If parameter
seasonal
is not specified, default value will beadd
. -
If parameter
initialization_method
is not specified, default value will beestimated
. -
args
(dict, optional) - Inner model args (key-value pairs). See accepted params in model documentation. Defaults to empty (not provided). Example: {"seasonal": "add", "initialization_method": "estimated"}
Config Example
models:
your_desired_alias_for_a_model:
class: "model.holtwinters.HoltWinters"
seasonality: '1d'
frequency: '1h'
# Inner model args (key-value pairs) accepted by statsmodels.tsa.holtwinters.ExponentialSmoothing
args:
seasonal: 'add'
initialization_method: 'estimated'
Resulting metrics of the model are described here.
MAD (Median Absolute Deviation)
The MAD model is a robust method for anomaly detection that is less sensitive to outliers in data compared to standard deviation-based models. It considers a point as an anomaly if the absolute deviation from the median is significantly large.
Parameters specific for vmanomaly:
class
(string) - model class name"model.mad.MADModel"
threshold
(float, optional) - The threshold multiplier for the MAD to determine anomalies. Defaults to2.5
. Higher values will identify fewer points as anomalies.
Config Example
models:
your_desired_alias_for_a_model:
class: "model.mad.MADModel"
threshold: 2.5
Resulting metrics of the model are described here.
Rolling Quantile
Parameters specific for vmanomaly:
class
(string) - model class name"model.rolling_quantile.RollingQuantileModel"
quantile
(float) - quantile value, from 0.5 to 1.0. This constraint is implied by 2-sided confidence interval.window_steps
(integer) - size of the moving window. (see 'sampling_period')
Config Example
models:
your_desired_alias_for_a_model:
class: "model.rolling_quantile.RollingQuantileModel"
quantile: 0.9
window_steps: 96
Resulting metrics of the model are described here.
Seasonal Trend Decomposition
Here we use Seasonal Decompose implementation from statsmodels
library. Parameters from this library can be passed to the model. Some parameters are specifically predefined in vmanomaly and can't be changed by user(model
='additive', two_sided
=False).
Parameters specific for vmanomaly:
class
(string) - model class name"model.std.StdModel"
period
(integer) - Number of datapoints in one season.z_threshold
(float, optional) - standard score for calculating boundaries to define anomaly score. Defaults to2.5
.
Config Example
models:
your_desired_alias_for_a_model:
class: "model.std.StdModel"
period: 2
Resulting metrics of the model are described here.
Additional output metrics produced by Seasonal Trend Decomposition model
resid
- The residual component of the data series.trend
- The trend component of the data series.seasonal
- The seasonal component of the data series.
Isolation forest (Multivariate)
Detects anomalies using binary trees. The algorithm has a linear time complexity and a low memory requirement, which works well with high-volume data. It can be used on both univatiate and multivariate data, but it is more effective in multivariate case.
Important: Be aware of the curse of dimensionality. Don't use single multivariate model if you expect your queries to return many time series of less datapoints that the number of metrics. In such case it is hard for a model to learn meaningful dependencies from too sparse data hypercube.
Here we use Isolation Forest implementation from scikit-learn
library. All parameters from this library can be passed to the model.
Parameters specific for vmanomaly:
-
class
(string) - model class name"model.isolation_forest.IsolationForestMultivariateModel"
-
contamination
(float or string, optional) - The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the scores of the samples. Default value - "auto". Should be either"auto"
or be in the range (0.0, 0.5]. -
seasonal_features
(list of string) - List of seasonality to encode through cyclical encoding, i.e.dow
(day of week). Introduced in 1.12.0.- Empty by default for backward compatibility.
- Example:
seasonal_features: ['dow', 'hod']
. - Supported seasonalities:
- "minute" - minute of hour (0-59)
- "hod" - hour of day (0-23)
- "dow" - day of week (1-7)
- "month" - month of year (1-12)
-
args
(dict, optional) - Inner model args (key-value pairs). See accepted params in model documentation. Defaults to empty (not provided). Example: {"random_state": 42, "n_estimators": 100}
Config Example
models:
your_desired_alias_for_a_model:
# To use univariate model, substitute class argument with "model.isolation_forest.IsolationForestModel".
class: "model.isolation_forest.IsolationForestMultivariateModel"
contamination: "0.01"
provide_series: ['anomaly_score']
seasonal_features: ['dow', 'hod']
args:
n_estimators: 100
# i.e. to assure reproducibility of produced results each time model is fit on the same input
random_state: 42
Resulting metrics of the model are described here.
vmanomaly output
When vmanomaly is executed, it generates various metrics, the specifics of which depend on the model employed. These metrics can be renamed in the writer's section.
The default metrics produced by vmanomaly include:
-
anomaly_score
: This is the primary metric.- It is designed in such a way that values from 0.0 to 1.0 indicate non-anomalous data.
- A value greater than 1.0 is generally classified as an anomaly, although this threshold can be adjusted in the alerting configuration.
- The decision to set the changepoint at 1 was made to ensure consistency across various models and alerting configurations, such that a score above 1 consistently signifies an anomaly.
-
yhat
: This represents the predicted expected value. -
yhat_lower
: This indicates the predicted lower boundary. -
yhat_upper
: This refers to the predicted upper boundary. -
y
: This is the original value obtained from the query result.
Important: Be aware that if NaN
(Not a Number) or Inf
(Infinity) values are present in the input data during infer
model calls, the model will produce NaN
as the anomaly_score
for these particular instances.
vmanomaly
monitoring metrics
Each model exposes several monitoring metrics to its health_path
endpoint:
Custom Model Guide
Apart from vmanomaly predefined models, users can create their own custom models for anomaly detection.
Here in this guide, we will
- Make a file containing our custom model definition
- Define VictoriaMetrics Anomaly Detection config file to use our custom model
- Run service
Note: The file containing the model should be written in Python language (3.11+)
1. Custom model
Note
: By default, each custom model is created as univariate / non-rolling model. If you want to override this behavior, define models inherited from
RollingModel
(to get a rolling model), or havingis_multivariate
class arg set toTrue
(please refer to the code example below).
We'll create custom_model.py
file with CustomModel
class that will inherit from vmanomaly Model
base class.
In the CustomModel
class there should be three required methods - __init__
, fit
and infer
:
-
__init__
method should initiate parameters for the model.Note: if your model relies on configs that have
arg
key-value pair argument, do not forget to use Python's**kwargs
in method's signature and to explicitly callsuper().__init__(**kwargs)
to initialize the base class each model derives from
-
fit
method should contain the model training process. Please be aware that forRollingModel
definingfit
method is not needed, as the whole fit/infer process should be defined completely ininfer
method. -
infer
should return Pandas.DataFrame object with model's inferences.
For the sake of simplicity, the model in this example will return one of two values of anomaly_score
- 0 or 1 depending on input parameter percentage
.
import numpy as np
import pandas as pd
import scipy.stats as st
import logging
from model.model import Model
# from model.model import RollingModel # inherit from it for your model to be of rolling type
logger = logging.getLogger(__name__)
class CustomModel(Model):
"""
Custom model implementation.
"""
# by default, each `Model` will be created as a univariate one
# uncomment line below for it to be of multivariate type
# is_multivariate = True
def __init__(self, percentage: float = 0.95, **kwargs):
super().__init__(**kwargs)
self.percentage = percentage
self._mean = np.nan
self._std = np.nan
def fit(self, df: pd.DataFrame):
# Model fit process:
y = df['y']
self._mean = np.mean(y)
self._std = np.std(y)
if self._std == 0.0:
self._std = 1 / 65536
def infer(self, df: pd.DataFrame) -> np.array:
# Inference process:
y = df['y']
zscores = (y - self._mean) / self._std
anomaly_score_cdf = st.norm.cdf(np.abs(zscores))
df_pred = df[['timestamp', 'y']].copy()
df_pred['anomaly_score'] = anomaly_score_cdf > self.percentage
df_pred['anomaly_score'] = df_pred['anomaly_score'].astype('int32', errors='ignore')
return df_pred
2. Configuration file
Next, we need to create config.yaml
file with VM Anomaly Detection configuration and model input parameters.
In the config file models
section we need to put our model class model.custom.CustomModel
and all parameters used in __init__
method.
You can find out more about configuration parameters in vmanomaly config docs.
schedulers:
s1:
infer_every: "1m"
fit_every: "1m"
fit_window: "1d"
models:
custom_model:
# note: every custom model should implement this exact path, specified in `class` field
class: "model.model.CustomModel"
# custom model params are defined here
percentage: 0.9
reader:
datasource_url: "http://localhost:8428/"
queries:
ingestion_rate: 'sum(rate(vm_rows_inserted_total)) by (type)'
churn_rate: 'sum(rate(vm_new_timeseries_created_total[5m]))'
writer:
datasource_url: "http://localhost:8428/"
metric_format:
__name__: "custom_$VAR"
for: "$QUERY_KEY"
run: "test-format"
monitoring:
# /metrics server.
pull:
port: 8080
push:
url: "http://localhost:8428/"
extra_labels:
job: "vmanomaly-develop"
config: "custom.yaml"
3. Running custom model
Let's pull the docker image for vmanomaly:
docker pull victoriametrics/vmanomaly:latest
Now we can run the docker container putting as volumes both config and model file:
Note
: place the model file to
/model/custom.py
path when copying
docker run -it \
--net [YOUR_NETWORK] \
-v [YOUR_LICENSE_FILE_PATH]:/license.txt \
-v $(PWD)/custom_model.py:/vmanomaly/src/model/custom.py \
-v $(PWD)/custom.yaml:/config.yaml \
victoriametrics/vmanomaly:latest /config.yaml \
--license-file=/license.txt
Please find more detailed instructions (license, etc.) here
Output
As the result, this model will return metric with labels, configured previously in config.yaml
.
In this particular example, 2 metrics will be produced. Also, there will be added other metrics from input query result.
{__name__="custom_anomaly_score", for="ingestion_rate", model_alias="custom_model", scheduler_alias="s1", run="test-format"},
{__name__="custom_anomaly_score", for="churn_rate", model_alias="custom_model", scheduler_alias="s1", run="test-format"}