vmanomaly - models a bit pretify docs (#5618)

* vmanomaly - models a bit pretify docs

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* typi

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* fix formatting

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

---------

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
This commit is contained in:
Artem Navoiev 2024-01-15 13:26:38 -08:00 committed by Aliaksandr Valialkin
parent a90c9bf8c9
commit 6370ae2d88
No known key found for this signature in database
GPG key ID: 52C003EE2BCDB9EB

View file

@ -40,20 +40,19 @@ Here we use ARIMA implementation from `statsmodels` [library](https://www.statsm
*Parameters specific for vmanomaly*:
\* - mandatory parameters.
* `class`\* (string) - model class name `"model.arima.ArimaModel"`
* `class` (string) - model class name `"model.arima.ArimaModel"`
* `z_threshold` (float) - [standard score](https://en.wikipedia.org/wiki/Standard_score) for calculating boundaries to define anomaly score. Defaults to 2.5.
* `z_threshold` (float, optional) - [standard score](https://en.wikipedia.org/wiki/Standard_score) for calculating boundaries to define anomaly score. Defaults to `2.5`.
* `provide_series` (list[string]) - List of columns to be produced and returned by the model. Defaults to `["anomaly_score", "yhat", "yhat_lower" "yhat_upper", "y"]`. Output can be **only a subset** of a given column list.
* `provide_series` (list[string], optional) - List of columns to be produced and returned by the model. Defaults to `["anomaly_score", "yhat", "yhat_lower" "yhat_upper", "y"]`. Output can be **only a subset** of a given column list.
* `resample_freq` (string) = Frequency to resample input data into, e.g. data comes at 15 seconds resolution, and resample_freq is '1m'. Then fitting data will be downsampled to '1m' and internal model is trained at '1m' intervals. So, during inference, prediction data would be produced at '1m' intervals, but interpolated to "15s" to match with expected output, as output data must have the same timestamps.
* `resample_freq` (string, optional) - Frequency to resample input data into, e.g. data comes at 15 seconds resolution, and resample_freq is '1m'. Then fitting data will be downsampled to '1m' and internal model is trained at '1m' intervals. So, during inference, prediction data would be produced at '1m' intervals, but interpolated to "15s" to match with expected output, as output data must have the same timestamps.
*Default model parameters*:
* `order`\* (list[int]) - ARIMA's (p,d,q) order of the model for the autoregressive, differences, and moving average components, respectively.
* `order` (list[int]) - ARIMA's (p,d,q) order of the model for the autoregressive, differences, and moving average components, respectively.
* `args`: (dict) - Inner model args (key-value pairs). See accepted params in [model documentation](https://www.statsmodels.org/dev/generated/statsmodels.tsa.arima.model.ARIMA.html). Defaults to empty (not provided). Example: {"trend": "c"}
* `args` (dict, optional) - Inner model args (key-value pairs). See accepted params in [model documentation](https://www.statsmodels.org/dev/generated/statsmodels.tsa.arima.model.ARIMA.html). Defaults to empty (not provided). Example: {"trend": "c"}
*Config Example*
<div class="with-copy" markdown="1">
@ -62,10 +61,7 @@ Here we use ARIMA implementation from `statsmodels` [library](https://www.statsm
model:
class: "model.arima.ArimaModel"
# ARIMA's (p,d,q) order
order:
- 1
- 1
- 0
order: [1, 1, 0]
z_threshold: 2.7
resample_freq: '1m'
# Inner model args (key-value pairs) accepted by statsmodels.tsa.arima.model.ARIMA
@ -80,21 +76,16 @@ Here we use Holt-Winters Exponential Smoothing implementation from `statsmodels`
*Parameters specific for vmanomaly*:
\* - mandatory parameters.
* `class`\* (string) - model class name `"model.holtwinters.HoltWinters"`
* `class` (string) - model class name `"model.holtwinters.HoltWinters"`
* `frequency`\* (string) - Must be set equal to sampling_period. Model needs to know expected data-points frequency (e.g. '10m').
If omitted, frequency is guessed during fitting as **the median of intervals between fitting data timestamps**. During inference, if incoming data doesn't have the same frequency, then it will be interpolated.
E.g. data comes at 15 seconds resolution, and our resample_freq is '1m'. Then fitting data will be downsampled to '1m' and internal model is trained at '1m' intervals. So, during inference, prediction data would be produced at '1m' intervals, but interpolated to "15s" to match with expected output, as output data must have the same timestamps.
* `frequency` (string) - Must be set equal to sampling_period. Model needs to know expected data-points frequency (e.g. '10m'). If omitted, frequency is guessed during fitting as **the median of intervals between fitting data timestamps**. During inference, if incoming data doesn't have the same frequency, then it will be interpolated. E.g. data comes at 15 seconds resolution, and our resample_freq is '1m'. Then fitting data will be downsampled to '1m' and internal model is trained at '1m' intervals. So, during inference, prediction data would be produced at '1m' intervals, but interpolated to "15s" to match with expected output, as output data must have the same timestamps. As accepted by pandas.Timedelta (e.g. '5m').
As accepted by pandas.Timedelta (e.g. '5m').
* `seasonality` (string) - As accepted by pandas.Timedelta.
* `seasonality` (string, optional) - As accepted by pandas.Timedelta.
*
If `seasonal_periods` is not specified, it is calculated as `seasonality` / `frequency`
Used to compute "seasonal_periods" param for the model (e.g. '1D' or '1W').
* `z_threshold` (float) - [standard score](https://en.wikipedia.org/wiki/Standard_score) for calculating boundaries to define anomaly score. Defaults to 2.5.
* `z_threshold` (float, optional) - [standard score](https://en.wikipedia.org/wiki/Standard_score) for calculating boundaries to define anomaly score. Defaults to 2.5.
*Default model parameters*:
@ -103,7 +94,7 @@ Used to compute "seasonal_periods" param for the model (e.g. '1D' or '1W').
* If [parameter](https://www.statsmodels.org/dev/generated/statsmodels.tsa.holtwinters.ExponentialSmoothing.html#statsmodels.tsa.holtwinters.ExponentialSmoothing-parameters) `initialization_method` is not specified, default value will be `estimated`.
* `args`: (dict) - Inner model args (key-value pairs). See accepted params in [model documentation](https://www.statsmodels.org/dev/generated/statsmodels.tsa.holtwinters.ExponentialSmoothing.html#statsmodels.tsa.holtwinters.ExponentialSmoothing-parameters). Defaults to empty (not provided). Example: {"seasonal": "add", "initialization_method": "estimated"}
* `args` (dict, optional) - Inner model args (key-value pairs). See accepted params in [model documentation](https://www.statsmodels.org/dev/generated/statsmodels.tsa.holtwinters.ExponentialSmoothing.html#statsmodels.tsa.holtwinters.ExponentialSmoothing-parameters). Defaults to empty (not provided). Example: {"seasonal": "add", "initialization_method": "estimated"}
*Config Example*
<div class="with-copy" markdown="1">
@ -128,10 +119,9 @@ Here we utilize the Facebook Prophet implementation, as detailed in their [libra
*Parameters specific for vmanomaly*:
\* - mandatory parameters.
* `class`\* (string) - model class name `"model.prophet.ProphetModel"`
* `seasonalities` (list[dict]) - Extra seasonalities to pass to Prophet. See [`add_seasonality()`](https://facebook.github.io/prophet/docs/seasonality,_holiday_effects,_and_regressors.html#modeling-holidays-and-special-events:~:text=modeling%20the%20cycle-,Specifying,-Custom%20Seasonalities) Prophet param.
* `provide_series` - model resulting metrics. If not specified [standard metrics](#vmanomaly-output) will be provided.
* `class` (string) - model class name `"model.prophet.ProphetModel"`
* `seasonalities` (list[dict], optional) - Extra seasonalities to pass to Prophet. See [`add_seasonality()`](https://facebook.github.io/prophet/docs/seasonality,_holiday_effects,_and_regressors.html#modeling-holidays-and-special-events:~:text=modeling%20the%20cycle-,Specifying,-Custom%20Seasonalities) Prophet param.
* `provide_series` (dict, optional) - model resulting metrics. If not specified [standard metrics](#vmanomaly-output) will be provided.
**Note**: Apart from standard vmanomaly output Prophet model can provide [additional metrics](#additional-output-metrics-produced-by-fb-prophet).
@ -171,11 +161,9 @@ Resulting metrics of the model are described [here](#vmanomaly-output)
*Parameters specific for vmanomaly*:
\* - mandatory parameters.
* `class`\* (string) - model class name `"model.rolling_quantile.RollingQuantileModel"`
* `quantile`\* (float) - quantile value, from 0.5 to 1.0. This constraint is implied by 2-sided confidence interval.
* `window_steps`\* (integer) - size of the moving window. (see 'sampling_period')
* `class` (string) - model class name `"model.rolling_quantile.RollingQuantileModel"`
* `quantile` (float) - quantile value, from 0.5 to 1.0. This constraint is implied by 2-sided confidence interval.
* `window_steps` (integer) - size of the moving window. (see 'sampling_period')
*Config Example*
<div class="with-copy" markdown="1">
@ -196,10 +184,9 @@ Here we use Seasonal Decompose implementation from `statsmodels` [library](https
*Parameters specific for vmanomaly*:
\* - mandatory parameters.
* `class`\* (string) - model class name `"model.std.StdModel"`
* `period`\* (integer) - Number of datapoints in one season.
* `z_threshold` (float) - [standard score](https://en.wikipedia.org/wiki/Standard_score) for calculating boundaries to define anomaly score. Defaults to 2.5.
* `class` (string) - model class name `"model.std.StdModel"`
* `period` (integer) - Number of datapoints in one season.
* `z_threshold` (float, optional) - [standard score](https://en.wikipedia.org/wiki/Standard_score) for calculating boundaries to define anomaly score. Defaults to `2.5`.
*Config Example*
@ -225,9 +212,8 @@ The MAD model is a robust method for anomaly detection that is *less sensitive*
*Parameters specific for vmanomaly*:
\* - mandatory parameters.
* `class`\* (string) - model class name `"model.mad.MADModel"`
* `threshold` (float) - The threshold multiplier for the MAD to determine anomalies. Defaults to 2.5. Higher values will identify fewer points as anomalies.
* `class` (string) - model class name `"model.mad.MADModel"`
* `threshold` (float, optional) - The threshold multiplier for the MAD to determine anomalies. Defaults to `2.5`. Higher values will identify fewer points as anomalies.
*Config Example*
<div class="with-copy" markdown="1">
@ -237,14 +223,15 @@ model:
class: "model.mad.MADModel"
threshold: 2.5
```
Resulting metrics of the model are described [here](#vmanomaly-output).
---
## [Z-score](https://en.wikipedia.org/wiki/Standard_score)
*Parameters specific for vmanomaly*:
\* - mandatory parameters.
* `class`\* (string) - model class name `"model.zscore.ZscoreModel"`
* `z_threshold` (float) - [standard score](https://en.wikipedia.org/wiki/Standard_score) for calculation boundaries and anomaly score. Defaults to 2.5.
* `class` (string) - model class name `"model.zscore.ZscoreModel"`
* `z_threshold` (float, optional) - [standard score](https://en.wikipedia.org/wiki/Standard_score) for calculation boundaries and anomaly score. Defaults to `2.5`.
*Config Example*
<div class="with-copy" markdown="1">
@ -254,6 +241,7 @@ model:
class: "model.zscore.ZscoreModel"
z_threshold: 2.5
```
</div>
Resulting metrics of the model are described [here](#vmanomaly-output).
@ -267,12 +255,11 @@ Here we use Isolation Forest implementation from `scikit-learn` [library](https:
*Parameters specific for vmanomaly*:
\* - mandatory parameters.
* `class`\* (string) - model class name `"model.isolation_forest.IsolationForestMultivariateModel"`
* `class` (string) - model class name `"model.isolation_forest.IsolationForestMultivariateModel"`
* `contamination` - The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the scores of the samples. Default value - "auto". Should be either `"auto"` or be in the range (0.0, 0.5].
* `contamination` (float or string, optional) - The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the scores of the samples. Default value - "auto". Should be either `"auto"` or be in the range (0.0, 0.5].
* `args`: (dict) - Inner model args (key-value pairs). See accepted params in [model documentation](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html). Defaults to empty (not provided). Example: {"random_state": 42, "n_estimators": 100}
* `args` (dict, optional) - Inner model args (key-value pairs). See accepted params in [model documentation](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html). Defaults to empty (not provided). Example: {"random_state": 42, "n_estimators": 100}
*Config Example*
<div class="with-copy" markdown="1">