VictoriaMetrics/docs/anomaly-detection/components/scheduler.md
Fred Navruzov 6fa3283d04
docs/vmanomaly - release 1.18.5 (#7684)
### Describe Your Changes

docs/vmanomaly - release 1.18.5 doc updates

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2024-11-28 13:34:45 +02:00

16 KiB

title weight menu aliases
Scheduler 3
docs
parent weight
vmanomaly-components 3
/anomaly-detection/components/scheduler.html

Scheduler defines how often to run and make inferences, as well as what timerange to use to train the model. Is specified in scheduler section of a config for VictoriaMetrics Anomaly Detection.

Note: Starting from v1.11.0 scheduler section in config supports multiple schedulers via aliasing.
Also, vmanomaly expects scheduler section to be named schedulers. Using old (flat) format with scheduler key is deprecated and will be removed in future versions.

schedulers:
  scheduler_periodic_1m:
    # class: "periodic" # or class: "scheduler.periodic.PeriodicScheduler" until v1.13.0 with class alias support)
    infer_every: "1m"
    fit_every: "2m"
    fit_window: "3h"
  scheduler_periodic_5m:
    # class: "periodic" # or class: "scheduler.periodic.PeriodicScheduler" until v1.13.0 with class alias support)
    infer_every: "5m"
    fit_every: "10m"
    fit_window: "3h"
...

Old-style configs (< 1.11.0)

scheduler:
  # class: "periodic" # or class: "scheduler.periodic.PeriodicScheduler" until v1.13.0 with class alias support)
  infer_every: "1m"
  fit_every: "2m"
  fit_window: "3h"
...

will be implicitly converted to

schedulers:
  default_scheduler:  # default scheduler alias added, for backward compatibility
    class: "scheduler.periodic.PeriodicScheduler"
    infer_every: "1m"
    fit_every: "2m"
    fit_window: "3h"
...

Parameters

class: str, default="scheduler.periodic.PeriodicScheduler", options={"scheduler.periodic.PeriodicScheduler", "scheduler.oneoff.OneoffScheduler", "scheduler.backtesting.BacktestingScheduler"}

  • "scheduler.periodic.PeriodicScheduler": periodically runs the models on new data. Useful for consecutive re-trainings to counter data drift and model degradation over time.
  • "scheduler.oneoff.OneoffScheduler": runs the process once and exits. Useful for testing.
  • "scheduler.backtesting.BacktestingScheduler": imitates consecutive backtesting runs of OneoffScheduler. Runs the process once and exits. Use to get more granular control over testing on historical data.

Note

: starting from v1.13.0, class aliases are supported, so "scheduler.periodic.PeriodicScheduler" can be substituted to "periodic", "scheduler.oneoff.OneoffScheduler" - to "oneoff", "scheduler.backtesting.BacktestingScheduler" - to "backtesting"

Depending on selected class, different parameters should be used

Periodic scheduler

Parameters

For periodic scheduler parameters are defined as differences in times, expressed in difference units, e.g. days, hours, minutes, seconds.

Examples: "50s", "4m", "3h", "2d", "1w".

Time granularity
s seconds
m minutes
h hours
d days
w weeks

Parameter Type Example Description

fit_window

str

14d

What time range to use for training the models. Must be at least 1 second.

infer_every

str

1m

How often a model produce and write its anomaly scores on new datapoints. Must be at least 1 second.

fit_every

str, Optional

1h

How often to completely retrain the models. If not set, value of infer_every is used and retrain happens on every inference run.

start_from

str, Optional

2024-11-26T01:00:00Z, 01:00

Available since v1.18.5. Specifies when to initiate the first fit_every call. Accepts either an ISO 8601 datetime or a time in HH:MM format. If the specified time is in the past, the next suitable time is calculated based on the fit_every interval. For the HH:MM format, if the time is in the past, it will be scheduled for the same time on the following day, respecting the tz argument if provided. By default, the timezone defaults to UTC.

tz

str, Optional

America/New_York

Available since v1.18.5. Defines the local timezone for the start_from parameter, if specified. Defaults to UTC if no timezone is provided.

Periodic scheduler config example

schedulers:
  periodic_scheduler_alias:
    class: "periodic"
    # (or class: "scheduler.periodic.PeriodicScheduler" for versions before v1.13.0, without class alias support)
    fit_window: "14d" 
    infer_every: "1m" 
    fit_every: "1h"
    start_from: "20:00"  # If launched before 20:00 (local Kyiv time), the first run starts today at 20:00. Otherwise, it starts tomorrow at 20:00.
    tz: "Europe/Kyiv"  # Defaults to 'UTC' if not specified.

This configuration specifies that vmanomaly will calculate a 14-day time window from the time of fit_every call to train the model. Starting at 20:00 Kyiv local time today (or tomorrow if launched after 20:00), the model will be retrained every hour using the most recent 14-day window, which always includes an additional hour of new data. The time window remains strictly 14 days and does not extend with subsequent retrains. Additionally, vmanomaly will perform model inference every minute, processing newly added data points using the most recent model.

Oneoff scheduler

Parameters

For Oneoff scheduler timeframes can be defined in Unix time in seconds or ISO 8601 string format. ISO format supported time zone offset formats are:

  • Z (UTC)
  • ±HH:MM
  • ±HHMM
  • ±HH

If a time zone is omitted, a timezone-naive datetime is used.

Defining fitting timeframe

Format Parameter Type Example Description
ISO 8601

fit_start_iso

str

"2022-04-01T00:00:00Z", "2022-04-01T00:00:00+01:00", "2022-04-01T00:00:00+0100", "2022-04-01T00:00:00+01"

Start datetime to use for training a model. ISO string or UNIX time in seconds.
UNIX time

fit_start_s

float 1648771200
ISO 8601

fit_end_iso

str

"2022-04-10T00:00:00Z", "2022-04-10T00:00:00+01:00", "2022-04-10T00:00:00+0100", "2022-04-10T00:00:00+01"

End datetime to use for training a model. Must be greater than

fit_start_* . ISO string or UNIX time in seconds.

UNIX time

fit_end_s

float 1649548800

Defining inference timeframe

Format Parameter Type Example Description
ISO 8601

infer_start_iso

str

"2022-04-11T00:00:00Z", "2022-04-11T00:00:00+01:00", "2022-04-11T00:00:00+0100", "2022-04-11T00:00:00+01"

Start datetime to use for a model inference. ISO string or UNIX time in seconds.
UNIX time

infer_start_s

float 1649635200
ISO 8601

infer_end_iso

str

"2022-04-14T00:00:00Z", "2022-04-14T00:00:00+01:00", "2022-04-14T00:00:00+0100", "2022-04-14T00:00:00+01"

End datetime to use for a model inference. Must be greater than

infer_start_* . ISO string or UNIX time in seconds.

UNIX time

infer_end_s

float 1649894400

ISO format scheduler config example

schedulers:
  oneoff_scheduler_alias:
    class: "oneoff"
    # (or class: "scheduler.oneoff.OneoffScheduler" until v1.13.0 with class alias support)
    fit_start_iso: "2022-04-01T00:00:00Z"
    fit_end_iso: "2022-04-10T00:00:00Z"
    infer_start_iso: "2022-04-11T00:00:00Z"
    infer_end_iso: "2022-04-14T00:00:00Z"

UNIX time format scheduler config example

schedulers:
  oneoff_scheduler_alias:
    class: "oneoff"
    # (or class: "scheduler.oneoff.OneoffScheduler" until v1.13.0 with class alias support)
    fit_start_s: 1648771200
    fit_end_s: 1649548800
    infer_start_s: 1649635200
    infer_end_s: 1649894400

Backtesting scheduler

Parameters

As for Oneoff scheduler, timeframes can be defined in Unix time in seconds or ISO 8601 string format. ISO format supported time zone offset formats are:

  • Z (UTC)
  • ±HH:MM
  • ±HHMM
  • ±HH

If a time zone is omitted, a timezone-naive datetime is used.

Parallelization

Parameter Type Example Description

n_jobs

int

1

Allows proportionally faster (yet more resource-intensive) evaluations of a config on historical data. Default value is 1, that implies sequential execution. Introduced in v1.13.0

Defining overall timeframe

This timeframe will be used for slicing on intervals (fit_window, infer_window == fit_every), starting from the latest available time point, which is to_* and going back, until no full fit_window + infer_window interval exists within the provided timeframe.

Format Parameter Type Example Description
ISO 8601

from_iso

str

"2022-04-01T00:00:00Z", "2022-04-01T00:00:00+01:00", "2022-04-01T00:00:00+0100", "2022-04-01T00:00:00+01"

Start datetime to use for backtesting.
UNIX time

from_s

float 1648771200
ISO 8601

to_iso

str

"2022-04-10T00:00:00Z", "2022-04-10T00:00:00+01:00", "2022-04-10T00:00:00+0100", "2022-04-10T00:00:00+01"

End datetime to use for backtesting. Must be greater than

from_start_*

UNIX time

to_s

float 1649548800

Defining training timeframe

The same explicit logic as in Periodic scheduler

Format Parameter Type Example Description
ISO 8601

fit_window

str

"PT1M", "P1H"

What time range to use for training the models. Must be at least 1 second.
Prometheus-compatible

"1m", "1h"

Defining inference timeframe

In BacktestingScheduler, the inference window is implicitly defined as a period between 2 consecutive model fit_every runs. The latest inference window starts from to_s - fit_every and ends on the latest available time point, which is to_s. The previous periods for fit/infer are defined the same way, by shifting fit_every seconds backwards until we get the last full fit period of fit_window size, which start is >= from_s.

Format Parameter Type Example Description
ISO 8601

fit_every

str

"PT1M", "P1H"

What time range to use previously trained model to infer on new data until next retrain happens.
Prometheus-compatible

"1m", "1h"

ISO format scheduler config example

schedulers:
  backtesting_scheduler_alias:
    class: "backtesting"
    # (or class: "scheduler.backtesting.BacktestingScheduler" until v1.13.0 with class alias support)
    from_iso: '2021-01-01T00:00:00Z'
    to_iso: '2021-01-14T00:00:00Z'
    fit_window: 'P14D'
    fit_every: 'PT1H'
    n_jobs: 1  # default = 1 (sequential), set it up to # of CPUs for parallel execution

UNIX time format scheduler config example

schedulers:
  backtesting_scheduler_alias:
    class: "backtesting"
    # (or class: "scheduler.backtesting.BacktestingScheduler" until v1.13.0 with class alias support)
    from_s: 167253120
    to_s: 167443200
    fit_window: '14d'
    fit_every: '1h'
    n_jobs: 1  # default = 1 (sequential), set it up to # of CPUs for parallel execution