VictoriaMetrics/docs/anomaly-detection/CHANGELOG.md
Fred Navruzov 5d73b8b866
docs/vmanomaly - release 1.18.0 (#7378)
### Describe Your Changes

docs/vmanomaly - release 1.18.0

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2024-10-28 17:25:24 +02:00

41 KiB

weight title menu aliases
5 CHANGELOG
docs
identifier parent weight
vmanomaly-changelog anomaly-detection 5
/anomaly-detection/CHANGELOG.html

Please find the changelog for VictoriaMetrics Anomaly Detection below.

v1.18.0

Released: 2024-10-28

  • FEATURE: Introduced timezone-aware support in VmReader for accurate seasonality modeling, especially during DST shifts. A new tz argument enables timezone offset management at both global and query-specific levels.

    • Enhanced ProphetModel with a tz_aware argument (combined with tz_seasonalities and tz_use_cyclical_encoding) for timezone-aware timestamps. This addresses a limitation in Prophet's native design that doesn't allow timezone-aware and DST-aware seasonality.
  • IMPROVEMENT: Enhanced error handling in VmReader to provide clearer diagnostics and broader coverage.

  • FIX: Updated vmanomaly_version_info and vmanomaly_ui_version_info gauges to correctly set the version label value based on image tags.

  • FIX: The n_samples_seen_ attribute now properly resets to 0 with each new fit call in online model classes (OnlineMADModel and OnlineQuantileModel), ensuring accurate tracking of processed sample count.

v1.17.2

Released: 2024-10-22

  • IMPROVEMENT: Added vmanomaly_version_info (service) and vmanomaly_ui_version_info (vmui) gauges to self-monitoring metrics.
  • IMPROVEMENT: Added instance and job labels to pushed metrics so they have the same labels as vmanomaly metrics that are pulled/scraped. Metric labels can be customized via the extra_labels argument. By default job label will be vmanomaly and the instance label will be f'{hostname}:{vmanomaly_port}. See monitoring.push for examples and details.
  • IMPROVEMENT: Added a subsection to monitoring page with detailed per-component service logs, including reader and writer logs, error handling, metrics updates, and multi-tenancy warnings.
  • IMPROVEMENT: Added a new Command-line arguments subsection to the Quickstart guide, providing details on available options for configuring vmanomaly.

v1.17.1

Released: 2024-10-18

  • FIX: Prophet models no longer fail to train on constant data, data consisting of the same value and no variation across time. The bug prevented the fit stage from completing successfully, resulting in the model instance not being stored in the model registry, after automated model cleanup was added in v1.17.0.

v1.17.0

Released: 2024-10-17

  • FEATURE: Added max_points_per_query (global and query-specific) VmReader arg to control query chunking. This overrides how search.maxPointsPerTimeseries flag (introduced in v1.14.1) is used in vmanomaly for splitting long fit_window queries into smaller sub-intervals. This helps users avoid hitting the search.maxQueryDuration limit for individual queries by distributing initial query across multiple subquery requests with minimal overhead.

  • IMPROVEMENT: Enhanced the self-monitoring metrics for consistency across the components. Key changes include:

    • Converted several self-monitoring metrics from Summary to Histogram to enable quantile calculation. This addresses the limitation of the prometheus_client's Summary implementation, which does not support quantiles. The change ensures metrics are more informative for performance analysis. Affected metrics are:
      • vmanomaly_reader_request_duration_seconds (VmReader)
      • vmanomaly_reader_response_parsing_seconds (VmReader)
      • vmanomaly_writer_request_duration_seconds (VmWriter)
      • vmanomaly_writer_request_serialize_seconds (VmWriter)
    • Added a query_key label to the vmanomaly_reader_response_parsing_seconds metric to provide finer granularity in tracking the performance of individual queries. This metric has also been switched from Summary to Histogram to align with the other metrics and support quantile calculations.
    • Added preset and scheduler_alias keys to VmReader and VmWriter metrics for consistency in multi-scheduler setups.
    • Renamed Counters vmanomaly_reader_response_count to vmanomaly_reader_responses and vmanomaly_writer_response_count to vmanomaly_writer_responses.
    • Updated docs for better clarity.
  • IMPROVEMENT: Accelerated performance of model fitting stages on multicore systems.

  • IMPROVEMENT: Optimized query handling in multi-scheduler setups by filtering queries for each scheduler based on model requirements. This reduces unnecessary data fetching from VictoriaMetrics, ensuring only relevant queries are processed by the VmReader, leading to better performance and efficiency of configs with multiple active schedulers.

  • IMPROVEMENT: Implemented automatic cleanup of files in subdirectories within /tmp (generated by the Stan backend when utilizing Prophet models) after each fit operation. This prevents the accumulation of unused data over time in /tmp, addressing a potential issue where these files would only be deleted upon termination of the current Python session or service, leading to uncontrolled disk growth.

  • FIX: Re-enable the vmanomaly_reader_response_count (now called vmanomaly_reader_responses) self-monitoring metric for the VmReader, which was unintentionally disabled in previous releases and now updates correctly as intended.

v1.16.3

Released: 2024-10-08

  • IMPROVEMENT: Added tls_cert_file and tls_key_file arguments to support mTLS (mutual TLS) in vmanomaly components. This enhancement applies to the following components: VmReader, VmWriter, and Monitoring/Push. You can also use these arguments in conjunction with verify_tls when it is set as a path to a custom CA certificate file.

v1.16.2

Released: 2024-10-06

  • FEATURE: Added support for multitenant value in tenant_id arg to enable querying across multiple tenants in VictoriaMetrics cluster (option available from v1.104.0):

    • Applied when reading input data from vmselect via the VmReader.
    • Applied when writing generated results through vminsert via the VmWriter.
    • For more details, refer to the tenant_id arg description in the documentation of the mentioned components.
  • FIX: Resolved an issue with handling an empty preset value (e.g., preset: "") that was preventing the default helm chart from being deployed.

v1.16.1

Released: 2024-10-02

  • FIX: This patch release prevents the service from crashing by rolling back the version of a third-party dependency. Affected releases: v1.16.0.

v1.16.0

Released: 2024-10-01

Note

: A bug was discovered in this release that causes the service to crash. Please use the patch v1.16.1 to resolve this issue.

  • FEATURE: Introduced data dumps to a host filesystem for VmReader. Resource-intensive setups (multiple queries returning many metrics, bigger fit_window arg) will have RAM consumption reduced during fit calls.

  • IMPROVEMENT: Added a groupby argument for logical grouping in multivariate models. When specified, a separate multivariate model is trained for each unique combination of label values in the groupby columns. For example, to perform multivariate anomaly detection on metrics at the machine level without cross-entity interference, you can use groupby: [host] or groupby: [instance], ensuring one model per entity being trained (e.g., per host). Please find more details here.

  • IMPROVEMENT: Improved performance of VmReader on multicore instances for reading and data processing.

  • IMPROVEMENT: Introduced new CLI argument aliases to enhance compatibility with Helm charts (i.e. using secrets) and better align with VictoriaMetrics flags:

    • --licenseFile as an alias for --license-file
    • --license.forceOffline as an alias for --license-verify-offline
    • --loggerLevel as an alias for --log-level
    • The previous argument format is retained for backward compatibility.
  • FIX: The provide_series common argument now correctly filters the written time series in the IsolationForestMultivariate model.

v1.15.9

Released: 2024-08-27

v1.15.8

Released: 2024-08-27

  • FIX: Made minor adjustments to how the reader and writer handle bearer tokens across different modes.

v1.15.7

Released: 2024-08-27

  • FIX: Made minor adjustments to how the reader and writer handle bearer tokens across different modes.

v1.15.6

Released: 2024-08-26

  • IMPROVEMENT: Introduced the bearer_token_file argument to the reader and writer components to enhance secret management.

v1.15.5

Released: 2024-08-19

  • FIX: following v1.15.2 online model enhancement, now data_range parameter is correctly initialized for online models, created (for new time series returned by particular query) during infer calls.

v1.15.4

Released: 2024-08-15

v1.15.3

Released: 2024-08-14

  • IMPROVEMENT: better config handling of reader section if using vmanomaly with helm charts.

v1.15.2

Released: 2024-08-13

  • IMPROVEMENT: Enhanced online models (e.g., OnlineQuantileModel) to automatically create model instances for unseen time series during infer calls, eliminating the need to wait for the next fit call. This ensures no inferences are skipped when using online models.
  • FIX: Corrected an issue with the OnlineMADModel to ensure proper functionality when used in combination with on-disk model dump mode.
  • FIX: Addressed numerical instability in the OnlineQuantileModel when use_transform is set to True.
  • FIX: Resolved a logging issue that could cause a RuntimeError: reentrant call inside <_io.BufferedWriter name='<stderr>'> when a termination event was received.

v1.15.1

Released: 2024-08-10

  • FEATURE: Introduced backward-compatible data_range query-specific parameter to the VmReader. It enables the definition of valid data ranges for input per individual query in queries, resulting in:

    • High anomaly scores (>1) when the data falls outside the expected range, indicating a data constraint violation.
    • Lowest anomaly scores (=0) when the model's predictions (yhat) fall outside the expected range, signaling uncertain predictions.
    • For more details, please refer to the documentation.
  • IMPROVEMENT: Added latency_offset argument to the VmReader to override the default -search.latencyOffset flag of VictoriaMetrics (30s). The default value is set to 1ms, which should help in cases where sampling_frequency is low (10-60s) and sampling_frequency equals infer_every in the PeriodicScheduler. This prevents users from receiving service - WARNING - [Scheduler [scheduler_alias]] No data available for inference. warnings in logs and allows for consecutive infer calls without gaps. To restore the backward compatible behavior, set it equal to your -search.latencyOffset value in VmReader config section.

  • FIX: Ensure the use_transform argument of the OnlineQuantileModel functions as intended.

  • FIX: Add a docstring for query_from_last_seen_timestamp arg of VmReader.

v1.15.0

Released: 2024-08-06

  • FEATURE: Introduced models that support online learning for stream-like input. These models significantly reduce the amount of data required for the initial fit stage. For example, they enable reducing fit_every from weeks to hours and increasing fit_every from hours to weeks in the PeriodicScheduler, significantly reducing the peak amount of data queried from VictoriaMetrics during fit stages. The next models were added:

  • FEATURE: Introduced the optimized_business_params key (list of strings) to the AutoTuned optimization_params. This allows particular business-specific parameters such as detection_direction and min_dev_from_expected to remain unchanged during optimizations, retaining their default values.

  • IMPROVEMENT: Optimized the AutoTuned model logic to minimize deviations from the expected anomaly_percentage specified in the configuration and the detected percentage in the data, while also reducing discrepancies between the actual values (y) and the predictions (yhat).

  • IMPROVEMENT: Allow ProphetModel to fit with multiple seasonalities when used in AutoTuned mode.

v1.14.2

Released: 2024-07-26

  • FIX: Patch a bug introduced in v1.14.1, causing vmanomaly to crash in preset mode.

v1.14.1

Released: 2024-07-26

  • FEATURE: Allow to process larger data chunks in VmReader that exceed -search.maxPointsPerTimeseries constraint in VictoriaMetrics by splitting the range and sending multiple requests. A warning is printed in logs, suggesting reducing the range or step, or increasing search.maxPointsPerTimeseries constraint in VictoriaMetrics, which is still a recommended option.

  • FEATURE: Backward-compatible redesign of queries arg of VmReader. Old format of {q_alias1: q_expr1, q_alias2: q_expr2, ...} will be implicitly converted to a new one with a warning raised in logs. New format allows to specify per-query parameters, like step to reduce amount of data read from VictoriaMetrics TSDB and to allow config flexibility. Find out more in Per-query parameters section of VmReader.

  • IMPROVEMENT: Added multi-platform builds for linux/amd64 and linux/arm64 architectures.

v1.13.3

Released: 2024-07-17

  • FIX: now validation of args argument for HoltWinters model works properly.

v1.13.2

Released: 2024-07-15

v1.13.0

Released: 2024-06-11

  • FEATURE: Introduced preset mode to run vmanomaly service with minimal user input and on widely-known metrics, like those produced by node_exporter.
  • FEATURE: Introduced min_dev_from_expected model common arg, aimed at reducing false positives in scenarios where deviations between the real value y and the expected value yhat are relatively high and may cause models to generate high anomaly scores. However, these deviations are not significant enough in absolute values to be considered anomalies based on domain knowledge.
  • FEATURE: Introduced detection_direction model common arg, enabling domain-driven anomaly detection strategies. Configure models to identify anomalies occurring above, below, or in both directions relative to the expected values.
  • FEATURE: add n_jobs arg to BacktestingScheduler to allow proportionally faster (yet more resource-intensive) evaluations of a config on historical data. Default value is 1, that implies sequential execution.
  • FEATURE: allow anomaly detection models to be dumped to a host filesystem after fit stage (instead of in-memory). Resource-intensive setups (many models, many metrics, bigger fit_window arg) and/or 3rd-party models that store fit data (like ProphetModel or HoltWinters) will have RAM consumption greatly reduced at a cost of slightly slower infer stage. Please find how to enable it here
  • IMPROVEMENT: Reduced the resource used for each fitted ProphetModel by up to 6 times. This includes both RAM for in-memory models and disk space for on-disk models storage. For more details, refer to this discussion on Facebook's Prophet.
  • IMPROVEMENT: now config components class can be referenced by a short alias instead of a full class path - i.e. model.zscore.ZscoreModel becomes zscore, reader.vm.VmReader becomes vm, scheduler.periodic.PeriodicScheduler becomes periodic, etc.
  • FIX: if using multi-scheduler setup (introduced in v1.11.0), prevent schedulers (and correspondent services) that are not attached to any model (so neither found in 'schedulers' arg nor left blank in model section) from being spawn, causing resource overhead and slight interference with existing ones.
  • FIX: set random seed for ProphetModel to assure uncertainty estimates (like yhat_lower, yhat_upper) and dependant series (like anomaly_score), produced during .infer() calls are always deterministic given the same input. See initial issue for the details.
  • FIX: prevent orphan queries (that are not attached to any model or scheduler) found in queries arg of Reader config section to be fetched from VictoriaMetrics TSDB, avoiding redundant data processing. A warning will be logged, if such queries exist in a parsed config.

v1.12.0

Released: 2024-03-31

  • FEATURE: Introduction of AutoTunedModel model class to optimize any built-in model on data during fit phase. Specify as little as anomaly_percentage param from (0, 0.5) interval and tuned_model_class (i.e. model.zscore.ZscoreModel) to get it working with best settings that match your data. See details here.
  • IMPROVEMENT: Better logging of model lifecycle (fit/infer stages).
  • IMPROVEMENT: Introduce provide_series arg to all the built-in models to define what output fields to generate for writing (i.e. provide_series: ['anomaly_score'] means only scores are being produced)
  • FIX: Self-monitoring metrics are now aggregated to queries aliases level (not to label sets of individual timeseries) and aligned with reader, writer and model sections description , so /metrics endpoint holds only necessary information for scraping.
  • FIX: Self-monitoring metric vmanomaly_models_active now has additional labels model_alias, scheduler_alias, preset to align with model-centric self-monitoring.
  • IMPROVEMENT: Add possibility to use temporal information in IsolationForest models via cyclical encoding. This is particularly helpful to detect multivariate seasonality-dependant anomalies.
  • BREAKING CHANGE: ARIMA model is removed from built-in models. For affected users, it is suggested to replace ARIMA by Prophet or Holt-Winters.

v1.11.0

Released: 2024-02-22

  • FEATURE: Multi-scheduler support. Now users can use multiple model specs in a single config (via aliasing), each spec can be run with its own (even multiple) schedulers.
    • Introduction of schedulers arg in model spec:
      • It allows each model to be managed by 1 (or more) schedulers, so overall resource usage is optimized and flexibility is preserved.
      • Passing an empty list or not specifying this param implies that each model is run in all the schedulers, which is a backward-compatible behavior.
      • Please find more details in docs on Model section
  • DEPRECATION: slight refactor of a scheduler config section
    • Now schedulers are passed as a mapping of scheduler_alias: scheduler_spec under scheduler sections. Using old format (< 1.11.0) will produce warnings for now and will be removed in future versions.
  • DEPRECATION: The --watch CLI option for config file reloads is deprecated and will be ignored in the future.

v1.10.0

Released: 2024-02-15

  • FEATURE: Multi-model support. Now users can specify multiple model specs in a single config (via aliasing), as well as to reference what queries from VmReader it should be run on.

    • Introduction of queries arg in model spec:
      • It allows the model to be executed only on a particular query subset from reader section.
      • Passing an empty list or not specifying this param implies that each model is run on results from all queries, which is a backward-compatible behavior.
      • Please find more details in docs on Model section
  • DEPRECATION: slight refactor of a model config section

    • Now models are passed as a mapping of model_alias: model_spec under model sections. Using old format (<= 1.9.2) will produce warnings for now and will be removed in future versions.
    • Please find more details in docs on Model section
  • IMPROVEMENT: now logs from monitoring.pull GET requests to /metrics endpoint are shown only in DEBUG mode

  • IMPROVEMENT: labelset for multivariate models is deduplicated and cleaned, resulting in better UX

Note

: These updates support more flexible setup and effective resource management in service, as now it's not longer needed to spawn several instances of vmanomaly to split queries/models context across.

v1.9.2

Released: 2024-01-29

v1.9.1

Released: 2024-01-27

  • IMPROVEMENT: Updated the offline license verification backbone to mitigate a critical vulnerability identified in the ecdsa library, ensuring enhanced security despite initial non-impact.
  • IMPROVEMENT: bump 3rd-party dependencies for Python 3.12.1

v1.9.0

Released: 2024-01-26

  • BUGFIX: The query_from_last_seen_timestamp internal logic in VmReader, first introduced in v1.5.1, now functions correctly. This fix ensures that the input data shape remains consistent for subsequent fit-based model calls in the service.
  • BREAKING CHANGE: The sampling_period parameter is now mandatory in VmReader. This change aims to clarify and standardize the frequency of input/output in vmanomaly, thereby reducing uncertainty and aligning with user expectations.

Note

: The majority of users, who have been proactively specifying the sampling_period parameter in their configurations, will experience no disruption from this update. This transition formalizes a practice that was already prevalent and expected among our user base.

v1.8.0

Released: 2024-01-15

  • FEATURE: Added Univariate MAD (median absolute deviation) model support.
  • IMPROVEMENT: Update Python to 3.12.1 and all the dependencies.
  • IMPROVEMENT: Don't check /health endpoint, check the real /query_range or /import endpoints directly. Users kept getting problems with /health.
  • DEPRECATION: "health_path" param is deprecated and doesn't do anything in config (reader, writer, monitoring.push).

v1.7.2

Released: 2023-12-21

  • FIX: fit/infer calls are now skipped if we have insufficient valid data to run on.
  • FIX: proper handling of inf and NaN in fit/infer calls.
  • FEATURE: add counter of skipped model runs vmanomaly_model_runs_skipped to healthcheck metrics.
  • FEATURE: add exponential retries wrapper to VmReader's read_metrics().
  • FEATURE: add BacktestingScheduler for consecutive retrospective fit/infer calls.
  • FEATURE: add improved & numerically stable anomaly scores.
  • IMPROVEMENT: add full config validation. The probability of getting errors in later stages (say, model fit) is greatly reduced now. All the config validation errors that needs to be fixed are now a part of logging.

    note: this is an backward-incompatible change, as model config section now expects key-value args for internal model defined in nested args.

  • IMPROVEMENT: add explicit support of gzip-ed responses from vmselect in VmReader.

v1.6.0

Released: 2023-10-30

  • IMPROVEMENT:
    • now all the produced healthcheck metrics have vmanomaly_ prefix for easier accessing.
    • updated docs for monitoring.

    note: this is an backward-incompatible change, as metric names will be changed, resulting in new metrics creation, i.e. model_datapoints_produced will become vmanomaly_model_datapoints_produced

  • IMPROVEMENT: Set default value for --log_level from DEBUG to INFO to reduce logs verbosity.
  • IMPROVEMENT: Add alias --log-level to --log_level.
  • FEATURE: Added extra_filters parameter to reader. It allows to apply global filters to all queries.
  • FEATURE: Added verify_tls parameter to reader and writer. It allows to disable TLS verification for remote endpoint.
  • FEATURE: Added bearer_token parameter to reader and writer. It allows to pass bearer token for remote endpoint for authentication.
  • BUGFIX: Fixed passing workers parameter for reader. Previously it would throw a runtime error if workers was specified.

v1.5.1

Released: 2023-09-18

  • IMPROVEMENT: Infer from the latest seen datapoint for each query. Handles the case datapoints arrive late.

v1.5.0

Released: 2023-08-11

  • FEATURE: add --license and --license-file command-line flags for license code verification.
  • IMPROVEMENT: Updated Python to 3.11.4 and updated dependencies.
  • IMPROVEMENT: Guide documentation for Custom Model usage.

v1.4.2

Released: 2023-06-09

  • FIX: Fix case with received metric labels overriding generated.

v1.4.1

Released: 2023-06-09

  • IMPROVEMENT: Update dependencies.

v1.4.0

Released: 2023-05-06

  • FEATURE: Reworked self-monitoring grafana dashboard for vmanomaly.
  • IMPROVEMENT: Update python version and dependencies.

v1.3.0

Released: 2023-03-21

  • FEATURE: Parallelized queries. See reader.workers param to control parallelism. By default it's value is equal to number of queries (sends all the queries at once).
  • IMPROVEMENT: Updated self-monitoring dashboard.
  • IMPROVEMENT: Reverted back default bind address for /metrics server to 0.0.0.0, as vmanomaly is distributed in Docker images.
  • IMPROVEMENT: Silenced Prophet INFO logs about yearly seasonality.

v1.2.2

Released: 2023-03-19

  • FIX: Fix for metric label to pass QUERY_KEY.
  • FEATURE: Added timeout config param to reader, writer, monitoring.push.
  • FIX: Don't hang if scheduler-model thread exits.
  • FEATURE: Now reader, writer and monitoring.push will not halt the process if endpoint is inaccessible or times out, instead they will increment metrics *_response_count{code=~"timeout|connection_error"}.

v1.2.1

Released: 2023-02-18

  • FIX: Fixed scheduler thread starting.
  • FIX: Fix rolling model fit+infer.
  • BREAKING CHANGE: monitoring.pull server now binds by default on 127.0.0.1 instead of 0.0.0.0. Please specify explicitly in monitoring.pull.addr what IP address it should bind to for serving /metrics.

v1.2.0

Released: 2023-02-04

  • FEATURE: With arg --watch watches for config(s) changes and reloads the service automatically.
  • IMPROVEMENT: Remove "provide_series" from HoltWinters model. Only Prophet model now has it, because it may produce a lot of series if "holidays" is on.
  • IMPROVEMENT: if Prophet's "provide_series" is omitted, then all series are returned.
  • DEPRECATION: Config monitoring.endpoint_url is deprecated in favor of monitoring.url.
  • DEPRECATION: Remove 'enable' param from config monitoring.pull. Now /metrics server is started whenever monitoring.pull is present.
  • IMPROVEMENT: include example configs into the docker image at /vmanomaly/config/*
  • IMPROVEMENT: include self-monitoring grafana dashboard into the docker image under /vmanomaly/dashboard/vmanomaly_grafana_dashboard.json

v1.1.0

Released: 2023-01-23

  • IMPROVEMENT: update Python dependencies
  • FEATURE: Add multivariate IsolationForest model.

v1.0.1

Released: 2023-01-06

  • FIX: prophet model incorrectly predicted two points in case of only one

v1.0.0-beta

Released: 2022-12-08

  • First public release is available