VictoriaMetrics/docs/anomaly-detection/components/reader.md at 329d9a46eecfc790fcc2d6f163fd093724530766

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-12-01 14:47:38 +00:00

docs/vmanomaly - release 1.18.0 (#7378 )

### Describe Your Changes

docs/vmanomaly - release 1.18.0

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

(cherry picked from commit 5d73b8b866)

2024-10-29 16:31:59 +01:00

18 KiB

Raw Blame History

title

weight

aliases

Reader

docs

parent	weight
vmanomaly-components	2

/anomaly-detection/components/reader.html

VictoriaMetrics Anomaly Detection (vmanomaly) primarily uses VmReader to ingest data. This reader focuses on fetching time-series data directly from VictoriaMetrics with the help of powerful MetricsQL expressions for aggregating, filtering and grouping your data, ensuring seamless integration and efficient data handling.

Future updates will introduce additional readers, expanding the range of data sources vmanomaly can work with.

VM reader

Note

: Starting from v1.13.0 there is backward-compatible change of queries arg of VmReader. New format allows to specify per-query parameters, like step to reduce amount of data read from VictoriaMetrics TSDB and to allow config flexibility. Please see per-query parameters section for the details.

Old format like

# other config sections ...
reader:
  class: 'vm'
  datasource_url: 'http://localhost:8428'  # source victoriametrics/prometheus
  sampling_period: "10s"  # set it <= min(infer_every) in schedulers section
  queries:
    # old format {query_alias: query_expr}, prior to 1.13, will be converted to a new format automatically
    vmb: 'avg(vm_blocks)'

will be converted to a new one with a warning raised in logs:

# other config sections ...
reader:
  class: 'vm'
  datasource_url: 'http://localhost:8428'  # source victoriametrics/prometheus
  sampling_period: '10s'
  queries:
    # old format {query_alias: query_expr}, prior to 1.13, will be converted to a new format automatically
    vmb:
      expr: 'avg(vm_blocks)'  # initial MetricsQL expression
      step: '10s'  # individual step for this query, will be filled with `sampling_period` from the root level
      data_range: ['-inf', 'inf']  # by default, no constraints applied on data range
      tz: 'UTC'  # by default, tz-free data is used throughout the model lifecycle
      # new query-level arguments will be added in backward-compatible way in future releases

Per-query parameters

Starting from v1.13.0 there is change of queries arg format. Now each query alias supports the next (sub)fields:

expr (string): MetricsQL/PromQL expression that defines an input for VmReader. As accepted by /query_range?query=%s. i.e. avg(vm_blocks)
step (string): query-level frequency of the points returned, i.e. 30s. Will be converted to /query_range?step=%s param (in seconds). Useful to optimize total amount of data read from VictoriaMetrics, where different queries may have different frequencies for different machine learning models to run on.

Note

: if not set explicitly (or if older config style prior to v1.13.0) is used, then it is set to reader-level sampling_period arg.

Note

: having different individual step args for queries (i.e. 30s for q1 and 2m for q2) is not yet supported for multivariate model if you want to run it on several queries simultaneously (i.e. setting queries arg of a model to [q1, q2]).
data_range (list[float | string]): Introduced in v1.15.1, it allows defining valid data ranges for input per individual query in queries, resulting in:
- High anomaly scores (>1) when the data falls outside the expected range, indicating a data constraint violation.
- Lowest anomaly scores (=0) when the model's predictions (yhat) fall outside the expected range, meaning uncertain predictions.
max_points_per_query (int): Introduced in v1.17.0, optional arg overrides how search.maxPointsPerTimeseries flag (available since v1.14.1) impacts vmanomaly on splitting long fit_window queries into smaller sub-intervals. This helps users avoid hitting the search.maxQueryDuration limit for individual queries by distributing initial query across multiple subquery requests with minimal overhead. Set less than search.maxPointsPerTimeseries if hitting maxQueryDuration limits. If set on a query-level, it overrides the global max_points_per_query (reader-level).
tz (string): Introduced in v1.18.0, this optional argument enables timezone specification per query, overriding the reader’s default tz. This setting helps to account for local timezone shifts, such as DST, in models that are sensitive to seasonal variations (e.g., ProphetModel or OnlineQuantileModel).

Per-query config example

reader:
  class: 'vm'
  sampling_period: '1m'
  max_points_per_query: 10000
  # other reader params ...
  queries:
    ingestion_rate:
      expr: 'sum(rate(vm_rows_inserted_total[5m])) by (type) > 0'
      step: '2m'  # overrides global `sampling_period` of 1m
      data_range: [10, 'inf']  # meaning only positive values > 10 are expected, i.e. a value `y` < 10 will trigger anomaly score > 1
      max_points_per_query: 5000 # overrides reader-level value of 10000 for `ingestion_rate` query
      tz: 'America/New_York'  # to override reader-wise `tz`

Config parameters

Parameter	Example	Description
`class`	`reader.vm.VmReader` (or `vm` starting from v1.13.0)	Name of the class needed to enable reading from VictoriaMetrics or Prometheus. VmReader is the default option, if not specified.
`queries`	See per-query config example above	See per-query config section above
`datasource_url`	`http://localhost:8481/`	Datasource URL address
`tenant_id`	`0:0`, `multitenant`	For VictoriaMetrics Cluster version only, tenants are identified by `accountID` or `accountID:projectID`. Starting from v1.16.2, `multitenant` endpoint is supported, to execute queries over multiple tenants. See VictoriaMetrics Cluster multitenancy docs
`sampling_period`	`1h`	Frequency of the points returned. Will be converted to `/query_range?step=%s` param (in seconds). Required since v1.9.0.
`query_range_path`	`/api/v1/query_range`	Performs PromQL/MetricsQL range query
`health_path`	`health`	Absolute or relative URL address where to check availability of the datasource.
`user`	`USERNAME`	BasicAuth username
`password`	`PASSWORD`	BasicAuth password
`timeout`	`30s`	Timeout for the requests, passed as a string
`verify_tls`	`false`	Verify TLS certificate. If `False`, it will not verify the TLS certificate. If `True`, it will verify the certificate using the system's CA store. If a path to a CA bundle file (like `ca.crt`), it will verify the certificate using the provided CA bundle.
`tls_cert_file`	`path/to/cert.crt`	Path to a file with the client certificate, i.e. `client.crt`. Available since v1.16.3.
`tls_key_file`	`path/to/key.crt`	Path to a file with the client certificate key, i.e. `client.key`. Available since v1.16.3.
`bearer_token`	`token`	Token is passed in the standard format with header: `Authorization: bearer {token}`
`bearer_token_file`	`path_to_file`	Path to a file, which contains token, that is passed in the standard format with header: `Authorization: bearer {token}`. Available since v1.15.9
`extra_filters`	`[]`	List of strings with series selector. See: Prometheus querying API enhancements
`query_from_last_seen_timestamp`	`False`	If True, then query will be performed from the last seen timestamp for a given series. If False, then query will be performed from the start timestamp, based on a schedule period. Defaults to `False`. Useful for `infer` stages in case there were skipped `infer` calls prior to given.
`latency_offset`	`1ms`	Introduced in v1.15.1, it allows overriding the default `-search.latencyOffset` flag of VictoriaMetrics (30s). The default value is set to 1ms, which should help in cases where `sampling_frequency` is low (10-60s) and `sampling_frequency` equals `infer_every` in the PeriodicScheduler. This prevents users from receiving `service - WARNING - [Scheduler [scheduler_alias]] No data available for inference.` warnings in logs and allows for consecutive `infer` calls without gaps. To restore the old behavior, set it equal to your `-search.latencyOffset` flag value.
`max_points_per_query`	`10000`	Introduced in v1.17.0, optional arg overrides how `search.maxPointsPerTimeseries` flag (available since v1.14.1) impacts `vmanomaly` on splitting long `fit_window` queries into smaller sub-intervals. This helps users avoid hitting the `search.maxQueryDuration` limit for individual queries by distributing initial query across multiple subquery requests with minimal overhead. Set less than `search.maxPointsPerTimeseries` if hitting `maxQueryDuration` limits. You can also set it on per-query basis to override this global one.
`tz`	`UTC`	Introduced in v1.18.0, this optional argument specifies the IANA timezone to account for local shifts, like DST, in models sensitive to seasonal patterns (e.g., `ProphetModel` or `OnlineQuantileModel`). Defaults to `UTC` if not set and can be overridden on a per-query basis.

Config file example:

reader:
  class: "vm"  # or "reader.vm.VmReader" until v1.13.0
  datasource_url: "https://play.victoriametrics.com/"
  tenant_id: "0:0"
  tz: 'America/New_York'
  queries:
    ingestion_rate:
      expr: 'sum(rate(vm_rows_inserted_total[5m])) by (type) > 0'
      step: '1m' # can override global `sampling_period` on per-query level
      data_range: [0, 'inf']
      tz: 'Australia/Sydney'  # if set, overrides reader-wise tz
  sampling_period: '1m'
  query_from_last_seen_timestamp: True  # false by default
  latency_offset: '1ms'

mTLS protection

As of v1.16.3, vmanomaly supports mutual TLS (mTLS) for secure communication across its components, including VmReader, VmWriter, and Monitoring/Push. This allows for mutual authentication between the client and server when querying or writing data to VictoriaMetrics Enterprise, configured for mTLS.

mTLS ensures that both the client and server verify each other's identity using certificates, which enhances security by preventing unauthorized access.

To configure mTLS, the following parameters can be set in the config:

verify_tls: If set to a string, it functions like the -mtlsCAFile command-line argument of VictoriaMetrics, specifying the CA bundle to use. Set to True to use the system's default certificate store.
tls_cert_file: Specifies the path to the client certificate, analogous to the -tlsCertFile argument of VictoriaMetrics.
tls_key_file: Specifies the path to the client certificate key, similar to the -tlsKeyFile argument of VictoriaMetrics.

These options allow you to securely interact with mTLS-enabled VictoriaMetrics endpoints.

Example configuration to enable mTLS with custom certificates:

reader:
  class: "vm"
  datasource_url: "https://your-victoriametrics-instance-with-mtls"
  # tenant_id: "0:0" uncomment and set for cluster version
  queries:
    vm_blocks_example:
      expr: 'avg(rate(vm_blocks[5m]))'
      step: 30s
  sampling_period: 30s
  verify_tls: "path/to/ca.crt"  # path to CA bundle for TLS verification
  tls_cert_file: "path/to/client.crt"  # path to the client certificate
  tls_key_file:  "path/to/client.key"  # path to the client certificate key
  # additional reader parameters ...

# other config sections, like models, schedulers, writer, ...

Healthcheck metrics

VmReader exposes several healthchecks metrics.

18 KiB Raw Blame History Unescape Escape