github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-12-11 14:53:49 +00:00

Author	SHA1	Message	Date
Hui Wang	282f13cf11	app/vmalert: improve performances when rules produce large volumes of results 1. Avoid storing the last evaluation results outside of rules, check for stale time series as soon as possible; 2. remove duplicated template `Clone()`. This pull request is primarily reducing memory usage when rules produce large volumes of results, as seen in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6894. The CPU time spent on garbage collection remains high and may be addressed in a separate PR.	2024-11-14 18:21:20 +01:00
Hui Wang	9616814728	vmalert: integrate with victorialogs (#7255 ) address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6706. See https://github.com/VictoriaMetrics/VictoriaMetrics/blob/vmalert-support-vlog-ds/docs/VictoriaLogs/vmalert.md. Related fix https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7254. Note: in this pull request, vmalert doesn't support [backfilling](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/vmalert-support-vlog-ds/docs/VictoriaLogs/vmalert.md#rules-backfilling) for rules with a customized time filter. It might be added in the future, see [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7289) for details. Feature can be tested with image `victoriametrics/vmalert:heads-vmalert-support-vlog-ds-0-g420629c-scratch`. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `68bad22fd2`)	2024-10-29 16:32:00 +01:00
Haleygo	b52f1d1f0a	vmalert: add `evalAlignment` for rule group and fix evalutaion timstamp (#5066 ) * vmalert: add `query_time_alignment` for rule group 1. add `eval_alignment` attribute for group which by default is true. So group rule query stamp will be aligned with interval and propagated to ALERT metrics and the messages for alertmanager; 2. deprecate `datasource.queryTimeAlignment` flag. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5049 (cherry picked from commit `2aa0f5fc41`)	2023-10-10 12:45:37 +02:00
Haleygo	0212219f6c	vmalert: add `eval_offset` for group (#4693 ) Adds `eval_offset` attribute for Groups. If specified, Group will be evaluated at the exact time offset on the range of [0...evaluationInterval]. The setting might be useful for cron-like rules which must be evaluated at specific moments of time. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3409 Signed-off-by: Haley Wang <pipilong.25@gmail.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `45c0e4bb31`)	2023-09-07 10:59:14 +02:00
Aliaksandr Valialkin	2d88ebd7cb	app/vmalert/datasource: substitute golang.org/x/exp/slices.SortFunc with sort.Slice This removes unnecessary third-party dependency on golang.org/x/exp. This is a follow-up for `da60a68d09` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2945	2023-07-24 19:17:19 -07:00
Haleygo	939c8b8372	vmalert: init unit test (#4596 ) vmalert: support unit tests See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2945 --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2023-07-20 21:19:45 -07:00
Roman Khavronenko	4edb97f4da	app/vmalert: detect alerting rules which don't match any series at all (#4198 ) app/vmalert: detect alerting rules which don't match any series at all vmalert starts to understand /query responses which contain object: ``` "stats":{"seriesFetched": "42"} ``` If object is present, vmalert parses it and populates a new field `SeriesFetched`. This field is then used to populate the new metric `vmalert_alerting_rules_last_evaluation_series_fetched` and to display warnings in the vmalert's UI. If response doesn't contain the new object (Prometheus or VictoriaMetrics earlier than v1.90), then `SeriesFetched=nil`. In this case, UI will contain no additional warnings. And `vmalert_alerting_rules_last_evaluation_series_fetched` will be set to `-1`. Negative value of the metric will help to compile correct alerting rule in follow-up. Thanks for the initial implementation to @Haleygo See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4056 See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4039 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-05-09 21:48:59 -07:00
Roman Khavronenko	96db7ac52c	vmalert: speed up state restore procedure on start (#3758 ) * vmalert: speed up state restore procedure on start Alerts state restore procedure has been changed to become asynchronous. It doesn't block groups start anymore which significantly improves vmalert's startup time. Instead, state restore is called by each group in their goroutines after the first rules evaluation. While previously state restore attempt was made for all loaded alerting rules, now it is called only for alerts which became active after the first evaluation. This reduces the amount of API calls to the configured remote read URL. This also means that `remoteRead.ignoreRestoreErrors` command-line flag becomes deprecated now and will have no effect if configured. See relevant issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2608 Signed-off-by: hagen1778 <roman@victoriametrics.com> * make lint happy Signed-off-by: hagen1778 <roman@victoriametrics.com> * Apply suggestions from code review --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-02-03 19:46:41 -08:00
Roman Khavronenko	a922308438	vmalert: reduce allocations for Prometheus resp parse (#3435 ) Method `metrics()` now pre-allocates slices for labels and results from query responses. This reduces the number of allocations on the hot path for instant requests. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-12-05 00:18:11 -08:00
Roman Khavronenko	09e211a05f	vmalert: print example of `curl` command for rule's state (#3112 ) The change adds an example of `curl` command to the Rule's page. The command is generated for each recorded state. It is supposed user can just copy&execute the command to see what was returned to vmalert. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-09-19 15:04:37 +03:00
Roman Khavronenko	a887c1bc07	vmalert: add `debug` mode for alerting rules (#3055 ) * vmalert: add `debug` mode for alerting rules Debug information includes alerts state changes and requests sent to the datasource. Debug can be enabled only on rule's level. It might be useful for debugging unexpected behaviour of alerting rule. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3025 Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: review fixes Signed-off-by: hagen1778 <roman@victoriametrics.com> * Update app/vmalert/alerting.go Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com> * vmalert: go fmt Signed-off-by: hagen1778 <roman@victoriametrics.com> Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2022-09-13 16:36:30 +03:00
Roman Khavronenko	01755fac38	vmalert: remove dependency on datasource pkg from config (#2905 ) * vmalert: remove dependency on datasource pkg from config Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-07-22 13:38:25 +03:00
Roman Khavronenko	d0abdc2b5b	vmalert: allow configuring custom headers per group (#2901 ) See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2860 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-07-21 20:48:05 +03:00
Roman Khavronenko	ab10178c85	Vmalert compliance 2 (#2340 ) * vmalert: split alert's `Start` field into `ActiveAt` and `Start` The `ActiveAt` field identifies when alert becomes active for rules with `for > 0`. Previously, this value was stored in field `Start`. The field `Start` now identifies the moment alert became `FIRING`. The split is needed in order to distinguish these two moments in the API responses for alerts. Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: support specific moment of time for rules evaluation The Querier interface was extended to accept a new argument used as a timestamp at which evaluation should be made. It is needed to align rules execution time within the group. Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: mark disappeared series as stale Series generated by alerting rules, which were sent to remote write now will be marked as stale if they will disappear on the next evaluation. This would make ALERTS and ALERTS_FOR_TIME series more precise. Signed-off-by: hagen1778 <roman@victoriametrics.com> * wip Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: evaluate rules at fixed timestamp Before, time at which rules were evaluated was calculated right before rule execution. The change makes sure that timestamp is calculated only once per evalution round and all rules are using the same timestamp. It also updates the logic of resending of already resolved alert notification. Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: allow overridin `alertname` label value if it is present in response Previously, `alertname` was always equal to the Alerting Rule name. Now, its value can be overriden if series in response containt the different value for this label. The change is needed for improving compatibility with Prometheus. Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: align rules evaluation in time Now, evaluation timestamp for rules evaluates as if there was no delay in rules evaluation. It means, that rules will be evaluated at fixed timestamps+group_interval. This way provides more consistent evaluation results and improves compatibility with Prometheus, Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: add metric for missed iterations New metric `vmalert_iteration_missed_total` will show whether rules evaluation round was missed. Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: reduce delay before the initial rule evaluation in group Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: rollback alertname override According to the spec: ``` The alert name from the alerting rule (HighRequestLatency from the example above) MUST be added to the labels of the alert with the label name as alertname. It MUST override any existing alertname label. ``` https://github.com/prometheus/compliance/blob/main/alert_generator/specification.md#step-3 Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: throw err immediately on dedup detection ``` The execution of an alerting rule MUST error out immediately and MUST NOT send any alerts or add samples to samples receiver if there is more than one alert with the same labels ``` https://github.com/prometheus/compliance/blob/main/alert_generator/specification.md#step-4 Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: cleanup Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: use strings builder to reduce allocs Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-04-01 12:03:41 +03:00
Roman Khavronenko	582c063698	vmalert: introduce additional HTTP URL params per-group configuration (#1892 ) * vmalert: introduce additional HTTP URL params per-group configuration The new group field `params` allows to configure custom HTTP URL params per each group. These params will be applied to every request before executing rule's expression. Hot config reload is also supported. Field `extra_filter_labels` was deprecated in favour of `params` field. vmalert will print deprecation log message if config file contains the deprecated field. `params` fields are supported by both Prometheus and Graphite datasource types. Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: provide more examples for `params` field Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: set higher priority for `params` setting If there would be a conflict between URL params set in `datasource.url` flag and params in group definition the latter will have higher priority. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2021-12-02 14:51:54 +02:00
Roman Khavronenko	5aa7846900	vmalert: support rules backfilling (aka `replay`) (#1358 ) * vmalert: support rules backfilling (aka `replay`) vmalert can `replay` configured rules in the past and backfill results via remote write protocol. It supports MetricsQL/PromQL storage as data source, and can backfill data to remote write compatible storage. Supports recording and alerting rules `replay`. See more details in README. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/836 * vmalert: review fixes * vmalert: readme fixes	2021-06-09 12:30:54 +03:00
Nikolay	2eb8ef7b2b	changes vmalert Querier with per rule querier (#1249 ) * changes vmalert Querier with per rule querier it allows to changes some parametrs based on rule setting for instance - alert type, tenant for cluster version or event endpoint url.	2021-04-29 11:31:07 +03:00
Nikolay	b8bc1c2e0f	Graphite vmalert wip (#112 ) * init implementation for graphite alerts * adds graphite support for vmalert * small fix * changes vmalert graphite api with type * updates tests * small fix * fixes graphite parse * Fixes graphite from time	2021-02-01 15:28:30 +02:00
Roman Khavronenko	9f578e389c	vmalert: add function "query", "first" and "value" to alert templates functions (#960 ) The commit adds a support for template function `query`, `first` and `value`. The function `query` executes a MetricsQL query for active alerts. In vmalert we update templates on every evaluation for active alerts to keep them up to date. With `query` func it may become a perf issue since it will fire a query on every execution. We should keep it in mind for now. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/539	2020-12-14 20:12:16 +02:00
Roman Khavronenko	4fd2b6cd16	vmalert: explicitly set extra labels to alert entities (#886 ) The previous implementation treated extra labels (global and rule labels) as separate label set to returned time series labels. Hence, time series always contained only original labels and alert ID was generated from sorted labels key-values. Extra labels didn't affect the generated ID and were applied on the following actions: - templating for Summary and Annotations; - persisting state via remote write; - restoring state via remote read. Such behaviour caused difficulties on restore procedure because extra labels had to be dropped before checking the alert ID, but that not always worked. Consider the case when expression returns the following time series `up{job="foo"}` and rule has extra label `job=bar`. This would mean that restored alert ID will be always different to the real time series because of collision. To solve the situation extra labels are now always applied beforehand and `vmalert` doesn't store original labels anymore. However, this could result into a new error situation. Consider the case when expression returns two time series `up{job="foo"}` and `up{job="baz"}`, while rule has extra label `job=bar`. In such case, applying extra labels will result into two identical time series and `vmalert` will return error: `result contains metrics with the same labelset after applying rule labels` https://github.com/VictoriaMetrics/VictoriaMetrics/issues/870	2020-11-10 00:27:56 +02:00
Aliaksandr Valialkin	e3db2c73a6	app/vmalert: sync with master branch	2020-04-28 00:19:42 +03:00

21 commits