github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2025-03-21 15:45:01 +00:00

Author	SHA1	Message	Date
Hui Wang	dc28491771	app/vmalert: properly register group and rules metrics Commit `9ca74d1fff` introduced an issue with metrics registration. Due to metrics.Summary type always registered at the global state of metrics package, vmalert had increased memory and CPU usage after multiple configuration reloads. This commit addresses this issue and properly registers metrics.Summary metric. Now metrics for group and rules must be explicitly registered before group.Start with group.Init method. It simplifies metrics usage an ensures that all needed metrics were registered and group is ready to start. Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8532	2025-03-19 14:04:49 +01:00
Hui Wang	bcf02fb5f8	app/vmalert: fix possible data race on group checksum 1. fix possible data race on group checksum when reload is called concurrently. Before, it didn't affect much but might update the group one more time. 2. remove the unnecessary g.mu.RLock() and compute group.id at newGroup creation. Changes to group.ID() indicate that type and interval have changed, and the group is new. Related PR: https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8540	2025-03-19 14:04:48 +01:00
Guillem Jover	1d8b7faf71	spelling and grammar fixes via codespell (#8497 ) ### Describe Your Changes Fix many spelling errors and some grammar, including misspellings in filenames. The change also fixes a typo in metric `vm_mmaped_files` to `vm_mmapped_files`. While this is a breaking change, this metric isn't used in alerts or dashboards. So it seems to have low impact on users. The change also deprecates `cspell` as it is much heavier and less usable. --------- Co-authored-by: Andrii Chubatiuk <achubatiuk@victoriametrics.com> Co-authored-by: Andrii Chubatiuk <andrew.chubatiuk@gmail.com> (cherry picked from commit `76d205feae`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2025-03-17 16:38:11 +01:00
Hui Wang	9435d02102	vmalert: allow chaining groups with `eval_offset` (#8402 ) address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/860, see https://github.com/VictoriaMetrics/VictoriaMetrics/blob/change-evaloffset-behavior/docs/vmalert.md#chaining-groups Also related to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8154 (cherry picked from commit `e8e2ef54a0`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2025-03-07 10:00:26 +01:00
Zakhar Bessarab	068772ec0e	app/vmalert: properly unregister exposed metrics for alerting rules Previously if rule group parameters were changed, alerting rules related metrics could be deleted due to bug at `utils/metrics` package. This commit introduces `metrics.Set` per rule group. It holds group and alerting rules metrics. It properly unregister alerting rules metrics and addresses issue. In addition: - expose group metrics only once group is started - this helps to avoid exposing metrics for groups which are created during YAML unmarshaling and only used to update existing group. - properly close rules which are discarded after updating existing rules so that metrics are also correctly closed. - detect file renames and properly recreate groups "moved" between files. Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8229	2025-02-21 12:43:27 +01:00
Roman Khavronenko	9de0b8a165	make: bump golangci-lint to v1.63.4 ( New version has additional checks and reduced resource consumption, so it doesn't timeout for our internal repos. To make linter happy, I addressed "redefinition of the built-in function" lint error. ---- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2025-01-13 07:23:21 +01:00
Hui Wang	53fc2e95df	app/vmalert: fix the auto-generated metrics `ALERTS` and `ALERTS_FOR_STATE` Previously, since labels slice is reused for both `ALERTS` and `ALERTS_FOR_STATE`, metrics might have incorrect labels and affect the restore process. Tested the fix under `TestAlertingRule_Exec: "for-pending=>empty"`. The bug is introduced in `282f13cf11`. Affected versions: v1.106.1, v1.107...v1.108.x related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7796	2025-01-02 17:46:25 +01:00
Hui Wang	120993ea42	app/vmalert: fixes reload of external templates Previously after configuration reload call `externalURL` templaing function defined at external templates could be lost. Since it was added only at initial `Load` call and never copied during template reload process. External templates for vmalert could be defined via `-rule.templates` flag. This commit properly reload external templates. It's no longer copies mutated templates and instead fully reloads it each time if there is any changes.	2024-12-13 12:10:31 +01:00
Hui Wang	af4c6f3a29	vmalert: fix alert states restoration (#7624 ) Previously, when the alert got resolved shortly before the vmalert process shuts down, this could result in false alerts. This change switches vmalert to use MetricsQL function during alerts state restore, which makes it incompatible for state restoration with PromQL. --------- Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>	2024-11-22 17:38:14 +01:00
Hui Wang	db6b612b68	app/vmalert: fix flaky ut `TestRecordingRule_Exec` The order of stale metrics can't be controlled in recording rule, only use two time series then.	2024-11-14 18:21:20 +01:00
Hui Wang	282f13cf11	app/vmalert: improve performances when rules produce large volumes of results 1. Avoid storing the last evaluation results outside of rules, check for stale time series as soon as possible; 2. remove duplicated template `Clone()`. This pull request is primarily reducing memory usage when rules produce large volumes of results, as seen in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6894. The CPU time spent on garbage collection remains high and may be addressed in a separate PR.	2024-11-14 18:21:20 +01:00
Aliaksandr Valialkin	a02d26e853	lib/logstorage: properly take into account the `end` query arg when calculating time range for _time:duration filters (cherry picked from commit `e5537bc64d`)	2024-11-08 17:07:57 +01:00
Hui Wang	9616814728	vmalert: integrate with victorialogs (#7255 ) address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6706. See https://github.com/VictoriaMetrics/VictoriaMetrics/blob/vmalert-support-vlog-ds/docs/VictoriaLogs/vmalert.md. Related fix https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7254. Note: in this pull request, vmalert doesn't support [backfilling](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/vmalert-support-vlog-ds/docs/VictoriaLogs/vmalert.md#rules-backfilling) for rules with a customized time filter. It might be added in the future, see [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7289) for details. Feature can be tested with image `victoriametrics/vmalert:heads-vmalert-support-vlog-ds-0-g420629c-scratch`. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `68bad22fd2`)	2024-10-29 16:32:00 +01:00
Hui Wang	abd2f34833	vmalert: fix blocking hot-reload process if the old rule group hasn't started yet (#7258 ) Group [sleeps](`daa7183749/app/vmalert/rule/group.go (L320)`) random duration before start the evaluation, and during the sleep, `g.updateCh <- new` will be blocked since there is no `<-g.updateCh` waiting. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `c4fe23794a`) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-10-18 11:42:47 +02:00
Hui Wang	d2b98245ea	vmalert: fix variable `$activeAt` value when templating rule annotation in replay mode Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>	2024-09-20 17:34:54 +02:00
Dima Lazerka	465c7ad045	docs: fixes misspelled typos Also tried to make it catch "Authorisation" in the future, fixed a lot of other misspells along the way, but didn't make it catch "Authorisation" anyway. - Fix misspelled "Authorization" header name - Fix misspelled "organization" - Fix more misspells	2024-09-13 13:19:03 +02:00
dufucun	1aa9f7be4e	tests: fix slice init length (#6897 ) ### Describe Your Changes fix slice init length ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). Signed-off-by: dufucun <dufuchun@sohu.com> (cherry picked from commit `95bafc8caf`)	2024-08-30 11:18:21 +02:00
hagen1778	ec05e70742	app/vmalert: rm unnecessary err check The error check was needed before `a84491324d` It was kept by mistake and makes no sense to have rn. Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `9726e6c1a2`)	2024-08-07 09:57:48 +02:00
Aliaksandr Valialkin	43fc1183b9	app/vmalert: switch from table-driven tests to f-tests This makes test code more clear and reduces the number of code lines by 500. This also simplifies debugging tests. See https://itnext.io/f-tests-as-a-replacement-for-table-driven-tests-in-go-8814a8b19e9e While at it, consistently use t.Fatal* instead of t.Error* across tests, since t.Error* requires more boilerplate code, which can result in additional bugs inside tests. While t.Error* allows writing logging errors for the same, this doesn't simplify fixing broken tests most of the time. This is a follow-up for `a9525da8a4`	2024-07-12 22:45:50 +02:00
Aliaksandr Valialkin	d6415b2572	all: consistently use 'any' instead of 'interface{}' 'any' type is supported starting from Go1.18. Let's consistently use it instead of 'interface{}' type across the code base, since `any` is easier to read than 'interface{}'.	2024-07-10 00:23:26 +02:00
Roman Khavronenko	955d36357c	app/vmalert/rule: reduce number of allocations for getStaleSeries fn (#6269 ) Allocations are reduced by re-using the byte buffer when converting labels to string keys. ``` name old allocs/op new allocs/op delta GetStaleSeries-10 703 ± 0% 203 ± 0% ~ (p=1.000 n=1+1) ``` Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `b0c1f3d819`)	2024-05-16 09:35:51 +02:00
Hui Wang	abd29c15ab	docs: update vmalert and vmagent docs (#6207 ) * restore and actualize doc section explaining duplicated labels error * rm misleading comment about post-aggregation in stream aggregation (cherry picked from commit `e3c226cf92`)	2024-04-30 10:30:19 +02:00
Hui Wang	e0d47ab6af	vmalert: avoid blocking APIs when alerting rule uses template functio… (#6129 ) * vmalert: avoid blocking APIs when alerting rule uses template function `query` * app/vmalert: small refactoring * simplify labels and templates expanding * simplify `newAlert` interface * fix `TestGroupStart` which mistakenly skipped annotations and response labels check Signed-off-by: hagen1778 <roman@victoriametrics.com> * reduce alerts lock time when restore --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-04-19 11:30:40 +02:00
Roman Khavronenko	95b0f82c9b	app/vmalert: make `TestGroupStart` more reliable (#6130 ) There was a sleep statement in the test, waiting for Group to perform a couple of evaluation. But looks like it worked unreliable for some CI tests like the one below https://github.com/VictoriaMetrics/VictoriaMetrics/actions/runs/8718213844/job/23915007958?pr=6115 This commit changes the sleep statement on a function that waits for a specific number of evaluations. It should make this test faster in general case, and more reliable for slow environemnts.	2024-04-19 11:28:30 +02:00
Aliaksandr Valialkin	a99005eff6	all: replace old https://docs.victoriametrics.com/vmalert.html url with the new one - https://docs.victoriametrics.com/vmalert/	2024-04-18 01:44:54 +02:00
wanshuangcheng	52a4ae0b28	chore: fix function names in comment (#6076 ) Signed-off-by: wanshuangcheng <wanshuangcheng@outlook.com>	2024-04-08 15:38:51 +02:00
Aliaksandr Valialkin	00f59d6ddf	all: fix golangci-lint(revive) warnings after `0c0ed61ce7` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6001	2024-04-03 03:00:45 +03:00
Hui Wang	fdb6eb1071	vmalert: fix sending alert messages (#6028 ) * vmalert: fix sending alert messages 1. fix `endsAt` field in messages that send to alertmanager, previously rule with small interval could never be triggered; 2. fix behavior of `-rule.resendDelay`, before it could prevent sending firing message when rule state is volatile. * docs: update changelog notes Signed-off-by: hagen1778 <roman@victoriametrics.com> --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-04-02 23:51:06 +03:00
Roman Khavronenko	a3e198588f	vmalert: set `ActiveAt` to evaluation timestamp in `newAlert` fn (#5657 ) The change fixes flaky test `TestAlertingRule_Exec` which has dependency on the actual timestamps, which resulted into inaccurate test states: https://github.com/VictoriaMetrics/VictoriaMetrics/actions/runs/7608452967/job/20717699688 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-01-29 17:30:14 +01:00
Roman Khavronenko	a2f83115ae	app/vmalert: autogenerate `ALERTS_FOR_STATE` time series for alerting rules with `for: 0` (#5680 ) * app/vmalert: autogenerate `ALERTS_FOR_STATE` time series for alerting rules with `for: 0` Previously, `ALERTS_FOR_STATE` was generated only for alerts with `for > 0`. This behavior differs from Prometheus behavior - it generates ALERTS_FOR_STATE time series for alerting rules with `for: 0` as well. Such time series can be useful for tracking the moment when alerting rule became active. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5648 https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3056 Signed-off-by: hagen1778 <roman@victoriametrics.com> * app/vmalert: support ALERTS_FOR_STATE in `replay` mode Signed-off-by: hagen1778 <roman@victoriametrics.com> --------- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-01-26 20:51:50 +01:00
Hui Wang	c14e229b20	vmalert: automatically add `exported_` prefix for original evaluation… (#5398 ) automatically add `exported_` prefix for original evaluation result label if it's conflicted with external or reserved one, previously it was overridden. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5161 Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `1f477aba41`)	2023-12-22 16:10:33 +01:00
Dmytro Kozlov	6a41e1ec0c	app/vmalert: replace error metrics for gauges with counter metrics (#5217 ) See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5160 Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `935bec447b`)	2023-12-06 19:41:34 +01:00
Dmytro Kozlov	6770bad207	app/vmalert: expose `/vmalert/api/v1/rule` and `/api/v1/rule` API which returns rule status in JSON format (#5397 ) * app/vmalert: expose `/vmalert/api/v1/rule` and `/api/v1/rule` API which returns rule status in JSON format * app/vmalert: hide updates if query param not set * app/vmalert: fix panic (recursion call) * app/vmalert: add needed group name and file name * app/vmalert: fix comment, update behavior * app/vmalert: fix description * app/vmalert: simplify API for /api/v1/rule Signed-off-by: hagen1778 <roman@victoriametrics.com> * app/vmalert: simplify API for /api/v1/rule Signed-off-by: hagen1778 <roman@victoriametrics.com> * app/vmalert: simplify API for /api/v1/rule Signed-off-by: hagen1778 <roman@victoriametrics.com> * app/vmalert: simplify API for /api/v1/rule Signed-off-by: hagen1778 <roman@victoriametrics.com> * app/vmalert: simplify API for /api/v1/rule Signed-off-by: hagen1778 <roman@victoriametrics.com> --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2023-12-04 22:49:39 +02:00
Roman Khavronenko	becf7bf8df	app/vmalert: update remote-write process (#5284 ) * app/vmalert: update remote-write process * automatically retry remote-write requests on closed connections. The change should reduce the amount of logs produced in environments with short-living connections or environments without support of keep-alive on network balancers. * increment `vmalert_remotewrite_errors_total` metric if all retries to send remote-write request failed. Before, this metric was incremented only if remote-write client's buffer is overloaded. * increment `vmalert_remotewrite_dropped_rows_total` amd `vmalert_remotewrite_dropped_bytes_total` metrics if remote-write client's buffer is overloaded. Before, these metrics were incremented only after unsuccessful HTTP calls. Signed-off-by: hagen1778 <roman@victoriametrics.com> * Update docs/CHANGELOG.md --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Hui Wang <haley@victoriametrics.com>	2023-11-13 09:25:29 +01:00
Aliaksandr Valialkin	3d6f4da3b3	docs: update -help output after recent changes to VictoriaMetrics components	2023-11-02 20:27:16 +01:00
Roman Khavronenko	4e8c762fd9	app/vmalert: add label `file` pointing to the group's filename to metrics (#5281 ) The filename should help identifying alerting rules belonging to specific groups with identical names but different filenames. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5267 Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `b5254199c6`)	2023-11-02 16:02:29 +01:00
hagen1778	3773510e8f	app/vmalert: verify alert name correctness in restore test Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `6eb205f8b0`)	2023-11-02 16:02:29 +01:00
Hui Wang	44fcdf0cf0	vmalert: reduce restore query request for each alerting rule (#5265 ) reduce the number of queries for restoring alerts state on start-up. The change should speed up the restore process and reduce pressure on `remoteRead.url`. (cherry picked from commit `90d45574bf`)	2023-11-02 16:02:28 +01:00
Hui Wang	8a786e5df4	vmalert: fix alert firing state in replay mode (#5192 ) fix possible missing firing states for alerting rules in replay mode Before if one firing stage is bigger than single query request range, like rule with a big `for`, alerting rule won't able to be detected as firing. Co-authored-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `abcb21aa5e`)	2023-10-30 13:55:48 +01:00
hagen1778	ddedeb1d42	app/vmalert: remove unclear comment The timestamp alignment should be applied as a last step to keep the timestamp consistent. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-10-27 14:09:01 +02:00
Aliaksandr Valialkin	36a1fdca6c	all: consistently use %w instead of %s in when error is passed to fmt.Errorf() This allows consistently using errors.Is() for verifying whether the given error wraps some other known error.	2023-10-26 09:44:40 +02:00
hagen1778	f00729ee24	app/vmalert: fix typo in tests Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `c07909a20b`)	2023-10-26 08:55:20 +02:00
hagen1778	cf541c757a	app/vmalert: fix tests after `a216fe6728` `a216fe6728` Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `eed0c3c6b0`)	2023-10-26 08:55:06 +02:00
hagen1778	6c63ca18f5	app/vmalert: follow-up after `c9375cac5e` `c9375cac5e` Descriptions were updated in attempt to make it more clear for readers, re-phrasing and linking missing docs. `eval_delay` was added to tests to verify it can be unmarshalled. `eval_delay` is now applied before timestamp alignment to make it more predictable. Before, if delay < interval the timestamp won't be aligned. `eval_delay` and `eval_offset` was added to API output. `PreviouslySentSeriesToRW` converted to private `previouslySentSeriesToRW`. Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `a216fe6728`)	2023-10-25 14:39:49 +02:00
Hui Wang	86d861ec55	vmalert: add `-rule.evalDelay` flag and `eval_delay` as group attribute (#5185 ) Also mark `-datasource.lookback` as will be deprecated, see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5155. (cherry picked from commit `c9375cac5e`)	2023-10-25 14:39:49 +02:00
Haleygo	130e0ea5f0	vmalert-tool: implement unittest (#4789 ) 1. split package rule under /app/vmalert, expose needed objects 2. add vmalert-tool with unittest subcmd https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2945	2023-10-16 14:12:06 +02:00

46 commits