github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-11-21 14:44:00 +00:00

Author	SHA1	Message	Date
Jiekun	623d257faf	app/vmalert: respect batch size limit for remote write on shutdown (#6039 ) During shutdown period of vmalert, remotewrite client retrieve all pending time series from buffer queue, compose them into 1 batch and execute remote write. This final batch may exceed the limit of -remoteWrite.maxBatchSize, and be rejected by the receiver (gateway, vmcluster or others). This changes ensures that even during shutdown vmalert won't exceed the max batch size limit for remote write destination. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6025	2024-03-29 14:27:50 +01:00
Hui Wang	d7224b2d1c	vmalert: fix sending alert messages (#6028 ) * vmalert: fix sending alert messages 1. fix `endsAt` field in messages that send to alertmanager, previously rule with small interval could never be triggered; 2. fix behavior of `-rule.resendDelay`, before it could prevent sending firing message when rule state is volatile. * docs: update changelog notes Signed-off-by: hagen1778 <roman@victoriametrics.com> --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-03-28 08:55:10 +01:00
Hui Wang	e80b44f19d	vmalert: deprecate cmd-line flag `-datasource.lookback` (#5877 ) * vmalert: deprecate cmd-line flag `-datasource.lookback` * fix lint * review fixes Signed-off-by: hagen1778 <roman@victoriametrics.com> --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2024-03-12 16:16:50 +01:00
Tien M. Nguyen	f5115c8f1b	feat: include cluster info in alert CPUThrottlingHigh (#5956 )	2024-03-12 14:51:32 +04:00
Aliaksandr Valialkin	e22836c636	app/{vmalert,vmctl}: consistently use http.NewRequestWithContext() instead of http.NewRequest() + req.WithContext()	2024-02-29 15:25:43 +02:00
Aliaksandr Valialkin	6697da73e5	app: consistently use atomic.* types instead of atomic.* functions See `ea9e2b19a5`	2024-02-24 02:44:24 +02:00
hagen1778	e2dad3a2ac	app/vmalert: consistently sort groups by name and filename on `/groups` page This should prevent non-deterministic sorting for groups with identical names. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-02-20 13:50:57 +01:00
hagen1778	11b03d9fc8	app/vmalert: follow-up after `b60dcbe11f` * support case-insensitive search * reflect search condition in URL, so link can be sharable * support filtering on /alerts page * fix collapseAll/expandAll logic to respect only shown entries * add changelog `b60dcbe11f` Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-02-20 13:07:05 +01:00
Victor Amorim dos Santos	b60dcbe11f	vmalert: add filter by group or rule name to UI (#5791 ) Co-authored-by: Yury Molodov <yurymolodov@gmail.com>	2024-02-20 12:31:41 +01:00
Roman Khavronenko	8850c7431d	app/vmalert: support filtering for /api/v1/rule like Prometheus does (#5787 ) Follow-up after `62e5e2a4c8` Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-02-09 14:35:31 +01:00
Victor Amorim dos Santos	62e5e2a4c8	app/vmalert: support `type` param for filtering /api/v1/rules response by rule type (#5749 ) Co-authored-by: Hui Wang <haley@victoriametrics.com>	2024-02-09 09:02:35 +01:00
Aliaksandr Valialkin	ae8a867924	all: add support for specifying multiple -httpListenAddr options	2024-02-09 03:15:04 +02:00
Khushi Jain	83e55456e2	app/vmbackup: support client-side TLS configuration for create/delete snapshot API (#5738 )	2024-02-08 15:52:00 +01:00
Roman Khavronenko	24eb1ad0c8	vmalert: set `ActiveAt` to evaluation timestamp in `newAlert` fn (#5657 ) The change fixes flaky test `TestAlertingRule_Exec` which has dependency on the actual timestamps, which resulted into inaccurate test states: https://github.com/VictoriaMetrics/VictoriaMetrics/actions/runs/7608452967/job/20717699688 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-01-29 12:02:02 +01:00
Roman Khavronenko	df59ac7f0e	app/vmalert: fix data race during hot-config reload (#5698 ) * app/vmalert: fix data race during hot-config reload During hot-reload, the logic evokes the group update and rules evaluation interruption simultaneously. Falsely assuming that interruption happens before the update. However, it could happen that group will be updated first and only after the rules evaluation will be cancelled. Which will result in permanent interruption for all rules within the group. The fix caches the cancel context function into local variable first. And only after performs the group update. With cached cancel function we can safely call it without worrying that we cancel the evaluation for already updated group. Signed-off-by: hagen1778 <roman@victoriametrics.com> * Revert "app/vmalert: fix data race during hot-config reload" This reverts commit `a4bb7e8932`. * app/vmalert: fix data race during hot-config reload During hot-reload, the logic evokes the group update and rules evaluation interruption simultaneously. Falsely assuming that interruption happens before the update. However, it could happen that group will be updated first and only after the rules evaluation will be cancelled. Which will result in permanent interruption for all rules within the group. The fix cancels the evaulation context before applying the update, making sure that the context will be cancelled for old group always. Signed-off-by: hagen1778 <roman@victoriametrics.com> * wip Signed-off-by: hagen1778 <roman@victoriametrics.com> --------- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-01-26 22:42:21 +01:00
Roman Khavronenko	b11f4ef5ea	app/vmalert: autogenerate `ALERTS_FOR_STATE` time series for alerting rules with `for: 0` (#5680 ) * app/vmalert: autogenerate `ALERTS_FOR_STATE` time series for alerting rules with `for: 0` Previously, `ALERTS_FOR_STATE` was generated only for alerts with `for > 0`. This behavior differs from Prometheus behavior - it generates ALERTS_FOR_STATE time series for alerting rules with `for: 0` as well. Such time series can be useful for tracking the moment when alerting rule became active. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5648 https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3056 Signed-off-by: hagen1778 <roman@victoriametrics.com> * app/vmalert: support ALERTS_FOR_STATE in `replay` mode Signed-off-by: hagen1778 <roman@victoriametrics.com> --------- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-01-25 15:42:57 +01:00
hagen1778	da556cc329	docs: fix Grafana link example for vmalert Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-01-22 09:35:18 +01:00
Aliaksandr Valialkin	1f105dde98	all: allow dynamically reading *AuthKey flag values from files and urls Examples: 1) -metricsAuthKey=file:///abs/path/to/file - reads flag value from the given absolute filepath 2) -metricsAuthKey=file://./relative/path/to/file - reads flag value from the given relative filepath 3) -metricsAuthKey=http://some-host/some/path?query_arg=abc - reads flag value from the given url The flag value is automatically updated when the file contents changes.	2024-01-21 22:03:38 +02:00
Aliaksandr Valialkin	be509b3995	lib/pushmetrics: wait until the background goroutines, which push metrics, are stopped at pushmetrics.Stop() Previously the was a race condition when the background goroutine still could try collecting metrics from already stopped resources after returning from pushmetrics.Stop(). Now the pushmetrics.Stop() waits until the background goroutine is stopped before returning. This is a follow-up for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5549 and the commit `fe2d9f6646` . Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5548	2024-01-15 13:50:36 +02:00
Aliaksandr Valialkin	d2c94a0663	lib/prompbmarshal: switch to github.com/VictoriaMetrics/easyproto	2024-01-14 23:04:45 +02:00
Aliaksandr Valialkin	a47127c1a6	app/vmalert/remotewrite: properly calculate vmalert_remotewrite_dropped_rows_total It was calculating the number of dropped time series instead of the number of dropped samples. While at it, drop vmalert_remotewrite_dropped_bytes_total metric, since it was inconsistently calculated - at one place it was calculating raw protobuf-encoded sample sizes, while at another place it was calculating the size of snappy-compressed prompbmarshal.WriteRequest protobuf message. Additionally, this metric has zero practical sense, so just drop it in order to reduce the level of confusion.	2024-01-14 22:55:11 +02:00
Aliaksandr Valialkin	c005245741	lib/prompb: switch to github.com/VictoriaMetrics/easyproto	2024-01-14 22:46:06 +02:00
hagen1778	91ccea236f	app/all: follow-up after `84d710beab` https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5548 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-01-09 13:34:54 +01:00
Hui Wang	1f477aba41	vmalert: automatically add `exported_` prefix for original evaluation… (#5398 ) automatically add `exported_` prefix for original evaluation result label if it's conflicted with external or reserved one, previously it was overridden. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5161 Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2023-12-22 16:07:47 +01:00
Aliaksandr Valialkin	160cc9debd	app/{vmagent,vmalert}: add the ability to set OAuth2 endpoint params via the corresponding *.oauth2.endpointParams command-line flags This is a follow-up for `5ebd5a0d7b` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5427	2023-12-20 21:35:28 +02:00
Aliaksandr Valialkin	5a88bc973f	all: use Gauge instead of Counter for `*_config_last_reload_successful` metrics This allows exposing the correct TYPE metadata for these labels when the app runs with -metrics.exposeMetadata command-line flag. See https://github.com/VictoriaMetrics/metrics/pull/61#issuecomment-1860085508 for more details. This is follow-up for `326a77c697`	2023-12-20 14:23:42 +02:00
Hui Wang	9253c24dd6	vmalert: validate schema for `-external.url` (#5450 ) Requests with wrong or no schema in `-external.url` could be rejected by alertmanager. So we validate schema on start up.	2023-12-15 11:13:56 +01:00
Aliaksandr Valialkin	b1fed78e0b	app: make more clear that -tls enables https at -httpListenAddr	2023-12-10 00:25:01 +02:00
Roman Khavronenko	74b09ab4de	app/vmalert: sanitize label names before sending to Alertmanager (#5442 ) Before, vmalert would send notifications with labels containing characters not supported by Alertmanager validator, resulting into validation errors like `msg="Failed to validate alerts" err="invalid label set: invalid name "foo.bar"` Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-12-08 16:53:35 +03:00
Dmytro Kozlov	935bec447b	app/vmalert: replace error metrics for gauges with counter metrics (#5217 ) See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5160 Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2023-12-06 19:39:35 +01:00
Dmytro Kozlov	a28cc6ebec	app/vmalert: expose `/vmalert/api/v1/rule` and `/api/v1/rule` API which returns rule status in JSON format (#5397 ) * app/vmalert: expose `/vmalert/api/v1/rule` and `/api/v1/rule` API which returns rule status in JSON format * app/vmalert: hide updates if query param not set * app/vmalert: fix panic (recursion call) * app/vmalert: add needed group name and file name * app/vmalert: fix comment, update behavior * app/vmalert: fix description * app/vmalert: simplify API for /api/v1/rule Signed-off-by: hagen1778 <roman@victoriametrics.com> * app/vmalert: simplify API for /api/v1/rule Signed-off-by: hagen1778 <roman@victoriametrics.com> * app/vmalert: simplify API for /api/v1/rule Signed-off-by: hagen1778 <roman@victoriametrics.com> * app/vmalert: simplify API for /api/v1/rule Signed-off-by: hagen1778 <roman@victoriametrics.com> * app/vmalert: simplify API for /api/v1/rule Signed-off-by: hagen1778 <roman@victoriametrics.com> --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2023-12-04 18:40:33 +03:00
Aliaksandr Valialkin	348482c575	app/vmalert/notifier: remove backticks from the description for -notifier.blackhole command-line flag Backticks in flag description are automatically converted to flag type. See https://pkg.go.dev/flag#PrintDefaults This is a follow-up for `20025d4fd6` and `25317b4e70`	2023-11-22 20:17:01 +02:00
Aliaksandr Valialkin	334a739ff6	docs: convert png images to webp in all the docs except of docs/operator/* This reduces the size of docs/* folder from 33MB to 18MB Images inside docs/operator/* must be converted at the https://github.com/VictoriaMetrics/operator/tree/master/docs and then the updated images must be automatically propagated to the docs/operator/* This is a follow-up for `d3f919df3e` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5206	2023-11-22 19:21:00 +02:00
hagen1778	20025d4fd6	docs: typo after `3f5a41e35e` Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-11-20 17:05:15 +01:00
Roman Khavronenko	8dfc874be3	docs/vmalert: clarify deduplication recommendations for HA setup (#5336 ) Please see discussion here https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5279 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-11-16 16:26:57 +01:00
hagen1778	feff13851c	docs: clarify vmalert flag changes Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-11-14 21:18:58 +01:00
Roman Khavronenko	bffd30b57a	app/vmalert: update remote-write process (#5284 ) * app/vmalert: update remote-write process * automatically retry remote-write requests on closed connections. The change should reduce the amount of logs produced in environments with short-living connections or environments without support of keep-alive on network balancers. * increment `vmalert_remotewrite_errors_total` metric if all retries to send remote-write request failed. Before, this metric was incremented only if remote-write client's buffer is overloaded. * increment `vmalert_remotewrite_dropped_rows_total` amd `vmalert_remotewrite_dropped_bytes_total` metrics if remote-write client's buffer is overloaded. Before, these metrics were incremented only after unsuccessful HTTP calls. Signed-off-by: hagen1778 <roman@victoriametrics.com> * Update docs/CHANGELOG.md --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Hui Wang <haley@victoriametrics.com>	2023-11-08 14:53:07 +08:00
hagen1778	c07dc45786	app/vmalert: fix typo in `remoteWrite.concurrency` description Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-11-03 22:04:50 +01:00
Aliaksandr Valialkin	815fda8995	docs: update -help output after recent changes to VictoriaMetrics components	2023-11-02 20:27:10 +01:00
Roman Khavronenko	b5254199c6	app/vmalert: add label `file` pointing to the group's filename to metrics (#5281 ) The filename should help identifying alerting rules belonging to specific groups with identical names but different filenames. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5267 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-11-02 16:01:31 +01:00
hagen1778	6eb205f8b0	app/vmalert: verify alert name correctness in restore test Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-11-02 15:28:39 +01:00
Hui Wang	90d45574bf	vmalert: reduce restore query request for each alerting rule (#5265 ) reduce the number of queries for restoring alerts state on start-up. The change should speed up the restore process and reduce pressure on `remoteRead.url`.	2023-11-02 15:22:13 +01:00
Hui Wang	e482eeff58	vmalert: support specifying full http url in notifier static_configs target (#5261 ) * vmalert: support specifying full http or https urls in notifier static_configs target address * show right label results in ui	2023-11-01 19:53:50 +08:00
Hui Wang	abcb21aa5e	vmalert: fix alert firing state in replay mode (#5192 ) fix possible missing firing states for alerting rules in replay mode Before if one firing stage is bigger than single query request range, like rule with a big `for`, alerting rule won't able to be detected as firing. Co-authored-by: hagen1778 <roman@victoriametrics.com>	2023-10-30 13:54:18 +01:00
Dima Lazerka	ad839aa492	lib/httpserver: add flags to specify HSTS / Frame-Options / CSP headers for httpserver (#5111 ) support `Strict-Transport-Security`, `Content-Security-Policy` and `X-Frame-Options` HTTP headers in all VictoriaMetrics components. The values for headers can be specified by users via the following flags: `-http.header.hsts`, `-http.header.csp` and `-http.header.frameOptions`. Co-authored-by: hagen1778 <roman@victoriametrics.com>	2023-10-30 11:33:38 +01:00
hagen1778	3aec7eb44f	app/vmalert: remove unclear comment The timestamp alignment should be applied as a last step to keep the timestamp consistent. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-10-26 15:41:35 +02:00
Aliaksandr Valialkin	d5a599badc	lib/promauth: follow-up for `e16d3f5639` - Make sure that invalid/missing TLS CA file or TLS client certificate files at vmagent startup don't prevent from processing the corresponding scrape targets after the file becomes correct, without the need to restart vmagent. Previously scrape targets with invalid TLS CA file or TLS client certificate files were permanently dropped after the first attempt to initialize them, and they didn't appear until the next vmagent reload or the next change in other places of the loaded scrape configs. - Make sure that TLS CA is properly re-loaded from file after it changes without the need to restart vmagent. Previously the old TLS CA was used until vmagent restart. - Properly handle errors during http request creation for the second attempt to send data to remote system at vmagent and vmalert. Previously failed request creation could result in nil pointer dereferencing, since the returned request is nil on error. - Add more context to the logged error during AWS sigv4 request signing before sending the data to -remoteWrite.url at vmagent. Previously it could miss details on the source of the request. - Do not create a new HTTP client per second when generating OAuth2 token needed to put in Authorization header of every http request issued by vmagent during service discovery or target scraping. Re-use the HTTP client instead until the corresponding scrape config changes. - Cache error at lib/promauth.Config.GetAuthHeader() in the same way as the auth header is cached, e.g. the error is cached for a second now. This should reduce load on CPU and OAuth2 server when auth header cannot be obtained because of temporary error. - Share tls.Config.GetClientCertificate function among multiple scrape targets with the same tls_config. Cache the loaded certificate and the error for one second. This should significantly reduce CPU load when scraping big number of targets with the same tls_config. - Allow loading TLS certificates from HTTP and HTTPs urls by specifying these urls at `tls_config->cert_file` and `tls_config->key_file`. - Improve test coverage at lib/promauth - Skip unreachable or invalid files specified at `scrape_config_files` during vmagent startup, since these files may become valid later. Previously vmagent was exitting in this case. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4959	2023-10-25 23:19:37 +02:00
Aliaksandr Valialkin	cb34d4440c	app/vmalert/config: fix flacky test TestParseBad It could return either `failed to read` or `failed to parse` errors depending on whether the given url can be loaded or not under the current environment	2023-10-25 21:30:55 +02:00
Aliaksandr Valialkin	42dd71bb63	all: consistently use %w instead of %s in when error is passed to fmt.Errorf() This allows consistently using errors.Is() for verifying whether the given error wraps some other known error.	2023-10-25 21:24:03 +02:00
hagen1778	c07909a20b	app/vmalert: fix typo in tests Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-10-25 16:28:27 +02:00

1 2 3 4 5 ...

565 commits