github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-11-21 14:44:00 +00:00

Author	SHA1	Message	Date
Hui Wang	71f521fc0c	vmalert: revert the default value of `-remoteWrite.maxQueueSize` from… (#7570 ) … `1_000_000` to `100_000` It was bumped in [v1.104.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.104.0), which increases memory usage and is not needed for most setups. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7471).	2024-11-20 16:20:51 +01:00
Roman Khavronenko	0204ce942d	app/vmalert: update `-remoteWrite.concurrency` and `-remoteWrite.flushInterval` (#7272 ) Auto-adjust `-remoteWrite.concurrency` cmd-line flags with the number of available CPU cores in the same way as vmagent does. With this change the default behavior of vmalert in high-loaded installation should become more resilient. This change also reduces `-remoteWrite.flushInterval` from `5s` to `2s` to provide better data freshness. --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2024-10-22 14:43:55 +02:00
Artem Fetishev	e2c73dc89f	app/(vmagent,vmalert)/remotewrite/client: Fix flag docs (#7198 ) ### Describe Your Changes The flags docs mention the flag that does not exist (and never existed). Perhaps that was a typo. `s/retryMaxInterval/retryMaxTime/g` ### Checklist The following checks are mandatory: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>	2024-10-08 13:14:38 +02:00
Roman Khavronenko	6b1b47df54	app/vmalert: bump default values for sending data to `remoteWrite.url` (#7084 ) * `remoteWrite.maxQueueSize` from `100_000` to `1_000_000`, this should improve resiliency of recording rules that produce many series; * `remoteWrite.maxBatchSize` from `1_000` to `10_000`, this should be more efficient to send from netwroking perspective; * `remoteWrite.concurrency` from `1` to `4`, this should imrpove speed of sending the generated series. The new settings should improve remote write performance of vmalert with default settings. ### Describe Your Changes Please provide a brief description of the changes you made. Be as specific as possible to help others understand the purpose and impact of your modifications. ### Checklist The following checks are mandatory: - [ ] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/). --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Hui Wang <haley@victoriametrics.com>	2024-09-25 15:01:39 +02:00
Roman Khavronenko	e58dde6925	lib/httputils: parse URL before creating HTTP transport (#6820 ) https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6740 --------- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-08-16 11:32:04 +02:00
Aliaksandr Valialkin	a468a6e985	lib/{httputils,netutil}: move httputils.GetStatDialFunc to netutil.NewStatDialFunc - Rename GetStatDialFunc to NewStatDialFunc, since it returns new function with every call - NewStatDialFunc isn't related to http in any way, so it must be moved from lib/httputils to lib/netutil - Simplify the implementation of NewStatDialFunc by removing sync.Map from there. - Use netutil.NewStatDialFunc at app/vmauth and lib/promscrape/discoveryutils - Use gauge instead of counter type for *_conns metric This is a follow-up for `d7b5062917` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6299	2024-07-15 23:02:34 +02:00
Aliaksandr Valialkin	0078399788	app/vmalert: switch from table-driven tests to f-tests This makes test code more clear and reduces the number of code lines by 500. This also simplifies debugging tests. See https://itnext.io/f-tests-as-a-replacement-for-table-driven-tests-in-go-8814a8b19e9e While at it, consistently use t.Fatal* instead of t.Error* across tests, since t.Error* requires more boilerplate code, which can result in additional bugs inside tests. While t.Error* allows writing logging errors for the same, this doesn't simplify fixing broken tests most of the time. This is a follow-up for `a9525da8a4`	2024-07-12 22:41:11 +02:00
hagen1778	4ef76eed7b	app/vmalert: follow-up `bc37b279aa` * rm extra interface method for rw Client, as it has low applicability and doesn't fit multitenancy well * add `GetDroppedRows` method instead Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-06-20 15:12:53 +02:00
Hui Wang	bc37b279aa	vmalert: exit replay mode with non-zero code if generated samples are… (#6513 ) … not successfully written into remoteWrite url address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6512	2024-06-20 13:20:40 +02:00
Nikolay	b97916276f	app/vmalert: adds idleConnTimeout flags and retry trivial network errors (#6382 ) * ".idleConnTimeout" flags must reduce probability of `write: broken pipe` and `read: connection reset by peer` errors Those errors may occur if remote server closes TCP socket for connection, while it's still exist at client. single time retries for `write: broken pipe` and `read: connection reset by peer` must handle a case for incorrectly configured timeouts at middleware proxies, mitigate minor network issues. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5661 ### Describe Your Changes Please provide a brief description of the changes you made. Be as specific as possible to help others understand the purpose and impact of your modifications. --------- Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>	2024-05-30 17:54:42 +02:00
Hui Wang	d7b5062917	app/vmalert: support DNS SRV record in `-remoteWrite.url` (#6299 ) part of https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6053, supports [DNS SRV](https://en.wikipedia.org/wiki/SRV_record) address in `-remoteWrite.url` command-line option.	2024-05-22 10:52:51 +02:00
Jiekun	623d257faf	app/vmalert: respect batch size limit for remote write on shutdown (#6039 ) During shutdown period of vmalert, remotewrite client retrieve all pending time series from buffer queue, compose them into 1 batch and execute remote write. This final batch may exceed the limit of -remoteWrite.maxBatchSize, and be rejected by the receiver (gateway, vmcluster or others). This changes ensures that even during shutdown vmalert won't exceed the max batch size limit for remote write destination. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6025	2024-03-29 14:27:50 +01:00
Aliaksandr Valialkin	e22836c636	app/{vmalert,vmctl}: consistently use http.NewRequestWithContext() instead of http.NewRequest() + req.WithContext()	2024-02-29 15:25:43 +02:00
Aliaksandr Valialkin	6697da73e5	app: consistently use atomic.* types instead of atomic.* functions See `ea9e2b19a5`	2024-02-24 02:44:24 +02:00
Khushi Jain	83e55456e2	app/vmbackup: support client-side TLS configuration for create/delete snapshot API (#5738 )	2024-02-08 15:52:00 +01:00
Aliaksandr Valialkin	d2c94a0663	lib/prompbmarshal: switch to github.com/VictoriaMetrics/easyproto	2024-01-14 23:04:45 +02:00
Aliaksandr Valialkin	a47127c1a6	app/vmalert/remotewrite: properly calculate vmalert_remotewrite_dropped_rows_total It was calculating the number of dropped time series instead of the number of dropped samples. While at it, drop vmalert_remotewrite_dropped_bytes_total metric, since it was inconsistently calculated - at one place it was calculating raw protobuf-encoded sample sizes, while at another place it was calculating the size of snappy-compressed prompbmarshal.WriteRequest protobuf message. Additionally, this metric has zero practical sense, so just drop it in order to reduce the level of confusion.	2024-01-14 22:55:11 +02:00
Aliaksandr Valialkin	c005245741	lib/prompb: switch to github.com/VictoriaMetrics/easyproto	2024-01-14 22:46:06 +02:00
Aliaksandr Valialkin	160cc9debd	app/{vmagent,vmalert}: add the ability to set OAuth2 endpoint params via the corresponding *.oauth2.endpointParams command-line flags This is a follow-up for `5ebd5a0d7b` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5427	2023-12-20 21:35:28 +02:00
Roman Khavronenko	bffd30b57a	app/vmalert: update remote-write process (#5284 ) * app/vmalert: update remote-write process * automatically retry remote-write requests on closed connections. The change should reduce the amount of logs produced in environments with short-living connections or environments without support of keep-alive on network balancers. * increment `vmalert_remotewrite_errors_total` metric if all retries to send remote-write request failed. Before, this metric was incremented only if remote-write client's buffer is overloaded. * increment `vmalert_remotewrite_dropped_rows_total` amd `vmalert_remotewrite_dropped_bytes_total` metrics if remote-write client's buffer is overloaded. Before, these metrics were incremented only after unsuccessful HTTP calls. Signed-off-by: hagen1778 <roman@victoriametrics.com> * Update docs/CHANGELOG.md --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Hui Wang <haley@victoriametrics.com>	2023-11-08 14:53:07 +08:00
hagen1778	c07dc45786	app/vmalert: fix typo in `remoteWrite.concurrency` description Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-11-03 22:04:50 +01:00
Aliaksandr Valialkin	d5a599badc	lib/promauth: follow-up for `e16d3f5639` - Make sure that invalid/missing TLS CA file or TLS client certificate files at vmagent startup don't prevent from processing the corresponding scrape targets after the file becomes correct, without the need to restart vmagent. Previously scrape targets with invalid TLS CA file or TLS client certificate files were permanently dropped after the first attempt to initialize them, and they didn't appear until the next vmagent reload or the next change in other places of the loaded scrape configs. - Make sure that TLS CA is properly re-loaded from file after it changes without the need to restart vmagent. Previously the old TLS CA was used until vmagent restart. - Properly handle errors during http request creation for the second attempt to send data to remote system at vmagent and vmalert. Previously failed request creation could result in nil pointer dereferencing, since the returned request is nil on error. - Add more context to the logged error during AWS sigv4 request signing before sending the data to -remoteWrite.url at vmagent. Previously it could miss details on the source of the request. - Do not create a new HTTP client per second when generating OAuth2 token needed to put in Authorization header of every http request issued by vmagent during service discovery or target scraping. Re-use the HTTP client instead until the corresponding scrape config changes. - Cache error at lib/promauth.Config.GetAuthHeader() in the same way as the auth header is cached, e.g. the error is cached for a second now. This should reduce load on CPU and OAuth2 server when auth header cannot be obtained because of temporary error. - Share tls.Config.GetClientCertificate function among multiple scrape targets with the same tls_config. Cache the loaded certificate and the error for one second. This should significantly reduce CPU load when scraping big number of targets with the same tls_config. - Allow loading TLS certificates from HTTP and HTTPs urls by specifying these urls at `tls_config->cert_file` and `tls_config->key_file`. - Improve test coverage at lib/promauth - Skip unreachable or invalid files specified at `scrape_config_files` during vmagent startup, since these files may become valid later. Previously vmagent was exitting in this case. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4959	2023-10-25 23:19:37 +02:00
Hui Wang	e16d3f5639	fix inconsistent behaviors with prometheus when scraping (#5153 ) * fix inconsistent behaviors with prometheus when scraping 1. address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4959. skip job with wrong syntax in `scrape_configs` with error logs instead of exiting; 2. show error messages on vmagent /targets ui if there are wrong auth configs in `scrape_configs`, previously will print error logs and do scrape without auth header; 3. don't send requests if there are wrong auth configs in: 1. vmagent remoteWrite; 2. vmalert datasource/remoteRead/remoteWrite/notifier. * add changelogs * address review comments * fix ut	2023-10-17 17:58:19 +08:00
Haleygo	dc28196237	vmalert-tool: implement unittest (#4789 ) 1. split package rule under /app/vmalert, expose needed objects 2. add vmalert-tool with unittest subcmd https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2945	2023-10-13 13:54:33 +02:00
Abirdcfly	835c03fb47	vmalert: fix `vmalert_remotewrite_send_duration_seconds_total` metric value (#4801 ) The deferred call's arguments are evaluated immediately, but the function call is not executed until the surrounding function returns. Signed-off-by: Abirdcfly <fp544037857@gmail.com>	2023-08-10 10:51:44 +08:00
Roman Khavronenko	9f1b9b86cc	vmalert: revert unittest feature (#4734 ) * Revert "vmalert: unittest support stale datapoint (#4696)" This reverts commit `0b44df7ec8`. * Revert "docs: specify min version and limitations for vmalert's unit tests" This reverts commit `a24541bd` Signed-off-by: hagen1778 <roman@victoriametrics.com> * Revert "vmalert: init unit test (#4596)" This reverts commit `da60a68d` Signed-off-by: hagen1778 <roman@victoriametrics.com> * docs: mention unittest revert in changelog Signed-off-by: hagen1778 <roman@victoriametrics.com> --------- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-07-28 10:42:02 +02:00
Haleygo	da60a68d09	vmalert: init unit test (#4596 ) vmalert: support unit tests See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2945 --------- Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: hagen1778 <roman@victoriametrics.com>	2023-07-20 15:07:10 +02:00
Haleygo	a97887a2d9	vmalert: add `vmalert_remotewrite_sent_duration_seconds_total` metric (#4517 ) add `vmalert_remotewrite_sent_duration_seconds_total` metric	2023-06-26 07:34:51 +02:00
Roman Khavronenko	37c9a631ca	vmalert: make linter happy (#4509 ) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-06-22 17:46:12 +02:00
Roman Khavronenko	5f9ad22884	vmalert: update retry policy for pushing data to `-remoteWrite.url` (#4504 ) By default, vmalert will make multiple retry attempts with exponential delay. The total time spent during retry attempts shouldn't exceed `-remoteWrite.retryMaxTime` (default is 30s). When retry time is exceeded vmalert drops the data dedicated for `-remoteWrite.url`. Before, vmalert dropped data after 5 retry attempts with 1s delay between attempts (not configurable). See `-remoteWrite.retryMinInterval` and `-remoteWrite.retryMaxTime` cmd-line flags. Signed-off-by: hagen1778 <roman@victoriametrics.com> Co-authored-by: Nikolay <nik@victoriametrics.com>	2023-06-22 15:14:23 +02:00
Roman Khavronenko	4aad7a43df	vmalert: properly interrupt remotewrite retries on shutdown (#4505 ) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-06-22 15:07:32 +02:00
Roman Khavronenko	79a5499cb2	vmalert: retry all errors except 4XX status codes (#4461 ) vmalert: retry all errors except 4XX status codes Retry all errors except 4XX status codes while pushing via remote-write to the remote storage. Previously, errors like broken connection could prevent vmalert from retrying the request. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-06-20 13:24:45 +02:00
Roman Khavronenko	f68d93cca2	vmalert: follow-up after `669becd011` (#4318 ) * vmalert: follow-up after `669becd011` Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: follow-up after `669becd011` Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: follow-up after `669becd011` Signed-off-by: hagen1778 <roman@victoriametrics.com> --------- Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-05-16 18:51:38 +02:00
Michael Hoffmann	3a65f4a733	vmalert: improve retry logic for remote write (#4134 ) vmalert should not retry on 4xx status codes according to https://prometheus.io/docs/concepts/remote_write_spec/	2023-05-16 16:30:03 +02:00
Alexander Marshalov	2e494e2375	fixed typos in documentation and commandline flags descriptions (#4275 )	2023-05-10 09:50:41 +02:00
Roman Khavronenko	8fdd613f25	Vmalert tests (#3975 ) * vmalert: add tests for notifier pkg * vmalert: add tests for remotewrite pkg * vmalert: add tests for template functions * vmalert: add tests for web pages * vmalert: fix int overflow in tests Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-03-17 15:57:24 +01:00
my-git9	9dec3c8f80	chore: Use http constants to replace numbers (#3846 ) Signed-off-by: xin.li <xin.li@daocloud.io>	2023-02-22 18:53:05 -08:00
Aliaksandr Valialkin	f2b40dbe9a	app/vmalert: use consistent randomizer in tests Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3683	2023-01-23 19:25:10 -08:00
Zakhar Bessarab	30fea30685	app/vmalert: add `remoteWrite.sendTimeout` command-line flag to configure timeout for sending data to `remoteWrite.url` (#3423 ) * app/vmalert: add `remoteWrite.sendTimeout` command-line flag to configure timeout for sending data to `remoteWrite.url` * vmalert: remove WriteTimeout from clients Cfg No need to have it as a part of configuration struct: * the client isn't used by other packages; * there are no internal tests to check the WriteTimeout. * vmalert: remove DisablePathAppend from clients Cfg No need to have it as a part of configuration struct: * the client isn't used by other packages; * there are no internal tests to check the DisablePathAppend. Co-authored-by: hagen1778 <roman@victoriametrics.com>	2022-12-01 09:57:19 +01:00
Roman Khavronenko	4e0ea95f26	vmalert: lower severity level for RW retries (#3237 ) The message about dropped data still remains at `error` level. The change supposed to make log message more clear about how serious it is. Signed-off-by: hagen1778 <roman@victoriametrics.com> Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-10-18 14:27:20 +02:00
Aliaksandr Valialkin	1f89278d88	all: subsitute ioutil.ReadAll with io.ReadAll ioutil.ReadAll is deprecated since Go1.16 - see https://tip.golang.org/doc/go1.16#ioutil VictoriaMetrics requires at least Go1.18, so it is OK to switch from ioutil.ReadAll to io.ReadAll. This is a follow-up for `02ca2342ab`	2022-08-22 00:16:37 +03:00
Roman Khavronenko	fa51c76ef9	vmalert: follow-up after `28441711e6` (#2972 ) Signed-off-by: hagen1778 <roman@victoriametrics.com> Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-08-11 13:30:32 +02:00
Matthew Blewitt	28441711e6	vmalert: mark some url flags as sensitive (#2965 ) Other components, such as `vmagent`, mark these flags as sensitive and hide them from the `/metrics` endpoint by default. This commit adds similar handling to the `vmalert` component, hiding them by default, to prevent logging of secrets inappropriately. Showing of these values is controlled by an additional flag. Follow up to https://github.com/VictoriaMetrics/VictoriaMetrics/pull/2947	2022-08-11 09:56:40 +02:00
Roman Khavronenko	70a822f3a0	vmalert: allow configuring custom headers for URLs (#2897 ) See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2860 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-07-21 13:57:53 +02:00
Aliaksandr Valialkin	1c4f67c5d2	lib/promauth: add ability to send additional http headers in requests to scrape targets This solves https://stackoverflow.com/questions/66032498/prometheus-scrape-metric-with-custom-header	2022-06-22 20:39:43 +03:00
Wataru Manji	99dbe7f9d4	Add remote-write headers (#2701 ) Co-authored-by: Wataru Manji <wataru.manji@linecorp.com>	2022-06-13 09:59:03 +03:00
Aliaksandr Valialkin	a93deb307f	docs/CHANGELOG.md: document https://github.com/VictoriaMetrics/VictoriaMetrics/pull/2685	2022-06-07 15:39:13 +03:00
Wataru Manji	6564dc6c16	add Content-Encoding Header (#2685 ) Co-authored-by: Wataru Manji <wataru.manji@linecorp.com>	2022-06-07 15:33:21 +03:00
Aliaksandr Valialkin	c448d2fcbb	app/vmalert: apply `-remoteRead.disablePathAppend` to `-datasource.url` in the same way as for the `-remoteRead.url` This is a follow-up for `0e2486df56` The related pull requests: - https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1536 - https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1712	2022-05-13 16:44:43 +03:00
Roman Khavronenko	0989649ad0	Vmalert compliance 2 (#2340 ) * vmalert: split alert's `Start` field into `ActiveAt` and `Start` The `ActiveAt` field identifies when alert becomes active for rules with `for > 0`. Previously, this value was stored in field `Start`. The field `Start` now identifies the moment alert became `FIRING`. The split is needed in order to distinguish these two moments in the API responses for alerts. Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: support specific moment of time for rules evaluation The Querier interface was extended to accept a new argument used as a timestamp at which evaluation should be made. It is needed to align rules execution time within the group. Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: mark disappeared series as stale Series generated by alerting rules, which were sent to remote write now will be marked as stale if they will disappear on the next evaluation. This would make ALERTS and ALERTS_FOR_TIME series more precise. Signed-off-by: hagen1778 <roman@victoriametrics.com> * wip Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: evaluate rules at fixed timestamp Before, time at which rules were evaluated was calculated right before rule execution. The change makes sure that timestamp is calculated only once per evalution round and all rules are using the same timestamp. It also updates the logic of resending of already resolved alert notification. Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: allow overridin `alertname` label value if it is present in response Previously, `alertname` was always equal to the Alerting Rule name. Now, its value can be overriden if series in response containt the different value for this label. The change is needed for improving compatibility with Prometheus. Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: align rules evaluation in time Now, evaluation timestamp for rules evaluates as if there was no delay in rules evaluation. It means, that rules will be evaluated at fixed timestamps+group_interval. This way provides more consistent evaluation results and improves compatibility with Prometheus, Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: add metric for missed iterations New metric `vmalert_iteration_missed_total` will show whether rules evaluation round was missed. Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: reduce delay before the initial rule evaluation in group Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: rollback alertname override According to the spec: ``` The alert name from the alerting rule (HighRequestLatency from the example above) MUST be added to the labels of the alert with the label name as alertname. It MUST override any existing alertname label. ``` https://github.com/prometheus/compliance/blob/main/alert_generator/specification.md#step-3 Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: throw err immediately on dedup detection ``` The execution of an alerting rule MUST error out immediately and MUST NOT send any alerts or add samples to samples receiver if there is more than one alert with the same labels ``` https://github.com/prometheus/compliance/blob/main/alert_generator/specification.md#step-4 Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: cleanup Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmalert: use strings builder to reduce allocs Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-03-29 15:09:07 +02:00

1 2

71 commits