Commit graph

582 commits

Author SHA1 Message Date
Roman Khavronenko
562edb72ea
app/vmalert: fix data race during hot-config reload (#5698)
* app/vmalert: fix data race during hot-config reload

During hot-reload, the logic evokes the group update and rules evaluation
interruption simultaneously. Falsely assuming that interruption happens before
the update. However, it could happen that group will be updated first and only
after the rules evaluation will be cancelled. Which will result in permanent
interruption for all rules within the group.

The fix caches the cancel context function into local variable first. And only after
performs the group update. With cached cancel function we can safely call it without
worrying that we cancel the evaluation for already updated group.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Revert "app/vmalert: fix data race during hot-config reload"

This reverts commit a4bb7e8932.

* app/vmalert: fix data race during hot-config reload

During hot-reload, the logic evokes the group update and rules evaluation
interruption simultaneously. Falsely assuming that interruption happens before
the update. However, it could happen that group will be updated first and only
after the rules evaluation will be cancelled. Which will result in permanent
interruption for all rules within the group.

The fix cancels the evaulation context before applying the update, making sure
that the context will be cancelled for old group always.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-26 22:43:02 +01:00
Roman Khavronenko
a2f83115ae
app/vmalert: autogenerate ALERTS_FOR_STATE time series for alerting rules with for: 0 (#5680)
* app/vmalert: autogenerate `ALERTS_FOR_STATE` time series for alerting rules with `for: 0`

 Previously, `ALERTS_FOR_STATE` was generated only for alerts with `for > 0`.
 This behavior differs from Prometheus behavior - it generates ALERTS_FOR_STATE
 time series for alerting rules with `for: 0` as well. Such time series can
 be useful for tracking the moment when alerting rule became active.

 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5648
 https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3056

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* app/vmalert: support ALERTS_FOR_STATE in `replay` mode

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-26 20:51:50 +01:00
hagen1778
ede466be56
docs: fix Grafana link example for vmalert
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-22 18:41:38 +02:00
Aliaksandr Valialkin
885ee160c2
all: allow dynamically reading *AuthKey flag values from files and urls
Examples:

1) -metricsAuthKey=file:///abs/path/to/file - reads flag value from the given absolute filepath
2) -metricsAuthKey=file://./relative/path/to/file - reads flag value from the given relative filepath
3) -metricsAuthKey=http://some-host/some/path?query_arg=abc - reads flag value from the given url

The flag value is automatically updated when the file contents changes.
2024-01-22 01:23:23 +02:00
Aliaksandr Valialkin
9e5e514faf
lib/pushmetrics: wait until the background goroutines, which push metrics, are stopped at pushmetrics.Stop()
Previously the was a race condition when the background goroutine still could try collecting metrics
from already stopped resources after returning from pushmetrics.Stop().
Now the pushmetrics.Stop() waits until the background goroutine is stopped before returning.

This is a follow-up for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5549
and the commit fe2d9f6646 .

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5548
2024-01-16 21:18:22 +02:00
Aliaksandr Valialkin
d566aa7d78
lib/prompbmarshal: switch to github.com/VictoriaMetrics/easyproto 2024-01-16 20:48:30 +02:00
Aliaksandr Valialkin
063ea8c773
app/vmalert/remotewrite: properly calculate vmalert_remotewrite_dropped_rows_total
It was calculating the number of dropped time series instead of the number of dropped samples.

While at it, drop vmalert_remotewrite_dropped_bytes_total metric, since it was inconsistently calculated -
at one place it was calculating raw protobuf-encoded sample sizes, while at another place it was calculating
the size of snappy-compressed prompbmarshal.WriteRequest protobuf message.
Additionally, this metric has zero practical sense, so just drop it in order to reduce the level of confusion.
2024-01-16 20:47:13 +02:00
Aliaksandr Valialkin
f7b589e38a
lib/prompb: switch to github.com/VictoriaMetrics/easyproto 2024-01-16 20:43:09 +02:00
hagen1778
2a7207f38a
app/all: follow-up after 84d710beab
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5548
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-09 13:17:09 +01:00
Hui Wang
c14e229b20
vmalert: automatically add exported_ prefix for original evaluation… (#5398)
automatically add `exported_` prefix for original evaluation result label if it's conflicted with external or reserved one,
previously it was overridden.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5161

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 1f477aba41)
2023-12-22 16:10:33 +01:00
Aliaksandr Valialkin
3a9cf13aaa
app/{vmagent,vmalert}: add the ability to set OAuth2 endpoint params via the corresponding *.oauth2.endpointParams command-line flags
This is a follow-up for 5ebd5a0d7b

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5427
2023-12-20 21:38:16 +02:00
Aliaksandr Valialkin
261c173f4b
all: use Gauge instead of Counter for *_config_last_reload_successful metrics
This allows exposing the correct TYPE metadata for these labels when the app runs with -metrics.exposeMetadata command-line flag.
See https://github.com/VictoriaMetrics/metrics/pull/61#issuecomment-1860085508 for more details.

This is follow-up for 326a77c697
2023-12-20 14:25:44 +02:00
Hui Wang
ed4f77575f
vmalert: validate schema for -external.url (#5450)
Requests with wrong or no schema in  `-external.url` could be rejected by alertmanager.
So we validate schema on start up.

(cherry picked from commit 9253c24dd6)
2023-12-15 11:54:07 +01:00
Aliaksandr Valialkin
55eb48f5ee
app: make more clear that -tls enables https at -httpListenAddr 2023-12-10 00:25:23 +02:00
Roman Khavronenko
276e9301f4
app/vmalert: sanitize label names before sending to Alertmanager (#5442)
Before, vmalert would send notifications with labels containing characters
  not supported by Alertmanager validator, resulting into validation errors
  like `msg="Failed to validate alerts" err="invalid label set: invalid name "foo.bar"`

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-12-08 18:09:07 +02:00
Dmytro Kozlov
6a41e1ec0c
app/vmalert: replace error metrics for gauges with counter metrics (#5217)
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5160

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 935bec447b)
2023-12-06 19:41:34 +01:00
Dmytro Kozlov
6770bad207
app/vmalert: expose /vmalert/api/v1/rule and /api/v1/rule API which returns rule status in JSON format (#5397)
* app/vmalert: expose `/vmalert/api/v1/rule` and `/api/v1/rule` API which returns rule status in JSON format

* app/vmalert: hide updates if query param not set

* app/vmalert: fix panic (recursion call)

* app/vmalert: add needed group name and file name

* app/vmalert: fix comment, update behavior

* app/vmalert: fix description

* app/vmalert: simplify API for /api/v1/rule

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* app/vmalert: simplify API for /api/v1/rule

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* app/vmalert: simplify API for /api/v1/rule

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* app/vmalert: simplify API for /api/v1/rule

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* app/vmalert: simplify API for /api/v1/rule

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2023-12-04 22:49:39 +02:00
Aliaksandr Valialkin
10b4dfbbf9
app/vmalert/notifier: remove backticks from the description for -notifier.blackhole command-line flag
Backticks in flag description are automatically converted to flag type. See https://pkg.go.dev/flag#PrintDefaults

This is a follow-up for 20025d4fd6 and 25317b4e70
2023-11-22 20:17:45 +02:00
Aliaksandr Valialkin
db6dadf1f7
docs: convert png images to webp in all the docs except of docs/operator/*
This reduces the size of docs/* folder from 33MB to 18MB

Images inside docs/operator/* must be converted at the https://github.com/VictoriaMetrics/operator/tree/master/docs
and then the updated images must be automatically propagated to the docs/operator/*

This is a follow-up for d3f919df3e

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5206
2023-11-22 19:29:47 +02:00
hagen1778
0dbbffbdd5
docs: typo after 3f5a41e35e
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 20025d4fd6)
2023-11-20 17:06:21 +01:00
Roman Khavronenko
c0039ce7a3
docs/vmalert: clarify deduplication recommendations for HA setup (#5336)
Please see discussion here https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5279

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-11-16 16:27:47 +01:00
hagen1778
cfc58dd932
docs: clarify vmalert flag changes
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-11-14 21:44:46 +01:00
Roman Khavronenko
becf7bf8df
app/vmalert: update remote-write process (#5284)
* app/vmalert: update remote-write process

* automatically retry remote-write requests on closed connections. The change should reduce the amount of logs produced in environments with short-living connections or environments without support of keep-alive on network balancers.
* increment `vmalert_remotewrite_errors_total` metric if all retries to send remote-write request failed. Before, this metric was incremented only if remote-write client's buffer is overloaded.
* increment `vmalert_remotewrite_dropped_rows_total` amd `vmalert_remotewrite_dropped_bytes_total` metrics if remote-write client's buffer is overloaded. Before, these metrics were incremented only after unsuccessful HTTP calls.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Update docs/CHANGELOG.md

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Hui Wang <haley@victoriametrics.com>
2023-11-13 09:25:29 +01:00
hagen1778
10da9e6e01
app/vmalert: fix typo in remoteWrite.concurrency description
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit c07dc45786)
2023-11-03 22:05:00 +01:00
Aliaksandr Valialkin
3d6f4da3b3
docs: update -help output after recent changes to VictoriaMetrics components 2023-11-02 20:27:16 +01:00
Roman Khavronenko
4e8c762fd9
app/vmalert: add label file pointing to the group's filename to metrics (#5281)
The filename should help identifying alerting rules belonging to specific groups
with identical names but different filenames.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5267

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit b5254199c6)
2023-11-02 16:02:29 +01:00
hagen1778
3773510e8f
app/vmalert: verify alert name correctness in restore test
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 6eb205f8b0)
2023-11-02 16:02:29 +01:00
Hui Wang
44fcdf0cf0
vmalert: reduce restore query request for each alerting rule (#5265)
reduce the number of queries for restoring alerts state on start-up.
The change should speed up the restore process and reduce pressure on `remoteRead.url`.

(cherry picked from commit 90d45574bf)
2023-11-02 16:02:28 +01:00
Hui Wang
4fafdda13e
vmalert: support specifying full http url in notifier static_configs target (#5261)
* vmalert: support specifying full http or https urls in notifier static_configs target address
* show right label results in ui
2023-11-01 16:44:54 +01:00
Hui Wang
8a786e5df4
vmalert: fix alert firing state in replay mode (#5192)
fix possible missing firing states for alerting rules in replay mode
Before if one firing stage is bigger than single query request range, like rule with a big `for`, alerting rule won't able to be detected as firing.

Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit abcb21aa5e)
2023-10-30 13:55:48 +01:00
Dima Lazerka
ed8fc04898
lib/httpserver: add flags to specify HSTS / Frame-Options / CSP headers for httpserver (#5111)
support `Strict-Transport-Security`, `Content-Security-Policy` and `X-Frame-Options`
HTTP headers in all VictoriaMetrics components.
The values for headers can be specified by users via the following flags:
`-http.header.hsts`, `-http.header.csp` and `-http.header.frameOptions`.

Co-authored-by: hagen1778 <roman@victoriametrics.com>

(cherry picked from commit ad839aa492)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-30 11:41:38 +01:00
hagen1778
ddedeb1d42
app/vmalert: remove unclear comment
The timestamp alignment should be applied as a last step
to keep the timestamp consistent.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-27 14:09:01 +02:00
Aliaksandr Valialkin
f03e81c693
lib/promauth: follow-up for e16d3f5639
- Make sure that invalid/missing TLS CA file or TLS client certificate files at vmagent startup
  don't prevent from processing the corresponding scrape targets after the file becomes correct,
  without the need to restart vmagent.
  Previously scrape targets with invalid TLS CA file or TLS client certificate files
  were permanently dropped after the first attempt to initialize them, and they didn't
  appear until the next vmagent reload or the next change in other places of the loaded scrape configs.

- Make sure that TLS CA is properly re-loaded from file after it changes without the need to restart vmagent.
  Previously the old TLS CA was used until vmagent restart.

- Properly handle errors during http request creation for the second attempt to send data to remote system
  at vmagent and vmalert. Previously failed request creation could result in nil pointer dereferencing,
  since the returned request is nil on error.

- Add more context to the logged error during AWS sigv4 request signing before sending the data to -remoteWrite.url at vmagent.
  Previously it could miss details on the source of the request.

- Do not create a new HTTP client per second when generating OAuth2 token needed to put in Authorization header
  of every http request issued by vmagent during service discovery or target scraping.
  Re-use the HTTP client instead until the corresponding scrape config changes.

- Cache error at lib/promauth.Config.GetAuthHeader() in the same way as the auth header is cached,
  e.g. the error is cached for a second now. This should reduce load on CPU and OAuth2 server
  when auth header cannot be obtained because of temporary error.

- Share tls.Config.GetClientCertificate function among multiple scrape targets with the same tls_config.
  Cache the loaded certificate and the error for one second. This should significantly reduce CPU load
  when scraping big number of targets with the same tls_config.

- Allow loading TLS certificates from HTTP and HTTPs urls by specifying these urls at `tls_config->cert_file` and `tls_config->key_file`.

- Improve test coverage at lib/promauth

- Skip unreachable or invalid files specified at `scrape_config_files` during vmagent startup, since these files may become valid later.
  Previously vmagent was exitting in this case.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4959
2023-10-26 09:55:47 +02:00
Aliaksandr Valialkin
19940b5629
app/vmalert/config: fix flacky test TestParseBad
It could return either `failed to read` or `failed to parse` errors depending
on whether the given url can be loaded or not under the current environment
2023-10-26 09:53:40 +02:00
Aliaksandr Valialkin
36a1fdca6c
all: consistently use %w instead of %s in when error is passed to fmt.Errorf()
This allows consistently using errors.Is() for verifying whether the given error wraps some other known error.
2023-10-26 09:44:40 +02:00
Aliaksandr Valialkin
94e061087f
docs: use https://github.com/VictoriaMetrics/VictoriaMetrics/releases/latest instead of https://github.com/VictoriaMetrics/VictoriaMetrics/releases link where needed
The https://github.com/VictoriaMetrics/VictoriaMetrics/releases link may show non-latest
releases at the top, such as LTS releases or VictoriaLogs releases.
So it is better to use https://github.com/VictoriaMetrics/VictoriaMetrics/releases/latest link,
which always redirect to the latest available release of VictoriaMetrics.
2023-10-26 09:23:17 +02:00
Hui Wang
d7dd7614eb
fix inconsistent behaviors with prometheus when scraping (#5153)
* fix inconsistent behaviors with prometheus when scraping

1. address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4959. skip job with wrong syntax in `scrape_configs` with error logs instead of exiting;
2. show error messages on vmagent /targets ui if there are wrong auth configs in `scrape_configs`, previously will print error logs and do scrape without auth header;
3. don't send requests if there are wrong auth configs in:
    1. vmagent remoteWrite;
    2. vmalert datasource/remoteRead/remoteWrite/notifier.

* add changelogs

* address review comments

* fix ut
2023-10-26 08:56:54 +02:00
hagen1778
f00729ee24
app/vmalert: fix typo in tests
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit c07909a20b)
2023-10-26 08:55:20 +02:00
hagen1778
cf541c757a
app/vmalert: fix tests after a216fe6728
a216fe6728
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit eed0c3c6b0)
2023-10-26 08:55:06 +02:00
hagen1778
6c63ca18f5
app/vmalert: follow-up after c9375cac5e
c9375cac5e

Descriptions were updated in attempt to make it more clear for readers,
re-phrasing and linking missing docs.

`eval_delay` was added to tests to verify it can be unmarshalled.

`eval_delay` is now applied before timestamp alignment to make it more predictable.
Before, if delay < interval the timestamp won't be aligned.

`eval_delay` and `eval_offset` was added to API output.

`PreviouslySentSeriesToRW` converted to private `previouslySentSeriesToRW`.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit a216fe6728)
2023-10-25 14:39:49 +02:00
Hui Wang
86d861ec55
vmalert: add -rule.evalDelay flag and eval_delay as group attribute (#5185)
Also mark `-datasource.lookback` as will be deprecated, see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5155.

(cherry picked from commit c9375cac5e)
2023-10-25 14:39:49 +02:00
Haleygo
130e0ea5f0
vmalert-tool: implement unittest (#4789)
1. split package rule under /app/vmalert, expose needed objects
2. add vmalert-tool with unittest subcmd

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2945
2023-10-16 14:12:06 +02:00
Aliaksandr Valialkin
dfc67aedcb
lib/license: cleanups and prettifications for log messages and docs related to licensing
- Make more clear the docs at docs/enterprise.md, so readers could figure out faster
  on how to obtain enterprise key and how to pass it to VictoriaMetrics Enterprise components.

- Fix examples at docs/enterprise.md, which were referring to non-existing `-license-file` command-line flag.
  The `-licenseFile` command-line flag must be used instead.

- Improve the description of `-license*` command-line flags, so users could understand
  faster how to use them.

- Improve the warning message, which is emitted when the deprecated -eula command-line flag is passed,
  so the user could figure out how to switch faster to -license* command-line flags.

- Disallow running VictoriaMetrics components with both -license and -licenseFile command-line flags.

- Disallow running VictoriaMetrics components when -licensFile points to an empty file.

- Consistently use the phrase "This flag is available only in Enterprise binaries" across
  all the enterprise-specific command-line flags.

- Remove unneeded level of indirection for `noLicenseMessage` and `expiredMessage` string contants
  in order to improve code readability and maintainability.

- Remove unneded `return` statements after `logger.Fatalf()` calls, since these calls exit the app and never return.

- Make sure that the info log message about successful license verification is emitted
  when the license is verified successfully. Previously the error message could be logged
  when the license payload is invalid or if it misses some required features.
2023-10-16 12:51:37 +02:00
Haleygo
b52f1d1f0a
vmalert: add evalAlignment for rule group and fix evalutaion timstamp (#5066)
* vmalert: add `query_time_alignment` for rule group

1. add `eval_alignment` attribute for group which by default is true. So group rule query stamp will be aligned with interval and propagated to ALERT metrics and the messages for alertmanager;
2. deprecate `datasource.queryTimeAlignment` flag.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5049
(cherry picked from commit 2aa0f5fc41)
2023-10-10 12:45:37 +02:00
Dmytro Kozlov
1cc6cd3d4f
app/vmalert: hide sensetive info in the vmalert (#5059)
Strip sensitive information such as auth headers or passwords from datasource, remote-read,
remote-write or notifier URLs in log messages or UI. This behavior is by default and is controlled via
`-datasource.showURL`, `-remoteRead.showURL`, `remoteWrite.showURL` or `-notifier.showURL` cmd-line flags.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5044
(cherry picked from commit 244c887825)
2023-10-10 12:45:36 +02:00
Artem Navoiev
65b2a0ce60
docs: update the license flags description
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

(cherry picked from commit b3cc22b159)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-10 12:45:06 +02:00
Aliaksandr Valialkin
5f1492d978
docs/vmalert.md: refer to -evaluationInterval command-line flag instead of evaluation_interval option, which isnt supported by vmalert
This is follow-up for 5c42c1218a
2023-10-02 20:32:02 +02:00
Zakhar Bessarab
b842a0cd25
docs: sync description for license flags (#4977)
- update eula flag to add deprecation notice
- add new license flags description

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-09-08 23:26:25 +02:00
Roman Khavronenko
548acce6b3
vmalert: correctly add duplicated params to the query (#4955)
Fix the bug when Group's `params` fields with multiple values were
overriding each other instead of adding up.
The bug was introduced in this commit eccecdf177
 starting from v1.91.1 https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.91.1

 https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4908

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 6351d07da8)
2023-09-08 09:33:45 +02:00
Aliaksandr Valialkin
0bc0d2610c
go.mod: increase the minimum supported Go version from Go1.19 to Go1.20 2023-09-07 12:18:12 +02:00
Haleygo
0212219f6c
vmalert: add eval_offset for group (#4693)
Adds `eval_offset` attribute for Groups.
If specified, Group will be evaluated at the exact time offset on the range of [0...evaluationInterval].
The setting might be useful for cron-like rules which must be evaluated at specific moments of time.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3409

Signed-off-by: Haley Wang <pipilong.25@gmail.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 45c0e4bb31)
2023-09-07 10:59:14 +02:00
Artem Navoiev
696f143eb5
use correct abbriviation for ESA legal doc
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-09-05 16:59:47 +02:00
Artem Navoiev
40c795a1e3
change link to the enterprise legal doc
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-09-05 16:58:51 +02:00
Aliaksandr Valialkin
d8afd7fe98
Makefile: update golangci-lint from v1.51.2 to v1.54.2
See https://github.com/golangci/golangci-lint/releases/tag/v1.54.2
2023-09-01 10:25:49 +02:00
Roman Khavronenko
68150655d2
vmalert: correctly re-instantinate HTTP req on retries (#4864)
* vmalert: correctly re-instantinate HTTP req on retries

Previosly, request retry to datasource re-used existing HTTP request.
But if request object was already partially processed (body was read),
then retry will be unsuccessful.

The change re-instantinates HTTP request object before retry.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: review fix

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit ddf87b32ed)
2023-08-27 09:04:42 +02:00
Abirdcfly
70780306b3
vmalert: fix vmalert_remotewrite_send_duration_seconds_total metric value (#4801)
The deferred call's arguments are evaluated immediately, but the function call is not executed until the surrounding function returns.

Signed-off-by: Abirdcfly <fp544037857@gmail.com>
2023-08-11 04:58:00 -07:00
hagen1778
1786a703fd
vmalert: mention vmalert_iteration_duration_seconds metric in README
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-08-11 04:57:24 -07:00
Haleygo
74d5622606
vmalert: fix redundant clean up move (#4803)
Follow-up after 55ae2c2d57
2023-08-11 04:56:57 -07:00
Roman Khavronenko
4c91773a15
vmalert: cleanup config reload metrics handling (#4790)
* rename `configErr` to `lastConfigErr` to reduce confusion
* add tests to verify metrics and msg are set properly
* fix mistake when config success metric wasn't restored after an error

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-08-11 04:56:21 -07:00
Haleygo
c0fdd73313
vmalert: fix uncleaned tmp files in tests (#4788) 2023-08-11 04:55:42 -07:00
Zakhar Bessarab
0747ca0595
docs: make phrase about dedup and evaluation interval relation less obscure (#4781)
Value of `-dedup.minScrapeInterval` comand-line flag must be higher than `evaluation_interval` in order to make sure that only one sample on each evaluation will be left after deduplication.
Moreover, value of `-dedup.minScrapeInterval` must be a multiple of vmalert's `evaluation_interval` in order to make sure that samples will be aligned between deduplication window periods.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4774#issuecomment-1663940811

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-08-11 04:54:43 -07:00
Roman Khavronenko
02dba5f728
vmalert: remove deprecated in v1.79.0 web links with */status suffix (#4747)
Links of form `/api/v1/<groupID>/<alertID>/status` were deprecated
in favour of `/api/v1/alerts?group_id=<>&alert_id=<>` links in
v1.79.0. See more details here https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2825

This change removes code responsible for deprecated functionality.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-31 07:58:37 -07:00
hagen1778
c6acb5b6bc
docs: rm typo of naming vmalert as a stateless service
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-31 07:57:42 -07:00
Roman Khavronenko
85eb62a2ec
vmalert: remove deprecated in v1.61.0 -rule.configCheckInterval (#4745)
Use `-configCheckInterval` command-line flag instead.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-31 07:55:54 -07:00
Aliaksandr Valialkin
7b01e28371
app/vmalert: use proper timestamp in setConfigSuccess() 2023-07-28 11:16:14 -07:00
Roman Khavronenko
303d3616ec
vmalert: revert unittest feature (#4734)
* Revert "vmalert: unittest support stale datapoint (#4696)"

This reverts commit 0b44df7ec8.

* Revert "docs: specify min version and limitations for vmalert's unit tests"

This reverts commit a24541bd

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Revert "vmalert: init unit test (#4596)"

This reverts commit da60a68d

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* docs: mention unittest revert in changelog

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>

(cherry picked from commit 9f1b9b86cc)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-28 11:09:52 +02:00
Aliaksandr Valialkin
63d9a92d3d
docs/vmalert.md: fix broken links to Web chapter 2023-07-27 18:05:25 -07:00
Aliaksandr Valialkin
ce18e9b2c4
app/vmalert: make golangci-lint happy after ae0e4a8c90 2023-07-27 13:27:36 -07:00
Haleygo
3c297e0253
vmalert: add keep_firing_for field for alerting rule (#4669)
vmalert: support `keep_firing_for` field for alerting rule

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4529

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2023-07-27 13:00:45 -07:00
hagen1778
1250ebc063
vmalert: clarify docs for state restore with additional details
The important change is to highlight that restore procedure happens
only once and only for already loaded rules. Config hot-reload
doesn't trigger the restore procedure.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-27 12:58:56 -07:00
hagen1778
91eddeca52
vmalert: revert accidental changes to Makefile rule
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-27 12:58:26 -07:00
Aliaksandr Valialkin
1b7d97787a
docs: use 1. instead of N. in numbered bullets, so they are automatically adjusted by Github Markdown engine
See https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#lists
2023-07-26 14:40:06 -07:00
Aliaksandr Valialkin
2d88ebd7cb
app/vmalert/datasource: substitute golang.org/x/exp/slices.SortFunc with sort.Slice
This removes unnecessary third-party dependency on golang.org/x/exp.

This is a follow-up for da60a68d09
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2945
2023-07-24 19:17:19 -07:00
Haleygo
ee1b3a48e9
vmalert: unittest support stale datapoint (#4696)
* vmalert: unittest support stale datapoint

* add stale ut case
2023-07-24 16:15:27 -07:00
Zakhar Bessarab
2fe33b3d97
app/vmalert/datasource/graphite: allow overriding "from" parameter for datasource queries (#4687)
* app/vmalert/datasource/graphite: allow overriding "from" parameter for datasource queries

Fixes construction of URL parameters for graphite render to allow overriding "from" parameter.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4685
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* app/vmalert/datasource/graphite: update flow for building URL parameters

Makes flow of building URL parameters same as Prometheus datasource has:
1) Setting all default values
2) Merging those values with provided `extraParams`

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* Update docs/CHANGELOG.md

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2023-07-22 14:18:52 -07:00
hagen1778
1d2a0e0e10
docs: fix the next release version for vmalert
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-20 21:49:34 -07:00
hagen1778
f7d60613a9
docs: specify min version and limitations for vmalert's unit tests
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-20 21:27:50 -07:00
Haleygo
939c8b8372
vmalert: init unit test (#4596)
vmalert: support unit tests

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2945
---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2023-07-20 21:19:45 -07:00
Aliaksandr Valialkin
443c266406
app/vmalert/README.md: sync with docs/vmalert.md after 54b7bd4564 2023-07-19 16:31:30 -07:00
Roman Khavronenko
80768d53dd
docs: follow-up after aec4b5db81 (#4638)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-19 14:48:17 -07:00
Roman Khavronenko
debe1793bb
vmalert: follow-up after d4ac4b7813 (#4659)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-18 16:03:28 -07:00
venkatbvc
bd2a37429c
vmalert: allow to blackhole alerting notifications (#4639)
vmalert: support option to blackhole alerting notifications

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4122

---------

Co-authored-by: Rao, B V Chalapathi <b_v_chalapathi.rao@nokia.com>
2023-07-18 16:02:48 -07:00
Haleygo
5e5c805599
vmalert: fix evalTS after modify group interval (#4629) 2023-07-14 10:47:29 -07:00
Aliaksandr Valialkin
650af7c5ca
app/vmalert: silence golagci-lint at TestAlertingRule_Template
Add a break if gotAlert is nil

This removes the following golangci-lint warning:

app/vmalert/alerting_test.go:868:8: SA5011(related information): this check suggests that the pointer can be nil (staticcheck)
				if gotAlert == nil {
				   ^
2023-07-13 12:16:00 -07:00
Roman Khavronenko
fdccb56620
vmalert: check for negative offset for missed rounds (#4628)
It could happen for low evaluation intervals and irregular
delays during execution that evaluation time would get
a negative offset. This could result into cumulative
discrepancy between the actual time and evaluation time for rules.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-13 12:05:52 -07:00
Zakhar Bessarab
ddd918b93c
docs: make httpAuth.* flags description less ambiguous (#4588)
* docs: make `httpAuth.*` flags description less ambiguous

Currently, it may confuse users whether `httpAuth.*` flags are used by HTTP client or server configuration(see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4586 for example).

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs: fix a typo

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-07-09 12:36:14 -07:00
Haleygo
3c2308fd52
vmalert:fix query request using rfc3339 format (#4577)
vmalert: consistently use time.RFC3339 format for time in queries

Co-authored-by: hagen1778 <roman@victoriametrics.com>
2023-07-09 11:03:10 -07:00
Roman Khavronenko
109e55f865
vmalert: allow disabling of step param attached to instant queries (#4574)
vmalert: allow disabling of `step` param attached to instant queries

This might be useful for using vmalert with datasources that to not support this param,
unlike VictoriaMetrics.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4573

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-06 23:13:56 -07:00
Aliaksandr Valialkin
316abe550d
docs/vmalert.md: update -help output 2023-07-06 22:50:47 -07:00
Dmytro Kozlov
dd412a3757
app/vmalert: show on UI groups error after reload config (#4543)
show on UI groups error after reload config

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4076

Co-authored-by: hagen1778 <roman@victoriametrics.com>
2023-07-06 22:11:36 -07:00
Haleygo
9e49a9e924
vmalert: add vmalert_remotewrite_sent_duration_seconds_total metric (#4517)
add `vmalert_remotewrite_sent_duration_seconds_total` metric
2023-07-06 21:51:31 -07:00
Roman Khavronenko
a677509b38
vmalert: make linter happy (#4509)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-06 21:46:22 -07:00
Roman Khavronenko
d5e7ea5ef3
vmalert: update retry policy for pushing data to -remoteWrite.url (#4504)
By default, vmalert will make multiple retry attempts with exponential delay.
The total time spent during retry attempts shouldn't exceed `-remoteWrite.retryMaxTime` (default is 30s).
When retry time is exceeded vmalert drops the data dedicated for `-remoteWrite.url`.
Before, vmalert dropped data after 5 retry attempts with 1s delay between attempts (not configurable).

See `-remoteWrite.retryMinInterval` and `-remoteWrite.retryMaxTime` cmd-line flags.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2023-07-06 21:44:18 -07:00
Roman Khavronenko
311a81c7b0
vmalert: properly interrupt remotewrite retries on shutdown (#4505)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-06 21:43:04 -07:00
Roman Khavronenko
4e99bf8c9e
docs/vmalert: specify version requirements for new features (#4480)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-06 21:24:38 -07:00
Roman Khavronenko
d4ee505f6f
vmalert: retry all errors except 4XX status codes (#4461)
vmalert: retry all errors except 4XX status codes

Retry all errors except 4XX status codes while pushing via remote-write
to the remote storage. Previously, errors like broken connection could
prevent vmalert from retrying the request.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-06 17:34:32 -07:00
Roman Khavronenko
b76c0d182c
docs/vmalert: mention same labelset error in docs (#4443)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-06 16:56:42 -07:00
Roman Khavronenko
91612b38cd
docs: mention stream aggregation as more efficient approach for aggregation (#4429)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-06 16:46:17 -07:00
Roman Khavronenko
a1b6a9317d
vmalert: fix nil map assignment (#4392)
* vmalert: fix nil map assignment

The storage instance with nil map params was created for remote-read purposes.
And before change 7a9ae9de0d this map was ignored in ApplyParams.
Now, it started to be used and vmalert panics in runtime.

The fix properly inits map for at `NewVMStorage` and verifies it is not nil
on assignment in `ApplyParams`.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: add to changelog

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: properly clone Storage params

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: properly clone Storage params

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: properly clone Storage params

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit de94812088)
2023-06-02 13:29:51 +02:00
Roman Khavronenko
fde5a59726
app/vmalert: follow-up after 7a9ae9de0d (#4381)
7a9ae9de0d

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit eccecdf177)
2023-06-02 13:29:47 +02:00