Commit graph

450 commits

Author SHA1 Message Date
Roman Khavronenko
1b24f4729b
vmalert: mention default value for external.url flag (#4365)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-30 12:33:17 +02:00
Roman Khavronenko
51cea6cad4
vmalert: properly form assets address if httpPrefix set (#4351)
Properly form path to static assets in WEB UI
 if `http.pathPrefix` set.

 https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4349

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-29 07:38:13 +02:00
Roman Khavronenko
66ed6fe62f
vmalert: do not return nil rules for /api/v1/rules (#4344)
The fix addresses a case when vmalert is configured with a group
which has `name`, but doesn't have `rules` configured. In this
case it still returns a `nil` instead of `[]` slice.

Fixing this via current commit.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4221

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-25 16:56:54 +02:00
Roman Khavronenko
76f7e66d8e
vmalert: fix the typo in popup (#4331)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-19 12:45:15 +02:00
Roman Khavronenko
f68d93cca2
vmalert: follow-up after 669becd011 (#4318)
* vmalert: follow-up after 669becd011

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: follow-up after 669becd011

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: follow-up after 669becd011

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-16 18:51:38 +02:00
Michael Hoffmann
3a65f4a733
vmalert: improve retry logic for remote write (#4134)
vmalert should not retry on 4xx status codes
according to https://prometheus.io/docs/concepts/remote_write_spec/
2023-05-16 16:30:03 +02:00
Roman Khavronenko
adc5635a07
vmalert: add hints to filter buttons (#4296)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-11 16:38:08 +02:00
Roman Khavronenko
2caf0b05c6
vmalert: correctly update seriesFetched metric for const exprs (#4287)
Previously, metric `vmalert_alerting_rules_last_evaluation_series_fetched`
would be set to 0 for const expressions, because const expression do not match
any series. This may result into a confusion: no series were matched but response isn't empty.
The change updates the logic behind metric: if no series were matched but there are samples
in response - use amount of samples as number of series.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-10 15:04:05 +02:00
Roman Khavronenko
51196739af
Vmalert UI updates (#4276)
* vmalert: expand rule groups on anchor click

before, anchor click was only updating the URL.
To expand the group, user had to click on rule's block.
Now, group will toggle automatically.

* vmalert: allow filtering group in web UI

The new filter allows to filter groups and rules within
groups by: errors only or noMatch only.

The filtering supposed to help navigating big numbers of groups/rules.
Filtering is reflected in URL, so can be shared as a link.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-10 14:38:13 +02:00
Alexander Marshalov
2e494e2375
fixed typos in documentation and commandline flags descriptions (#4275) 2023-05-10 09:50:41 +02:00
Aliaksandr Valialkin
3c0470f91e
docs/vmalert.md: clarify docs regarding the support of recursive globs
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4041
2023-05-08 16:21:44 -07:00
Roman Khavronenko
fa3a17938e
vmalert: follow-up after cae87da (#4269)
* vmalert: follow-up after cae87da

cae87da4bb
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: update struct comments

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: rm typo

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-08 13:31:54 +02:00
Haleygo
cae87da4bb
vmalert: support reading rule from http url (#4212)
vmalert: support reading rule's config from HTTP URL
2023-05-08 09:52:57 +02:00
Roman Khavronenko
5b8450fc1b
app/vmalert: detect alerting rules which don't match any series at all (#4198)
app/vmalert: detect alerting rules which don't match any series at all

vmalert starts to understand /query responses which contain object:
```
"stats":{"seriesFetched": "42"}
```
If object is present, vmalert parses it and populates a new field
`SeriesFetched`. This field is then used to populate the new metric
`vmalert_alerting_rules_last_evaluation_series_fetched` and to
display warnings in the vmalert's UI.

If response doesn't contain the new object (Prometheus or
VictoriaMetrics earlier than v1.90), then `SeriesFetched=nil`.
In this case, UI will contain no additional warnings.
And `vmalert_alerting_rules_last_evaluation_series_fetched` will
be set to `-1`. Negative value of the metric will help to compile
correct alerting rule in follow-up.

Thanks for the initial implementation to @Haleygo
See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4056

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4039

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-08 09:36:39 +02:00
Roman Khavronenko
3383f12a4b
vmalert: fix API to return non-nil values (#4222)
Properly return empty slices instead of nil for `/api/v1/rules` and `/api/v1/alerts` API handlers.
This improves compatibility with Grafana.

 https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4221

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-04-28 10:08:29 +02:00
Roman Khavronenko
eb746a4dab
Revert "http server: limit max concurrent requests (#4185)" (#4215)
This reverts commit 77f76371

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-04-27 13:02:47 +02:00
Roman Khavronenko
29e059e49c
app/vmalert: follow-up after 6c322b4a00 (#4214)
6c322b4a00

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-04-27 13:02:21 +02:00
Haleygo
6c322b4a00
vmalert: allow configuring custom notifier headers per group (#4088)
vmalert: allow configuring custom notifier headers per group
2023-04-27 12:17:26 +02:00
Zakhar Bessarab
b21a55febf
app/vmalert: add support of recursive path globs for rules and templates (#4148)
Supports using `**` for `-rule` and `-rule.templates`: `dir/**/*.tpl` loads contents of dir and all subdirectories recursively.

See: #4041

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Artem Navoiev <tenmozes@gmail.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2023-04-26 19:20:22 +02:00
Zakhar Bessarab
89a1c941c2
app/vmalert: return an error when using query function in -external.alert.source flag (#4191)
Templating of `-external.alert.source` is not expected to have access to the query which was causing runtime error when query function was passed as nil.
See: #4181

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-04-26 15:31:14 +02:00
Roman Khavronenko
77f76371d0
http server: limit max concurrent requests (#4185)
* lib/httpserver: introduce `-http.maxConcurrentRequests` command-line flag

Introduce `-http.maxConcurrentRequests` command-line flag to protect
VM components from resource exhaustion during unexpected spikes of HTTP requests.
By default, the new flag's value is set to 0 which means no limits are applied.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/httpserver: mention http.maxConcurrentRequests in docs

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-04-24 14:52:06 +02:00
Roman Khavronenko
de61a73c63
vmalert: retry datasource requests with EOF or unexpected EOF errors (#4146)
* vmalert: retry datasource requests with EOF or unexpected EOF errors

Retry failed read request on the closed connection one more time.
This may improve rules execution reliability when connection
between vmalert and datasource closes unexpectedly.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: fix old tests

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-04-19 10:18:32 +02:00
Zakhar Bessarab
a8d1497024
app/vmalert: update Grafana URLs to match latest format (#4061)
See: #4019

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-04-04 15:25:29 +04:00
Roman Khavronenko
4a49577028
vmalert: use missingkey=zero for templating (#4040)
Replace empty labels with "" instead of "<no value>"
during templating, as Prometheus does.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4012

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-30 16:57:00 +04:00
Roman Khavronenko
4021aa11b5
app/vmselect: export seriesFetched stat for /query responses (#3925)
The change adds a new field `seriesFetched` to EvalConfig object.
Since EvalConfig object can be copied inside `Exec`,
`seriesFetched` is a pointer which can be updated by all copied
objects.

The reason for having stats is that other components, like vmalert,
could benefit from this information.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-03-27 15:18:25 -07:00
Roman Khavronenko
365d2ff0bf
vmalert: add anchor char to Group's link (#4006)
This should help users to see that Group's name is clickable
and used for anchoring.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-24 09:48:43 +01:00
Roman Khavronenko
8db6a71f83
vmalert: mention VMUI example for alert's source (#4005)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-24 09:40:55 +01:00
Roman Khavronenko
3283f0dae4
vmalert: support logs suppressing during config reloads (#3973)
* vmalert: support logs suppressing during config reloads

The change is mostly required for ENT version of vmalert,
since it supports object-storage for config files.
Reading data from object storage could be time-consuming,
so vmalert emits logs to track the progress.

However, these logs are mostly needed on start or on
manual config reload. Printing these logs each time
`rule.configCheckInterval` is triggered would too verbose.
So the change allows to control logs emitting during
config reloads.

Now, logs are emitted during start up or when SIGHUP is receieved.
For periodicall config checks logs emitted by config pkg are suppressed.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: review fixes

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-20 16:08:30 +01:00
Roman Khavronenko
8fdd613f25
Vmalert tests (#3975)
* vmalert: add tests for notifier pkg

* vmalert: add tests for remotewrite pkg

* vmalert: add tests for template functions

* vmalert: add tests for web pages

* vmalert: fix int overflow in tests

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-17 15:57:24 +01:00
Zakhar Bessarab
f7a7eb8f3e
docs: add a note about cache reset for vmalert backfilling docs (#3940)
docs: add a note about cache reset for vmalert backfilling docs
2023-03-10 13:45:11 +01:00
Roman Khavronenko
d66bae212b
app/vmalert: log number of configration files found for each specified -rule (#3936)
The change also introduces `List` method to `FS` interface.
The `List` method can be used for wildcard support in object storage FS.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2023-03-09 14:46:19 +01:00
Aliaksandr Valialkin
1b5dc9f91d
all: follow-up for 7a3e16e774
- Sync the description for -httpListenAddr.useProxyProtocol command-line flag at vmagent and vmauth,
  so it is consistent with the description at vmauth and victoria-metrics
- Add a sample of panic text to docs/CHANGELOG.md, so it could be googled
- Mention the -httpListenAddr.useProxyProtocol command-line flag in the description for the bugfix

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3335
2023-03-08 01:26:55 -08:00
Roman Khavronenko
2472baa934
app/vmalert: do not wait for group start on removal (#3891)
Each group in vmalert starts with an artifical delay to avoid
thundering herd problem. For some groups with high evaluation
intervals, the delay could be significant.
If during this delay user will remove the group from the config
and hot-reload it - vmalert will have to wait until the delay
ends. This results into slow config reloading and UI hang.

The change moves the start-delay logic back to the group's
`start` method. Now, group can immediately exit from the
delay when `group.close()` method is called.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-06 14:04:43 +01:00
Roman Khavronenko
d6fa4da712
vmalert: cancel in-flight requests on group's update or close (#3886)
When group's update() or close() method is called, the group
still need to wait for its current evaluation to finish.
Sometimes, evaluation could take a significant amount of time
which slows configuration update or vmalert's graceful shutdown.

The change interrupts current evaluation in order to speed up
the graceful shutdown or config update procedures.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-01 15:48:20 +01:00
Aliaksandr Valialkin
255a0cf635
all: add makefile rules for GOARCH=s390x for all the VictoriaMetrics components
This is a follow-up for 007530f882
2023-02-26 12:36:51 -08:00
Aliaksandr Valialkin
510f78a96b
all: consistently use http.Method{Get,Post,Put} across the codebase
This is a follow-up after 9dec3c8f80
2023-02-22 18:58:46 -08:00
my-git9
9dec3c8f80
chore: Use http constants to replace numbers (#3846)
Signed-off-by: xin.li <xin.li@daocloud.io>
2023-02-22 18:53:05 -08:00
Roman Khavronenko
5446ce0018
docs: mention rules replay blogpost in vmalert docs (#3851)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-21 22:16:15 +01:00
Roman Khavronenko
74f122294d
docs: update vmalert docs (#3843)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-20 16:39:31 +01:00
Aliaksandr Valialkin
8bd2eee8e0
app/vmalert/README.md: sync with docs/vmalert.md after 6ef6f3a771 2023-02-18 15:21:24 -08:00
Haleygo
6ef6f3a771
vmalert: fix maxResolveDuration flag note (#3827)
Signed-off-by: Haleygo <hui.wang@daocloud.io>
2023-02-16 19:26:17 +01:00
Roman Khavronenko
a645a95bd6
docs: improve troubleshooting docs for vmalert (#3812)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-13 17:29:30 +01:00
Oleksandr Redko
9fff48c3e3
app,lib: fix typos in comments (#3804) 2023-02-13 13:27:13 +01:00
Aliaksandr Valialkin
73358571ee
app/vmalert: follow-up after d3c64aae8768d58781ee7e358bd7f3d8e0eb836d
- Document the change at docs/CHANGELOG.md
- Add `Reading rules from object storage` section to docs/vmalert.md
- Add `s3` prefix to command-line flags related to the configuration of s3 and gcs clients
- Explicitly mention that reading rules from object storage is supported only in enterprise version
2023-02-09 18:52:00 -08:00
Roman Khavronenko
14c20c1843
vmalert: support object storage for rules (#519)
* vmalert: support object storage for rules

Support loading of alerting and recording rules from object
storages `gcs://`, `gs://`, `s3://`.

* review fixes
2023-02-09 18:50:48 -08:00
Aliaksandr Valialkin
0e0095d350
all: run apk update && apk upgrade in base Alpine Docker image in order to get all the recent security fixes 2023-02-09 14:01:32 -08:00
Roman Khavronenko
e83f14210d
Vmalert fixes (#3788)
* vmalert: use group's ID in UI to avoid collisions

Identical group names are allowed. So we should used IDs
for various groupings and aggregations in UI.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: prevent disabling state updates tracking

The minimum number of update states to track is now set to 1.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: properly update `debug` and `update_entries_limit` params on hot-reload

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: display `debug` field for rule in UI

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: exclude `updates` field from json marhsaling

This field isn't correctly marshaled right now.
And implementing the correct marshaling for it doesn't
seem right, since json representation is mostly used
by systems like Grafana. And Grafana doesn't expect this
field to be present.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* fix test for disabled state

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* fix test for disabled state

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-08 14:34:03 +01:00
Roman Khavronenko
c32d8ea29e
vmalert: update docs (#3770)
vmalert: update flags description

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-06 09:51:30 +01:00
Roman Khavronenko
6fd10e8871
vmalert: speed up state restore procedure on start (#3758)
* vmalert: speed up state restore procedure on start

Alerts state restore procedure has been changed to become asynchronous.
It doesn't block groups start anymore which significantly improves vmalert's startup time.
Instead, state restore is called by each group in their goroutines after the first rules
evaluation.

While previously state restore attempt was made for all loaded alerting rules,
now it is called only for alerts which became active after the first evaluation.
This reduces the amount of API calls to the configured remote read URL.

This also means that `remoteRead.ignoreRestoreErrors` command-line flag becomes deprecated now
and will have no effect if configured.

See relevant issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2608

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* make lint happy

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Apply suggestions from code review

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-03 19:46:13 -08:00
Roman Khavronenko
1cbdcd391c
docs: mention -vmalert.proxyURL in vmalert docs (#3730)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-01-30 16:28:33 +01:00