Commit graph

272 commits

Author SHA1 Message Date
Howie
7c3d43fa7f
fix: docs (#2658)
Signed-off-by: lihaowei <haoweili35@gmail.com>
2022-05-30 08:16:07 +02:00
spectvtor
9e343faa41
fix alert relabeling (#2633) 2022-05-25 09:36:04 +02:00
Roman Khavronenko
113301308a
vmalert: mention how to build a custom image (#2626)
Thanks to @f41gh7

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-05-23 00:59:34 +02:00
Roman Khavronenko
2cf586da78
vmalert: add new metric vmalert_iteration_interval_seconds (#2623)
The new metric shows the configured evaluation interval per group.
Metric updates its value when group's interval is changed during
hot reload.
The new metric can be used to estimate how close group
is to start missing evaluation rounds. The following query
will show the % of used time by the group to evaluate all rules
before the next round:
```
(max(vmalert_iteration_duration_seconds{quantile="0.99"}) / vmalert_iteration_interval_seconds) * 100
```

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2618
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-05-20 17:31:16 +02:00
Aliaksandr Valialkin
1731fe2ada
docs: update the description for command-line flags according to recent changes 2022-05-20 15:09:43 +03:00
Roman Khavronenko
5111d850e2
vmalert: remove a line added for debug (#2611)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-05-18 14:57:58 +02:00
Roman Khavronenko
34116882b4
vmalert: support scalar type in response (#2610)
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2607

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-05-18 09:50:46 +02:00
Roman Khavronenko
1fad4dc919
vmalert: support strings in humanize.* templates (#2606)
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2569

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-05-17 15:38:54 +02:00
Yurii Kravets
5c42c1218a
Update vmalert.md (#2580)
docs: update vmalert/README.md

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2022-05-17 14:14:18 +02:00
Roman Khavronenko
b74c001c92
vmalert: support /rules path for Grafana's ngalert requests (#2593)
Unexpectedly, Grafana makes an extra request to `/rules`
handler in addition to `/api/v1/rules` calls in alerts UI.
This happens only for Grafana versions older than 8.5.*.
Apparently, this is related to support of other monitoring
systems.
Prometheus responds with `text/html` content for UI page `/rules`
to such requests. Actually, returning just a blank page with
SC=200 works as well.

Returning actual response of `/api/v1/rules`
results in error in Grafana since it expects a `yaml` (?) in response.
So we add a placeholder to `vmalert`.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2583
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-05-16 10:00:24 +02:00
Roman Khavronenko
284bda8746
docs: fix liquid syntax errors (#2592)
For liquid text processor double braces `{{` `}}`
are special chars for templating.
Since we use them in some of our docs with different purpose,
we must escape them to avoid syntax errors from liquid.

For escaping curly braces we use bult-in plugin which helps
to enclose sections of text via `{% raw %}` and `{% endraw %}`.
This approach prevents liquid syntax errors and makes render correct.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-05-16 09:27:19 +02:00
Roman Khavronenko
0d07166eed
vmalert: fix readme formatting (#2587)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-05-14 19:29:09 +02:00
Roman Khavronenko
9bc03f6b04
vmalert: follow-up after 0ac1cdfff5 (#2586)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-05-14 18:56:31 +02:00
Andrii Chubatiuk
a531a96193
added reusable templates support (#2532)
Signed-off-by: Andrii Chubatiuk <andrew.chubatiuk@gmail.com>
2022-05-14 11:38:44 +02:00
Aliaksandr Valialkin
c448d2fcbb
app/vmalert: apply -remoteRead.disablePathAppend to -datasource.url in the same way as for the -remoteRead.url
This is a follow-up for 0e2486df56

The related pull requests:
- https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1536
- https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1712
2022-05-13 16:44:43 +03:00
Roman Khavronenko
3f0ecee128
vmalert: properly cleanup stale series tracker on rules update (#2577)
Rules executor within group tracks series sent to remote write
in order to mark them as stale if they had disappeared in next
evaluation round.
The executor uses rules ID as a key to identifies series which belong to rule.
On config reload, executor remains active but the set of rules could change.
Hence, we need to properly cleanup the tracker for rules which has been disappeared
on config reload.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-05-13 10:04:49 +02:00
Dmytro Kozlov
c8af625bcc
vmctl: fix build for solaris os (#2555)
* vmctl: fix build for solaris os

* vmctl: updated dependency (using Syscall instead of Syscall6)

* vmctl: updated dependency

* vmctl: updated dependency
2022-05-09 21:36:18 +02:00
Roman Khavronenko
331a5d9a17
Code check (#2558)
* vmstorage: make gofmt happy

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: make linter happy

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-05-09 10:11:56 +02:00
Roman Khavronenko
e9fa363480
Vmalert fix bugs in alerting evaluation (#2557)
* vmalert: calculate time for firing alert based on the given timestamp

Previously, current time was used for checking the `firing` threshold.
This is not correct, since alerts are evaluated at specific timestamps.
Hence, this specific timestamp supposed to be used in the calculation.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: properly calculate evaluation timestamp for rules

Timestamp for rules evaluation should be calculated after
the artifical delay for groups start. Otherwise, evaluation
timestamp can fall back too far in time.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-05-09 10:11:06 +02:00
Aliaksandr Valialkin
381e2de59c
app/vmalert: run make quicktemplate-gen from the root directory after the commit f6dcfbcdd6 2022-05-04 20:27:36 +03:00
Dmytro Kozlov
f6dcfbcdd6
vmalert/tpl: fixed truncating alerts expression in table (#2494)
vmalert: improve `/groups` UI visual 

The change also fixes truncated rules expressions in UI
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2484
2022-05-04 18:02:18 +02:00
Aliaksandr Valialkin
58390192c1
app/vmalert: run make quicktemplate-gen from the repository root
This is a follow-up after b2294d1cf1
2022-05-02 15:17:03 +03:00
Roman Khavronenko
3616337812
vmalert: do not execute templates during validation (#2528)
Function `ValidateTemplates`, used on the vmalert startup,
is supposed to check whether used templates and functions
in loaded rules are correct. The function was parsing
and executing loaded templates.
However, rules may contain functions which can't be executed
without values (label values or query results), like `slice`.
Because of this, validation for completely valid expression
`{{ slice $labels.job 9 }}` will fail since `$labels.job`
is empty during validation.

This PR updates `ValidateTemplates` function to only parse
templates without executing them.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2514
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-05-02 10:16:16 +02:00
Dmytro Kozlov
32a6b67e6c
vmalert: added disableProgressBar flag which disable progressbar (#2506)
vmalert: added disableProgressBar flag which disable progressbar

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1761
2022-05-02 10:08:24 +02:00
Dmytro Kozlov
b2294d1cf1
vmctl/vm: added datapoints collection bar (#2486)
add progress bars to the VM importer

The new progress bars supposed to display the processing speed per each
VM importer worker. This info should help to identify if there is a bottleneck
on the VM side during the import process, without waiting for its finish.
The new progress bars can be disabled by passing `vm-disable-progress-bar` flag.

Plotting multiple progress bars requires using experimental progress bar pool
from github.com/cheggaaa/pb/v3. Switch to progress bar pool required changes
in all import modes.

The openTSDB mode wasn't changed due to its implementation, which implies individual progress
bars per each series. Because of this, using the pool wasn't possible.

Signed-off-by: dmitryk-dk <kozlovdmitriyy@gmail.com>

Co-authored-by: hagen1778 <roman@victoriametrics.com>
2022-05-02 09:06:34 +02:00
Aliaksandr Valialkin
1097ebebe6
lib/httpserver: clarify that -tls flag enables TLS for http requests to -httpListenAddr 2022-04-16 16:59:26 +03:00
Aliaksandr Valialkin
b49b8020d6
docs: sync docs with the latest changes 2022-04-16 15:59:53 +03:00
Aliaksandr Valialkin
ebaa1c7ad5
lib/promscrape: follow-up after baa1c24b36 2022-04-16 14:25:54 +03:00
Roman Khavronenko
45fcaa33e8
vmalert: add DNS service discovery (#2465)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2460
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-04-13 11:50:26 +03:00
Aliaksandr Valialkin
b89e846ce3
docs/CHANGELOG.md: document ed364a42e3 2022-04-11 12:11:32 +03:00
hagen1778
ed364a42e3 vmalert: support relabeling for alert labels sent via notifier
Before, relabeling for notifier configured via file was supported
only for target labels discovered via SD.
With this change, new config field `alert_relabel_configs` is introduced
for applying relabeling to labels of sent alerts.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-04-11 11:09:14 +03:00
Roman Khavronenko
2b59fff526
vmalert: fix labels and annotations processing for alerts (#2403)
To improve compatibility with Prometheus alerting the order of
templates processing has changed.
Before, vmalert did all labels processing beforehand. It meant
all extra labels (such as `alertname`, `alertgroup` or rule labels)
were available in templating. All collisions were resolved in favour
of extra labels.
In Prometheus, only labels from the received metric are available in
templating, so no collisions are possible.
This change makes vmalert's behaviour similar to Prometheus.

For example, consider alerting rule which is triggered by time series
with `alertname` label. In vmalert, this label would be overriden
by alerting rule's name everywhere: for alert labels, for annotations, etc.
In Prometheus, it would be overriden for alert's labels only, but in annotations
the original label value would be available.

See more details here https://github.com/prometheus/compliance/issues/80

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-04-06 20:24:45 +02:00
Roman Khavronenko
70bb0d2708
vmalert: add flag for disabling long-lived connections (keepalive) (#2395)
The new flag `datasource.disableKeepAlive` allows disabling keepalive
connections. This may be useful if there are multiple datasource
replicas (e.g. vmselects) behind the HTTP balancer to avoid uneven
load spread because of long-lived connections.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-04-04 12:59:04 +03:00
Roman Khavronenko
1354e6d712
vmalert: protect executor's field from concurrent access (#2387)
Executor recently gain field for storing previously sent series.
Since the same executor object can be used in multiple goroutines,
the access to this field should be serialized.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-03-30 12:37:27 +02:00
Roman Khavronenko
0989649ad0
Vmalert compliance 2 (#2340)
* vmalert: split alert's `Start` field into `ActiveAt` and `Start`

The `ActiveAt` field identifies when alert becomes active for rules
with `for > 0`. Previously, this value was stored in field `Start`.

The field `Start` now identifies the moment alert became `FIRING`.

The split is needed in order to distinguish these two moments
in the API responses for alerts.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: support specific moment of time for rules evaluation

The Querier interface was extended to accept a new argument
used as a timestamp at which evaluation should be made.

It is needed to align rules execution time within the group.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: mark disappeared series as stale

Series generated by alerting rules, which were sent to remote write
now will be marked as stale if they will disappear on the next
evaluation. This would make ALERTS and ALERTS_FOR_TIME series
more precise.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: evaluate rules at fixed timestamp

Before, time at which rules were evaluated was calculated
right before rule execution. The change makes sure
that timestamp is calculated only once per evalution round
and all rules are using the same timestamp.

It also updates the logic of resending of already resolved
alert notification.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: allow overridin `alertname` label value if it is present in response

Previously, `alertname` was always equal to the Alerting Rule name. Now,
its value can be overriden if series in response containt the different value
for this label.

The change is needed for improving compatibility with Prometheus.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: align rules evaluation in time

Now, evaluation timestamp for rules evaluates as if
there was no delay in rules evaluation. It means, that
rules will be evaluated at fixed timestamps+group_interval.
This way provides more consistent evaluation results and
improves compatibility with Prometheus,

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: add metric for missed iterations

New metric `vmalert_iteration_missed_total` will show
whether rules evaluation round was missed.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: reduce delay before the initial rule evaluation in group

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: rollback alertname override

According to the spec:
```
The alert name from the alerting rule (HighRequestLatency from the example above) MUST be added to the labels of the alert with the label name as alertname. It MUST override any existing alertname label.
```

https://github.com/prometheus/compliance/blob/main/alert_generator/specification.md#step-3
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: throw err immediately on dedup detection

```
The execution of an alerting rule MUST error out immediately and MUST NOT send any alerts
or add samples to samples receiver if there is more than one alert with the same labels
```

https://github.com/prometheus/compliance/blob/main/alert_generator/specification.md#step-4
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: cleanup

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: use strings builder to reduce allocs

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-03-29 15:09:07 +02:00
Roman Khavronenko
56de8f0356
docs: fix typo in vmalert's API (#2380)
The API handler was changed in 1.75 but docs
still contain the old address.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2366
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-03-28 12:07:02 +02:00
Aliaksandr Valialkin
c8f356a6a8
app: sync Markdown changes from a8de1ab000 2022-03-22 14:11:18 +02:00
Aliaksandr Valialkin
09c6c7350b
docs/vmalert.md: sync after 11ae1ae924 2022-03-17 20:17:40 +02:00
Dmytro Kozlov
11ae1ae924
Added resendDelay for alerts (#2296)
* vmalert: add support of `resendDelay` flag for alerts

Co-authored-by: dmitryk-dk <dmitry.kozlov@brightlocal.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2022-03-16 15:26:33 +00:00
Roman Khavronenko
fb6eab03a2
Vmalert compliance improvements (#2320)
* vmalert: add support for `sortByLabel` template function

* vmalert: update API according to Prometheus conformance program

The changes to the API, field names and URL path has been made
according to the Prometheus specification for `alert_generator`
https://github.com/prometheus/compliance/blob/main/alert_generator/specification.md

* vmalert: fix the timestamp of the evaluated rules

The timestamp used for alert's `EndsAt` was calculated
before sending the notification. While the correct way
is to use the timestamp taken right before rules evaluation.

* vmalert: add `-datasource.queryTimeAlignment` flag

The flag is supposed to provide ability to disable `time`
param alignment when executing rules. By default, this flag
is enabled, so it remains backward compatible.

The flag was introduced to achieve better compatibility
with Prometheus behaviour according to https://github.com/prometheus/compliance/blob/main/alert_generator/specification.md

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-03-15 11:54:53 +00:00
Roman Khavronenko
0fa7effc4b
docs: fix broken links (#2303)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-03-13 15:56:01 +02:00
Dmytro Kozlov
565bd08c43
Issue-1824: added flags and different auth types support (#2287)
* vmalert/notifier: added flags and different auth types support

Co-authored-by: hagen1778 <roman@victoriametrics.com>
2022-03-10 13:09:12 +02:00
Bastien Dronneau
8b21f40217
docs(vmalert): typo in path (#2278) 2022-03-05 22:35:10 +02:00
Denys Holius
1685e181ae
Added minimal supported version of AlertManager (#2237)
* added minimal supported version of supported AlertManager

* docs: `make docs-sync`

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-22 20:07:04 +02:00
Roman Khavronenko
69d1893f4c
Consul SD - update services on the watcher's start (#2202)
* lib/discovery/consul: update services on the watcher's start

Previously, watcher's start was only initing goroutines for discovery
but not waiting for the first iteration to end. It means first Consul
discovery wasn't returning discovered targets until the next iteration.

The change makes the watcher's start blocking until we get first discovery
iteration done and all registries updated.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: remove workarounds for consul SD

Now when consul SD lib properly updates services
on the first start, we don't need workarounds in vmalert.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/discovery/consul: update after review

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-21 15:32:45 +02:00
Aliaksandr Valialkin
ee5da826e9
docs: update -help output for VictoriaMetrics components 2022-02-15 21:08:22 +02:00
hagen1778
2efa46a11c vmalert: support $externalLabels and $externalURL in templates
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2193
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-02-15 17:33:52 +03:00
Nikolay
75e84144c7
adds release build for macos darwin amd64 and arm64 (#2185)
* adds release build for macos darwin amd64 and arm64

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1896
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1851

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-14 17:28:56 +02:00
Roman Khavronenko
e3adcbec6e
lib/promscrape: support prometheus-like duration in scrape configs (#2169)
* lib/promscrape: support prometheus-like duration in scrape configs

The change allows to specify duration values like `1d`, `1w`
for fields `scrape_interval`, `scrape_timeout`, etc.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/817#issuecomment-1033384766
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/blockcache: make linter happy

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/promscrape: support prometheus-like duration in scrape configs

* add support for extra fields `scrape_align_interval` and `scrape_offset`;
* support Prometheus duration parsing for `__scrape_interval__`
and `__scrape_duration__` labels;

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

* wip

* docs/CHANGELOG.md: document the feature

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-11 16:17:00 +02:00
hagen1778
4e722c459b vmalert: fix bug with relative links in UI
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2167
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-02-09 12:18:39 +03:00