Commit graph

487 commits

Author SHA1 Message Date
Roman Khavronenko
d0abdc2b5b
vmalert: allow configuring custom headers per group (#2901)
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2860

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-07-21 20:48:05 +03:00
Roman Khavronenko
356d8b99e0
vmalert: allow configuring custom headers for URLs (#2897)
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2860

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-07-21 20:45:12 +03:00
Aliaksandr Valialkin
fe68bb3ba7
all: follow-up after 46f803fa7a
Add -pushmetrics.* command-line flags to all the VictoriaMetrics apps
2022-07-21 20:18:25 +03:00
cui fliter
e0d30f6ec5
fix some typos (#2882)
Signed-off-by: cui fliter <imcusg@gmail.com>
2022-07-18 12:10:40 +03:00
Roman Khavronenko
80de6face8
vmalert: drop support of deprecated extra_filter_labels param (#2870)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-07-14 16:10:46 +03:00
Aliaksandr Valialkin
70b9925bf7
app: fix make publish-* after ed93330e66
Add missing `-linux` substring to built binary names for copying into Docker images
2022-07-14 11:01:34 +03:00
Aliaksandr Valialkin
da6c85a2f6
all: follow-up for d99ba3481b 2022-07-13 17:17:08 +03:00
Dmytro Kozlov
4e4def9df8
Rename release packages (#2810)
* makefile: add os to each release file

* makefile: update vmutils arm64

* makefile: update victoria-metrics release process

* makefile: update publish with os

* makefile: update publish with os

* makefile: change tar library

* update release logic

* copy all releases

* sort command by GOOS

* rollback commands

* rollback OSARCH

* fix commands

* cleanup

* fix windows build

* sort build by GOOS, update README.md
2022-07-13 17:11:01 +03:00
Aliaksandr Valialkin
eab8ebbe11
all: make fmt via the upcoming Go1.19 2022-07-11 19:23:25 +03:00
Aliaksandr Valialkin
64419aa97c
app/vmalert/utils/links.go: document Prefix function, which has been added in b29fafa86b 2022-07-08 13:25:23 +03:00
Roman Khavronenko
b29fafa86b
vmalert: deprecate alert's status link (#2840)
* vmalert: deprecate alert's status link

Deprecate alert's status link `/api/v1/<groupID>/<alertID>/status` in favour of
`api/v1/alerts?group_id=<group_id>&alert_id=<alert_id>"`.

The change was needed for simplifying logic in vmselect for proxying vmalert's requests.

The old alert's status link will be still supported for a few versions but will be removed in the future.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2825
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: fix review comments

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-07-08 13:05:11 +03:00
Roman Khavronenko
218dfe7956
docs: mention deduplication issue for HA vmalert topology (#2838)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-07-07 01:13:56 +03:00
Roman Khavronenko
56f4058fe3
vmalert: make UI and assets links relative (#2831)
* make all links in vmalert relative, so links continue to work even if vmalert sits behind the proxy;
* update vmalert's routing to always have component-unique path prefix, e.g. /vmalert;

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2825

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-07-06 12:47:53 +03:00
Roman Khavronenko
50c0eb4c4e
vmalert: make __name__ available for templating in alerts (#2783)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-06-27 13:53:55 +03:00
Aliaksandr Valialkin
4b41a05ca7
app/vmalert: load static js and css from proper paths if -http.pathPrefix command-line flag is set
This is a follow-up for b104f67beb
2022-06-27 13:12:57 +03:00
Roman Khavronenko
572db17857
vmalert: use absolute path for assets (#2784)
Using relative path breaks assets loading on alert view page.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-06-27 00:47:36 +03:00
Aliaksandr Valialkin
3ae6300497
lib/promauth: add ability to send additional http headers in requests to scrape targets
This solves https://stackoverflow.com/questions/66032498/prometheus-scrape-metric-with-custom-header
2022-06-22 20:40:50 +03:00
Roman Khavronenko
54f0f2d384
docs: follow-up for 197d3cdd74 (#2766)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-06-22 13:18:03 +03:00
云原生驿站
67e5833ced
docs: supplement vmalert downsampling docs (#2765)
Co-authored-by: 吴典秋 <muti_kube@163.com>
2022-06-22 13:18:03 +03:00
Aliaksandr Valialkin
597bce4f55
docs: update docs after e4d6b750f6
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2753
2022-06-21 14:01:25 +03:00
Roman Khavronenko
3ada676879
docs: reference links from key concepts (#2745)
Signed-off-by: hagen1778 <roman@victoriametrics.com>

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-06-19 23:14:30 +03:00
Aliaksandr Valialkin
fe9f59fcd6
all: replace bash with console blocks in all the *.md files
This is a follow-up for 954a7a6fc6
2022-06-19 23:02:02 +03:00
Roman Khavronenko
3e45e1ff63
Vmalert notifiers (#2744)
* vmalert: remove head of line blocking for sending alerts

This change makes sending alerts to notifiers concurrent instead
of sequential. This eliminates head of line blocking, where first
faulty notifier address prevents the rest of notifiers from
receiving notifications.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: make default timeout for sending alerts 10s

Previous value of 1m was too high and was inconsistent
with default timeout defined for notifiers via
configuration file.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: linter checks fix

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-06-19 22:49:10 +03:00
Roman Khavronenko
ba7ece02c4
docs: add multiple-remote-writes topology to vmalert (#2738)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-06-16 20:21:12 +03:00
Wataru Manji
0cd750fa5e
Add remote-write headers (#2701)
Co-authored-by: Wataru Manji <wataru.manji@linecorp.com>
2022-06-13 10:07:19 +03:00
Roman Khavronenko
3c09d25039
vmalert: followup for 76f05f8670 (#2706)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-06-09 13:15:35 +03:00
Howie
4afd7aa695
feat: rule limit (#2676)
vmalert: support `limit` param in groups definition

`limit` param limits number of time series samples produced by a single rule
during execution.
On reaching the limit rule will return an err.

Signed-off-by: lihaowei <haoweili35@gmail.com>
2022-06-09 13:15:33 +03:00
Aliaksandr Valialkin
c07f99cd4f
docs/CHANGELOG.md: document https://github.com/VictoriaMetrics/VictoriaMetrics/pull/2685 2022-06-07 15:39:53 +03:00
Wataru Manji
64f7095c3a
add Content-Encoding Header (#2685)
Co-authored-by: Wataru Manji <wataru.manji@linecorp.com>
2022-06-07 15:34:25 +03:00
Aliaksandr Valialkin
68b6ddfb14
all: follow-up after 8edb390e21
- Remove unused js bloatware from /targets page. This strips down binary size by more than 100Kb
- Add /service-discovery page for API compatibility with Prometheus
- Properly load bootstrap.min.css from /prometheus/targets
- Serve static contents for /targets page from app/vminsert instead of app/vmselect, because /targets page is served from there
2022-06-07 01:05:53 +03:00
Aliaksandr Valialkin
afced37c0b
all: add initial support for query tracing
See https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#query-tracing

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1403
2022-06-01 02:31:44 +03:00
Aliaksandr Valialkin
e83b96366f
docs/CHANGELOG.md: follow-up after 11f91532c5
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2594
2022-05-31 12:42:48 +03:00
Dmytro Kozlov
cd1fa2e4cd
issue-2594: use embedded for static files (#2650)
embed static js and css files from CDN into vmalert, vmagent and vmsingle binaries.

Co-authored-by: f41gh7 <nik@victoriametrics.com>

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2594
2022-05-31 12:42:48 +03:00
Howie
09fed19ba5
chore: remove duplicated code (#2657)
Signed-off-by: lihaowei <haoweili35@gmail.com>
2022-05-30 12:27:47 +03:00
Howie
9c5f998438
fix: docs (#2658)
Signed-off-by: lihaowei <haoweili35@gmail.com>
2022-05-30 12:27:19 +03:00
spectvtor
c5df5c9a95
fix alert relabeling (#2633) 2022-05-25 15:05:10 +03:00
Roman Khavronenko
d597cdd06f
vmalert: mention how to build a custom image (#2626)
Thanks to @f41gh7

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-05-23 10:55:42 +03:00
Roman Khavronenko
88c4c6f465
vmalert: add new metric vmalert_iteration_interval_seconds (#2623)
The new metric shows the configured evaluation interval per group.
Metric updates its value when group's interval is changed during
hot reload.
The new metric can be used to estimate how close group
is to start missing evaluation rounds. The following query
will show the % of used time by the group to evaluate all rules
before the next round:
```
(max(vmalert_iteration_duration_seconds{quantile="0.99"}) / vmalert_iteration_interval_seconds) * 100
```

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2618
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-05-21 01:13:01 +03:00
Aliaksandr Valialkin
aba00b8cb2
docs: update the description for command-line flags according to recent changes 2022-05-20 15:11:37 +03:00
Roman Khavronenko
d814c83b21
vmalert: remove a line added for debug (#2611)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-05-20 14:08:57 +03:00
Roman Khavronenko
2aeb00f98f
vmalert: support scalar type in response (#2610)
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2607

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-05-20 14:08:19 +03:00
Roman Khavronenko
a07ddf9b65
vmalert: support strings in humanize.* templates (#2606)
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2569

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-05-20 14:05:53 +03:00
Yurii Kravets
40c614dc4e
Update vmalert.md (#2580)
docs: update vmalert/README.md

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2022-05-20 14:02:34 +03:00
Roman Khavronenko
87e4e76537
vmalert: support /rules path for Grafana's ngalert requests (#2593)
Unexpectedly, Grafana makes an extra request to `/rules`
handler in addition to `/api/v1/rules` calls in alerts UI.
This happens only for Grafana versions older than 8.5.*.
Apparently, this is related to support of other monitoring
systems.
Prometheus responds with `text/html` content for UI page `/rules`
to such requests. Actually, returning just a blank page with
SC=200 works as well.

Returning actual response of `/api/v1/rules`
results in error in Grafana since it expects a `yaml` (?) in response.
So we add a placeholder to `vmalert`.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2583
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-05-20 13:53:55 +03:00
Roman Khavronenko
ee1f8bae2e
docs: fix liquid syntax errors (#2592)
For liquid text processor double braces `{{` `}}`
are special chars for templating.
Since we use them in some of our docs with different purpose,
we must escape them to avoid syntax errors from liquid.

For escaping curly braces we use bult-in plugin which helps
to enclose sections of text via `{% raw %}` and `{% endraw %}`.
This approach prevents liquid syntax errors and makes render correct.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-05-20 13:42:47 +03:00
Roman Khavronenko
6e5ba1921d
vmalert: fix readme formatting (#2587)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-05-20 13:40:41 +03:00
Roman Khavronenko
0790ba5e26
vmalert: follow-up after 0ac1cdfff5 (#2586)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-05-20 12:25:13 +03:00
Andrii Chubatiuk
7789c47e41
added reusable templates support (#2532)
Signed-off-by: Andrii Chubatiuk <andrew.chubatiuk@gmail.com>
2022-05-20 12:25:11 +03:00
Aliaksandr Valialkin
83ff4c411d
app/vmalert: apply -remoteRead.disablePathAppend to -datasource.url in the same way as for the -remoteRead.url
This is a follow-up for 0e2486df56

The related pull requests:
- https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1536
- https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1712
2022-05-13 16:59:16 +03:00
Roman Khavronenko
2ea625d5bf
vmalert: properly cleanup stale series tracker on rules update (#2577)
Rules executor within group tracks series sent to remote write
in order to mark them as stale if they had disappeared in next
evaluation round.
The executor uses rules ID as a key to identifies series which belong to rule.
On config reload, executor remains active but the set of rules could change.
Hence, we need to properly cleanup the tracker for rules which has been disappeared
on config reload.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-05-13 16:59:16 +03:00
Dmytro Kozlov
1a8a24bcb3
vmctl: fix build for solaris os (#2555)
* vmctl: fix build for solaris os

* vmctl: updated dependency (using Syscall instead of Syscall6)

* vmctl: updated dependency

* vmctl: updated dependency
2022-05-11 14:30:45 +03:00
Roman Khavronenko
c9b5e6e8ca
Code check (#2558)
* vmstorage: make gofmt happy

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: make linter happy

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-05-09 15:29:25 +03:00
Roman Khavronenko
ee93d003d3
Vmalert fix bugs in alerting evaluation (#2557)
* vmalert: calculate time for firing alert based on the given timestamp

Previously, current time was used for checking the `firing` threshold.
This is not correct, since alerts are evaluated at specific timestamps.
Hence, this specific timestamp supposed to be used in the calculation.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: properly calculate evaluation timestamp for rules

Timestamp for rules evaluation should be calculated after
the artifical delay for groups start. Otherwise, evaluation
timestamp can fall back too far in time.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-05-09 15:28:30 +03:00
Aliaksandr Valialkin
358fa99af2
app/vmalert: run make quicktemplate-gen from the root directory after the commit f6dcfbcdd6 2022-05-04 20:28:37 +03:00
Dmytro Kozlov
0aeefeb5f1
vmalert/tpl: fixed truncating alerts expression in table (#2494)
vmalert: improve `/groups` UI visual 

The change also fixes truncated rules expressions in UI
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2484
2022-05-04 20:28:37 +03:00
Aliaksandr Valialkin
725cb64e81
app/vmalert: run make quicktemplate-gen from the repository root
This is a follow-up after b2294d1cf1
2022-05-02 15:37:54 +03:00
Dmytro Kozlov
4764f6e522
vmalert: added disableProgressBar flag which disable progressbar (#2506)
vmalert: added disableProgressBar flag which disable progressbar

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1761
2022-05-02 15:37:54 +03:00
Roman Khavronenko
d0d0be9031
vmalert: do not execute templates during validation (#2528)
Function `ValidateTemplates`, used on the vmalert startup,
is supposed to check whether used templates and functions
in loaded rules are correct. The function was parsing
and executing loaded templates.
However, rules may contain functions which can't be executed
without values (label values or query results), like `slice`.
Because of this, validation for completely valid expression
`{{ slice $labels.job 9 }}` will fail since `$labels.job`
is empty during validation.

This PR updates `ValidateTemplates` function to only parse
templates without executing them.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2514
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-05-02 15:37:54 +03:00
Dmytro Kozlov
25e54d2b50
vmctl/vm: added datapoints collection bar (#2486)
add progress bars to the VM importer

The new progress bars supposed to display the processing speed per each
VM importer worker. This info should help to identify if there is a bottleneck
on the VM side during the import process, without waiting for its finish.
The new progress bars can be disabled by passing `vm-disable-progress-bar` flag.

Plotting multiple progress bars requires using experimental progress bar pool
from github.com/cheggaaa/pb/v3. Switch to progress bar pool required changes
in all import modes.

The openTSDB mode wasn't changed due to its implementation, which implies individual progress
bars per each series. Because of this, using the pool wasn't possible.

Signed-off-by: dmitryk-dk <kozlovdmitriyy@gmail.com>

Co-authored-by: hagen1778 <roman@victoriametrics.com>
2022-05-02 10:58:06 +03:00
Aliaksandr Valialkin
7debf57ca6
lib/httpserver: clarify that -tls flag enables TLS for http requests to -httpListenAddr 2022-04-16 16:59:41 +03:00
Aliaksandr Valialkin
6bd032a6d3
docs: sync docs with the latest changes 2022-04-16 16:00:27 +03:00
Aliaksandr Valialkin
c50e48a74c
lib/promscrape: follow-up after baa1c24b36 2022-04-16 14:26:38 +03:00
Roman Khavronenko
56cd2b918a
vmalert: add DNS service discovery (#2465)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2460
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-04-13 14:14:25 +03:00
Aliaksandr Valialkin
3c27bde77e
docs/CHANGELOG.md: document ed364a42e3 2022-04-11 12:12:07 +03:00
hagen1778
fa9601a0f1
vmalert: support relabeling for alert labels sent via notifier
Before, relabeling for notifier configured via file was supported
only for target labels discovered via SD.
With this change, new config field `alert_relabel_configs` is introduced
for applying relabeling to labels of sent alerts.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-04-11 12:12:04 +03:00
Roman Khavronenko
4de1b2b74a
vmalert: fix labels and annotations processing for alerts (#2403)
To improve compatibility with Prometheus alerting the order of
templates processing has changed.
Before, vmalert did all labels processing beforehand. It meant
all extra labels (such as `alertname`, `alertgroup` or rule labels)
were available in templating. All collisions were resolved in favour
of extra labels.
In Prometheus, only labels from the received metric are available in
templating, so no collisions are possible.
This change makes vmalert's behaviour similar to Prometheus.

For example, consider alerting rule which is triggered by time series
with `alertname` label. In vmalert, this label would be overriden
by alerting rule's name everywhere: for alert labels, for annotations, etc.
In Prometheus, it would be overriden for alert's labels only, but in annotations
the original label value would be available.

See more details here https://github.com/prometheus/compliance/issues/80

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-04-07 15:24:06 +03:00
Roman Khavronenko
ce1629b70a
vmalert: add flag for disabling long-lived connections (keepalive) (#2395)
The new flag `datasource.disableKeepAlive` allows disabling keepalive
connections. This may be useful if there are multiple datasource
replicas (e.g. vmselects) behind the HTTP balancer to avoid uneven
load spread because of long-lived connections.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-04-04 13:08:06 +03:00
Roman Khavronenko
7aa9d0f5f6
vmalert: protect executor's field from concurrent access (#2387)
Executor recently gain field for storing previously sent series.
Since the same executor object can be used in multiple goroutines,
the access to this field should be serialized.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-04-01 12:03:41 +03:00
Roman Khavronenko
ab10178c85
Vmalert compliance 2 (#2340)
* vmalert: split alert's `Start` field into `ActiveAt` and `Start`

The `ActiveAt` field identifies when alert becomes active for rules
with `for > 0`. Previously, this value was stored in field `Start`.

The field `Start` now identifies the moment alert became `FIRING`.

The split is needed in order to distinguish these two moments
in the API responses for alerts.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: support specific moment of time for rules evaluation

The Querier interface was extended to accept a new argument
used as a timestamp at which evaluation should be made.

It is needed to align rules execution time within the group.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: mark disappeared series as stale

Series generated by alerting rules, which were sent to remote write
now will be marked as stale if they will disappear on the next
evaluation. This would make ALERTS and ALERTS_FOR_TIME series
more precise.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: evaluate rules at fixed timestamp

Before, time at which rules were evaluated was calculated
right before rule execution. The change makes sure
that timestamp is calculated only once per evalution round
and all rules are using the same timestamp.

It also updates the logic of resending of already resolved
alert notification.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: allow overridin `alertname` label value if it is present in response

Previously, `alertname` was always equal to the Alerting Rule name. Now,
its value can be overriden if series in response containt the different value
for this label.

The change is needed for improving compatibility with Prometheus.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: align rules evaluation in time

Now, evaluation timestamp for rules evaluates as if
there was no delay in rules evaluation. It means, that
rules will be evaluated at fixed timestamps+group_interval.
This way provides more consistent evaluation results and
improves compatibility with Prometheus,

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: add metric for missed iterations

New metric `vmalert_iteration_missed_total` will show
whether rules evaluation round was missed.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: reduce delay before the initial rule evaluation in group

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: rollback alertname override

According to the spec:
```
The alert name from the alerting rule (HighRequestLatency from the example above) MUST be added to the labels of the alert with the label name as alertname. It MUST override any existing alertname label.
```

https://github.com/prometheus/compliance/blob/main/alert_generator/specification.md#step-3
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: throw err immediately on dedup detection

```
The execution of an alerting rule MUST error out immediately and MUST NOT send any alerts
or add samples to samples receiver if there is more than one alert with the same labels
```

https://github.com/prometheus/compliance/blob/main/alert_generator/specification.md#step-4
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: cleanup

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: use strings builder to reduce allocs

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-04-01 12:03:41 +03:00
Roman Khavronenko
d907e0d9f0
docs: fix typo in vmalert's API (#2380)
The API handler was changed in 1.75 but docs
still contain the old address.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2366
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-04-01 12:03:41 +03:00
Aliaksandr Valialkin
ba7cfd7b25
app: sync Markdown changes from a8de1ab000 2022-03-22 14:12:03 +02:00
Aliaksandr Valialkin
3b7aefade2
docs/vmalert.md: sync after 11ae1ae924 2022-03-17 20:18:07 +02:00
Dmytro Kozlov
2f350c200a
Added resendDelay for alerts (#2296)
* vmalert: add support of `resendDelay` flag for alerts

Co-authored-by: dmitryk-dk <dmitry.kozlov@brightlocal.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2022-03-17 20:09:18 +02:00
Roman Khavronenko
35bf5bf688
Vmalert compliance improvements (#2320)
* vmalert: add support for `sortByLabel` template function

* vmalert: update API according to Prometheus conformance program

The changes to the API, field names and URL path has been made
according to the Prometheus specification for `alert_generator`
https://github.com/prometheus/compliance/blob/main/alert_generator/specification.md

* vmalert: fix the timestamp of the evaluated rules

The timestamp used for alert's `EndsAt` was calculated
before sending the notification. While the correct way
is to use the timestamp taken right before rules evaluation.

* vmalert: add `-datasource.queryTimeAlignment` flag

The flag is supposed to provide ability to disable `time`
param alignment when executing rules. By default, this flag
is enabled, so it remains backward compatible.

The flag was introduced to achieve better compatibility
with Prometheus behaviour according to https://github.com/prometheus/compliance/blob/main/alert_generator/specification.md

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-03-16 13:22:26 +02:00
Roman Khavronenko
ec48d022df
docs: fix broken links (#2303)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-03-16 13:01:46 +02:00
Dmytro Kozlov
7f0212f964
Issue-1824: added flags and different auth types support (#2287)
* vmalert/notifier: added flags and different auth types support

Co-authored-by: hagen1778 <roman@victoriametrics.com>
2022-03-16 12:52:01 +02:00
Bastien Dronneau
da722e462e
docs(vmalert): typo in path (#2278) 2022-03-16 12:24:12 +02:00
Denys Holius
fc6f71fba8
Added minimal supported version of AlertManager (#2237)
* added minimal supported version of supported AlertManager

* docs: `make docs-sync`

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-22 20:09:26 +02:00
Roman Khavronenko
5a4b16794d
Consul SD - update services on the watcher's start (#2202)
* lib/discovery/consul: update services on the watcher's start

Previously, watcher's start was only initing goroutines for discovery
but not waiting for the first iteration to end. It means first Consul
discovery wasn't returning discovered targets until the next iteration.

The change makes the watcher's start blocking until we get first discovery
iteration done and all registries updated.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: remove workarounds for consul SD

Now when consul SD lib properly updates services
on the first start, we don't need workarounds in vmalert.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/discovery/consul: update after review

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-21 15:33:33 +02:00
hagen1778
11345c3fd1
vmalert: support $externalLabels and $externalURL in templates
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2193
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-02-15 21:13:32 +02:00
Aliaksandr Valialkin
7e8596edb8
docs: update -help output for VictoriaMetrics components 2022-02-15 21:02:36 +02:00
Nikolay
48a9e068be
adds release build for macos darwin amd64 and arm64 (#2185)
* adds release build for macos darwin amd64 and arm64

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1896
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1851

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-14 17:42:33 +02:00
Roman Khavronenko
791cad8c2e
lib/promscrape: support prometheus-like duration in scrape configs (#2169)
* lib/promscrape: support prometheus-like duration in scrape configs

The change allows to specify duration values like `1d`, `1w`
for fields `scrape_interval`, `scrape_timeout`, etc.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/817#issuecomment-1033384766
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/blockcache: make linter happy

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/promscrape: support prometheus-like duration in scrape configs

* add support for extra fields `scrape_align_interval` and `scrape_offset`;
* support Prometheus duration parsing for `__scrape_interval__`
and `__scrape_duration__` labels;

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

* wip

* docs/CHANGELOG.md: document the feature

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-11 16:17:51 +02:00
hagen1778
1662b5bcec
vmalert: fix bug with relative links in UI
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2167
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-02-10 12:21:05 +02:00
Aliaksandr Valialkin
eed66b6640
lib/promscrape: set -promscrape.config.strictParse to true by default
This allows detecting long-living silent errors in -promscrape.config
2022-02-08 15:42:33 +02:00
Aliaksandr Valialkin
5da7f08be6
docs: cross-link downsampling docs from deduplication and vmalert docs 2022-02-07 14:53:50 +02:00
Aliaksandr Valialkin
dbead4813e
docs: updates after 5da71eb685
* Mention about the ability to configure vmalert notifiers via files in docs/CHANGELOG.md
* Mention about the ability to use Consul service discovery for vmalert notifiers in docs/CHANGELOG.md
* Run `make docs-sync` in order to sync app/vmalert/README.md to docs/vmalert.md
2022-02-02 23:42:25 +02:00
hagen1778
a67e843257
vmalert: add support of -notifier.basicAuth.passwordFile flag for notifiers
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1567

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-02-02 23:42:25 +02:00
hagen1778
605de7bdd6
vmalert: remove trailing slash for static notifier addresses
This would make addresses `http://localhost:9093` and `http://localhost:9093/`
both to result into `http://localhost:9093/api/v2/alerts`.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-02-02 23:42:25 +02:00
Roman Khavronenko
2a3a62dc41
vmalert: support configuration file for notifiers (#2127)
vmalert: support configuration file for notifiers

* vmalert notifiers now can be configured via file
see https://docs.victoriametrics.com/vmalert.html#notifier-configuration-file
* add support of Consul service discovery for notifiers config
see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1947
* add UI section for currently loaded/discovered notifiers
* deprecate `-rule.configCheckInterval` in favour of `-configCheckInterval`
* add ability to suppress logs for duplicated targets for notifiers discovery
* change behaviour of `vmalert_alerts_send_errors_total` - it now accounts
for failed alerts, not HTTP calls.
2022-02-02 23:42:25 +02:00
Aliaksandr Valialkin
755779adb7
app/vmalert: add parseDuration function in the same way as Prometheus does
See https://github.com/prometheus/prometheus/pull/8817
2022-01-13 23:30:56 +02:00
Aliaksandr Valialkin
0f81ecd7df
app/vmalert: add stripPort template function in the same way as Prometheus does
See https://github.com/prometheus/prometheus/pull/10002
2022-01-13 22:54:46 +02:00
Andrey Afoninsky
2e8c23e0f6
chore: add vmalert_remotewrite_total metric (#2040)
Co-authored-by: Andrey Afoninsky <andrey.afoninsky@booking.com>
2022-01-11 08:56:02 +02:00
Denys Holius
3a522c591a
Old links replaced for newest (#2033)
* replaced old links to the website

* fixed deletion main README.md file

* fix: added docs files after docs-sync
2022-01-05 16:32:54 +02:00
Roman Khavronenko
e92c987dc3
vmalert: check if remoteWrite is configured for replay mode (#1990)
* vmalert: check if remoteWrite is configured for replay mode

The purpose of `replay` mode is to backfill results of recording
or alerting rules. So `remoteWrite.url` should be required.
Otherwise, process can fail on attempt to send data.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Update app/vmalert/main.go

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2021-12-21 20:31:10 +02:00
Roman Khavronenko
1f0301c809
vmalert: always convert step value to seconds for better compatibility (#1955)
When using `vmalert` with older Prometheus versions, the passed
`step=2m` may be parsed by Prometheus with an err: "cannot parse \"2m0s\" to a valid duration".
In order to improve compatibility vmalert will always convert step duration to seconds.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1943
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-12-17 20:17:09 +02:00
Roman Khavronenko
e7d7f6c94e
vmalert: update the order of service labels attaching (#1922)
Service labels like `alertname` or `alertgroup` were attached
after template expanding for `labels` section. Because of this,
labels `alertname` or `alertgroup` weren't available for templating
in `labels` section of alert's definition.
This commit changes the order of labels attaching and adds a test
for verifying these labels availability.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1921
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-12-10 12:14:16 +02:00
Aliaksandr Valialkin
7d8c481960
app/vmalert/config: sort extra_filter labels before passing them to query args in order to get consistent order of query args across runs
This fixes TestGroupParams test - see https://github.com/VictoriaMetrics/VictoriaMetrics/runs/4432510244?check_suite_focus=true#step:5:288
2021-12-08 13:04:08 +02:00
Roman Khavronenko
582c063698
vmalert: introduce additional HTTP URL params per-group configuration (#1892)
* vmalert: introduce additional HTTP URL params per-group configuration

The new group field `params` allows to configure custom HTTP URL params
per each group. These params will be applied to every request before
executing rule's expression. Hot config reload is also supported.

Field `extra_filter_labels` was deprecated in favour of `params` field.
vmalert will print deprecation log message if config file contains
the deprecated field.

`params` fields are supported by both Prometheus and Graphite datasource types.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: provide more examples for `params` field

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: set higher priority for `params` setting

If there would be a conflict between URL params set in `datasource.url` flag
and params in group definition the latter will have higher priority.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-12-02 14:51:54 +02:00
Roman Khavronenko
8b67168609
ci: bump go version to 1.17 (#1895)
The bump was required for `vmalert` package.
`vmalert` docs now also contain an updated description.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-12-02 14:51:45 +02:00
Roman Khavronenko
fc4e59ec1d
vmalert: adjust topologies docs in README (#1893)
Commit changes images width and order in topologies section
for better readability.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-12-02 10:33:47 +02:00
Roman Khavronenko
6823818843
Vmalert docs upd (#1890)
* vmalert: add topology examples in docs

* vmalert: docs typo fix
2021-12-02 10:33:47 +02:00
Roman Khavronenko
83c46eaf1a
vmalert: continue to print errors for bad config during hot reload (#1871)
Previously, vmalert would print an err message and set vmalert_config_last_reload_successful=0
only once during a hot reload of a bad config. Such behaviour may result into non noticed
event of a bad config reload attempt

Now, it continues to print error messages and keep vmalert_config_last_reload_successful state
until successful attempt will be made or config state will be rolled back to prev state.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-11-30 01:24:20 +02:00
Roman Khavronenko
48fb972ddc
vmalert: make notifier.Addr optional (#1870)
For a long time notifier.Addr flag was required. The assumption was that vmalert will
be always used for alerting. However, practice shows that some users need only
recording rules. In this case, requirement of notifier.Addr is ambigious.

The change verifies if loaded config contains recording or alerting rules and
if there are corresponding flags set. This is true for initial config load
and hot reload.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-11-30 01:23:00 +02:00
Aliaksandr Valialkin
7b88daf2ec
docs/CHANGELOG.md: document 852a895b70 2021-11-30 01:22:59 +02:00
Aliaksandr Valialkin
aa01bb9349
app/vmalert/README.md: sync with docs/vmalert.md
This is a follow-up after d8c70903ec
2021-11-17 00:56:53 +02:00
Aliaksandr Valialkin
4fb19fe34b
all: consistently return application/json content-type without charset=utf-8
The `application/json` content-type has utf-8 encoding by default.
See https://stackoverflow.com/questions/9254891/what-does-content-type-application-json-charset-utf-8-really-mean

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/897
2021-11-09 18:07:22 +02:00
Aliaksandr Valialkin
bc357d7132
docs/vmalert.md: improve wording in Multitenancy chapter 2021-11-09 14:20:30 +02:00
Aliaksandr Valialkin
57aaf914ac
docs: mention that graphs on the official dashboards contain useful hints 2021-11-08 19:54:25 +02:00
Aliaksandr Valialkin
e58630401e
app/{vmalert,vmbackup}/README.md: sync with docs after the commit 47d1612bf8 2021-11-05 20:46:07 +02:00
Aliaksandr Valialkin
9aa40fc0c3
docs/vmalert.md: document the addition of -defaultTenant.prometheus and -defaultTenant.graphite command-line options to enterprise version of vmalert 2021-11-05 20:04:53 +02:00
Aliaksandr Valialkin
56a25146cb
app/vmalert/datasource: use plain string literals instead of constants
This removes the unneeded level of indirection and improves code readability.

The "prometheus" and "graphite" constants aren't going to change in the future, so there is no sense in hiding them behind constants.
2021-11-05 19:58:15 +02:00
Aliaksandr Valialkin
31ac507bee
app/vmalert: remove rule.type config, since it doesnt play well with the upcoming default tenants for -clusterMode
It is better from the consistency point of view to set up rule types at group level where tenant config is set up.
2021-11-05 19:52:41 +02:00
Aliaksandr Valialkin
847004fa77
app/{vminsert,vmagent}: hide passwords and auth tokens by default at /config page
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1764
2021-11-05 14:42:13 +02:00
Aliaksandr Valialkin
2ebee4e741
app/{vmalert,vmagent}: improve the distribution of scrape offsets among targets / rules
Previously only the lower part of 64-bit hash was used for calculating the offset.
This may give uneven distribution in some cases. So let's use all the available 64 bits from the hash
for calculating the offset.
2021-10-27 20:04:02 +03:00
Roman Khavronenko
5321127add
vmalert: allow groups with empty rules for compatibility reasons (#1742)
Prometheus allows to have groups with no rules, so we should support
it in vmalert as well for compatibility reasons.
It is also allowed to hot-reload empty groups by adding or removing rules.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-10-25 14:46:51 +03:00
Roman Khavronenko
e0f21d6000
vmalert: correctly calculate alert ID including extra labels (#1734)
Previously, ID for alert entity was generated without alertname or groupname.
This led to collision, when multiple alerting rules within the same group
producing same labelsets. E.g. expr: `sum(metric1) by (job) > 0` and
expr: `sum(metric2) by (job) > 0` could result into same labelset `job: "job"`.

The issue affects only UI and Web API parts of vmalert, because alert ID is used
only for displaying and finding active alerts. It does not affect state restore
procedure, since this label was added right before pushing to remote storage.

The change now adds all extra labels right after receiving response from the datasource.
And removes adding extra labels before pushing to remote storage.

Additionally, change introduces a new flag `Restored` which will be displayed in UI
for alerts which have been restored from remote storage on restart.
2021-10-22 12:31:35 +03:00
Aliaksandr Valialkin
5705f4b6d1
lib/httpserver: expose command-line flags at /flags page
This should simplify debugging.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1695
2021-10-20 00:46:54 +03:00
Roman Khavronenko
ca49853664
vmalert: make group.ID() thread-safe (#1726)
Commit fixes potential race condition when group update
and generating of ID() happens simultaneously.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-10-19 16:45:52 +03:00
Roman Khavronenko
627224d493
vmalert: properly init SIGHUP listener before starting group manager (#1725)
Regression was introduced during code refactoring. It potentially
could lead to situation when SIGHUP signals were ignored while
vmalert was still busy with initing group manager.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-10-19 16:36:08 +03:00
Roman Khavronenko
44a88b7458
vmalert: remove extra / from path in WEB interface (#1717)
The extra `/` may cause issues when additional path prefixes
are configured. Also, removing it makes it consistent
with the rest of declarations.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-10-18 15:19:01 +03:00
Aliaksandr Valialkin
9273b83cd0
app/vmalert: follow-up after 0e2486df56 2021-10-18 15:08:42 +03:00
Alexander Rickardsson
0e1dbcd039
vmalert: add disablePathAppend to remote read (#1712)
* vmalert: add disablePathAppend to remoteRead

* docs: add docs for remoteRead.disablePathAppend
2021-10-18 14:59:17 +03:00
Alexander Rickardsson
63571e1334
vmalert: Redact passwords from error messages (#1713) 2021-10-18 14:59:17 +03:00
Roman Khavronenko
f393145843
Adjust http.Transport.MaxIdleConns setting for vmauth/vmalert services (#1704)
* vmalert: adjust `http.Transport.MaxIdleConns` value accordingly to `http.Transport.MaxIdleConnsPerHost`

`http.Transport.MaxIdleConnsPerHost` setting is controlled by `datasource.maxIdleConnections` flag,
while `http.Transport.MaxIdleConns` is inherited from DefaultTransport and is equal to `100`.
The fix adjusts `http.Transport.MaxIdleConns` value if it is lower than `http.Transport.MaxIdleConnsPerHost`.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmauth: adjust `http.Transport.MaxIdleConns` value accordingly to `http.Transport.MaxIdleConnsPerHost`

`http.Transport.MaxIdleConnsPerHost` setting is controlled by `maxIdleConnsPerBackend` flag,
while `http.Transport.MaxIdleConns` is inherited from DefaultTransport and is equal to `100`.
The fix adjusts `http.Transport.MaxIdleConns` value if it is lower than `http.Transport.MaxIdleConnsPerHost`.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-10-13 19:23:52 +03:00
Roman Khavronenko
14995eece6
vmalert: add Source link to alerts UI (#1701)
The source link is controlled by `external.url` and `external.alert.source`
flags, in the same way as for alertmanager notifications.
The source link is added to Alerts list view, and specific Alert view.
2021-10-13 15:26:57 +03:00
Roman Khavronenko
1bdccfc2bf
app/vmalert: remove unnecessary omitempty tag for interval param (#1649)
`omitempty` tag resulted into skipping this param on marshaling,
which was used as a checksum for groups configuration. Since on
config reload checksums are compared before applying changes,
any change to `interval` only didn't trigger config reload.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1641
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-09-23 20:47:56 +03:00
Roman Khavronenko
aa052097cf
app/vmalert: support http.pathPrefix flag in UI (#1636)
The change makes UI to respect `http.pathPrefix` flag
for API or navigation items links.
2021-09-21 18:15:08 +03:00
Roman Khavronenko
a07fa92ef2 vmalert: add new metric vmalert_remotewrite_flush_duration_seconds (#1622) 2021-09-16 14:01:24 +03:00
Roman Khavronenko
286b79dc48 vmalert: create basic auth config only if args aren't empty (#1618)
* vmalert: create basic auth config only if args aren't empty

follow-up after 68721f6

* vmalert: make lint happy
2021-09-15 09:35:38 +03:00
Aliaksandr Valialkin
63fb5a5ca5 docs/vmalert.md: follow-up after 68721f6e7d
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1608
2021-09-14 14:48:45 +03:00
Roman Khavronenko
eea6317d40 vmalert: support bearer token for datasource, remotewrite and remoteread (#1614)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1608
2021-09-14 14:48:43 +03:00
Aliaksandr Valialkin
2932f495f3 docs/CHANGELOG.md: document 5494bc02a6 2021-09-13 17:14:45 +03:00
Roman Khavronenko
7df731ea91 vmalert: add flag to limit the max value for auto-resovle duration for alerts (#1609)
* vmalert: add flag to limit the max value for auto-resovle duration for alerts

The new flag `rule.maxResolveDuration` suppose to limit max value for
alert.End param, which is used by notifiers like Alertmanager for alerts auto resolve.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1586
2021-09-13 17:14:45 +03:00
Roman Khavronenko
9dd121db36 vmalert: display extra filter labels in UI (#1613) 2021-09-13 14:17:51 +03:00
Aliaksandr Valialkin
e1632dc0df docs/vmalert.md: typo fix in Multitenancy chapter 2021-09-10 17:57:43 +03:00
Aliaksandr Valialkin
d72593cca2 app/vmalert: document GroupAlerts
This makes golint happy
2021-09-07 22:50:48 +03:00
Aliaksandr Valialkin
608ed19655 app/vmalert: follow-up after 21f022e5f0 2021-09-07 22:45:47 +03:00
Roman Khavronenko
bb26aa9b1f vmalert: add initial UI implementation (#1602)
New UI pages:
/ - welcome page with API handlers list;
/groups - list of all rules per group;
/alerts - list of all active alerts;
/groupID/alertID/status - status of the active alert;
2021-09-07 22:45:45 +03:00
Roman Khavronenko
1cf4f5a715 Vmalert extra params (#1587)
* vmalert: allow extra GET params in datasource package

ExtraParams will be added as GET params to every HTTP request made by datasource.
The `roundDigits` param, for example, was substituted by corresponding extra param.

* vmalert: add nocache=1 param for replay process

The `nocache=1` param is VictoriaMetrics specific parameter which prevents it
from caching and boundaries aligning for queries. We set it to avoid cache
pollution in `replay` mode and also to avoid unnecessary time range boundaries
alignment.

* vmalert: mention nocache=1 in replay description

* vmalert: fix bug with unused param
2021-09-01 12:20:01 +03:00
Nikolay
210c76b51e adds external_labels per group for vmalert (#1485)
* adds external_label per group for vmalert
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1471
2021-09-01 12:19:49 +03:00
Roman Khavronenko
1cb7037fc8 Vmalert metrics update (#1580)
* vmalert: remove `vmalert_execution_duration_seconds` metric

The summary for `vmalert_execution_duration_seconds` metric gives no additional
value comparing to `vmalert_iteration_duration_seconds` metric.

* vmalert: update config reload success metric properly

Previously, if there was unsuccessfull attempt to reload config and then
rollback to previous version - the metric remained set to 0.

* vmalert: add Grafana dashboard to overview application metrics

* docker: include vmalert target into list for scraping

* vmalert: extend notifier metrics with addr label

The change adds an `addr` label to metrics for alerts_sent and alerts_send_errors
to identify which exact address is having issues.
The according change was made to vmalert dashboard.

* vmalert: update documentation and docker environment for vmalert's dashboard

Mention Grafana's dashboard in vmalert's README in a new section #Monitoring.

Update docker-compose env to automatically add vmalert's dashboard.
Update docker-compose README with additional info about services.
2021-09-01 12:19:34 +03:00
Aliaksandr Valialkin
ff4c7c1a3d docs/vmalert.md: run make docs-sync after 9ee3d0378f 2021-08-21 20:25:26 +03:00
Roman Khavronenko
0c2284b95f vmalert: add flag disableAlertgroupLabel for disabling extra label added to series (#1534)
The new label added in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/611
may negatively impact deduplication in Alertmanager. The new flag supposed to give
an option to disable adding this label.

To enable flag just add `-disableAlertgroupLabel` to binary execution command.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1532
2021-08-21 20:23:22 +03:00
Alexander Rickardsson
9e2e9d83a5 vmalert: accept http.StatusOK for remotewrite (#1550) 2021-08-21 20:23:22 +03:00
Aliaksandr Valialkin
fe8c462044 app/vmalert: mention -remoteWrite.disablePathAppend in the description for -remoteWrite.url 2021-08-16 15:23:39 +03:00
Aliaksandr Valialkin
21974cb571 app/vmalert: follow-up for 2400f85761 2021-08-16 15:20:35 +03:00
Alexander Rickardsson
d27dc3721b vmalert: enable configuring explicit path (#1536)
* vmalert: allow to disable automatically added path to remote write address via disablePathAppend flag
* docs: update docs to include remoteWrite.disablePathAppend
2021-08-16 14:58:05 +03:00
Aliaksandr Valialkin
90efb5831b lib/envflag: add a link to docs for -envflag.enable 2021-08-11 10:32:40 +03:00
Roman Khavronenko
d5ba8248cc vmalert: expose new metrics for tracking number of produced samples during last evaluation (#1518)
* vmalert: expose new metrics for tracking number of produced samples during last evaluation

Two new metrics were added to track the number of samples produced during the last evaluation:
* vmalert_recording_rules_last_evaluation_samples
* vmalert_alerting_rules_last_evaluation_samples

The gauge type is used to remain consistent with Prometheus metric
`prometheus_rule_group_last_evaluation_samples` which is on the group level.
However, the counter type was considered as well.

Two metrics instead of one are used to make it easier to separate recording and
alerting rules. It is likely, number of samples produced by recording rules is
more important so people will refer to it more frequently.

The expected usage of the new metric is the following:
```
   - alert: RecordingRuleReturnsEmptyResults
        expr: sum(vmalert_recording_rules_last_evaluation_samples) by(recording) < 1
        annotations:
          summary: Recording rule {{$labels.recording}} returns empty results.
            Please verify expression correctness.
```

Addresses https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1494

* vmalert: rename `vmalert_alerts_error` to `vmalert_alerting_rules_error` to remain consistent with recording rules metrics
2021-08-05 10:02:35 +03:00
Qifei Wan
095bb90879 app/vmalert: update config state metrics if config parsed failed (#1507) 2021-08-03 16:12:48 +03:00
assassins
6ab0001a1f Performance optimization (#1481)
There are redundant steps
2021-07-28 19:29:22 +03:00
Aliaksandr Valialkin
d98e22fe50 app/vmalert: accept Prometheus-like durations in interval config option inside group section 2021-07-12 12:36:22 +03:00
Aliaksandr Valialkin
acb7a95c64 app/vmselect: follow-up after aa11ef6d3b
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1413
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4
2021-07-07 17:45:09 +03:00
Aliaksandr Valialkin
ceda2b1df4 lib/httpserver: print full requestURI in httpserver.Errorf
This should simplify debugging.
2021-07-07 13:11:29 +03:00
Roman Khavronenko
c5f493db8e Vmalert docs (#1372)
* vmalert: mention what happens if `for` is set to 0 or omitted

* vmalert: add more context to docs
2021-06-14 11:43:01 +03:00
Roman Khavronenko
f3cb2158a3 vmalert: fix mistake with object reuse while parsing response (#1370)
* vmalert: fix mistake with object reuse while parsing response

During the refactoring, the wrong optimisations was applied in
parse function which caused metric fields reset. The change removes
optimisation.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1369

* vmalert: add test to cover multiple metrics in one response
2021-06-11 11:30:07 +03:00
Aliaksandr Valialkin
f3749dedba docs: document rules replay feature for vmalert
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/836

This is a follow-up for 2a259ef5e7
2021-06-09 12:30:54 +03:00
Roman Khavronenko
5aa7846900 vmalert: support rules backfilling (aka replay) (#1358)
* vmalert: support rules backfilling (aka `replay`)

vmalert can `replay` configured rules in the past
and backfill results via remote write protocol.
It supports MetricsQL/PromQL storage as data source,
and can backfill data to remote write compatible
storage.

Supports recording and alerting rules `replay`. See more
details in README.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/836

* vmalert: review fixes

* vmalert: readme fixes
2021-06-09 12:30:54 +03:00
Roman Khavronenko
e183a5c532 vmalert: automatically reload configuration on file change (#1326)
New flag `-rule.configCheckInterval` defines how often `vmalert` will re-read
config file. If it detects any changes, the config will be reloaded.
This behaviour is turned off by default.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/512
2021-05-26 12:24:27 +03:00
Roman Khavronenko
beee24ecee vmalert: support extra_filter_labels setting per-group (#1319)
The new setting `extra_filter_labels` may be assigned to group.
If it is, then all rules within a group will automatically filter
for configured labels. The feature is well-described here
https://docs.victoriametrics.com#prometheus-querying-api-enhancements

New setting is compatible only with VM datasource.
2021-05-23 14:15:49 +03:00
Nikolay
23a6c9c016 changes vmalert query function (#1307)
* changes vmalert query function
for prometheus rules compatibility its better to use labels as map.
it simplifies template evaluation and allow to ignore can't evaluate field error
because map will return default value.
fixes https://github.com/VictoriaMetrics/operator/issues/243
2021-05-21 16:38:20 +03:00
Aliaksandr Valialkin
d77db9d813 all: do not skip SIGHUP signal during service initialization
This can lead to stale or incomplete configs like in the https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-21 16:38:20 +03:00
Aliaksandr Valialkin
69e365cd48 Makefile: update golangci-lint from v1.29.0 to v1.40.1 2021-05-20 18:30:24 +03:00
Aliaksandr Valialkin
74ef40034c lib/httpserver: typo fix in -http.shutdownDelay command-line flag description: servier -> server 2021-05-18 16:25:27 +03:00
Aliaksandr Valialkin
1668280e67 docs/vmalert.md: document multitenant support
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/740
2021-05-18 16:25:21 +03:00
Aliaksandr Valialkin
6ea191d196 docs: dealay -> delay 2021-05-18 01:07:32 +03:00
Roman Khavronenko
3428df6f15 vmalert: use stringified label keys for duplicates map in recroding rules (#1301)
duplicates map helps to determine wheter extra labels has overriden
labels which make time series unique. It was using a sorted hashed
labels sequence as a key. But hashing algorithm could have collisions,
so it is more convenient to not use hashing at all.

Log message for recording rules duplicates was improved as well.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1293
2021-05-17 01:51:48 +03:00
Aliaksandr Valialkin
a6cb4f10a7 app/{vmalert,vmauth}: explicitly set MaxIdleConnsPerHost in net/http.Client.Transport
By default MaxIdleConnsPerHost is set to 2. This limits the possibility to re-use http keep-alive connections.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1300
2021-05-14 18:13:34 +03:00
Aliaksandr Valialkin
cca9670573 docs/CHANGELOG.md: document -datasource.roundDigits added at 5c448126dc 2021-05-10 11:18:58 +03:00
Roman Khavronenko
a7f00101f5 vmalert: add support for round_digits param in datasource package (#1278)
Starting from v1.56.0 VM supports `round_digits` which allows to limit
the number of digits after the decimal point in response value. The feature
can be used to reduce entropy of produced by recording rules values
and significantly improve the compression. See more details in link below.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/525
2021-05-10 11:18:56 +03:00
Roman Khavronenko
35237fe1f5 vmalert: fix error when rule didn't start if restore failed (#1279)
Previously, `startGroup` could exit on restore errors despite the
`remoteRead.ignoreRestoreErrors` flag value. Now vmalert checks the
flag value before deciding whether to return error or just log it.
2021-05-10 11:10:32 +03:00
Aliaksandr Valialkin
07bc021f58 app/vmalert: add missing comment for ErrStateRestore 2021-05-08 19:53:45 +03:00
Roman Khavronenko
bb7e113dd4 vmalert: add flag to control behaviour on startup for state restore errors (#1265)
Alerting rules now can return specific error type ErrStateRestore to indicate
whether restore state procedure failed. Such errors were returned and logged
before as well. But now user can specify whether to just log these errors
(remoteRead.ignoreRestoreErrors=true) or to stop the process
(remoteRead.ignoreRestoreErrors=false). The latter is important when VM isn't
ready yet to serve queries from vmalert and it needs to wait.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1252
2021-05-05 12:24:32 +03:00
Aliaksandr Valialkin
0a2e746175 docs/vmalert.md: update docs after afca7b430c 2021-04-30 11:49:40 +03:00
Roman Khavronenko
7394967841 vmalert: fix the typo in ApplyParams func (#1259) 2021-04-30 11:47:11 +03:00
Roman Khavronenko
6fbedd62b8 vmalert: use rule's evaluationInterval as step param by default (#1258)
User still can override param by specifying `datasource.queryStep` flag.
2021-04-30 10:03:50 +03:00
Aliaksandr Valialkin
daf2778025 docs/CHANGELOG.md: document the change from f3a048288e
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1232
2021-04-30 09:56:47 +03:00
Roman Khavronenko
b55677e93d Vmalert: adjust time param for datasource queries according to evaluationInterval (#1257)
* Simplify arguments list for fn `queryDataSource` to improve readbility

* vmalert: adjust `time` param according to rule evaluation interval

With this change, vmalert will start to use rule's evaluation interval
for truncating the `time` param. This is mostly needed to produce consistent
time series with timestamps unaffected by vmalert start time. Now, timestamp
becomes predictable.
Additionally, adjustment is similar to what Grafana does for plotting range graphs.
Hence, recording rule series and recording rule expression plotted in grafana
suppose to become similar in most of cases.
2021-04-30 09:56:46 +03:00
Aliaksandr Valialkin
8be1cb297b app/vmagent: list user-visible endpoints at http://vmagent:8429/
While at it, use common WriteAPIHelp function for the listing in vmagent, vmalert and victoria-metrics
2021-04-30 09:38:23 +03:00
Nikolay
2eb8ef7b2b changes vmalert Querier with per rule querier (#1249)
* changes vmalert Querier with per rule querier
it allows to changes some parametrs based on rule setting
for instance - alert type, tenant for cluster version or event endpoint url.
2021-04-29 11:31:07 +03:00
Roman Khavronenko
0ceb4f7565 vmalert: keep the returned timestamp when persisting recording rule (#1245)
Previously, vmalert used `lastExecTime` timestamp when writing recording rules
to the remote storage. This may be incorrect, if vmalert uses `datasource.lookback` flag,
which means rule's expression will be executed at some moment in the past.
To avoid such situations, vmalert now will use returned timestamp instead of `lastExecTime`.
2021-04-27 00:16:45 +03:00
Aliaksandr Valialkin
6dc5d3b357 all: rename https://victoriametrics.github.io to https://docs.victoriametrics.com 2021-04-20 20:20:01 +03:00
Aliaksandr Valialkin
64f1ddefe5 all: consistency renaming Victoria Metrics -> VictoriaMetrics
VMInsert -> vminsert
VMSelect -> vmselect
VMStorage -> vmstorage
2021-04-20 11:45:02 +03:00
Aliaksandr Valialkin
c872ba45b9 docs: update -help output after the commit 77be3e3a82 2021-04-12 12:35:39 +03:00
Artem Navoiev
c3dcfdef8c improve docs for cli flags (#1202)
* improve docs for cli flags

* improve docs for cli flags.2
2021-04-12 12:28:36 +03:00
Roman Khavronenko
712725b4a5 vmalert: document template functions and mention them in README (#1197) 2021-04-08 18:20:57 +03:00
Roman Khavronenko
ff3711eea2 docs: update docs ordering and formatting (#1192)
The major change is adding `sort` directive to docs. For those docs which are copied
from internal packages `sort` is added via makefile command. For the rest it is added
manually since they're updated manually as well.

The rest of changes is connected with markdown formatting. For example, changing headers
in some files (`##` => `#`) makes navigation on .github.io to look better. This especially
useful for `changelog` docs.

Table of contents for `vmctl` is dropped, since we already have it autogenerated on .github.io.

No link changes expected. The corresponding PR to `cluster` branch will be made in follow-up PR.
2021-04-07 13:43:01 +03:00
Aliaksandr Valialkin
4028d692f5 app: do not process non-GET requests on at / handler 2021-04-02 22:56:38 +03:00
Aliaksandr Valialkin
0a8f0a4e2f all: increase minimum supported Go version for building VictoriaMetrics components from v1.14 to v1.15
This is needed after the commit c0ac740f93, which uses URL.Redacted() method,
which has been added in v1.15.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1147
2021-03-29 23:06:36 +03:00
Aliaksandr Valialkin
82047be90b docs: add a link to the repository from build instruction for all the VictoriaMetrics components 2021-03-25 17:16:55 +02:00
Aliaksandr Valialkin
450e23533d docs/vmalert.md: remove misleading -evaluationInterval=3s from example config args
3s evaluation interval is too small for practical setups. It can result in increased load on datasource.
So it is better to remove it from example config args, which are usually copy-pasted by novice users.
2021-03-25 15:31:10 +02:00
Aliaksandr Valialkin
3caac5edd4 Makefile: prepare vmutils-windows-*.zip archive on make release-vmutils command
The archive contains the following executables for Windows:

* vmagent
* vmalert
* vmauth
* vmctl

Other components - vmbackup, vmrestore, victoria-metrics - aren't supported for Windows yet
2021-03-16 20:54:10 +02:00
Aliaksandr Valialkin
e2717d84c0 all: various fixes in command-line flag descriptions 2021-03-15 22:03:49 +02:00
Ihor Borodin
933de6b9b1 Fixing examples of external.alert.source in documentation (#1120)
* Fixing examples of external.alert.source in documentation
2021-03-10 12:08:22 +02:00
Aliaksandr Valialkin
d109e17f46 all: bump minimum supported Go version from 1.13 to 1.14 2021-03-03 15:58:17 +02:00
Aliaksandr Valialkin
2ecee0515a app/vmalert/README.md: sync with docs/vmalert.md 2021-03-03 10:42:54 +02:00
Aliaksandr Valialkin
d9e8af0e8f docs: actualize -help output 2021-03-01 17:02:05 +02:00
Nikolay
b52d1e4f19 adds query params for vmalert (#1094)
remoteWrite.url now accepts query params at provided url
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1087
2021-02-28 14:12:34 +02:00
Aliaksandr Valialkin
901710b9e2 app/vmalert: add missing multiarch Dockerfile 2021-02-18 15:23:57 +02:00
Roman Khavronenko
2aa37b0450 vmalert: mention -datasource.appendTypePrefix in README (#1052) 2021-02-03 23:48:45 +02:00
Dmitry Shevchuk
007fd6ce9c Adds ability to query right vmselect endpoint based on the query type (#1050)
* Adds ability to query right vmselect endpoint based on the query type

Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>
2021-02-03 23:48:44 +02:00
Aliaksandr Valialkin
88ee836d0c docs/vmalert.md: mention that type option can be set at group level additionally to rule level 2021-02-03 21:12:39 +02:00
Aliaksandr Valialkin
6811445b64 docs: document ability to query Graphite datasource from vmalert 2021-02-01 15:28:31 +02:00
Nikolay
b8bc1c2e0f Graphite vmalert wip (#112)
* init implementation for graphite alerts

* adds graphite support for vmalert

* small fix

* changes vmalert graphite api with type

* updates tests

* small fix

* fixes graphite parse

* Fixes graphite from time
2021-02-01 15:28:30 +02:00
weng zhao
2a8a34ea05 vmalert: add option datasource.queryStep to allow user to address the inconsistency between grafana dashboards(query_range with step 15s usually) and ALERTS (#1027)
Co-authored-by: zhao.weng <zhao.weng@shopee.com>
2021-01-26 16:38:20 +02:00
Roman Khavronenko
304512b668 vmalert-989: return non-empty result in template func query stub to pass validation (#1002)
On templates validation stage vmalert does not acutally send queries, so for complex
chained expression validation may fail. To avoid this, we add a blank sample in response
so validation can pass successfully. Later, during the rule execution, stub will be replaced
with real `query` function.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/989
2021-01-11 12:59:33 +02:00
Nikolay
14915071d6 adds escape for CRLF (#984)
at external.alert.source - \n and \r symbols was url encoded, instead of direct usage.
replace it from "\n" to `\n`  allows to skip url encoding.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/890
2020-12-25 11:06:47 +02:00
Aliaksandr Valialkin
b480585905 app/vmalert: typo fix in descriptions for notifier.basicAuth.username and notifier.basicAuth.password command-line flags 2020-12-24 12:49:40 +02:00
Aliaksandr Valialkin
d8511b6651 docs: mention that it is possible to set multiple -notifier.tlsInsecureSkipVerify command-line flags for vmalert
See c3a92968343c2b3619f1ab935702d0e9b3a46733
2020-12-22 22:32:56 +02:00
Nikolay
67e470e598 changes vmalert notifier flag, (#978)
fixes issue with notifier insecure setting, now its possible to use multiple notifier.tlsInsecureSkipVerify multiple time.
2020-12-22 22:27:03 +02:00
Roman Khavronenko
9ce8b36d2a vmalert-974: fix order for labels templating (#975)
The change fixes bug caused by 3adf8c5a6f.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/974
2020-12-19 14:21:27 +02:00
Roman Khavronenko
9f578e389c vmalert: add function "query", "first" and "value" to alert templates functions (#960)
The commit adds a support for template function `query`,
`first` and `value`. The function `query` executes
a MetricsQL query for active alerts. In vmalert we
update templates on every evaluation for active alerts
to keep them up to date. With `query` func it may become
a perf issue since it will fire a query on every execution.
We should keep it in mind for now.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/539
2020-12-14 20:12:16 +02:00
Aliaksandr Valialkin
fc82c22e50 docs: consistently use links to https://victoriametrics.github.io for documentation references 2020-12-11 21:09:17 +02:00
Aliaksandr Valialkin
1a237c6903 all: properly handle CPU limits set on the host system/container
This can reduce memory usage on systems with enabled CPU limits.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/946
2020-12-08 21:07:03 +02:00
Aliaksandr Valialkin
7bdf07883b app/{vmalert,vmagent}: skip empty values in -remoteWrite.label and -label lists 2020-12-08 14:54:02 +02:00
Aliaksandr Valialkin
bdac2171f1 all: do not print usage info for all the flags when incorrect command-line flag is passed
This should improve usability for VictoriaMetrics apps that have big number of command-line flags,
i.e. all the apps.
2020-12-03 21:46:19 +02:00
Nikolay
e4e33cb757 fixes checksum calculation (#928)
* fixes checksum calculation,
'for' rule param wasnt marshal properly during checksum calculation

* fixes error
2020-11-29 09:50:57 +02:00
Aliaksandr Valialkin
7ceaf4ba8f all: consistently return text-based HTTP responses with charset=utf-8
This is a follow-up for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/897
2020-11-13 10:30:21 +02:00
Roman Khavronenko
4fd2b6cd16 vmalert: explicitly set extra labels to alert entities (#886)
The previous implementation treated extra labels (global and rule labels) as
separate label set to returned time series labels. Hence, time series always contained
only original labels and alert ID was generated from sorted labels key-values.
Extra labels didn't affect the generated ID and were applied on the following actions:
- templating for Summary and Annotations;
- persisting state via remote write;
- restoring state via remote read.

Such behaviour caused difficulties on restore procedure because extra labels had to be dropped
before checking the alert ID, but that not always worked. Consider the case when expression
returns the following time series `up{job="foo"}` and rule has extra label `job=bar`.
This would mean that restored alert ID will be always different to the real time series because
of collision.

To solve the situation extra labels are now always applied beforehand and `vmalert` doesn't
store original labels anymore. However, this could result into a new error situation.
Consider the case when expression returns two time series `up{job="foo"}` and `up{job="baz"}`,
while rule has extra label `job=bar`. In such case, applying extra labels will result into
two identical time series and `vmalert` will return error:
 `result contains metrics with the same labelset after applying rule labels`

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/870
2020-11-10 00:27:56 +02:00
Roman Khavronenko
333675875f vmalert: skip automatically added labels on alerts restore (#871)
Label `alertgroup` was introduced in #611 and automatically added to generated
time series. By mistake, this new label wasn't correctly purged on restore event
and affected alert's ID uniqueness. This commit removes `alertgroup` label
in restore function.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/870
2020-11-01 23:26:00 +02:00
kreedom
4526cf92d3 vmalert - add dryRun (#842)
vmalert: add `dryRun` flag for rules validation without running the service
2020-10-20 10:49:22 +03:00
Roman Khavronenko
d6155a3f33 vmalert: update docs to highlight the state restore requirements; (#833)
Address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/830
2020-10-13 18:34:00 +03:00
Aliaksandr Valialkin
4b1c401790 app/vmalert: accept days, weeks and years in for: part of config like Prometheus does
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/817
2020-10-08 20:13:20 +03:00
Aliaksandr Valialkin
f9f8e4a39c app/vmalert: do not pring description for all the flags on config errors
The description is too big to consume by human and it just distracts humans.
2020-10-08 13:35:46 +03:00
Dmitry Shihovtsev
aec863e70b Fix typos in the vmalert datasource (#814)
* Fix typos in the vmalert datasource

* Fix typo in the vmalert datasource test
2020-10-07 18:00:29 +03:00
Roman Khavronenko
368b890e11 vmalert: make maxIdleConnections configurable for datasource HTTP client (#797)
Address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/795
2020-09-30 09:51:14 +03:00
Aliaksandr Valialkin
543f3aea97 all: consistently use "%w" formatting in fmt.Errorf for wrapped errors 2020-09-23 22:48:21 +03:00
Aliaksandr Valialkin
8cd89cb847 app/vmalert: remove unneeded UTC() call
UTC() doesn't change the underlying timestamp, so the call isn't needed here
2020-09-21 15:56:48 +03:00
Roman Khavronenko
d111969d39 vmalert: add support for datasource.lookback flag (#779)
New datasource flag `datasource.lookback` defines how far to look into
past when evaluating queries.

Address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/668
2020-09-21 15:56:47 +03:00
Roman Khavronenko
0042b0f307 vmalert: fix the typo in error message (#782)
The error will be always nil so no sense in printing it.
2020-09-21 11:36:09 +03:00
Roman Khavronenko
e2b31590e6 vmalert: add Group name as label to generated alerts and timeseries (#761)
Solves #611
2020-09-11 23:41:12 +03:00
Roman Khavronenko
16e0bb496e vmalert: update groups on config reload only if changes detected (#759)
On config reload event `vmalert` reloads configuration for every group. While
it works for simple configurations, the more complex and heavy installations may
suffer from frequent config reloads.
The change introduces the `checksum` field for every group and is set to md5 hash
of yaml configuration. The checksum will change if on any change to group
definition like rules order or annotation change. Comparing the `checksum` field
on config reload event helps to detect if group should be updated.
The groups update is now done concurrently, so reload duration will be limited by
the slowest group now.

Partially solves #691 by improving config reload speed.
2020-09-11 23:41:12 +03:00
Aliaksandr Valialkin
475698d2ad docs: sync docs for vmalert, vmauth, vmbackup and vmrestore 2020-09-09 21:10:48 +03:00
Nikolay Khramchikhin
80a9dc79fe changed vmalert behaviour (#738)
* VMAlert start with empty rules dir

There are some applications (operator for instance), that generates alerts configuration at runtime
and vmalert must start correctly without rules to support this behaviour.
Later application will add rules files and send SIGHUP to vmalert,
which will trigger reading rules files and start rules exectuion.

Removing rules files with SIGHUP signal must stop rules execution and
vmalert will wait for new rules.

* imports sorted

* added test cases for empty rules, removed blank line

* fixed imports conflict

* updated tests
2020-09-03 11:07:40 +03:00
Aliaksandr Valialkin
7ac10ee978 app/vmalert: imrovements over 3f932c2db1 2020-09-03 01:14:30 +03:00
DexterZhang
85f49ad439 feat: spread load of rule evaluation by group when starting new groups (#724)
* feat: spread load of rule evaluation by group when starting new groups

* review: reduce the resulting diff.

* Update app/vmalert/group.go

Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>
2020-09-03 01:14:26 +03:00
Roman Khavronenko
08b76cb26f vmalert: update -rule flag description to enforce quotes using (#709)
Description for `-rule` flag uses as example specific chars like asterisks
which could be interpreted wrong by different shells. To avoid this, description
now contains quoted flag values.

See also #708
2020-08-28 09:46:35 +03:00
Aliaksandr Valialkin
e7c0b2ca56 docs: update docs 2020-08-14 19:14:46 +03:00
Aliaksandr Valialkin
60c7397be5 all: support %{ENV_VAR} placeholders in yaml configs in all the vm* components
Such placeholders are substituted by the corresponding environment variable values.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/583
2020-08-13 17:17:06 +03:00
Aliaksandr Valialkin
6721e47ae9 app: respect CPU limits set via cgroups
Update GOMAXPROCS to limits set via cgroups. This should reduce CPU trashing and reduce memory usage
for cases when VictoriaMetrics components run in containers with CPU limits.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/685
2020-08-11 23:01:03 +03:00
Roman Khavronenko
78afc61896 app/vmalert: extend metrics set exported by vmalert #573 (#654)
* app/vmalert: extend metrics set exported by `vmalert` #573

New metrics were added to improve observability:
+ vmalert_alerts_pending{alertname, group} - number of pending alerts per group
per alert;
+ vmalert_alerts_acitve{alertname, group} - number of active alerts per group
per alert;
+ vmalert_alerts_error{alertname, group} - is 1 if alertname ended up with error
during prev execution, is 0 if no errors happened;
+ vmalert_recording_rules_error{recording, group} - is 1 if recording rule
 ended up with error during prev execution, is 0 if no errors happened;
* vmalert_iteration_total{group, file} - now contains group and file name labels.
This should improve control over specific groups;
* vmalert_iteration_duration_seconds{group, file} - now contains group and file name labels. This should improve control over specific groups;

Some collisions for alerts and recording rules are possible, because neither
group name nor alert/recording rule name are unique for compatibility reasons.

Commit contains list of TODOs for Unregistering metrics since groups and rules
are ephemeral and could be removed without application restart. In order to
unlock Unregistering feature corresponding PR was filed - https://github.com/VictoriaMetrics/metrics/pull/13

* app/vmalert: extend metrics set exported by `vmalert` #573

The changes are following:
* add an ID label to rules metrics, since `name` collisions within one group is
a common case - see the k8s example alerts;
* supports metrics unregistering on rule updates. Consider the case when one rule
was added or removed from the group, or the whole group was added or removed.

The change depends on https://github.com/VictoriaMetrics/metrics/pull/16
where race condition for Unregister method was fixed.
2020-08-09 09:42:05 +03:00
Aliaksandr Valialkin
67cacb22ac lib/httpserver: add -tls, -tlsCertFile and -tlsKeyFile command-line flags in every vm binary
This makes such binaries compatible with binaries from `master` branch (aka single-node version)

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/677
2020-08-07 10:57:32 +03:00
Aliaksandr Valialkin
106e302d7a all: add mssing APP_NAME to vm*-GOARCH builds 2020-07-31 13:45:32 +03:00
Aliaksandr Valialkin
945645f38f docs/{vmagent,vmalert}: add instruction on how to build for ARM 2020-07-31 09:25:41 +03:00
Roman Khavronenko
ec6ed467c6 app/vmalert: support external.label to specify global labelset for all rules #622 (#652)
`external.label` flag supposed to help to distinguish alert or recording rules
source in situations when more than one `vmalert` runs for the same datasource
or AlertManager.
2020-07-28 14:23:04 +03:00
Aliaksandr Valialkin
31ef39e8da lib/httpserver: log remote address in error message from httpserver.Errorf
This should improve detection of the root cause of errors.
Thanks to Anant for the idea.
2020-07-20 14:06:29 +03:00
Aliaksandr Valialkin
ce381b3868 app/vmalert: consistently use "%w" instead of "%s" in fmt.Errorf when wrapping errors 2020-07-15 13:55:13 +03:00
Roman Khavronenko
9afd19d375 app/vmalert: add retries to remotewrite (#605)
* app/vmalert: add retries to remotewrite

Remotewrite pkg now does limited number of retries if write request failed.
This suppose to make vmalert state persisting more reliable.

New metrics were added to remotewrite in order to track rows/bytes sent/dropped.

defaultFlushInterval was increased from 1s to 5s for sanity reasons.

* fix

* wip

* wip

* wip

* fix bits alignment bug for 32-bit systems

* fix mistakenly dropped field
2020-07-05 18:47:38 +03:00
Ween
d28fb0baf9 [VMAlert] Fix error log when remoteWrite queue size is full (#602)
* Fix Auto metrics relabeled errors

* Finalize auto-genenated  Labels

* Fix Test Errors

* fix error logs when queue is full

Co-authored-by: xinyulong <xinyulong@kuaishou.com>
2020-07-03 16:50:43 +03:00