Commit graph

1160 commits

Author SHA1 Message Date
Roman Khavronenko
bb26aa9b1f vmalert: add initial UI implementation (#1602)
New UI pages:
/ - welcome page with API handlers list;
/groups - list of all rules per group;
/alerts - list of all active alerts;
/groupID/alertID/status - status of the active alert;
2021-09-07 22:45:45 +03:00
Aliaksandr Valialkin
304661c547 docs/vmagent.md: add API path for Prometheus text exposition format 2021-09-07 16:15:28 +03:00
Aliaksandr Valialkin
5c58c8d55e docs: update -help output for victoria-metrics and vmagent after f77dde837a 2021-09-01 16:35:18 +03:00
Aliaksandr Valialkin
c4df601f43 lib/promscrape: add the ability to limit the number of unique series per each scrape target
The number of series per target can be limited with the following options:

* Global limit with `-promscrape.maxSeriesPerTarget` command-line option.
* Per-target limit with `max_series: N` option in `scrape_config` section.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1561
2021-09-01 16:08:12 +03:00
Roman Khavronenko
8962a50478 vmctl: update README and flags description (#1588)
The purpose of update is to make README and flags description more
clear to the reader. Especially, show that vm-account-id flag is required
for clustered version of VM.
2021-09-01 12:22:58 +03:00
Roman Khavronenko
1cf4f5a715 Vmalert extra params (#1587)
* vmalert: allow extra GET params in datasource package

ExtraParams will be added as GET params to every HTTP request made by datasource.
The `roundDigits` param, for example, was substituted by corresponding extra param.

* vmalert: add nocache=1 param for replay process

The `nocache=1` param is VictoriaMetrics specific parameter which prevents it
from caching and boundaries aligning for queries. We set it to avoid cache
pollution in `replay` mode and also to avoid unnecessary time range boundaries
alignment.

* vmalert: mention nocache=1 in replay description

* vmalert: fix bug with unused param
2021-09-01 12:20:01 +03:00
Nikolay
210c76b51e adds external_labels per group for vmalert (#1485)
* adds external_label per group for vmalert
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1471
2021-09-01 12:19:49 +03:00
Roman Khavronenko
1cb7037fc8 Vmalert metrics update (#1580)
* vmalert: remove `vmalert_execution_duration_seconds` metric

The summary for `vmalert_execution_duration_seconds` metric gives no additional
value comparing to `vmalert_iteration_duration_seconds` metric.

* vmalert: update config reload success metric properly

Previously, if there was unsuccessfull attempt to reload config and then
rollback to previous version - the metric remained set to 0.

* vmalert: add Grafana dashboard to overview application metrics

* docker: include vmalert target into list for scraping

* vmalert: extend notifier metrics with addr label

The change adds an `addr` label to metrics for alerts_sent and alerts_send_errors
to identify which exact address is having issues.
The according change was made to vmalert dashboard.

* vmalert: update documentation and docker environment for vmalert's dashboard

Mention Grafana's dashboard in vmalert's README in a new section #Monitoring.

Update docker-compose env to automatically add vmalert's dashboard.
Update docker-compose README with additional info about services.
2021-09-01 12:19:34 +03:00
Aliaksandr Valialkin
3788d4f4eb app/vmselect: show useful endpoints when requested /select/<accountID>/ page 2021-08-29 12:05:14 +03:00
Aliaksandr Valialkin
39bb6bdd79 app/vmselect/promql: add quantile("phiLabel", phi1, ..., phiN, q) aggregate function to MetricsQL
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1573
2021-08-27 18:40:25 +03:00
Aliaksandr Valialkin
6e5864b014 docs/{vmgateway,vmbackupmanager}: mention that enterprise binaries are free for download and evaluation 2021-08-27 14:53:48 +03:00
Aliaksandr Valialkin
102ab795f8 docs/vmagent.md: document the ability to load scrape configs from multiple files
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1559
2021-08-26 09:13:54 +03:00
benclive
76816a0193 Remove trailing slash for URLPrefixes with specific path (#1554) 2021-08-25 13:32:04 +03:00
Aliaksandr Valialkin
335de30083 app/vmselect/promql: make fmt after 0078486ea7 2021-08-23 23:05:34 +03:00
Aliaksandr Valialkin
40b06e84f8 app/vmselect/promql: rename sign() function to sgn() in order to be consistent with Prometheus
See https://github.com/prometheus/prometheus/pull/8457 for details.
2021-08-23 11:46:29 +03:00
Aliaksandr Valialkin
ff4c7c1a3d docs/vmalert.md: run make docs-sync after 9ee3d0378f 2021-08-21 20:25:26 +03:00
Roman Khavronenko
0c2284b95f vmalert: add flag disableAlertgroupLabel for disabling extra label added to series (#1534)
The new label added in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/611
may negatively impact deduplication in Alertmanager. The new flag supposed to give
an option to disable adding this label.

To enable flag just add `-disableAlertgroupLabel` to binary execution command.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1532
2021-08-21 20:23:22 +03:00
Alexander Rickardsson
9e2e9d83a5 vmalert: accept http.StatusOK for remotewrite (#1550) 2021-08-21 20:23:22 +03:00
Aliaksandr Valialkin
91534057a3 app/vmselect/prometheus: do not extend [d] to the detected interval between samples for first_over_time(m[d])
This is for the sake of consistency with similar change for the last_over_time(m[d]) at a724229b5d
2021-08-21 19:56:56 +03:00
Roman Khavronenko
1ccb77904b vmselect: update vm_request_duration_seconds value when request fails (#1537)
Before, metric `vm_request_duration_seconds` was update only on successful
attempts which could be misleading. For example, timeout errors on netstorage
request may be not accounted in the metric and won't be visible on dashboards.
Using `defer` statement to update the metric after query arguments validation
may improve the situation.
2021-08-19 14:07:00 +03:00
Aliaksandr Valialkin
ee1f3414d1 app/vmselect/promql: do not override [d] at last_over_time(m[d]) if [d] is smaller than scrape_interval
Since most users do not expect the overriding of explicitly set `[d]`.
2021-08-19 10:33:10 +03:00
Aliaksandr Valialkin
5d92fafc40 app/vmselect: add -search.noStaleMarkers command-line flag for disabling stale markers handling in queries
This option allows reducing CPU usage a bit when VictoriaMetrics is used
for collecting and processing non-Prometheus data. For example, InfluxDB line protocol, Graphite, OpenTSDB, CSV, etc.
2021-08-18 13:58:06 +03:00
Aliaksandr Valialkin
f21fad53b4 lib/promscrape: add ability to disable sending Prometheus staleness markers with -promscrape.disableStaleMarkers command-line flag
This option can be useful when vmagent consumes too much additional memory
for staleness markers functionality and when staleness markers aren't needed.
2021-08-18 13:58:05 +03:00
Aliaksandr Valialkin
49886ecbc8 app/vmselect/promql: add bitmap_and(), bitmap_or() and bitmap_xor() functions to MetricsQL
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1541
2021-08-17 13:22:15 +03:00
Aliaksandr Valialkin
38065bec7b app/vmselect/promql: move common condition to dropStaleNaNs in order to improve code maintainability 2021-08-17 11:00:58 +03:00
Aliaksandr Valialkin
fe8c462044 app/vmalert: mention -remoteWrite.disablePathAppend in the description for -remoteWrite.url 2021-08-16 15:23:39 +03:00
Aliaksandr Valialkin
21974cb571 app/vmalert: follow-up for 2400f85761 2021-08-16 15:20:35 +03:00
Alexander Rickardsson
d27dc3721b vmalert: enable configuring explicit path (#1536)
* vmalert: allow to disable automatically added path to remote write address via disablePathAppend flag
* docs: update docs to include remoteWrite.disablePathAppend
2021-08-16 14:58:05 +03:00
Aliaksandr Valialkin
48920bdef8 app/vmagent/remotewrite: expose vmagent_remotewrite_send_duration_seconds_total metric
This metric can be used for determining high saturation of every connection to remote storage with
an alerting query `rate(vmagent_remotewrite_send_duration_seconds_total) > 0.9s`.
This query triggers when a connection is satureated by more than 90%
2021-08-15 13:34:07 +03:00
Aliaksandr Valialkin
5420c3d967 app/vmselect/promql: drop staleness marks before calling rollupConfig.Do
This allows dropping staleness marks only once and then calculate multiple rollup functions on the result.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1526
2021-08-15 13:22:26 +03:00
Aliaksandr Valialkin
6c4c54eaad Revert "app/vmselect/promql: properly handle Prometheus staleness marks in removeCounterResets functions"
This reverts commit 94dfcb6747a3b29a11d14e71bea21a2312bb6346.

It is better to remove staleness marks (decimal.StaleNaN) before calling rollupConfig.Do, e.g. in preFunc
2021-08-15 13:22:24 +03:00
Aliaksandr Valialkin
af4a306d7b app/vmselect/promql: properly handle Prometheus staleness marks in removeCounterResets functions
Prometheus stalenss marks shouldn't be changed in removeCounterResets. Otherwise they will be converted to an ordinary NaN values,
which couldn't be removed in dropStaleNaNs() function later. This may result in incorrect calculations for rollup functions.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1526
2021-08-14 12:45:31 +03:00
Aliaksandr Valialkin
c1f81f08d4 all: add support for Prometheus staleness markers
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1526
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/748
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1509
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1530
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/845
2021-08-13 12:13:15 +03:00
Aliaksandr Valialkin
b35ae791f1 app/vmselect: make vmui-update after the commit 4ae14df864a7e327955f44941295a286175423b3 2021-08-11 13:42:53 +03:00
Aliaksandr Valialkin
f60ff85dbe app/vmui: actualize Dockerfiles 2021-08-11 13:42:53 +03:00
Aliaksandr Valialkin
9eb828b2c2 app/vminsert: add vm_rpc_send_duration_seconds_total metric per each vminsert->vmstorage link
This metric is useful for determining high link saturation with the following alerting rule:

rate(vm_rpc_send_duration_seconds_total) > 0.9s
2021-08-11 11:42:33 +03:00
Aliaksandr Valialkin
90efb5831b lib/envflag: add a link to docs for -envflag.enable 2021-08-11 10:32:40 +03:00
Yury Molodov
aca2cb245e vmui: fix layout and add server url by default (#1519)
* fix: change layout for correctly display big query

* fix: set default server from url

* fix: change get default server url
2021-08-06 12:16:53 +03:00
Roman Khavronenko
d5ba8248cc vmalert: expose new metrics for tracking number of produced samples during last evaluation (#1518)
* vmalert: expose new metrics for tracking number of produced samples during last evaluation

Two new metrics were added to track the number of samples produced during the last evaluation:
* vmalert_recording_rules_last_evaluation_samples
* vmalert_alerting_rules_last_evaluation_samples

The gauge type is used to remain consistent with Prometheus metric
`prometheus_rule_group_last_evaluation_samples` which is on the group level.
However, the counter type was considered as well.

Two metrics instead of one are used to make it easier to separate recording and
alerting rules. It is likely, number of samples produced by recording rules is
more important so people will refer to it more frequently.

The expected usage of the new metric is the following:
```
   - alert: RecordingRuleReturnsEmptyResults
        expr: sum(vmalert_recording_rules_last_evaluation_samples) by(recording) < 1
        annotations:
          summary: Recording rule {{$labels.recording}} returns empty results.
            Please verify expression correctness.
```

Addresses https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1494

* vmalert: rename `vmalert_alerts_error` to `vmalert_alerting_rules_error` to remain consistent with recording rules metrics
2021-08-05 10:02:35 +03:00
Aliaksandr Valialkin
13d438d808 app/vmagent: typo fix in the description for -remoteWrite.queues 2021-08-05 10:00:58 +03:00
Aliaksandr Valialkin
b877538622 app/vmagent: follow-up after fe445f753b
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1491
2021-08-05 09:51:00 +03:00
Omar Ghader
fe445f753b
feature: Add multitenant for vmagent (#1505)
* feature: Add multitenant for vmagent

* Minor fix

* Fix rcs index out of range

* Minor fix

* Fix multi Init

* Fix multi Init

* Fix multi Init

* Add default multi

* Adjust naming

* Add TenantInserted metrics

* Add TenantInserted metrics

* fix: remove unused metrics for vmagent

* fix: remove unused metrics for vmagent

Co-authored-by: mghader <marc.ghader@ubisoft.com>
Co-authored-by: Sebastian YEPES <syepes@gmail.com>
2021-08-05 09:44:29 +03:00
Qifei Wan
095bb90879 app/vmalert: update config state metrics if config parsed failed (#1507) 2021-08-03 16:12:48 +03:00
Aliaksandr Valialkin
60cfa5f100 app/vmselect/promql: add present_over_time(m[d]) function, which will be available starting from Prometheus 2.29.0
See https://github.com/prometheus/prometheus/releases/tag/v2.29.0-rc.0 and https://github.com/prometheus/prometheus/pull/9097
2021-08-03 12:21:53 +03:00
wusphinx
511e5c2e68 Update TimeSelector.tsx (#1515)
delete garbled code
2021-08-03 11:14:56 +03:00
Nikolay
3f3ad13753 adds /rules and /alerts api for grafana (#1504)
Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-08-02 17:29:49 +03:00
Aliaksandr Valialkin
99004a6a40 app/vmselect/netstorage: unpack time series data in mostly local big chunks
This should improve performance on multi-CPU systems for queries selecting time series with big number of raw samples
2021-07-30 12:26:33 +03:00
Aliaksandr Valialkin
c473d8ffe1 li/storage: re-use the per-day inverted index search code for searching in global index
This allows removing a big pile of outdated code for global index search.

This may help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1486
2021-07-30 10:28:20 +03:00
Aliaksandr Valialkin
cbb81c2ce9 app/vmselect/netstorage: do not query Go maps with unsafe string keys, since this breaks in Go 1.17 2021-07-30 10:28:19 +03:00
Aliaksandr Valialkin
b709fa387a app/vmselect: follow-up for ed95bc9531
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1493
2021-07-29 09:48:47 +03:00