Commit graph

2758 commits

Author SHA1 Message Date
Roman Khavronenko
bffd30b57a
app/vmalert: update remote-write process (#5284)
* app/vmalert: update remote-write process

* automatically retry remote-write requests on closed connections. The change should reduce the amount of logs produced in environments with short-living connections or environments without support of keep-alive on network balancers.
* increment `vmalert_remotewrite_errors_total` metric if all retries to send remote-write request failed. Before, this metric was incremented only if remote-write client's buffer is overloaded.
* increment `vmalert_remotewrite_dropped_rows_total` amd `vmalert_remotewrite_dropped_bytes_total` metrics if remote-write client's buffer is overloaded. Before, these metrics were incremented only after unsuccessful HTTP calls.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Update docs/CHANGELOG.md

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Hui Wang <haley@victoriametrics.com>
2023-11-08 14:53:07 +08:00
hagen1778
c07dc45786
app/vmalert: fix typo in remoteWrite.concurrency description
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-11-03 22:04:50 +01:00
Yury Molodov
f90d2ec843
vmui: display query error on Explore metrics page (#5272)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5202
2023-11-03 16:23:19 +01:00
Zakhar Bessarab
323f3720ed
app/vmauth: add option to skip TLS verification (#5256)
Add `tls_insecure_skip_verify` option on per-user basis which allows to disable TLS verification for all requests to backend on behalf of this user.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5240

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-11-03 12:04:17 +01:00
Aliaksandr Valialkin
815fda8995
docs: update -help output after recent changes to VictoriaMetrics components 2023-11-02 20:27:10 +01:00
Aliaksandr Valialkin
65db6609eb
docs/CHANGELOG.md: update the description of the optimization for SLO/SLI-like queries according to latest changes
See commits 4497a08e3d and 92826b0b4a
2023-11-02 20:05:05 +01:00
Roman Khavronenko
b5254199c6
app/vmalert: add label file pointing to the group's filename to metrics (#5281)
The filename should help identifying alerting rules belonging to specific groups
with identical names but different filenames.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5267

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-11-02 16:01:31 +01:00
hagen1778
6eb205f8b0
app/vmalert: verify alert name correctness in restore test
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-11-02 15:28:39 +01:00
Hui Wang
90d45574bf
vmalert: reduce restore query request for each alerting rule (#5265)
reduce the number of queries for restoring alerts state on start-up. 
The change should speed up the restore process and reduce pressure on `remoteRead.url`.
2023-11-02 15:22:13 +01:00
Aliaksandr Valialkin
eeb25bc4a5
app/vmselect/promql: add missing trace message in rollupResultCache.GetSeries() 2023-11-02 09:22:48 +01:00
Aliaksandr Valialkin
ed70a40669
app/vmagent/remotewrite: add -remoteWrite.shardByURL.labels command-line flag
This command-line flag can be used for specifying a list of labels used for sharding
among -remoteWrite.url entries when -remoteWrite.shardByURL command-line flag is set.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4942
2023-11-01 23:08:54 +01:00
Alexander Marshalov
828ddd4e4f
vmauth: add browser authorization request for http requests without… (#5234)
* vmauth: add browser authorization request for http requests without credentials to a route that is not in the `unauthorized_user` section (when `unauthorized_user` is specified).

* add link to issue in CHANGELOG

* Extend vmauth docs

* wip

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-11-01 20:59:46 +01:00
Aliaksandr Valialkin
4497a08e3d
app/vmselect/promql: reduce the minimum lookbehind window for enabling SLO/SLI optimizations from 24 hours to 6 hours
This reduction is based on production testing.

Also expose -search.minWindowForInstantRollupOptimization command-line flag, so users could fine-tune this arg for their needs
2023-11-01 20:18:44 +01:00
Aliaksandr Valialkin
f59dda3223
app/vmselect: run make quicktemplate-gen after b8739bc00b 2023-11-01 17:52:56 +01:00
Aliaksandr Valialkin
b8739bc00b
app/vmselect: return stats.seriesFetched as string instead of number
vmalert expects string value for stats.seriesFetched, so it is impossible
switching to number without breaking compatibility with old vmalert releases :(

It is still unclear why stats.seriesFetched has string type in the first place...
2023-11-01 17:49:59 +01:00
Aliaksandr Valialkin
da887b49e7
app/vmui: show query execution duration in the header of query input field
This should simplify the process of query optimization
2023-11-01 16:43:51 +01:00
Hui Wang
e482eeff58
vmalert: support specifying full http url in notifier static_configs target (#5261)
* vmalert: support specifying full http or https urls in notifier static_configs target address
* show right label results in ui
2023-11-01 19:53:50 +08:00
Aliaksandr Valialkin
92826b0b4a
app/vmselect/promql: apply SLO-like optimization to all the count_*_over_time() functions
This is a follow-up for 41a0fdaf39
2023-11-01 09:58:16 +01:00
Aliaksandr Valialkin
0189081490
app/vmselect/promql: typo fix, which could lead to panic during range query execution
The panic is:

  BUG: unexpected values after merging new values

This is a follow-up for 41a0fdaf39
2023-11-01 09:58:15 +01:00
Aliaksandr Valialkin
c4c6ee9485
app/vmui: fix non-working Disable cache checkbox at JSON and Table views 2023-10-31 22:58:06 +01:00
Aliaksandr Valialkin
a70818f72f
app/vmselect/promql: properly calculate rollup result if lookbehind window isn't set
This is a follow-up for 41a0fdaf39
2023-10-31 22:22:37 +01:00
Aliaksandr Valialkin
ea81f6fc36
app/vmselect/promql: add outliers_iqr(q) and outlier_iqr_over_time(m[d]) functions
These functions allow detecting anomalies in series and samples using Interquartile range method.
See Outliers section at https://en.wikipedia.org/wiki/Interquartile_range for more details.
2023-10-31 22:10:31 +01:00
Aliaksandr Valialkin
41a0fdaf39
app/vmselect/promql: optimize repeated SLI-like instant queries with lookbehind windows >= 1d
Repeated instant queries with long lookbehind windows, which contain one of the following rollup functions,
are optimized via partial result caching:

- sum_over_time()
- count_over_time()
- avg_over_time()
- increase()
- rate()

The basic idea of optimization is to calculate

  rf(m[d] @ t)

as

  rf(m[offset] @ t) + rf(m[d] @ (t-offset)) - rf(m[offset] @ (t-d))

where rf(m[d] @ (t-offset)) is cached query result, which was calculated previously

The offset may be in the range of up to 1 hour.
2023-10-31 19:25:23 +01:00
Aliaksandr Valialkin
51aab7bb17
app/vmselect/promql: wrap too long line after a950873fff 2023-10-31 18:59:10 +01:00
Roman Khavronenko
a950873fff
app/vmselect: expose vm_memory_intensive_queries_total counter metric (#5208)
The new metric gets increased each time `-search.logQueryMemoryUsage` memory limit
is exceeded by a query. This metric should help to identify expensive and heavy queries
without inspecting the logs.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-31 13:31:09 +01:00
Hui Wang
abcb21aa5e
vmalert: fix alert firing state in replay mode (#5192)
fix possible missing firing states for alerting rules in replay mode
Before if one firing stage is bigger than single query request range, like rule with a big `for`, alerting rule won't able to be detected as firing.

Co-authored-by: hagen1778 <roman@victoriametrics.com>
2023-10-30 13:54:18 +01:00
Dima Lazerka
ad839aa492
lib/httpserver: add flags to specify HSTS / Frame-Options / CSP headers for httpserver (#5111)
support `Strict-Transport-Security`, `Content-Security-Policy` and `X-Frame-Options`
HTTP headers in all VictoriaMetrics components. 
The values for headers can be specified by users via the following flags: 
`-http.header.hsts`, `-http.header.csp` and `-http.header.frameOptions`.

Co-authored-by: hagen1778 <roman@victoriametrics.com>
2023-10-30 11:33:38 +01:00
Aliaksandr Valialkin
9149353a36
app/vmui: change the order of tables at Top queries tab
Move the most interesting table - queries with the most summary time to execute - to the top
2023-10-28 11:56:16 +02:00
hagen1778
3aec7eb44f
app/vmalert: remove unclear comment
The timestamp alignment should be applied as a last step
to keep the timestamp consistent.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-26 15:41:35 +02:00
Aliaksandr Valialkin
d5a599badc
lib/promauth: follow-up for e16d3f5639
- Make sure that invalid/missing TLS CA file or TLS client certificate files at vmagent startup
  don't prevent from processing the corresponding scrape targets after the file becomes correct,
  without the need to restart vmagent.
  Previously scrape targets with invalid TLS CA file or TLS client certificate files
  were permanently dropped after the first attempt to initialize them, and they didn't
  appear until the next vmagent reload or the next change in other places of the loaded scrape configs.

- Make sure that TLS CA is properly re-loaded from file after it changes without the need to restart vmagent.
  Previously the old TLS CA was used until vmagent restart.

- Properly handle errors during http request creation for the second attempt to send data to remote system
  at vmagent and vmalert. Previously failed request creation could result in nil pointer dereferencing,
  since the returned request is nil on error.

- Add more context to the logged error during AWS sigv4 request signing before sending the data to -remoteWrite.url at vmagent.
  Previously it could miss details on the source of the request.

- Do not create a new HTTP client per second when generating OAuth2 token needed to put in Authorization header
  of every http request issued by vmagent during service discovery or target scraping.
  Re-use the HTTP client instead until the corresponding scrape config changes.

- Cache error at lib/promauth.Config.GetAuthHeader() in the same way as the auth header is cached,
  e.g. the error is cached for a second now. This should reduce load on CPU and OAuth2 server
  when auth header cannot be obtained because of temporary error.

- Share tls.Config.GetClientCertificate function among multiple scrape targets with the same tls_config.
  Cache the loaded certificate and the error for one second. This should significantly reduce CPU load
  when scraping big number of targets with the same tls_config.

- Allow loading TLS certificates from HTTP and HTTPs urls by specifying these urls at `tls_config->cert_file` and `tls_config->key_file`.

- Improve test coverage at lib/promauth

- Skip unreachable or invalid files specified at `scrape_config_files` during vmagent startup, since these files may become valid later.
  Previously vmagent was exitting in this case.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4959
2023-10-25 23:19:37 +02:00
Aliaksandr Valialkin
cb34d4440c
app/vmalert/config: fix flacky test TestParseBad
It could return either `failed to read` or `failed to parse` errors depending
on whether the given url can be loaded or not under the current environment
2023-10-25 21:30:55 +02:00
Aliaksandr Valialkin
42dd71bb63
all: consistently use %w instead of %s in when error is passed to fmt.Errorf()
This allows consistently using errors.Is() for verifying whether the given error wraps some other known error.
2023-10-25 21:24:03 +02:00
hagen1778
c07909a20b
app/vmalert: fix typo in tests
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-25 16:28:27 +02:00
hagen1778
eed0c3c6b0
app/vmalert: fix tests after a216fe6728
a216fe6728
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-25 16:25:26 +02:00
hagen1778
a216fe6728
app/vmalert: follow-up after c9375cac5e
c9375cac5e

Descriptions were updated in attempt to make it more clear for readers,
re-phrasing and linking missing docs.

`eval_delay` was added to tests to verify it can be unmarshalled.

`eval_delay` is now applied before timestamp alignment to make it more predictable.
Before, if delay < interval the timestamp won't be aligned.

`eval_delay` and `eval_offset` was added to API output.

`PreviouslySentSeriesToRW` converted to private `previouslySentSeriesToRW`.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-25 13:07:13 +02:00
Hui Wang
c9375cac5e
vmalert: add -rule.evalDelay flag and eval_delay as group attribute (#5185)
Also mark `-datasource.lookback` as will be deprecated, see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5155.
2023-10-25 11:54:18 +02:00
Aliaksandr Valialkin
2f21c0c119
docs: use https://github.com/VictoriaMetrics/VictoriaMetrics/releases/latest instead of https://github.com/VictoriaMetrics/VictoriaMetrics/releases link where needed
The https://github.com/VictoriaMetrics/VictoriaMetrics/releases link may show non-latest
releases at the top, such as LTS releases or VictoriaLogs releases.
So it is better to use https://github.com/VictoriaMetrics/VictoriaMetrics/releases/latest link,
which always redirect to the latest available release of VictoriaMetrics.
2023-10-18 20:06:25 +02:00
Roman Khavronenko
b8b6e120ff
app/vmselect: limit the number of parallel workers by 32 (#5195)
* app/vmselect: limit the number of parallel workers by 32

The change should improve performance and memory usage during query processing
on machines with big number of CPU cores. The number of parallel workers for
query processing is controlled via `-search.maxWorkersPerQuery` command-line flag.
By default, the number of workers is limited by the number of available CPU cores,
but not more than 32. The limit can be increased via `-search.maxWorkersPerQuery`.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

- The `-search.maxWorkersPerQuery` command-line flag doesn't limit resource usage,
  so move it from the `resource usage limits` to `troubleshooting` chapter at docs/Single-server-VictoriaMetrics.md

- Make more clear the description for the `-search.maxWorkersPerQuery` command-line flag

- Add the description of `-search.maxWorkersPerQuery` to docs/Cluster-VictoriaMetrics.md

- Limit the maximum value, which can be passed to `-search.maxWorkersPerQuery`, to GOMAXPROCS,
  because bigger values may worsen query performance and increase CPU usage

- Improve the the description of the change at docs/CHANGELOG.md. Mark it as FEATURE instead of BUGFIX,
  since it is closer to a feature than to a bugfix.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5087

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-10-18 19:51:37 +02:00
Yury Molodov
ad9e2b21ee
vmui: update dependencies (#5194) 2023-10-18 18:38:32 +02:00
Hui Wang
e16d3f5639
fix inconsistent behaviors with prometheus when scraping (#5153)
* fix inconsistent behaviors with prometheus when scraping

1. address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4959. skip job with wrong syntax in `scrape_configs` with error logs instead of exiting;
2. show error messages on vmagent /targets ui if there are wrong auth configs in `scrape_configs`, previously will print error logs and do scrape without auth header;
3. don't send requests if there are wrong auth configs in:
    1. vmagent remoteWrite;
    2. vmalert datasource/remoteRead/remoteWrite/notifier.

* add changelogs

* address review comments

* fix ut
2023-10-17 17:58:19 +08:00
Aliaksandr Valialkin
da77f4deeb
app/vmselect/promql: add labels_equal(q, "label1", "label2", ...) function
This function returns q series, which have identical values for the listed labels
"label1", "label2", ...

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5148
2023-10-16 21:50:11 +02:00
Aliaksandr Valialkin
6c3dd16a16
app/vmagent/remotewrite: move sas var initialization closer to the place where it is used
This makes the code sligthtly easier to understand.

This is a follow-up for 1d3d989be5

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5170
2023-10-16 20:52:56 +02:00
Aliaksandr Valialkin
bdb743c88d
app/vmselect/promql: add drop_empty_series() function for dropping empty series before performing additional calculations
This can be useful in the following queries:

   drop_empty_series(temperature <= 30) default 40

This query drops temperature series with all the values bigger than 30 on the selected time range,
while replacing gaps in the remaining series with 40.

The query without drop_empty_series:

  (temperature <= 30) default 40

would leave all the temperature series with all the values bigger than 30 on the selected time range,
and replace all their values with 40. This is not what could be epxected in some cases
like here - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5071
2023-10-16 20:44:56 +02:00
hagen1778
1d3d989be5
app/vmagent/remotewrite: follow-up after 4f102ff945
4f102ff945
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-16 16:00:24 +02:00
luosjde
4f102ff945
vmagent: fix streamaggr config reload bug
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5170

Authored-by: luoshaojun01 <luoshaojun01@baidu.com>
2023-10-16 15:57:24 +02:00
Aliaksandr Valialkin
b8c267075e
lib/promscrape: add a link to https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets in descriptions for -promscrape.cluster.* command-line flags
This should help users figuring out the purpose of -promscrape.cluster.* command-line flags
2023-10-16 14:46:22 +02:00
Aliaksandr Valialkin
fc98b62760
lib/promutils, app/vmalert-tool/unittest: move promutils.Duration.ParseTime() to app/vmalert-tool/unittest.durationToTime()
The ParseTime() function looks strange, since it converts relative duration to absolute time since Unix Epoch.
In most scenarios such a conversion is used by mistake.

It is better to do not expose such a function for public use and hide it inside the package where it is needed,
e.g. inside app/vmalert-tool/unittest.

This is a follow-up for dc28196237
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2945
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4789
2023-10-16 14:19:31 +02:00
Aliaksandr Valialkin
07150359b2
docs/vmbackup.md: clarify documentation about -deleteAllObjectVersions command-line flag
Updates 2fc7e9f47e
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5121
2023-10-16 12:09:46 +02:00
Aliaksandr Valialkin
97a7128e4b
app/vmselect/promql: do not use unsafe conversion from bytes slice to string when storing a value by map key
The assigned map key shouldn't change over time, otherwise the map won't work properly.

This is a follow-up for 1f91f22b5f
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5087
2023-10-16 01:56:57 +02:00
Aliaksandr Valialkin
2c334ed953
app/{vmagent,vminsert}: follow-up for NewRelic data ingestion protocol support
This is a follow-up for f60c08a7bd

Changes:

- Make sure all the urls related to NewRelic protocol start from /newrelic . Previously some urls were started from /api/v1/newrelic

- Remove /api/v1 part from NewRelic urls, since it has no sense

- Remove automatic transformation from CamelCase to snake_case for NewRelic labels and metric names,
  since it may complicate the transition from NewRelic to VictoriaMetrics. Preserve all the metric names and label names,
  so users could query metrics and labels by the same names which are used in NewRelic.
  The automatic transformation from CamelCase to snake_case can be added later as a special action for relabeling rules if needed.

- Properly update per-tenant data ingestion stats at app/vmagent/newrelic/request_handler.go . Previously it was always zero.

- Fix NewRelic urls in vmagent when multitenant data ingestion is enabled. Previously they were mistakenly started from `/`.

- Document NewRelic data ingestion url at docs/Cluster-VictoriaMetrics.md

- Remove superflouos memory allocations at lib/protoparser/newrelic

- Improve tests at lib/protoparser/newrelic/*

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3520
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4712
2023-10-16 00:25:25 +02:00