Commit graph

2943 commits

Author SHA1 Message Date
Roman Khavronenko
276e9301f4
app/vmalert: sanitize label names before sending to Alertmanager (#5442)
Before, vmalert would send notifications with labels containing characters
  not supported by Alertmanager validator, resulting into validation errors
  like `msg="Failed to validate alerts" err="invalid label set: invalid name "foo.bar"`

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-12-08 18:09:07 +02:00
Alexander Marshalov
e9cf39f519
added field version to the response for /api/v1/status/buildinfo API for using more efficient API in Grafana for receiving label values, added additional info about setup Grafana datasource (#5370) (#5437) 2023-12-07 16:41:56 +02:00
Aliaksandr Valialkin
32aea90847
app/vmselect/prometheus: go fmt after b39e9257eb 2023-12-07 16:05:01 +02:00
Aliaksandr Valialkin
9f79342e6a
app/vmselect/prometheus: properly encode Prometheus label values at /federate endpoint
Prometheus spec says that only \, \n and " must be escaped inside label values.
See 995743836e/content/docs/instrumenting/exposition_formats.md (L90)

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5431
2023-12-07 15:36:50 +02:00
Aliaksandr Valialkin
12e94f10cc
deployment/docker: update Go builder from Go1.21.4 to Go1.21.5
See https://github.com/golang/go/issues?q=milestone%3AGo1.21.5+label%3ACherryPickApproved
2023-12-06 22:33:27 +02:00
Dmytro Kozlov
6a41e1ec0c
app/vmalert: replace error metrics for gauges with counter metrics (#5217)
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5160

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 935bec447b)
2023-12-06 19:41:34 +01:00
Aliaksandr Valialkin
509339bf63
app/vmselect: properly adjust the lower bound for the time range where raw samples must be selected for default_rollup() function
Previously the lower bound could be too small, which could result in missing values at the beginning of the graph
for default_rollup() function. This function is automatically applied to all the series selectors if they aren't
explicitly wrapped into a rollup function - see https://docs.victoriametrics.com/MetricsQL.html#implicit-query-conversions

While at it, properly take into account `-search.minStalenessInterval` command-line flag when adjusting
the lower bound for the selected time range.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5388
2023-12-06 14:46:18 +02:00
Aliaksandr Valialkin
559e4db512
Revert "add datadog /api/v2/series and /api/beta/sketches support (#5094)"
This reverts commit d6b4c8e4ef.

Reason for revert: https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5094#issuecomment-1839789080
2023-12-05 02:30:40 +02:00
Aliaksandr Valialkin
bf187b2dc9
app/vmagent: add -enableMultitenantHandlers command-line flag
This flag allows converting tenant id to (vm_account_id, vm_project_id) labels.
this flag deprecates `-remoteWrite.multitenantURL` command-line flag,
because `-enableMultitenantHandlers` is easier to use and combine with multitenant url
at vminsert - https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#multitenancy-via-labels

See https://docs.victoriametrics.com/vmagent.html#multitenancy

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1505
2023-12-05 01:35:59 +02:00
Aliaksandr Valialkin
85fcefaa34
app/vmagent: code cleanup for Kafka and Google PubSub consumers / producers
- Add links to relevant docs into descriptions for every -kafka.* and -gcp.pubsub.* command-line flags.
- Wait until message processing goroutines are stopped before returning from gcppubsub.Stop().
- Prevent from multiple calls to Init() without Stop().
- Drop message if tenantID cannot be parsed properly.
- Take into account tenantID for all the supported message formats.
- Support gzip-compressed messages for graphite format.
- Use exponential backoff sleep when the message cannot be pushed to remote storage systems
  because of disabled on-disk persistence - https://docs.victoriametrics.com/vmagent.html#disabling-on-disk-persistence
- Unblock from sleep as soon as Stop() is called. Previously the sleep could take up to 2 seconds after Stop() is called.
- Remove unused globalCtx and initContext from app/vmagent/remotewrite/gcppubsub
- Mention Google PubSub support at docs/enterprise.md
- Make Google PubSub docs more clear at docs/vmagent.md

This is a follow-up for commits 115245924a5f096c5a3383d6cc8e8b6fbd421984
and e6eab781ce42285a6a1750dc01eba6801dd35516 .

Updates https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/pull/717
Updates https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/pull/713
2023-12-04 22:51:04 +02:00
Dmytro Kozlov
6770bad207
app/vmalert: expose /vmalert/api/v1/rule and /api/v1/rule API which returns rule status in JSON format (#5397)
* app/vmalert: expose `/vmalert/api/v1/rule` and `/api/v1/rule` API which returns rule status in JSON format

* app/vmalert: hide updates if query param not set

* app/vmalert: fix panic (recursion call)

* app/vmalert: add needed group name and file name

* app/vmalert: fix comment, update behavior

* app/vmalert: fix description

* app/vmalert: simplify API for /api/v1/rule

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* app/vmalert: simplify API for /api/v1/rule

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* app/vmalert: simplify API for /api/v1/rule

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* app/vmalert: simplify API for /api/v1/rule

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* app/vmalert: simplify API for /api/v1/rule

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2023-12-04 22:49:39 +02:00
Aliaksandr Valialkin
d868155751
app/vmselect: do not limit concurrency for static and fast queries
Previously concurrency for static and fast queries was limited with the -search.maxConcurrentRequests
command-line flag. This could complicate identifying heavy queries via `vmui` at `Top queries` and `Active queries` pages,
since `vmui` and these pages couldn't be opened on overloaded vmselect.

Thanks to @f41gh7 for the idea.
2023-12-04 18:14:29 +02:00
Aliaksandr Valialkin
9f352f1b93
app/vminsert/newrelic: simplify the code a bit after 1fb8dc0092
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5416
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5421
2023-12-04 16:26:52 +02:00
Dmytro Kozlov
1fb8dc0092
app/vminsert: fix newrelic ingestion in cluster version (#5421)
Properly pass tenant ID to ingested data from newrelic.
Before tenant ID was mistakenly skipped.
2023-12-04 09:38:32 +01:00
Aliaksandr Valialkin
e017176f45
docs/vmauth.md: add typical use cases
(cherry picked from commit 837f6f0975)
2023-12-01 14:00:23 +01:00
Andrii Chubatiuk
d6b4c8e4ef
add datadog /api/v2/series and /api/beta/sketches support (#5094)
Co-authored-by: Andrew Chubatiuk <andrew.chubatiuk@motional.com>
Co-authored-by: Nikolay <https://github.com/f41gh7>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

(cherry picked from commit 543f218fe9)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-12-01 13:55:32 +01:00
luckyxiaoqiang
8ce82c5400
app/vmselect/promql: add day_of_year() function (#5368)
Co-authored-by: dingxiaoqiang <dingxiaoqiang@bytedance.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
(cherry picked from commit d7897e0d70)
2023-11-28 12:49:48 +01:00
Aliaksandr Valialkin
5ccc22d66d
app/vmagent: properly increase vmagent_remotewrite_samples_dropped_total when scraped samples cannot be sent to the remote storage and -remoteWrite.dropSamplesOnOverload is set
This is a follow-up for 5034aa0773
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2110
2023-11-25 14:44:42 +02:00
Aliaksandr Valialkin
2f14394335
app/vmagent: follow-up for 090cb2c9de
- Add Try* prefix to functions, which return bool result in order to improve readability and reduce the probability of missing check
  for the result returned from these functions.

- Call the adjustSampleValues() only once on input samples. Previously it was called on every attempt to flush data to peristent queue.

- Properly restore the initial state of WriteRequest passed to tryPushWriteRequest() before returning from this function
  after unsuccessful push to persistent queue. Previously a part of WriteRequest samples may be lost in such case.

- Add -remoteWrite.dropSamplesOnOverload command-line flag, which can be used for dropping incoming samples instead
  of returning 429 Too Many Requests error to the client when -remoteWrite.disableOnDiskQueue is set and the remote storage
  cannot keep up with the data ingestion rate.

- Add vmagent_remotewrite_samples_dropped_total metric, which counts the number of dropped samples.

- Add vmagent_remotewrite_push_failures_total metric, which counts the number of unsuccessful attempts to push
  data to persistent queue when -remoteWrite.disableOnDiskQueue is set.

- Remove vmagent_remotewrite_aggregation_metrics_dropped_total and vm_promscrape_push_samples_dropped_total metrics,
  because they are replaced with vmagent_remotewrite_samples_dropped_total metric.

- Update 'Disabling on-disk persistence' docs at docs/vmagent.md

- Update stale comments in the code

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5088
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2110
2023-11-25 12:13:39 +02:00
Nikolay
25ac2aac31
app/vmagent: allow to disabled on-disk persistence (#5088)
* app/vmagent: allow to disabled on-disk queue
Previously, it wasn't possible to build data processing pipeline with a
chain of vmagents. In case when remoteWrite for the last vmagent in the
chain wasn't accessible, it persisted data only when it has enough disk
capacity. If disk queue is full, it started to silently drop ingested
metrics.

New flags allows to disable on-disk persistent and immediatly return an
error if remoteWrite is not accessible anymore. It blocks any writes and
notify client, that data ingestion isn't possible.

Main use case for this feature - use external queue such as kafka for
data persistence.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2110

* adds test, updates readme

* apply review suggestions

* update docs for vmagent

* makes linter happy

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-11-25 12:12:29 +02:00
Roman Khavronenko
26242f526e
lib/protoparser: decrease import.maxLineLen from 100MB to 10MB (#5364)
Tests showed that importing a single line with 70MB size takes 5.3GiB
RSS memory for VictoriaMetrics single-node.
In the scenario when user exports and imports data from one VM to another,
it could possibly lead to OOM exception for destination VM.

Importing a single line with 16MB size taks 1.3GiB RSS memory.
Hence, the limit for `import.maxLineLen` was decreased from 100MB to 10MB
to improve reliability of VictoriaMetrics during imports.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-11-24 13:13:33 +02:00
Aliaksandr Valialkin
a906a7d85c
app/vmagent/remotewrite: do not drop persistent queues when -remoteWrite.multitenantURL is set
It is unsafe to drop persistent queues when -remoteWrite.multitenantURL command-line flag is set,
since these queues are created on demand when a new sample for the given tenant is pushed
to the remote storage.

This addresses https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5357
The issue has been appeared in the commit f3a51e8b1d
when implementing https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4014
2023-11-23 20:43:21 +02:00
Aliaksandr Valialkin
10b4dfbbf9
app/vmalert/notifier: remove backticks from the description for -notifier.blackhole command-line flag
Backticks in flag description are automatically converted to flag type. See https://pkg.go.dev/flag#PrintDefaults

This is a follow-up for 20025d4fd6 and 25317b4e70
2023-11-22 20:17:45 +02:00
Aliaksandr Valialkin
db6dadf1f7
docs: convert png images to webp in all the docs except of docs/operator/*
This reduces the size of docs/* folder from 33MB to 18MB

Images inside docs/operator/* must be converted at the https://github.com/VictoriaMetrics/operator/tree/master/docs
and then the updated images must be automatically propagated to the docs/operator/*

This is a follow-up for d3f919df3e

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5206
2023-11-22 19:29:47 +02:00
Aliaksandr Valialkin
46e58f3669
app/vmagent/README.md: sync with docs/vmagent.md after cbe4a5c251 , so make docs-sync properly works 2023-11-20 22:43:28 +02:00
Nikolay
c06044ef52
app/vmagent: adds google pubsub as remoteWrite dst and ingest consumer (#713)
it allows to push and receive metrics from google pubsub queue
Adds needed documentation and examples for it
2023-11-20 22:43:26 +02:00
hagen1778
0dbbffbdd5
docs: typo after 3f5a41e35e
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 20025d4fd6)
2023-11-20 17:06:21 +01:00
Khanh Quoc Le
03e5ebaea9
Add _stream fields log (#5068) 2023-11-17 16:04:13 +01:00
Aliaksandr Valialkin
5492ccf0d5
app/vmselect/promql: reduce the number of memory allocations inside copyTimeseriesShallow()
Previously the number of memory allocations inside copyTimeseriesShallow() was equal to 1+len(tss)
Reduce this number to 2 by pre-allocating a slice of timeseries structs with len(tss) length.
2023-11-17 15:41:38 +01:00
Aliaksandr Valialkin
8723c8546a
vendor: run make vendor-update 2023-11-16 20:21:16 +01:00
Aliaksandr Valialkin
994b3da361
app/vmselect: simplify code a bit after 63e0f16062
Use only a single call to prometheus.WriteErrorResponse() inside sendPrometheusError
2023-11-16 18:15:08 +01:00
Aliaksandr Valialkin
633ec37022
app/vmselect/promql: typo fix after 7ca8ebef20
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5332
2023-11-16 17:01:19 +01:00
Roman Khavronenko
c0039ce7a3
docs/vmalert: clarify deduplication recommendations for HA setup (#5336)
Please see discussion here https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5279

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-11-16 16:27:47 +01:00
Aliaksandr Valialkin
7ca8ebef20
app/vmselect/promql: properly handle duplicate series when merging cached results with the results obtained from the database
evalRollupFuncNoCache() may return time series with identical labels (aka duplicate series)
when performing queries satisfying all the following conditions:

- It must select time series with multiple metric names. For example, {__name__=~"foo|bar"}
- The series selector must be wrapped into rollup function, which drops metric names. For example, rate({__name__=~"foo|bar"})
- The rollup function must be wrapped into aggregate function, which has no streaming optimization.
  For example, quantile(0.9, rate({__name__=~"foo|bar"})

In this case VictoriaMetrics shouldn't return `cannot merge series: duplicate series found` error.
Instead, it should fall back to query execution with disabled cache.

Also properly store the merged results. Previously they were incorrectly stored because of a typo
introduced in the commit 41a0fdaf39

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5332
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5337
2023-11-16 16:16:17 +01:00
Yury Molodov
cc5f1745ca
vmui: change autocomplete hotkey to Alt/Option + A (#5328) 2023-11-15 23:33:33 +01:00
Aliaksandr Valialkin
68c0038a5d
docs/vmbackup.md: fix links to https://docs.victoriametrics.com/vmbackup.html#permanent-deletion-of-objects-in-s3-compatible-storages
This is a follow-up for 2fc7e9f47e
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5121
2023-11-15 23:27:00 +01:00
Aliaksandr Valialkin
9a1354e8a9
docs/vmagent.md: refer to proper command-line flag: -remoteWrite.shardByURL.labels instead of -remoteWrite.shardByURLLabels
This is a follow-up for ed70a40669

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4942
2023-11-15 23:03:30 +01:00
Aliaksandr Valialkin
f9355d34be
docs: mention that VictoriaMetrics and vmagent support data ingestion via New Relic protocol now
This is a follow-up for f60c08a7bd
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3520
2023-11-15 22:56:34 +01:00
Aliaksandr Valialkin
8b01c6caf4
app/vmalert-tool: add missing multiarch directory
This is needed for 'make publish-vmalert-tool'
2023-11-15 18:13:05 +01:00
Aliaksandr Valialkin
de3d5943eb
docs/stream-aggregation.md: clarify that stream aggregation is applied after all the configured relabeling
This is a follow-up after 68d2cb203d
2023-11-15 15:54:57 +01:00
Aliaksandr Valialkin
9d3f1ec0d0
app/vmctl/README.md: sync with docs/vmctl.md after 7b2e2a23c2 2023-11-15 12:58:31 +01:00
John Belmonte
e94ec36ef6
vmctl README.md typo (#5326) 2023-11-15 12:57:49 +01:00
hagen1778
cfc58dd932
docs: clarify vmalert flag changes
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-11-14 21:44:46 +01:00
Aliaksandr Valialkin
5b7f40907e
app/vmselect/netstorage: do not retry request when deadline is exceeded 2023-11-14 19:57:29 +01:00
Aliaksandr Valialkin
2f885d8e57
app/vmselect/promql: typo fixes after 7cf7740d18 2023-11-14 03:34:25 +01:00
Aliaksandr Valialkin
9ff1ee333f
app/vmselect/promql: properly handle instant query optimization conrner cases for min_over_time() and max_over_time()
- If min_over_time(m[offset] @ timestamp) <= min_over_time(m[offset] @ (timestamp-window)),
  then the optimization can be applied.

- If max_over_time(m[offset] @ timestamp) >= max_over_time(m[offset] @ (timestamp-window)),
  then the optimization can be applied.
2023-11-14 02:58:18 +01:00
Yury Molodov
0fe02e8d9d
vmui: reduced the number of server requests (#5253)
* vmui: reduced the number of server requests

* run `make vmui-update vmui-logs-update`

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-11-14 01:50:57 +01:00
Yury Molodov
33e65e2cab
vmui: fix trailing slash in serverURL (#5271)
* vmui: add function to autoremove slash at the end of serverURL (#5203)

* vmui: change removeTrailingSlash func
2023-11-14 01:24:29 +01:00
Noah Labrecque
fbb572a180
fix: apply correct bounds to sf and tf (#5274) 2023-11-14 01:19:47 +01:00
Zakhar Bessarab
f7834767c1
vmcluster: re-routing enhancement (#5293)
* app/vmstorage: close vminsert connections gradually before stopping storage

Implements graceful shutdown approach suggested here - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4922#issuecomment-1768146878

Test results for this can be found here - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4922#issuecomment-1790640274

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* app/vmstorage: update graceful shutdown logic

- close connections from vminsert in determenistic order
- update flag description
- lower default timeout to 25 seconds. 25 seconds value was chosen because the lowest default value used in default configuration deployments is 30s(default value in Kubernetes and ansible-playbooks).

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs/cluster: add information about re-routing enhancement during restart

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs/changelog: add entry for new command-line flag

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* {app/vmstorage,lib/ingestserver}: address review feedback

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs/cluster: add note to update workload scheduler timeout

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* wip

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-11-14 01:00:42 +01:00