Commit graph

10443 commits

Author SHA1 Message Date
Aliaksandr Valialkin
27999ed356
docs/victorialogs/cluster.md: add a link to the changelog for the latest available release 2025-04-10 17:09:42 +02:00
Aliaksandr Valialkin
5ccd24f7e0
docs/victorialogs/CHANGELOG.md: add release date for v1.18.0-victorialogs 2025-04-10 17:08:04 +02:00
Aliaksandr Valialkin
311468ff59
deployment: update VictoriaLogs Docker image tag from v1.17.0-victorialogs to v1.18.0-victorialogs
See https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.18.0-victorialogs
2025-04-10 17:05:35 +02:00
Aliaksandr Valialkin
04fb337752
victorialogs: add cluster mode
Cluster mode is enabled when -storageNode command-line flag is passed to VictoriaLogs.
In this mode it spreads the ingested logs among storage nodes specified in the -storageNode flag.
It also queries storage nodes during `select` queries.

Cluster mode allows building multi-level cluster setup when top-level select node can query multiple lower-level clusters
and get global querying view.

See https://docs.victoriametrics.com/victorialogs/cluster/

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5077
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7950
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8223
2025-04-10 16:57:15 +02:00
Aliaksandr Valialkin
754f69fb73
docs/anomaly-detection: sync with master branch after some commits were missed for cherry-picking from the master branch
Missing cherry-picked commits:

- 30b61c6d8a
- a335ed23c7

These commits cannot be cherry-picked now because of too big churn in the docs/anomaly-detection after these commits.
2025-04-10 14:01:46 +02:00
Aliaksandr Valialkin
b61d8059b8
lib/protoparser: support for identity encoding in a generic way inside protoparserutil.GetUncompressedReader
This should help avoiding future issues when `identity` encoding isn't replaced to `` encoding
by the caller of protoparserutil.GetUncompressedReader().

This is a follow-up for 303b425fa3

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8652
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8649
2025-04-10 13:52:31 +02:00
Artem Navoiev
15d91c03ce
docs: changelog fix the link to cluster version in 114 release.2
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2025-04-10 11:45:10 +02:00
Artem Navoiev
2621fe4fb5
docs: changelog fix the link to cluster version in 114 release
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2025-04-10 11:45:09 +02:00
Andrii Chubatiuk
61ab2e14b4
lib/protoparser/datadog*: support Content-Encoding: identity value
introduction of common decompression logic in
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8416 removed
ability to treat unsupported compression algorithms as uncompressed data
for datadog v1 endpoint. This PR adds support of `identity`
Content-Encoding header value, though according to RFC 2616 this value
is only expected in `Accept-Encoding` header

related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8649
2025-04-08 17:45:22 +02:00
Nikolay
d5522e7c15
lib/httpserver: mask authKey at PostFrom
'authKey' is well-known url and form param for VictoriaMetrics
components authorization. Previously, it could be printed into stdout
via httpserver error logger. It makes this authKey insecure and hard to
use.

This commit prevents from logging authKey defined at PostForm or as part
of url.Query.

It's recommneded to transfer authKey via PostForm and it should be
implemented at separate PRs.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5973

---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
2025-04-08 17:45:22 +02:00
Nikolay
ebe15e0c7b
lib/backup/s3: properly set ProfileName
Previously, if ProfileName is set to empty value (as default). AWS s3
lib ignored any profile config defined with `-configProfilePath`.

This commit correctly configure client options and set profile name only
if it's set to non-empty value.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8668
2025-04-08 17:45:22 +02:00
nemobis
f2cd4ceb3a
docs: Fix typo in changelog for v113
Fix a typo `scrapped` for `scraped`.
2025-04-08 17:45:21 +02:00
Zakhar Bessarab
9f89eee091
docs/guides/vmgateway-grafana-oidc: update guide for recent versions of components
- update grafana & keycloak to latest versions
- update UI images with the latest screenshots
- update wording to reflect UI changes
2025-04-08 17:45:21 +02:00
Zakhar Bessarab
47e7a8b2e2
make: fix make package for vmalert-tool
`make package` relies on presence of `APP_NAME/deployment/Dockerfile`
which was missing for vmalert-tool.
2025-04-08 17:45:21 +02:00
nemobis
cecaeecd09
docs: fix typo in pull request template
The verb is _adhere to_, see https://en.wiktionary.org/wiki/adhere .
2025-04-08 17:45:21 +02:00
Max Kotliar
79254126f1
vmagent/remotewrite: set content encoding header based on actual body
Improve remote write handling in vmagent by setting the
`Content-Encoding` header based on the actual request body, rather than
relying on configuration.

- Detects Zstd compression via the Zstd magic number.
- Falls back to Snappy if Zstd is not detected.
- Persistent queue may now contain mixed-encoding content.
- Add basic vmagent integration tests

Follow up on
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5344 and
12cd32fd75.

Extracted from
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8462

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5301
2025-04-08 17:45:20 +02:00
f41gh7
c2db691c88
docs: release follow-up
* mention lts release changes
* update vm apps versions at docs and deployment examples

Signed-off-by: f41gh7 <nik@victoriametrics.com>
2025-04-07 13:00:26 +02:00
f41gh7
7f124053cb
CHANGELOG.md: cut v1.115.0 release 2025-04-04 14:30:22 +02:00
f41gh7
8f0a8825ae
make docs-update-version 2025-04-04 14:23:48 +02:00
f41gh7
e6ccb307c5
make vmui-update 2025-04-04 14:23:48 +02:00
Andrii Chubatiuk
d339f75159
lib/streamaggr: fix panic in rate output
This commit properly reset aggregator state. Previously, it was not checked for `nil` and it lead to the panic on access.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8634
2025-04-04 14:17:17 +02:00
hansemschnokeloch
9ae124d7e0
docs/vlogs: fix typo in README 2025-04-04 14:17:16 +02:00
Zakhar Bessarab
ec3f11451a
docs/changelog: correct entry location after 298f862f
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-04-04 12:25:17 +04:00
Aliaksandr Valialkin
0bcb5194b7
lib/logstorage: pad pipeStatsProcessorShard.groupMapShards in order to avoid false sharing when merging these shards in parallel on many CPU cores 2025-04-03 22:21:35 +02:00
Aliaksandr Valialkin
1525a93f21
lib/logstorage: add padding between hitsMap items at hitsMapAdaptive.shards in order to avoid false sharing when processing the hitsMapAdaptive.shards on multiple CPU cores 2025-04-03 20:15:19 +02:00
Zakhar Bessarab
09a15d5239
deps: downgrade AWS dependencies
Pin AWS libraries to version before 2025-01-15 (see
https://github.com/aws/aws-sdk-go-v2/releases/tag/release-2025-01-15).

This version enabled request and response checksum verification by
default which breaks compatibility with non-AWS S3-compatible storage
providers.

See: https://github.com/victoriaMetrics/victoriaMetrics/issues/8622

Supersedes https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8630

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-04-03 18:06:57 +04:00
Zakhar Bessarab
dfc24db513
app/vmauth: return non-OK response for timeouts and request cancellation
Currently, requests failing due to network timeout would receive "200
OK" while producing a warning log message about the timeout. This
behaviour is confusing and might produce unexpected issues as it is not
possible to retry errors properly.

Change this to return "502 Bad Gateway" response so that error can be
handled by the client.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8621

Config for testing:
```
unauthorized_user:
  url_prefix: "http://example.com:9800"
```

Before the change:
```
*   Trying 127.0.0.1:8427...
* Connected to 127.0.0.1 (127.0.0.1) port 8427
* using HTTP/1.x
> HEAD /api/v1/query HTTP/1.1
> Host: 127.0.0.1:8427
> User-Agent: curl/8.12.1
> Accept: */*
>
* Request completely sent off
/* NOTE: 30 seconds timeout passes */
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Vary: Accept-Encoding
Vary: Accept-Encoding
< X-Server-Hostname: pc
X-Server-Hostname: pc
< Date: Tue, 01 Apr 2025 08:54:05 GMT
Date: Tue, 01 Apr 2025 08:54:05 GMT
<

* Connection  to host 127.0.0.1 left intact
```

After:
```
*   Trying 127.0.0.1:8427...
* Connected to 127.0.0.1 (127.0.0.1) port 8427
* using HTTP/1.x
> HEAD /api/v1/query HTTP/1.1
> Host: 127.0.0.1:8427
> User-Agent: curl/8.12.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 502 Bad Gateway
HTTP/1.1 502 Bad Gateway
< Content-Type: text/plain; charset=utf-8
Content-Type: text/plain; charset=utf-8
< Vary: Accept-Encoding
Vary: Accept-Encoding
< X-Content-Type-Options: nosniff
X-Content-Type-Options: nosniff
< X-Server-Hostname: pc
X-Server-Hostname: pc
< Date: Tue, 01 Apr 2025 09:13:57 GMT
Date: Tue, 01 Apr 2025 09:13:57 GMT
< Content-Length: 109
Content-Length: 109
<

* Connection  to host 127.0.0.1 left intact
```

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-04-03 13:45:56 +04:00
hagen1778
3cbc3eb19f
docs: improve wording for recent vmalert changes
follow-up for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8522

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-04-03 09:47:06 +01:00
Emre Yazıcı
56f60e8be9
app/vmalert: show partial responses in debug logs ()
### Describe Your Changes

Log when the data response from vmselect is partial during
rule(recording, alertingrule) evaluations.

vmselect returns `isPartial: true` in case data is not fully fetched
from scattered vmstorages. At the time of rule evals, it may be drifting
apart from real values due to missing points. This is an important event
that should be logged to inform users to see how often that happens as
it may lead to false positive alerts.

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: emreya <emre.yazici@adyen.com>
Signed-off-by: emreya <e.yazici1990@gmail.com>
Signed-off-by: Emre Yazici <e.yazici1990@gmail.com>
2025-04-03 09:42:39 +01:00
Artem Fetishev
18c97b827b
Update series count docs ()
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-04-03 10:39:43 +02:00
Aliaksandr Valialkin
abd5167874
app/vlselect: run /select/logsql/tail queries without concurrency limit
The concurrency limit is intended for short-running queries. If it is applied to tail queries,
then this can affect short-running queries.
2025-04-02 20:22:42 +02:00
Aliaksandr Valialkin
b1c9aa9ec8
app/vlselect: do not log canceled requests, since they are expected and legal 2025-04-02 19:16:27 +02:00
Aliaksandr Valialkin
19dfb6d0cf
deployment: update Go builder from Go1.24.1 to Go1.24.2
See https://github.com/golang/go/issues?q=milestone%3AGo1.24.2+label%3ACherryPickApproved
2025-04-02 18:01:44 +02:00
Artem Fetishev
4b3d7627f7
lib/storage: When creating and listing snapshots, panic instead of returning an error ()
When creating and listing snapshots, panic instead of returning an error
since errors are not recoverable anyway.
Also do not cleanup the filesystem on panic. Leave as is for further
manual inspection.

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-04-02 16:00:25 +02:00
Artem Fetishev
cf340f6e76
lib/storage: Pass the partition time range during the partition creation and opening ()
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-04-02 15:04:07 +02:00
Aliaksandr Valialkin
f705ea6100
app/vmui: replace old-style links to https://docs.victoriametrics.com/MetricsQL.html with https://docs.victoriametrics.com/metricsql/
Replace also https://docs.victoriametrics.com/keyConcepts.html with https://docs.victoriametrics.com/keyconcepts/

This is the follow-up for the commit ee1da35071
2025-04-02 13:24:38 +02:00
Artem Fetishev
bc0e651fd2
lib/storage: mergeBlockStreams(): replace the dependency on Storage with dependency on the set of deleted metricIDs ()
This should narrow down the function dependencies and simplify testing.

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-04-02 13:20:32 +02:00
Aliaksandr Valialkin
c48db5e171
docs/victoriametrics/vmagent.md: mention that increasing scrape_interval can reduce CPU usage 2025-04-02 12:42:11 +02:00
Aliaksandr Valialkin
d66656ab19
docs/victoriametrics/vmagent.md: mention that -promscrape.disableKeepAlive option can reduce RAM usage when scraping thousands of targets 2025-04-01 23:22:42 +02:00
Aliaksandr Valialkin
984d294b9c
lib/promscrape: do not clutter logs with cannot scrape target ...: context canceled errors when vmagent is stopped 2025-04-01 23:22:42 +02:00
Aliaksandr Valialkin
713e6aab4a
docs/victoriametrics/vmagent.md: change GOGC from 50 to 100 in the example of optimized config for vmagent
This is a follow-up after bf024d3dce,
2025-04-01 21:36:22 +02:00
Aliaksandr Valialkin
36a380f762
docs/victoriametrics/vmagent.md: remove the recommendation to set GOGC to 50 at vmagent in order to reduce CPU usage
The default GOGC is set to 50 at vmagent after bf024d3dce,
so this recommendation makes no sense. Leave the recommendation to increase GOGC to 100.
2025-04-01 21:15:41 +02:00
Aliaksandr Valialkin
15e1f03940
app/vmagent: increase the default GOGC from 30 to 50
This reduces CPU usage by up to 30% in exchange of the increased RAM usage by 10%
when scraping thousands of targets, which expose millions of metrics in summary.

This looks like a good tradeoff after the commit edac875179 ,
which reduced RAM usage by more than 10%, so the final RAM usage for vmagent
is still lower than the RAM usage at v1.114.0 by ~15%, while CPU usage drops by 30%.
2025-04-01 21:08:55 +02:00
Aliaksandr Valialkin
f632ab8763
lib/promscrape: use chunkedbuffer.Buffer instead of bytesutil.ByteBuffer for reading response body from scrape targets
This reduces memory usage when reading large response bodies because the underlying buffer
doesn't need to be re-allocated during the read of large response body in the buffer.

Also decompress response body under the processScrapedDataConcurrencyLimitCh .
This reduces CPU usage and RAM usage a bit when scraping thousands of targets.
2025-04-01 20:48:14 +02:00
Aliaksandr Valialkin
53fe13ce72
docs/victoriametrics/vmagent.md: add Performance optimizations chapter
Enumerate the most commonly used options for reducing CPU usage and RAM usage
for vmagent, which scrapes thousands of targets.

See https://docs.victoriametrics.com/vmagent/#performance-optimizations
2025-04-01 18:36:51 +02:00
Max Kotliar
e6b00253a4
vmagent/remotewrite: fix golangci-lint code style issue
### Describe Your Changes

Fixes golangci-lint issues introduced in
98f1e32e39

```
--- a/app/vmagent/remotewrite/pendingseries.go
+++ b/app/vmagent/remotewrite/pendingseries.go
@@ -202,7 +202,7 @@ func (wr *writeRequest) copyTimeSeries(dst, src *prompbmarshal.TimeSeries) {

 	// Pre-allocate memory for labels.
 	labelsLen := len(wr.labels)
-	wr.labels = slicesutil.SetLength(wr.labels, labelsLen + len(labelsSrc))
+	wr.labels = slicesutil.SetLength(wr.labels, labelsLen+len(labelsSrc))
 	labelsDst := wr.labels[labelsLen:]

 	// Pre-allocate memory for byte slice needed for storing label names and values.
@@ -212,7 +212,7 @@ func (wr *writeRequest) copyTimeSeries(dst, src *prompbmarshal.TimeSeries) {
 		neededBufLen += len(label.Name) + len(label.Value)
 	}
 	bufLen := len(wr.buf)
-	wr.buf = slicesutil.SetLength(wr.buf, bufLen + neededBufLen)
+	wr.buf = slicesutil.SetLength(wr.buf, bufLen+neededBufLen) buf := wr.buf[:bufLen]

 	// Copy labels

```

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2025-04-01 18:38:30 +04:00
Aliaksandr Valialkin
810a4f55d4
app/vmagent/remotewrite: optimize writeRequest.copyTimeSeries a bit
Pre-allocate memory for labels and for the needed byte buffer used
for holding the copied label names and values.
2025-04-01 16:00:30 +02:00
Aliaksandr Valialkin
f6bb26cd08
lib/promscrape: always store the last response per every scrape target in compressed form
This reduces memory usage for vmagent when scraping big number of targets at the cost of slightly higher CPU usage.

The increased CPU usage can be decreased by disabling tracking of stale markers either via -promscrape.noStaleMarkers
command-line flag or via `no_stale_markers: true` option at the scrape config pointed by -promscrape.config command-line flag.
See https://docs.victoriametrics.com/vmagent/#prometheus-staleness-markers
2025-04-01 16:00:30 +02:00
Aliaksandr Valialkin
94f89b7898
lib/leveledbytebufferpool: start with the pools[0] for byte slices up to 256 bytes
The pool is used mostly for obtaining byte buffers for responses from scrape targets.
There are no responses smaller than 256 bytes in practice, so there is no sense in maintaining
pools for byte slices up to 64 and 128 bytes.
2025-04-01 12:05:19 +02:00
Aliaksandr Valialkin
c80025bbfd
lib/promscrape: make sure that the maxLabelsLen contains really the maximum len(wc.labels) among concurrently running callbacks at stream.Parse
Previously the maxLabelsLen could be updated with smaller value after it is updated to bigger value by concurrently running goroutines.
Prevent this by loading the latest maxLabelsLen value and updating it only if it is smaller than the current len(wc.labels)
before the exit from callback passed to stream.Parse.

While at it, return early from the callback on the sample_limit exceeding error,
since the rest of the code in the callback becomes no-op after wc.reset().
This simplifies following the logic in the code a bit.

Also remove outdated misleading comment in front of sw.pushData() call inside callbacks passed to stream.Parse.
This comment has no sense after every callback start working with its own goroutine-local wc.
2025-04-01 11:53:45 +02:00