Commit graph

10458 commits

Author SHA1 Message Date
Aliaksandr Valialkin
5d61122fd5
docs/victorialogs/cluster.md: add a link to the changelog for the latest available release 2025-04-10 17:09:23 +02:00
Aliaksandr Valialkin
e9c04879ce
docs/victorialogs/CHANGELOG.md: add release date for v1.18.0-victorialogs 2025-04-10 17:07:58 +02:00
Aliaksandr Valialkin
5f4205a050
deployment: update VictoriaLogs Docker image tag from v1.17.0-victorialogs to v1.18.0-victorialogs
See https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.18.0-victorialogs
2025-04-10 17:04:30 +02:00
Aliaksandr Valialkin
7a46af3920
victorialogs: add cluster mode
Cluster mode is enabled when -storageNode command-line flag is passed to VictoriaLogs.
In this mode it spreads the ingested logs among storage nodes specified in the -storageNode flag.
It also queries storage nodes during `select` queries.

Cluster mode allows building multi-level cluster setup when top-level select node can query multiple lower-level clusters
and get global querying view.

See https://docs.victoriametrics.com/victorialogs/cluster/

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5077
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7950
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8223
2025-04-10 16:55:23 +02:00
Aliaksandr Valialkin
ff967a8e65
lib/protoparser: support for identity encoding in a generic way inside protoparserutil.GetUncompressedReader
This should help avoiding future issues when `identity` encoding isn't replaced to `` encoding
by the caller of protoparserutil.GetUncompressedReader().

This is a follow-up for 303b425fa3

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8652
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8649
2025-04-10 13:52:45 +02:00
Artem Navoiev
3108376d95
docs: changelog fix the link to cluster version in 114 release.2
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2025-04-10 09:39:15 +02:00
Artem Navoiev
494fe4403a
docs: changelog fix the link to cluster version in 114 release
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2025-04-09 21:28:40 +02:00
Andrii Chubatiuk
303b425fa3
lib/protoparser/datadog*: support Content-Encoding: identity value
introduction of common decompression logic in
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8416 removed
ability to treat unsupported compression algorithms as uncompressed data
for datadog v1 endpoint. This PR adds support of `identity`
Content-Encoding header value, though according to RFC 2616 this value
is only expected in `Accept-Encoding` header

related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8649
2025-04-08 16:17:19 +02:00
Nikolay
8f3efde55d
lib/httpserver: mask authKey at PostFrom
'authKey' is well-known url and form param for VictoriaMetrics
components authorization. Previously, it could be printed into stdout
via httpserver error logger. It makes this authKey insecure and hard to
use.

This commit prevents from logging authKey defined at PostForm or as part
of url.Query.

It's recommneded to transfer authKey via PostForm and it should be
implemented at separate PRs.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5973

---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
2025-04-08 16:15:48 +02:00
Nikolay
f16938bba9
lib/backup/s3: properly set ProfileName
Previously, if ProfileName is set to empty value (as default). AWS s3
lib ignored any profile config defined with `-configProfilePath`.

This commit correctly configure client options and set profile name only
if it's set to non-empty value.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8668
2025-04-08 16:15:07 +02:00
nemobis
65ff04bc09
docs: Fix typo in changelog for v113
Fix a typo `scrapped` for `scraped`.
2025-04-08 16:14:50 +02:00
Zakhar Bessarab
e2715f94af
docs/guides/vmgateway-grafana-oidc: update guide for recent versions of components
- update grafana & keycloak to latest versions
- update UI images with the latest screenshots
- update wording to reflect UI changes
2025-04-08 16:13:58 +02:00
Zakhar Bessarab
582160f566
make: fix make package for vmalert-tool
`make package` relies on presence of `APP_NAME/deployment/Dockerfile`
which was missing for vmalert-tool.
2025-04-08 16:13:28 +02:00
nemobis
638f9839d5
docs: fix typo in pull request template
The verb is _adhere to_, see https://en.wiktionary.org/wiki/adhere .
2025-04-08 16:12:51 +02:00
Max Kotliar
3f5bf4bd03
vmagent/remotewrite: set content encoding header based on actual body
Improve remote write handling in vmagent by setting the
`Content-Encoding` header based on the actual request body, rather than
relying on configuration.

- Detects Zstd compression via the Zstd magic number.
- Falls back to Snappy if Zstd is not detected.
- Persistent queue may now contain mixed-encoding content.
- Add basic vmagent integration tests

Follow up on
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5344 and
12cd32fd75.

Extracted from
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8462

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5301
2025-04-08 16:12:06 +02:00
f41gh7
038419663b
docs: release follow-up
* mention lts release changes
* update vm apps versions at docs and deployment examples

Signed-off-by: f41gh7 <nik@victoriametrics.com>
2025-04-07 12:59:53 +02:00
f41gh7
123f373537
CHANGELOG.md: cut v1.115.0 release 2025-04-04 14:30:16 +02:00
f41gh7
57121c828f
make docs-update-version 2025-04-04 14:23:08 +02:00
f41gh7
aa5edbc706
make vmui-update 2025-04-04 14:20:37 +02:00
Andrii Chubatiuk
f9d8c86b0a
lib/streamaggr: fix panic in rate output
This commit properly reset aggregator state. Previously, it was not checked for `nil` and it lead to the panic on access.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8634
2025-04-04 14:14:52 +02:00
hansemschnokeloch
b733fc5b83
docs/vlogs: fix typo in README 2025-04-04 14:10:59 +02:00
Zakhar Bessarab
f2eaad62dc
docs/changelog: correct entry location after 298f862f
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-04-04 12:24:16 +04:00
Aliaksandr Valialkin
adae788b18
lib/logstorage: pad pipeStatsProcessorShard.groupMapShards in order to avoid false sharing when merging these shards in parallel on many CPU cores 2025-04-03 22:21:18 +02:00
Aliaksandr Valialkin
a65d10fcce
lib/logstorage: add padding between hitsMap items at hitsMapAdaptive.shards in order to avoid false sharing when processing the hitsMapAdaptive.shards on multiple CPU cores 2025-04-03 20:14:20 +02:00
Zakhar Bessarab
298f862fc0
deps: downgrade AWS dependencies
Pin AWS libraries to version before 2025-01-15 (see
https://github.com/aws/aws-sdk-go-v2/releases/tag/release-2025-01-15).

This version enabled request and response checksum verification by
default which breaks compatibility with non-AWS S3-compatible storage
providers.

See: https://github.com/victoriaMetrics/victoriaMetrics/issues/8622

Supersedes https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8630

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-04-03 18:05:07 +04:00
Zakhar Bessarab
aff1580a1d
app/vmauth: return non-OK response for timeouts and request cancellation
Currently, requests failing due to network timeout would receive "200
OK" while producing a warning log message about the timeout. This
behaviour is confusing and might produce unexpected issues as it is not
possible to retry errors properly.

Change this to return "502 Bad Gateway" response so that error can be
handled by the client.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8621

Config for testing:
```
unauthorized_user:
  url_prefix: "http://example.com:9800"
```

Before the change:
```
*   Trying 127.0.0.1:8427...
* Connected to 127.0.0.1 (127.0.0.1) port 8427
* using HTTP/1.x
> HEAD /api/v1/query HTTP/1.1
> Host: 127.0.0.1:8427
> User-Agent: curl/8.12.1
> Accept: */*
>
* Request completely sent off
/* NOTE: 30 seconds timeout passes */
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Vary: Accept-Encoding
Vary: Accept-Encoding
< X-Server-Hostname: pc
X-Server-Hostname: pc
< Date: Tue, 01 Apr 2025 08:54:05 GMT
Date: Tue, 01 Apr 2025 08:54:05 GMT
<

* Connection  to host 127.0.0.1 left intact
```

After:
```
*   Trying 127.0.0.1:8427...
* Connected to 127.0.0.1 (127.0.0.1) port 8427
* using HTTP/1.x
> HEAD /api/v1/query HTTP/1.1
> Host: 127.0.0.1:8427
> User-Agent: curl/8.12.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 502 Bad Gateway
HTTP/1.1 502 Bad Gateway
< Content-Type: text/plain; charset=utf-8
Content-Type: text/plain; charset=utf-8
< Vary: Accept-Encoding
Vary: Accept-Encoding
< X-Content-Type-Options: nosniff
X-Content-Type-Options: nosniff
< X-Server-Hostname: pc
X-Server-Hostname: pc
< Date: Tue, 01 Apr 2025 09:13:57 GMT
Date: Tue, 01 Apr 2025 09:13:57 GMT
< Content-Length: 109
Content-Length: 109
<

* Connection  to host 127.0.0.1 left intact
```

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-04-03 13:44:51 +04:00
hagen1778
d4c0a42c1b
docs: improve wording for recent vmalert changes
follow-up for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8522

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 3cbc3eb19f)
2025-04-03 09:47:34 +01:00
Emre Yazıcı
a9736a5bfb
app/vmalert: show partial responses in debug logs ()
### Describe Your Changes

Log when the data response from vmselect is partial during
rule(recording, alertingrule) evaluations.

vmselect returns `isPartial: true` in case data is not fully fetched
from scattered vmstorages. At the time of rule evals, it may be drifting
apart from real values due to missing points. This is an important event
that should be logged to inform users to see how often that happens as
it may lead to false positive alerts.

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: emreya <emre.yazici@adyen.com>
Signed-off-by: emreya <e.yazici1990@gmail.com>
Signed-off-by: Emre Yazici <e.yazici1990@gmail.com>
(cherry picked from commit 56f60e8be9)
2025-04-03 09:47:34 +01:00
Artem Fetishev
2e4beeefb1
Update series count docs ()
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-04-03 10:37:35 +02:00
Aliaksandr Valialkin
635c5c9feb
app/vlselect: run /select/logsql/tail queries without concurrency limit
The concurrency limit is intended for short-running queries. If it is applied to tail queries,
then this can affect short-running queries.
2025-04-02 20:22:27 +02:00
Aliaksandr Valialkin
4e1260e189
app/vlselect: do not log canceled requests, since they are expected and legal 2025-04-02 19:14:44 +02:00
Aliaksandr Valialkin
ca3910748f
deployment: update Go builder from Go1.24.1 to Go1.24.2
See https://github.com/golang/go/issues?q=milestone%3AGo1.24.2+label%3ACherryPickApproved
2025-04-02 18:01:21 +02:00
Artem Fetishev
2f0796ff40
lib/storage: When creating and listing snapshots, panic instead of returning an error ()
When creating and listing snapshots, panic instead of returning an error
since errors are not recoverable anyway.
Also do not cleanup the filesystem on panic. Leave as is for further
manual inspection.

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-04-02 15:47:23 +02:00
Artem Fetishev
cdba6dbc0e
lib/storage: Pass the partition time range during the partition creation and opening ()
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-04-02 14:57:59 +02:00
Aliaksandr Valialkin
f18daaeac5
app/vmui: replace old-style links to https://docs.victoriametrics.com/MetricsQL.html with https://docs.victoriametrics.com/metricsql/
Replace also https://docs.victoriametrics.com/keyConcepts.html with https://docs.victoriametrics.com/keyconcepts/

This is the follow-up for the commit ee1da35071
2025-04-02 13:22:58 +02:00
Artem Fetishev
a9f124388f
lib/storage: mergeBlockStreams(): replace the dependency on Storage with dependency on the set of deleted metricIDs ()
This should narrow down the function dependencies and simplify testing.

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-04-02 13:16:26 +02:00
Aliaksandr Valialkin
4b2276608b
docs/victoriametrics/vmagent.md: mention that increasing scrape_interval can reduce CPU usage 2025-04-02 12:41:57 +02:00
Aliaksandr Valialkin
b352470ae1
docs/victoriametrics/vmagent.md: mention that -promscrape.disableKeepAlive option can reduce RAM usage when scraping thousands of targets 2025-04-01 23:22:13 +02:00
Aliaksandr Valialkin
b3261a1b87
lib/promscrape: do not clutter logs with cannot scrape target ...: context canceled errors when vmagent is stopped 2025-04-01 23:20:43 +02:00
Aliaksandr Valialkin
6d5973dcb0
docs/victoriametrics/vmagent.md: change GOGC from 50 to 100 in the example of optimized config for vmagent
This is a follow-up after bf024d3dce,
2025-04-01 21:36:04 +02:00
Aliaksandr Valialkin
74f17bb67e
docs/victoriametrics/vmagent.md: remove the recommendation to set GOGC to 50 at vmagent in order to reduce CPU usage
The default GOGC is set to 50 at vmagent after bf024d3dce,
so this recommendation makes no sense. Leave the recommendation to increase GOGC to 100.
2025-04-01 21:14:33 +02:00
Aliaksandr Valialkin
bf024d3dce
app/vmagent: increase the default GOGC from 30 to 50
This reduces CPU usage by up to 30% in exchange of the increased RAM usage by 10%
when scraping thousands of targets, which expose millions of metrics in summary.

This looks like a good tradeoff after the commit edac875179 ,
which reduced RAM usage by more than 10%, so the final RAM usage for vmagent
is still lower than the RAM usage at v1.114.0 by ~15%, while CPU usage drops by 30%.
2025-04-01 21:04:28 +02:00
Aliaksandr Valialkin
5b87aff830
lib/promscrape: use chunkedbuffer.Buffer instead of bytesutil.ByteBuffer for reading response body from scrape targets
This reduces memory usage when reading large response bodies because the underlying buffer
doesn't need to be re-allocated during the read of large response body in the buffer.

Also decompress response body under the processScrapedDataConcurrencyLimitCh .
This reduces CPU usage and RAM usage a bit when scraping thousands of targets.
2025-04-01 20:30:39 +02:00
Aliaksandr Valialkin
34d35869fa
docs/victoriametrics/vmagent.md: add Performance optimizations chapter
Enumerate the most commonly used options for reducing CPU usage and RAM usage
for vmagent, which scrapes thousands of targets.

See https://docs.victoriametrics.com/vmagent/#performance-optimizations
2025-04-01 18:35:43 +02:00
Max Kotliar
b1d1f1f461
vmagent/remotewrite: fix golangci-lint code style issue
### Describe Your Changes

Fixes golangci-lint issues introduced in
98f1e32e39

```
--- a/app/vmagent/remotewrite/pendingseries.go
+++ b/app/vmagent/remotewrite/pendingseries.go
@@ -202,7 +202,7 @@ func (wr *writeRequest) copyTimeSeries(dst, src *prompbmarshal.TimeSeries) {

 	// Pre-allocate memory for labels.
 	labelsLen := len(wr.labels)
-	wr.labels = slicesutil.SetLength(wr.labels, labelsLen + len(labelsSrc))
+	wr.labels = slicesutil.SetLength(wr.labels, labelsLen+len(labelsSrc))
 	labelsDst := wr.labels[labelsLen:]

 	// Pre-allocate memory for byte slice needed for storing label names and values.
@@ -212,7 +212,7 @@ func (wr *writeRequest) copyTimeSeries(dst, src *prompbmarshal.TimeSeries) {
 		neededBufLen += len(label.Name) + len(label.Value)
 	}
 	bufLen := len(wr.buf)
-	wr.buf = slicesutil.SetLength(wr.buf, bufLen + neededBufLen)
+	wr.buf = slicesutil.SetLength(wr.buf, bufLen+neededBufLen) buf := wr.buf[:bufLen]

 	// Copy labels

```

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2025-04-01 18:37:02 +04:00
Aliaksandr Valialkin
98f1e32e39
app/vmagent/remotewrite: optimize writeRequest.copyTimeSeries a bit
Pre-allocate memory for labels and for the needed byte buffer used
for holding the copied label names and values.
2025-04-01 15:58:04 +02:00
Aliaksandr Valialkin
edac875179
lib/promscrape: always store the last response per every scrape target in compressed form
This reduces memory usage for vmagent when scraping big number of targets at the cost of slightly higher CPU usage.

The increased CPU usage can be decreased by disabling tracking of stale markers either via -promscrape.noStaleMarkers
command-line flag or via `no_stale_markers: true` option at the scrape config pointed by -promscrape.config command-line flag.
See https://docs.victoriametrics.com/vmagent/#prometheus-staleness-markers
2025-04-01 15:27:11 +02:00
Aliaksandr Valialkin
0ff1a3b154
lib/leveledbytebufferpool: start with the pools[0] for byte slices up to 256 bytes
The pool is used mostly for obtaining byte buffers for responses from scrape targets.
There are no responses smaller than 256 bytes in practice, so there is no sense in maintaining
pools for byte slices up to 64 and 128 bytes.
2025-04-01 12:01:21 +02:00
Aliaksandr Valialkin
bbe58cc37b
lib/promscrape: make sure that the maxLabelsLen contains really the maximum len(wc.labels) among concurrently running callbacks at stream.Parse
Previously the maxLabelsLen could be updated with smaller value after it is updated to bigger value by concurrently running goroutines.
Prevent this by loading the latest maxLabelsLen value and updating it only if it is smaller than the current len(wc.labels)
before the exit from callback passed to stream.Parse.

While at it, return early from the callback on the sample_limit exceeding error,
since the rest of the code in the callback becomes no-op after wc.reset().
This simplifies following the logic in the code a bit.

Also remove outdated misleading comment in front of sw.pushData() call inside callbacks passed to stream.Parse.
This comment has no sense after every callback start working with its own goroutine-local wc.
2025-04-01 11:49:35 +02:00
Aliaksandr Valialkin
78dca6ee6e
lib/promscrape: tune leveledWriteRequestCtxPool a bit
Start with writeRequestCtx containing up to 256 labels instead of 8 labels,
since a typical response from scrape target contains much more than 8 labels across all the exposed metrics.

Do not pre-allocate labels at writeRequestCtx, since they are pre-allocated inside writeRequestCtx.addRows(),
together with the pre-allocation of samples and writeRequest.Timeseries.
2025-04-01 02:11:14 +02:00