Commit graph

2584 commits

Author SHA1 Message Date
Andrii Chubatiuk
937ae2ca90
lib/streamaggr: added stale samples metric, added metrics labels (#6462)
### Describe Your Changes

- added stale metrics counters for input and output samples
- added labels for aggregator metrics =>
`name="{rwctx}:{aggrId}:{aggrSuffix}"`
   - rwctx - global or number starting from 1
   - aggrid - aggregator id starting from 1
   - aggrSuffix - <interval>_(by|without)_label1_label2_labeln
   e.g: `name="global:1:1m_without_instance_pod"`

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>

(cherry picked from commit 861852f262)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-07-01 15:01:49 +02:00
Aliaksandr Valialkin
208a624d4d
lib/logstorage: properly search for the surrounding logs in stream_context pipe
The set of log fields in the found logs may differ from the set of log fields present in the log stream.
So compare only the log fields in the found logs when searching for the matching log entry in the log stream.

While at it, return _stream field in the delimiter log entry, since this field is used by VictoriaLogs Web UI
for grouping logs by log streams.
2024-07-01 02:33:00 +02:00
Aliaksandr Valialkin
76a58ae08d
lib/logstorage: add ability to store sorted log position into a separate field with sort ... rank <fieldName> syntax 2024-07-01 01:46:03 +02:00
Aliaksandr Valialkin
d0dca7b8c5
lib/logstorage: add delimiter between log chunks returned from | stream_context pipe 2024-07-01 01:46:02 +02:00
Aliaksandr Valialkin
4b3477e62b
lib/logstorage: add stream_context pipe, which allows selecting surrounding logs for the matching logs 2024-06-28 19:15:19 +02:00
Aliaksandr Valialkin
2f28819bb1
lib/logstorage: it is safe using | unroll pipe in live tailing
`| unroll` pipe can make multiple copies of rows from the input row.
This doesn't break live tailing, so allow `| unroll` pipe in live tailing.
2024-06-27 19:45:12 +02:00
Aliaksandr Valialkin
b26acec9a8
app/vlselect: properly return live tailing results 2024-06-27 15:06:15 +02:00
Aliaksandr Valialkin
dd62a2b9d6
lib/logstorage: work-in-progress 2024-06-27 14:21:03 +02:00
Andrii Chubatiuk
580d02c3f8
added IMDSv2 for YC SD (#6524)
### Describe Your Changes

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5513

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2024-06-26 19:12:35 +02:00
rtm0
48a5c4cb01
Fix Date metricid cache consistency under concurrent use (#6534)
### Describe Your Changes

Fix Date metricid cache consistency under concurrent use.
When one goroutine calls Has() and does not find the cache entry in the
immutable map it will acquire a lock and check the mutable map. And it
is possible that before that lock is acquired, the entry is moved from
the mutable map to the immutable map by another goroutine causing a
cache miss.

The fix is to check the immutable map again once the lock is acquired. 

### Checklist

The following checks are **mandatory**:

- [x ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: Artem Fetishev <wwctrsrx@gmail.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2024-06-26 19:12:34 +02:00
Aliaksandr Valialkin
d5cbda3424
app/vlstorage: add -retention.maxDiskSpaceUsageBytes command-line flag for limiting the retention at VictoriaLogs by disk space usage 2024-06-25 17:30:46 +02:00
Aliaksandr Valialkin
f24123a776
lib/logstorage: parse syslog structured data into separate fields in order to simplify further querying of this data 2024-06-25 14:54:25 +02:00
Aliaksandr Valialkin
1716c4e609
lib/logstorage: properly parse timezone offset at TryParseTimestampRFC3339Nano()
The TryParseTimestampRFC3339Nano() must properly parse RFC3339 timestamps with timezone offsets.

While at it, make tryParseTimestampISO8601 function private in order to prevent
from improper usage of this function from outside the lib/logstorage package.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6508
2024-06-25 14:54:24 +02:00
Aliaksandr Valialkin
2a7fcba330
lib/logstorage: make golangci-lint happy 2024-06-25 03:06:28 +02:00
Aliaksandr Valialkin
7026498359
lib/httpserver: revert 9b7e532172
Reason for revert: this commit doesn't resolve real security issues,
while it complicates the resulting code in subtle ways (aka security circus).

Comparison of two strings (passwords, auth keys) takes a few nanoseconds.
This comparison is performed in non-trivial http handler, which takes thousands
of nanoseconds, and the request handler timing is non-deterministic because of Go runtime,
Go GC and other concurrently executed goroutines. The request handler timing is even
more non-deterministic when the application is executed in shared environments
such as Kubernetes, where many other applications may run on the same host and use
shared resources of this host (CPU, RAM bandwidth, network bandwidth).

Additionally, it is expected that the passwords and auth keys are passed via TLS-encrypted connections.
Establishing TLS connections takes additional non-trivial time (millions of nanoseconds),
which depends on many factors such as network latency, network congestion, etc.

This makes impossible to conduct timing attack on passwords and auth keys in VictoriaMetrics components.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6423/files
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6392
2024-06-25 01:51:06 +02:00
Aliaksandr Valialkin
7de6f5b4ce
lib/logstorage: work-in-progress 2024-06-25 00:44:57 +02:00
Andrii Chubatiuk
50783fca4d
app/vmagent: add max_scrape_size to scrape config (#6434)
Related to
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6429

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 1e83598be3)
2024-06-20 14:00:22 +02:00
Slava Bobik
a7266785ce
Fixed a typo in the FastQueue mutex comment (#6514)
### Describe Your Changes

Fixed a small typo in a comment about the mutex inside the FastQueue
struct

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

(cherry picked from commit d236604d39)
2024-06-20 14:00:08 +02:00
Aliaksandr Valialkin
d5224f3363
lib/logstorage: work-in-progress 2024-06-20 03:10:37 +02:00
Zakhar Bessarab
886f545f81
lib/fs/fscore: do not trim content from path (#6503)
### Describe Your Changes

Trimming content which is loaded from an external pass leads to obscure
issues in case user-defined input contained trimmed chars. For example.
user-defined password "foo\n" will become "foo" while user will expect
it to contain a new line.

---
For example, a user defines a password which ends with `\n`. This often
happens when user Kubernetes secrets and manually encodes value as
base64-encoded string.

In this case vmauth configuration might look like:
```
users:
  - url_prefix:
      - http://vminsert:8480/insert/0/prometheus/api/v1/write
    name: foo
    username: foo
    password: "foobar\n"
```

vmagent configuration for this setup will use the following flags:
```
-remoteWrite.url=http://vmauth:8427/
-remoteWrite.basicAuth.passwordFile=/tmp/vmagent-password
-remoteWrite.basicAuth.username="foo"
```
Where `/tmp/vmagent-password` is a file with `foobar\n` password.

Before this change such configuration will result in `401 Unauthorized`
response received by vmagent since after file content will become
`foobar`.

---
An example with Kubernetes operator which uses a secret to reference the
same password in multiple configurations.

<details>
  <summary>See full manifests</summary>

`Secret`:
```
apiVersion: v1
data:
  name: Zm9v # foo
  password: Zm9vYmFy # foobar\n
  username: Zm9v= # foo
kind: Secret
metadata:
  name: vmuser
```

`VMUser`:
```
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMUser
metadata:
  name: vmagents
spec:
  generatePassword: false
  name: vmagents
  targetRefs:
  - crd:
      kind: VMAgent
      name: some-other-agent
      namespace: example
  username: foo
  # note - the secret above is referenced to provide password
  passwordRef:
    name: vmagent
    key: password
```

`VMAgent`:
```
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMAgent
metadata:
  name: example
spec:
  selectAllByDefault: true
  scrapeInterval: 5s
  replicaCount: 1
  remoteWrite:
    - url: "http://vmauth-vmauth-example:8427/api/v1/write"
      # note - the secret above is referenced as well
      basicAuth:
        username:
          name: vmagent
          key: username
        password:
          name: vmagent
          key: password
```

</details>

Since both config target exactly the same `Secret` object it is expected
to work, but apparently the result will be `401 Unauthrized` error.

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 201fd6de1e)
2024-06-19 10:37:12 +02:00
Nihal
8fd46caa22
victoria-metrics: constant-time comparison of credentials like authkeys and basic auth credentials (#6423)
Changes for constant-time comparison of credentials like authkeys and
basic auth credentials.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6392

---------

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>
(cherry picked from commit 9b7e532172)
2024-06-19 10:37:09 +02:00
Aliaksandr Valialkin
c10a646d19
app/vlinsert/syslog: allow accepting syslog messages with different configs at different ports 2024-06-17 23:16:58 +02:00
hagen1778
863f1c2513
lib/streamaggr: remove accidentally committed changes
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 34771ab293)
2024-06-17 14:25:45 +02:00
Roman Khavronenko
df7e300071
app/vmselect/promql: check for ranged vectors in aggr funcs if implicit conversions are disabled (#6450)
Check for ranged vector arguments in aggregate expressions when
`-search.disableImplicitConversion` or `-search.logImplicitConversion`
are enabled.
 For example, `sum(up[5m])` will fail to execute if these flags are set.

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [*] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 6149adbe10)
2024-06-17 14:25:43 +02:00
Aliaksandr Valialkin
1750991119
lib/logstorage: work-in-progress 2024-06-17 12:13:25 +02:00
Andrii Chubatiuk
8ca1813bd2
lib/flagutil: use month limit for duration flag for parsed duration assessment (#6486)
use maxMonths limit for parsed duration flag value

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6330

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit faf67aa8b5)
2024-06-14 15:21:32 +02:00
Andrii Chubatiuk
abc233a902
lib/backup/s3remote: fixed credsFilePath flag (#6488)
properly use credsFilePath flag value

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6353

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit e678a9aa51)
2024-06-14 14:14:58 +02:00
Roman Khavronenko
5df50e5645
lib/streamaggr: prevent rate_sum and rate_avg from producing NaNs (#6482)
### Describe Your Changes

* check if `lastValue` was seen at least twice with different
timestamps. Otherwise, the difference between last timestamp and
previous timestamp could be `0` and will result into `NaN` calculation
* check if there items left in lastValue map after staleness cleanup.
Otherwise, `rate_avg` could have produce `NaN` result.

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 51d19485bb)
2024-06-14 13:26:42 +02:00
Aliaksandr Valialkin
2bbf62b6f6
lib/leveledbytebufferpool: do not pool byte slices bigger than 2^18 bytes
Previously byte slices up to 2^20 bytes (e.g. 1Mb) were cached because of a typo in the commit c14dafce43 .

This could result in increased memory usage when vmagent scrapes many regular targets, which expose
relatively small number of metrics (e.g. up to a few thousand per target) and a few large targets such as kube-state-metrics,
which expose more than 10 thousand metrics. This is common case for Kubernetes monitoring.

While at it, remove pools for very small byte slices, since they are rarely used during scraping.
2024-06-13 17:02:05 +02:00
Aliaksandr Valialkin
faf07fbc67
lib/bytesutil: optimize internStringMap cleanup
- Make it in a separate goroutine, so it doesn't slow down regular intern() calls.

- Do not lock internStringMap.mutableLock during the cleanup routine, since now
  it is called from a single goroutine and reads only the readonly part of the internStringMap.
  This should prevent from locking regular intern() calls for new strings during cleanups.

- Add jitter to the cleanup interval in order to prevent from synchornous increase in resource usage
  during cleanups.

- Run the cleanup twice per -internStringCacheExpireDuration . This should save 30% CPU time spent
  on cleanup comparing to the previous code, which was running the cleanup 3 times per -internStringCacheExpireDuration .
2024-06-13 15:09:42 +02:00
Zakhar Bessarab
ac16d1dc1b
lib/promscrape: increase default value for promscrape.maxDroppedTargets to 10_000 (#6459)
### Describe Your Changes
This limit can be increased since after
4513893ead
tracking of dropped targets uses much less memory per entry.

See:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6381#issuecomment-2156708228

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

(cherry picked from commit 34071ac660)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-06-13 09:28:16 +02:00
LHHDZ
41e4135371
app/vmauth: fix discovering backend IPs when url_prefix contains hostname with srv+ prefix (#6401)
This change fixes the following panic:
```
2024-06-04T11:16:52.899Z        warn    app/vmauth/auth_config.go:353   cannot discover backend SRV records for http://srv+localhost:8080: lookup localhost on 10.100.10.4:53: server misbehaving; use it literally
panic: runtime error: integer divide by zero

goroutine 9 [running]:
github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver.handlerWrapper.func1()
        /Users/lhhdz/wd/projects/go/VictoriaMetrics/lib/httpserver/httpserver.go:291 +0x58
panic({0x103115100?, 0x10338d700?})
        /Users/lhhdz/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.22.3.darwin-arm64/src/runtime/panic.go:770 +0x124
main.getLeastLoadedBackendURL({0x0?, 0x22?, 0x1400014757b?}, 0x1400013c120?)
        /Users/lhhdz/wd/projects/go/VictoriaMetrics/app/vmauth/auth_config.go:473 +0x210
main.(*URLPrefix).getBackendURL(0x140000aa080)
        /Users/lhhdz/wd/projects/go/VictoriaMetrics/app/vmauth/auth_config.go:312 +0xb8
```

---------

Co-authored-by: Haley Wang <haley@victoriametrics.com>
2024-06-12 11:47:44 +02:00
Aliaksandr Valialkin
9135b404d9
lib/logstorage: work-in-progress 2024-06-11 17:51:01 +02:00
Aliaksandr Valialkin
9bd16790c0
lib/streamaggr: prevent from data race inside dedupAggrShard when samplesBuf can be updated in pushSamples() while their values are read in the flush() loop without das.mu lock
This issue has been introduced in the commit 253c0cffbe
2024-06-11 17:31:38 +02:00
Aliaksandr Valialkin
37a8cc0b12
lib/logstorage: work-in-progress 2024-06-10 18:42:31 +02:00
Aliaksandr Valialkin
7e24bf99de
lib/streamaggr: return back string interning to dedupAggr after 78953723200f15ffc417064d1912bdbb7551505c
It should reduce memory allocation rate during stream deduplication
2024-06-10 18:06:25 +02:00
Aliaksandr Valialkin
6470eac7dc
lib/bytesutil: reduce the number of memory allocations per each interned string in bytesutil.InternString() from 5 to 1
This should reduce GC overhead when tens of millions of strings are interned (for example, during stream deduplication
of millions of active time series).
2024-06-10 18:06:24 +02:00
Roman Khavronenko
8c8d84e30a
lib/protoparser/opentelemetry/firehose: escape requestID before returning it to user (#6451)
All user input should be sanitized before rendering. This should prevent
possible attacks. See
https://github.com/VictoriaMetrics/VictoriaMetrics/security/code-scanning/203

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-06-10 18:06:24 +02:00
Aliaksandr Valialkin
883c0e6221
lib/streamaggr: reduce memory allocations by using dedupAggrSample buffer per each dedupAggrShard 2024-06-10 16:39:26 +02:00
Aliaksandr Valialkin
422225bfa5
lib/streamaggr: reduce the number of duplicates per each sample in BenchmarkDedupAggr from 100 to 2
This is closer to typical production setups when deduplication is used for de-duplicating of 2 samples per series.
2024-06-10 16:39:26 +02:00
Aliaksandr Valialkin
d269a95da3
lib/streamaggr: use strings.Clone() instead of bytesutil.InternString() for creating series key in dedupAggr
Our internal testing shows that this reduces GC overhead when deduplicating tens of millions of active series.
2024-06-10 16:08:47 +02:00
Aliaksandr Valialkin
9ed9e766e8
lib/streamaggr: improve performance for dedupAggr.sizeBytes() and dedupAggr.itemsCount()
These functions are called every time `/metrics` page is scraped, so it would be great
if they could be sped up for the cases when dedupAggr tracks tens of millions of active time series.
2024-06-10 16:00:05 +02:00
Aliaksandr Valialkin
387c22da49
lib/streamaggr: remove flushState arg at dedupAggr.flush(), since it is always set to true in production 2024-06-10 16:00:05 +02:00
Hui Wang
028a80613f
lib/httpserver: allow reloadAuthKey and configAuthKey to override htt… (#6338)
…pAuth.*

address https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6329,
makes `reloadAuthKey`, `configAuthKey`, `flagsAuthKey`, `pprofAuthKey`
behavior the same way,
but keys like `-snapshotAuthKey`, `-forceMergeAuthKey` are still
protected by httpAuth.*. All the available key are listed in
https://docs.victoriametrics.com/single-server-victoriametrics/#security.

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>

(cherry picked from commit 61dce6f2a1)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-06-10 12:41:29 +02:00
Aliaksandr Valialkin
32aa0751a1
lib/streamaggr: follow-up for 7cb894a777
- Use bytesutil.InternString() instead of strings.Clone() for inputKey and outputKey in aggregatorpushSamples().
  This should reduce string allocation rate, since strings can be re-used between aggrState flushes.
- Reduce memory allocations at dedupAggrShard by storing dedupAggrSample by value in the active series map.
- Remove duplicate call to bytesutil.InternBytes() at Deduplicator, since it is already called inside dedupAggr.pushSamples().
- Add missing string interning at rateAggrState.pushSamples().

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6402
2024-06-07 16:35:53 +02:00
Roman Khavronenko
78121642df
lib/streamaggr: reduce number of inuse objects (#6402)
The main change is getting rid of interning of sample key. It was
discovered that for cases with many unique time series aggregated by
vmagent interned keys could grow up to hundreds of millions of objects.
This has negative impact on the following aspects:
1. It slows down garbage collection cycles, as GC has to scan all inuse
objects periodically. The higher is the number of inuse objects, the
longer it takes/the more CPU it takes.
2. It slows down the hot path of samples aggregation where each key
needs to be looked up in the map first.

The change makes code more fragile, but suppose to provide performance
optimization for heavy-loaded vmagents with stream aggregation enabled.

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-06-07 16:35:52 +02:00
Roman Khavronenko
fae589bb83
lib/promrelabel: speedup label match by __name__ (#6432)
The change adds a fastpath for `equalValue` comparisons against
`__name__` label by avoiding calls to `toCanonicalLabelName` func. This
speedups matches by metric name like `'foo'`. See bench stats below:
```
benchcmp old.txt new.txt

benchmark                                           old ns/op     new ns/op     delta
BenchmarkIfExpression/equal_label:_last-10          35.6          35.1          -1.18%
BenchmarkIfExpression/equal_label:_middle-10        18.3          17.3          -5.41%
BenchmarkIfExpression/equal_label:_first-10         1.20          1.24          +2.74%
BenchmarkIfExpression/equal___name__:_last-10       10.1          4.96          -50.75%
BenchmarkIfExpression/equal___name__:_middle-10     5.79          3.16          -45.41%
BenchmarkIfExpression/equal___name__:_first-10      1.17          1.05          -9.76%
```

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-06-07 16:35:52 +02:00
Andrii Chubatiuk
93cd08f15f
lib/streamaggr: metrics to track dropped, nan samples and samples lag (#6358)
### Describe Your Changes

Added streamaggr metrics to:
 - `vm_streamaggr_samples_lag_seconds` - samples lag
- `vm_streamaggr_ignored_samples_total{reason="nan"}` - ignored NaN
samples
- `vm_streamaggr_ignored_samples_total{reason="too_old"}` - ignored old
samples

(cherry picked from commit 185fac03b3)
2024-06-06 19:22:45 +02:00
Aliaksandr Valialkin
53382ae837
lib/logstorage: work-in-progress 2024-06-06 12:27:11 +02:00
Aliaksandr Valialkin
a200fb433a
lib/logstorage: allow using eval keyword instead of math keyword in math pipe 2024-06-05 10:08:08 +02:00