Commit graph

1307 commits

Author SHA1 Message Date
Zakhar Bessarab
690959c8e3
app/{vmselect,vlselect}: run make vmui-update vmui-logs-update
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-03-21 16:37:57 +04:00
Guillem Jover
1d8b7faf71
spelling and grammar fixes via codespell ()
### Describe Your Changes

Fix many spelling errors and some grammar, including misspellings in
filenames.

The change also fixes a typo in metric `vm_mmaped_files` to `vm_mmapped_files`.
While this is a breaking change, this metric isn't used in alerts or dashboards.
So it seems to have low impact on users.

The change also deprecates `cspell` as it is much heavier and less usable.
---------

Co-authored-by: Andrii Chubatiuk <achubatiuk@victoriametrics.com>
Co-authored-by: Andrii Chubatiuk <andrew.chubatiuk@gmail.com>

(cherry picked from commit 76d205feae)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-03-17 16:38:11 +01:00
Roman Khavronenko
27f9eaa852
app/vmselect/promql: optimize binary operator or for common cases ()
The optimization touches 2 things:
1. Reduces amount of allocations when comparing canonical metric names
between left and right parts of expressions.
2. Adds fast path for cases when right part of expression returns
scalar: `series_selector or on() vector(1)`, which is a typical
expression.

```
benchcmp old.txt new.txt
benchcmp is deprecated in favor of benchstat: https://pkg.go.dev/golang.org/x/perf/cmd/benchstat
benchmark                                             old ns/op     new ns/op     delta
BenchmarkBinaryOpOr/tss:1_or_tss:1-14                 291           272           -6.56%
BenchmarkBinaryOpOr/tss:1_or_tss:1000-14              44590         28592         -35.88%
BenchmarkBinaryOpOr/tss:1000_or_tss:1-14              103124        39563         -61.64%
BenchmarkBinaryOpOr/tss:1000_or_tss:1000-14           20386150      1859335       -90.88%
BenchmarkBinaryOpOr/tss:1000_or_on()_vector(0)-14     91382         36805         -59.72%
```

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8382

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit dc1f7ef0d0)
2025-03-14 12:30:08 +01:00
Nikolay
541cd4efe1
app/vmselect: properly cancel multitenant query request
Previously, vmselect didn't stop multitenant query execution if it
receives error from vmstorage. Such as limit error or any other. It
continued to execute queries until it did it for all tenants. It leads
to the potential waste of resources.
 In addition, callback error was incorrectly reference and can be updated by
subsequent callback call.


This commit returns error earlier, cancels sub-sequent requests for
tenants and properly return storageNode request error.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8461
2025-03-14 00:51:08 +01:00
alicja-karasiewicz
5467d68954
feat: make topN limit configurable from CLI
### Describe Your Changes

Implement changes mentioned in  

Allow the administrator to specify the limit of returned TSDB series in
`/api/v1/status/tsdb` by making a TopN limit configurable from CLI.

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: alicja-karasiewicz <alicja.karasiewicz@allegro.com>
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-03-12 11:30:17 +04:00
Artem Fetishev
cb94d05ae1
app/{vmselect,vlselect}: run make vmui-update vmui-logs-update
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-03-07 14:20:19 +01:00
Nikolay
773b8b0b28
lib/storage: add tracker for time series metric names statistics
This feature allows to track query requests by metric names. Tracker
state is stored in-memory, capped by 1/100 of allocated memory to the
storage. If cap exceeds, tracker rejects any new items add and instead
registers query requests for already observed metric names.

This feature is disable by default and new flag:
`-storage.trackMetricNamesStats` enables it.

  New API added to the select component:

* /api/v1/status/metric_names_stats - which returns a JSON
object
    with usage statistics.
* /admin/api/v1/status/metric_names_stats/reset - which resets internal
    state of the tracker and reset tsid/cache.

   New metrics were added for this feature:

  * vm_cache_size_bytes{type="storage/metricNamesUsageTracker"}
  * vm_cache_size{type="storage/metricNamesUsageTracker"}
  * vm_cache_size_max_bytes{type="storage/metricNamesUsageTracker"}

  Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4458
---------

Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2025-03-06 22:10:41 +01:00
Zakhar Bessarab
dea3eb20cb
app/vmselect/promql: fix panic with using @ with series which is not present at the start of the query ()
### Describe Your Changes

Previously, "selector @ another_selector" assumed that
"another_selector" metric is supposed to exist since "start" used in the
query.

If the query was evaluated in the following case (timestamps):
- start - 2, end - 10
- "another_selector" 5,6,7,8,9,10
- "selector" The resulting "at" timestamp would be taken from NaN (as
`int64(NaN * 1000)`), causing a panic or invalid behavior later.

Note that type cast of `NaN` to int64 is also platform-dependent, so
value of `int64(math.NaN() * 1000)` can produce `0` or max int64 on
different platforms and versions of Go.

This commit changes this and checks for the first non-NaN value. This
makes it easier to use for users as series are not always aligned and
returning an error in this case would disallow using this for some time
ranges.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8444

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 7dfaef9088)
2025-03-06 16:42:51 +01:00
Zhu Jiekun
774004867b
bugfix: negative rate result when lookbehind window longer than search.maxLookback ()
### Describe Your Changes

 

fix negative rate result when the lookbehind window is longer than
`-search.maxLookback` or `-search.maxStalenessInterval` and data
contains gap.

This issue was introduced since
[v1.110.0](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8072).

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2025-02-27 22:55:32 +01:00
f41gh7
084560bb02
make vmui-update 2025-02-21 14:08:10 +01:00
Phuong Le
52fd89e426
docs: search.lookback-delta -> query.lookback-delta ()
(cherry picked from commit 23147c8339)
2025-02-11 23:02:20 +01:00
Zakhar Bessarab
bb05af129e
make vmui-update
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
(cherry picked from commit 8e576854b3)
2025-02-07 18:43:03 +04:00
Joost Buskermolen
824f531490
app/vmselect: expose /-/healthy and /-/ready endpoints on full Prometheus path
This commit improves integration with third-party solutions who rely on non-root
endpoints (i.e. MinIO) when the vmselect path has been specified in the
configured Prometheus URL like:
`http://vmselect.:8481/select/0/prometheus`

Comparable change has been done before
(b885a3b6e9), however only takes care of
the root path. This means endpoints `-/healthy` and `-/ready` are still
not available on full vmselect Prometheus paths, resulting in
unsupported path requests.

This change makes these endpoints available on the full paths like:
`/select/0/prometheus/-/healthy` and `/select/0/prometheus/-/ready`,
thus achieving full Prometheus compatibility for external dependencies.

Related issues:
- https://github.com/minio/console/issues/2829
- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1833

---

Signed-off-by: Joost Buskermolen <j.buskermolen@cloudmeesters.com>
2025-02-05 17:00:11 +01:00
Roman Khavronenko
e803b9b68b
metricsql: bump to v0.83.0 ()
metricsql: bump to v0.83.0

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7703

The update also returns an error if metric name is specified twice in
metrics selector.
For example, `foo{__name__="bar"}` is not allowed anymore. It would
successfully parse before
this change, but it won't satisfy the search filter any way. So it had
no sense in supporting this. This is why some test cases were removed.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit fa2107bbec)
2025-02-01 22:31:56 +01:00
Roman Khavronenko
13cd76347d
app/vmselect/promql: fix discrepancies when using or binary operator
The change covers various corner cases when using `or` binary operator.
See corresponding issues and pull request here to see the cases:
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7770

Related issues:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7759
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7640

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 72837919ae)
2025-02-01 22:31:55 +01:00
f41gh7
39e69b103e
app/vmselect: properly cancel long running requests on client connection close
At this time `bufferedwriter` [silently ignores connection close
errors](78eaa056c0/lib/bufferedwriter/bufferedwriter.go (L67)).
It may be very convenient in some situations (to not log such
unimportant errors), but it's too implicit and unsafe for the others.
For example, if you close [export
API](https://docs.victoriametrics.com/#how-to-export-time-series) client
connection in the middle of communication, VictoriaMetrics won't notice
it and will start to hog CPU by exporting all the data into nowhere
until it process all of them. If you'll make a few retries, it will be
effectively a DoS on the server.

This commit replaces this implicit error suppressing with explicit error
handling which fixes the issue with export API.

Issue was introduced at e78f3ac8ac
2025-01-29 16:38:38 +01:00
Aliaksandr Valialkin
7b62086609
lib: consistently use logger.Panicf("BUG: ...") for logging programming bugs
logger.Fatalf("BUG: ...") complicates investigating the bug, since it doesn't show the call stack,
which led to the bug. So it is better to consistently use logger.Panicf("BUG: ...") for logging programming bugs.
2025-01-24 16:40:50 +01:00
f41gh7
43772b9869
make vmui-update 2025-01-24 14:23:49 +01:00
Nikolay
cab5cf3c4c
app/vmselect: fixes panic data race at query tracing
Previously, NewChild elements of querytracer could be referenced by concurrent
storageNode goroutines. After earlier return ( if search.skipSlowReplicas is set), it is
possible, that tracer objects could be still in-use by concurrent workers.
  It may cause panics and data races. Most probable case is when parent tracer is finished, but children
still could write data to itself via Donef() method. It triggers read-write data race at trace
formatting.

This commit adds a new methods to the querytracer package, that allows to
create children not referenced by parent and add it to the parent later.

 Orphaned child must be registered at the parent, when goroutine returns. It's done synchronously by the single caller  via finishQueryTracer call. 
If child didn't finished work and reference for it is used by concurrent goroutine, new child must be created instead with
context message.
 It prevents panics and possible data races.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8114

---------

Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2025-01-24 13:55:32 +01:00
Zakhar Bessarab
aef93b1889
app/vmselect/prometheus: fix panic when performing delete with "multitenant" auth token
Initially delete_series API wasn't implemented for mulitenant auth token.

 This commit fixes it and properly handle delete series requests for mulitenant auth token.
It also adds integration tests for this case.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8126

Introduced at v1.104.0 release:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1434
---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: f41gh7 <nik@victoriametrics.com>
2025-01-24 08:32:50 +01:00
Zakhar Bessarab
5f56375564
app/vmselect/prometheus: prevent panic when using "multitenant" at /api/v1/series/count requests
Adding support of multi-tenant reads to /api/v1/series/count would
require introducing a breaking change to a `netstorage` RPC, so
currently vmselect will explicitly deny these requests.

Related issues:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8126

Introduced at v1.104.0 release:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1434

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-01-24 07:58:54 +01:00
Roman Khavronenko
9261da53a0
app/vmselect/promql: respect staleness in removeCounterResets ()
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8072

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2025-01-24 07:52:51 +01:00
Zhu Jiekun
21f6d84b4b
docs: update docs for *authKey, add authKey to HTTP 401 resp body ()
### Describe Your Changes

optimize for
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6226

for user who set `*AuthKey` flag, they will receive new response in
body:
```go
// query arg not set
The provided authKey '' doesn't match -search.resetCacheAuthKey

// incorrect query arg
The provided authKey '5dxd71hsz==' doesn't match -search.resetCacheAuthKey
```

previously, they receive:
```
The provided authKey doesn't match -search.resetCacheAuthKey
```

### Checklist

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

(cherry picked from commit 1f0b03aebe)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-01-20 13:49:12 +01:00
Hui Wang
cc24696ade
vmselect: add -search.maxDeleteDuration to limit the duration of th… ()
…e `/api/v1/admin/tsdb/delete_series` call

Previously, it is limited by `-search.maxQueryDuration`, and can be
small for delete calls.

part of https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7857.

(cherry picked from commit 4574958e2e)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-01-17 15:30:26 +01:00
Roman Khavronenko
3dfcbf3229
app/vmselect/promql: limit staleness detection for increase/increase_pure/delta ()
`doInternal` has adaptive staleness detection mechanism. It is
calculated using timestamp distance between samples in selected list of
samples. It is dynamic because VM can store signals from many sources
with different samples resolution. And while it works for most of cases,
there are edge cases for rollup functions that are comparing values
between windows: increase, increase_pure, delta.

The edge case 1.
There was a gap between series because of the missed scrape or two. In
this case staleness will trigger and increase-like functions will assume
the value they need to compare with is 0. In result, this could produce
spikes for a flappy data - see
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/894 This
problem was solved by introducing a `realPrevValue` field -
1f19c167a4.
It stores the closest real sample value on selected interval and is used
for comparison when samples start after the gap.

The edge case 2.
`realPrevValue` doesn't respect staleness interval. In result, even if
gap between samples is huge (hours), the increase-like functions will
not consider it as a new series that started from 0. See
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8002.

Covering both edge cases is tricky, because `realPrevValue` has to
respect and not respect the staleness interval in the same time. In
other words, it should be able to ignore periodic missing gaps, but
reset if the gap is too big. While "too big gap" can't be figured out
empirically, I suggest using `-search.maxStalenessInterval` for this
purpose. If `-search.maxStalenessInterval` is set to 0 (default), then
`realPrevValue` ignores staleness interval. If
`-search.maxStalenessInterval` is > 0, then `realPrevValue` respects it
as a staleness interval.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 7d2a6764e7)
2025-01-16 17:07:38 +01:00
Roman Khavronenko
9de0b8a165
make: bump golangci-lint to v1.63.4 (
New version has additional checks and reduced resource consumption, so
it doesn't timeout for our internal repos.

To make linter happy, I addressed "redefinition of the built-in
function" lint error.

----
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-01-13 07:23:21 +01:00
Roman Khavronenko
0614eb97a5
app/vmselect/promql: account for staleness when populating realPrevValue ()
When vmselect process a rollup function it fetches all the raw samples
on requested `start-end` interval of the query. It then loops through
the raw samples, picks the range of the samples based on provided `step`
interval and invokes a rollup function for each of the picked ranges of
samples.

During this processing, vmselect always populates the `realPrevValue`
field with the closest previous raw sample value before the picked range
of samples. This `realPrevValue` is used by rollup functions like
increase_pure or delta to decide whether the counter change happened or
not. For example, we get the counter value == 1. If we've seen this
counter before and its value was also 1 - then no change happened. If we
didn't see it before, then this counter should have started with value=0
and we need to account for `1-0=1` change. All this is required to deal
with situations when scrapes are missing or `step` is too small.

However, vmselect doesn't check how "old" is the `realPrevValue`. In
other words, it doesn't respect the staleness interval when picking it.
In result, depending on the `start` and `end` params, vmselect can use
`realPrevValue` which is a couple of hours old and is unlikely to be a
temporary scrape fail. In result, some increases can be incorrectly
ingnored by vmselect.

This change makes sure that vmselect doesn't populate `realPrevValue`
with samples that are older than staleness interval.

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ x ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).


-------------------

To reproduce, create a dataset with one metric `foo` which has samples
with value=1 on interval of couple of hours and resolution 15s, and a
gap for an hour in the middle:
<img width="769" alt="image"
src="https://github.com/user-attachments/assets/a39b2740-b741-45f8-ad18-093b7c57c3b3"
/>

Then run `increase(foo[1m])` expression on this time range (disable
cache):
<img width="1472" alt="image"
src="https://github.com/user-attachments/assets/463cece1-f359-4c75-a96c-60092a31cab2"
/>

In result, there will be one increase on the beginning of the series.
And no increase after the gap. Then change the time range so it starts
in the middle of the gap:
<img width="1505" alt="image"
src="https://github.com/user-attachments/assets/f4a460c3-9fd1-4ec7-ab47-15e716ec1019"
/>

Now, there is an increase>0 because the `realPrevValue` wasn't
populated. This is wrong, because it hides the increase of the series.

With the fix, the original increase query on full time range should show
2 increases:
<img width="1492" alt="image"
src="https://github.com/user-attachments/assets/aa9d8a6b-7b22-41f6-9eb9-83b3113a6982"
/>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-01-13 07:23:17 +01:00
Zakhar Bessarab
357ae06c37
app/{vmselect,vlselect}: run make vmui-update vmui-logs-update
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-01-10 18:02:30 +04:00
Zakhar Bessarab
8f8b29355d
app/vmselect/promql: set tenant information for numbers
Since
44b071296d
`evalNumber` function no longer updating MetricName tenancy information.
This leads to mismatch in metric names between the query result and
evaluated number for all tenants other than 0:0.

For example, query `count(up) or 0` will return different results for
tenants 0:0 and 1:1 (assuming up is present for both tenants):
- tenant 0:0 - will only contain result of `count(up)`
- tenant 1:1 - will return both `count(up)` and `0` since metric names
will not be matched

This restores setting of tenancy information for metric name for
single-tenant queries.

Related issue:

 https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7987
---
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-01-10 09:57:58 +01:00
YuDong Tang
ec73b22d24
app/select: add command-line flag -search.maxBinaryOpPushdownLabelValues
### Describe Your Changes

Binary operations like `exprFirst op exprSecond` in VictoriaMetrics are
performed in the following way:
1. Execute exprFirst.
2. Extract **common label filters** from the result of step 1.
3. Apply these common label filters to `exprSecond` and execute it, in
order to retrieve less time series from vmstorage nodes.

In step 2, only labels with less than `100` (hard-coded) value could be
used as **common label filter** (e.g. `{common_lb=~"v1|v2|...|v100"}`.

In our scenarios, a label, take `instance` label as an example, could
has thousands of candidate values. Regarding bring more pressure to
vmstorage node, it's still beneficial if labels with more than 100
values could be used as filter in `exprSecond`, with enough vmstorage
resources. After adjusting the value from `100` to `10000`, our query
round-trip time drops significantly from 5s to 2s.

This pull request change the hard-coded value into a configurable flag.
2025-01-03 13:19:44 +01:00
f41gh7
795af212d5
app/vmselect/promql: improve performance of parseCache on systems with many CPU cores
Parse cache is a pretty simple implementation of cache. It's just a
standard map with mutex.
Map with mutex overall has poor performance, plus when the cache
overflow occurs, the whole cache locks until 1k elements have been
deleted (now it's 10% of 10000 max elements in the cache). To avoid this
bottleneck and improve performance of cache on systems with many CPU
cores but keep it rather simple, we can implement cache with per bucket
locks like it's done in fastcache. The logic and API remain the same. So
now each bucket will have a map with approximately 78 elements (with 128
buckets), and overflow will occur now for each bucket, and only 7
elements need to be deleted.
Because exec_test.go has about 10k lines of code, it's better to move
the cache into a separate file to add tests and benchmarks for it,
because now it does not have them.

```
goos: windows
goarch: amd64
pkg: github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/promql
cpu: 11th Gen Intel(R) Core(TM) i9-11900K @ 3.50GHz

Current cache implementation performance on 8 cores:
BenchmarkCachePutNoOverFlow-8               1932            618372 ns/op             253 B/op          0 allocs/op
BenchmarkCacheGetNoOverflow-8               6547            211527 ns/op               0 B/op          0 allocs/op
BenchmarkCachePutGetNoOverflow-8            1873            621718 ns/op             261 B/op          0 allocs/op
BenchmarkCachePutOverflow-8                 2262            464328 ns/op              32 B/op          0 allocs/op
BenchmarkCachePutGetOverflow-8              1764            655866 ns/op              38 B/op          0 allocs/op

New cache implementation performance on 8 cores:
BenchmarkCachePutNoOverFlow-8              10408            111412 ns/op               0 B/op          0 allocs/op
BenchmarkCacheGetNoOverflow-8              22407             52809 ns/op               0 B/op          0 allocs/op
BenchmarkCachePutGetNoOverflow-8            6583            168088 ns/op               0 B/op          0 allocs/op
BenchmarkCachePutOverflow-8                 9822            117212 ns/op               2 B/op          0 allocs/op
BenchmarkCachePutGetOverflow-8              6481            175952 ns/op               3 B/op          0 allocs/op

Current cache implementation performance on 16 cores:
BenchmarkCachePutNoOverFlow-16              2331            475307 ns/op             218 B/op          0 allocs/op
BenchmarkCacheGetNoOverflow-16              6069            196905 ns/op               0 B/op          0 allocs/op
BenchmarkCachePutGetNoOverflow-16           1870            644236 ns/op             262 B/op          0 allocs/op
BenchmarkCachePutOverflow-16                2296            509279 ns/op              34 B/op          0 allocs/op
BenchmarkCachePutGetOverflow-16             1726            671510 ns/op              45 B/op          0 allocs/op

New cache implementation performance on 16 cores:
BenchmarkCachePutNoOverFlow-16             13549             82413 ns/op               0 B/op          0 allocs/op
BenchmarkCacheGetNoOverflow-16             30274             38997 ns/op               0 B/op          0 allocs/op
BenchmarkCachePutGetNoOverflow-16           8512            126239 ns/op               0 B/op          0 allocs/op
BenchmarkCachePutOverflow-16               13884             88124 ns/op               1 B/op          0 allocs/op
BenchmarkCachePutGetOverflow-16             7903            131299 ns/op               3 B/op          0 allocs/op
```
From the benchmarks above, we can see that the new implementation is ~5
times faster than the old one.

---------
Co-authored-by: f41gh7 <nik@victoriametrics.com>
2025-01-02 17:47:54 +01:00
Hui Wang
92c8049647
app/vmstorage: allow to override the default unique time series limit
previously vmstorage ignored limit values from vmselect component.

This behavior is prohibited starting from v1.105.0, with
85f60237e2.

This breaks the original intent of the -search.maxUniqueTimeseries command-line flag, which has been added at vmselect nodes in the commit b843f0e : to be able to override the default limit at vmstorage on the number of unique time series, at different subsets of vmselect nodes.

The behavior should be the following:

*    If -search.maxUniqueTimeseries command-line flag isn't set at both vmselect and vmstorage nodes, then the limit on  the number of unique time series must be automatically detected at vmstorage nodes according to 

* vmstorage: automatically adjust -search.maxUniqueTimeseries max value   . This simplifies configuration of VictoriaMetrics cluster for the typical case.

* If -search.maxUniqueTimeseries command-line flag is explicitly set at vmstorage node, then it must be used as the limit on the number of unique time series, without automatic detection of the limit. Explicitly set limit at vmstorage node cannot be exceeded by the limit from vmselect nodes.
* If the -search.maxUniqueTimeseries command-line flag is explicitly set at vmselect node, then it must override the automatically detected limit at vmstorage node. For example, if vmselect node provides the limit, which exceeds the automatically detected limit at vmstorage node, then the limit from the vmselect node must be applied during query execution at vmstorage node. This will allow properly executing queries from the subset of vmselect nodes for reporting queries described above.

related issue:
 https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7852
2024-12-30 15:19:03 +01:00
f41gh7
12b79f31fe
make vmui-update 2024-12-18 23:08:41 +01:00
f41gh7
5558841cc1
Fix inconsistent treatment of millisecond-precision time for instant queries ()
This PR fixes . See the points 6 and 7 in `Steps to reproduce`:

> Now let's set time to only 5ms past the timestamp of the first point,
since even 199ms worked for the second point. Surprise, the point isn't
returned 💥:
>
> ```curl -s $VMQURL -d 'query=series1' -d 'time=1707123456705' -d
'step=1ms' | grep 10 # nothing!```
>
> But, 4ms works: 🤨🤔
>
> ```curl -s $VMQURL -d 'query=series1' -d 'time=1707123456704' -d
'step=1ms' | grep 10 # found```

This happens so because the actual step becomes 5ms due to jitter being
applied. THe fix is to do not apply jitter if scrape interval was not
detected (the case when vmstorage returns only one result). In this case
the scrape interval is set to `5m+step`.

An integration test has been added to check the steps to reproduce and
then to confirm that fix works. Note that the cluster tests are
currently disabled because the fix is not in cluster branch yet.

The following checks are **mandatory**:

- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2024-12-18 22:40:44 +01:00
f41gh7
5feb0336a3
make vmui-update 2024-12-13 12:10:46 +01:00
Artem Fetishev
dd079eb8e6
app/vmselect: respect -search.skipSlowReplicas when -globalReplicationFactor > 1
Previously cluster with the following vmselect configuration:

./bin/vmselect 
  -storageNode=gr1/:8211,gr1/:8212 
  -storageNode=gr2/:8213,gr2/:8214 
  -search.skipSlowReplicas=true
  -globalReplicationFactor=2

Here we have two vmstorage groups and -globalReplicationFactor=2, which effectively means that "every ingested sample is replicated across multiple vmstorage groups". Hence, gr1 and gr2 contain identical data set. And when we set -search.skipSlowReplicas=true it is expected vmselect should return result as soon as at least one storage group returned the full result.
In current state, -search.skipSlowReplicas is ignored on the storage group level. It is only respected within the group (with -replicationFactor flag).
  

   This commit fixes global replication for skipSlowReplicas. 

 To ensure that the fix works and does not break
anything replication tests have been added. For checking the fix for
skipping slow replicas see `testGroupSkipSlowReplicas()`.

To emulate storage groups, the integration test creates a cluster with
multilevel vminsert. The L1 inserts are group-level inserts, each writes
to its own group of vmstorages. The L2 vminsert is a global vminsert
that writes replicated to the L1 vminserts.

To enable multilevel inserts changes in apptest framework and
`lib/ingestserver/clusternative/server.go` were necessary.

related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6924

---------

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2024-12-13 11:56:06 +01:00
f41gh7
b12cea7fbf
make vmui-update 2024-11-29 17:46:04 +01:00
Yury Molodov
0d951a35f2
vmui: fix predefined panels
### Describe Your Changes

- Fixes the handling of the `showLegend` flag.  
- Fixes the handling of `alias`.  
- Adds support for alias templates, allowing dynamic substitutions like
`{{label_name}}`.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7565
2024-11-28 13:50:02 +01:00
f41gh7
ed9ab2ea73
refactoring: changed prompb to prompbmarshal everythere where internal series transformations are happening ()
doing similar changes for both vmagent and vminsert (like one in
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7399) ends up
with almost same implementations for each of packages instead of having
this shared code in one place. one of the reasons is the same Timeseries
and Labels structure from different prompb and prompbmarshal packages.
My proposal is to use structures from prompb package only to
marshal/unmarshal sent/received data, but for internal transformations
use only structures from prompbmarshal package

Another example, where it already can help to simplify code is streaming
aggregation pipeline for vmsingle (now it first marshals
prompb.Timeseries to storage.MetricRow and then if streaming aggregation
or deduplication is enabled it unmarshals all the series back but to
prompbmarshal.Timeseries)

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2024-11-26 19:02:06 +01:00
Nikolay
93061dfa7b
app/vmselect: fixes multitenant cache init
Previously multitenant cache was inited before flag.Parse call. It
didn't allow to change cache expiration value and default value was
always used.

 This commit adds cache init at the first time cache was called.

 Also this commit adds small cache improvements:
* chore for cleanup cache, it now uses common pattern for in-place items
filtering
* fail cache request fast if item is already expired


---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2024-11-25 11:49:34 +01:00
Andrei Baidarov
439d1b932e
app/vmselect: fix panic/incorrect tenant in key
This is a follow-up after 3120dc2
  
- Consistently use key for rollupCache in multitenant mode cache keys use different authTokens. Previously it could lead to panic in rare cases when cache state was inconsistent. 
- Do not share `err` variable across goroutines for `processBlock` function. It could lead to data races. 

Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7549
---------
Signed-off-by: Andrei Baidarov <abaidarov@yandex.ru>
Co-authored-by: f41gh7 <nik@victoriametrics.com>
2024-11-25 11:47:24 +01:00
Nikolay
8357c22cc8
app/vmselect: properly return binary pow function result ()
Previously, for `^` aka pow function calls, VictoriaMetrics returned `1`
if left arg was Nan. For example, given query=`(hour()==2)^1` returns 1
for NaN produced by hour() == 2 function. It added additional non-exist
datapoints to the timeseries.

This commit port bugfix from `metricql` package and adds test for it.
Now, VictoriaMetrics
correctly returns `NaN` for such cases.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7359

Signed-off-by: f41gh7 <nik@victoriametrics.com>
(cherry picked from commit bb399518db)
2024-11-21 15:23:49 +01:00
f41gh7
c54b5c542c
make vmui-update 2024-11-15 19:22:00 +01:00
Andrei Baidarov
3120dc2054
app/vmselect: fixes possible panics for multitenant queries
This commit fixes panic for multitenant requests and empty storage node responses for tenants api.

 It also optimizes `populateSqTenantTokensIfNeeded` function calls, by making it only once for query request. Previously it was incorrectly called multiple times per each storage node request.

Related issue: 
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7549
---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: f41gh7 <nik@victoriametrics.com>
2024-11-15 16:12:30 +01:00
Evgeniy Negriy
98fe1950a1
app/vmselect: fixes graphite function transformRemoveEmptySeries
Previously it incorrectly applied xFilesFactor, if it's value equal to 0.

 This commit properly handles this case and returns result according to
the graphite documentation:

`xFilesFactor follows the same semantics as in Whisper storage schemas. Setting it to 0 (the default) means that only a single value in the series needs to be non-null for it to be considered non-empty, setting it to 1 means that all values ​​in the series must be non-null. A setting of 0.5 means that at least half the values ​​in the series must be non-null.`

Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: Evgeniy Negriy <einegriy@avito.ru>
(cherry picked from commit d27dfac5c6)
2024-11-07 13:00:18 +01:00
Andrii Chubatiuk
93bc205e05
promql: exclude limit_offset from default by metric name sorting ()
### Describe Your Changes

I don't like this solution, but it works. Other possible solutions
described in an issue

fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7068

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit a88f896b43)
2024-11-06 15:27:29 +01:00
Zakhar Bessarab
2bdede8ed5
app/{vmselect,vlselect}: run make vmui-update vmui-logs-update
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2024-11-04 10:59:58 -03:00
Hui Wang
451bd164f0
docs: clarify flags -search.maxxxDuration can only be overridden to… ()
… a smaller value with `timeout` arg
2024-10-27 20:25:17 +01:00
hagen1778
bdc9dcec00
app/vmui: add missing assets after a710d43a20
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit a1882a84fb)
2024-10-21 08:28:37 +02:00
hagen1778
955c1f4da9
app/{vmselect,vlselect}: run make vmui-update vmui-logs-update
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit a710d43a20)
2024-10-18 14:28:38 +02:00