### Describe Your Changes
Fix many spelling errors and some grammar, including misspellings in
filenames.
The change also fixes a typo in metric `vm_mmaped_files` to `vm_mmapped_files`.
While this is a breaking change, this metric isn't used in alerts or dashboards.
So it seems to have low impact on users.
The change also deprecates `cspell` as it is much heavier and less usable.
---------
Co-authored-by: Andrii Chubatiuk <achubatiuk@victoriametrics.com>
Co-authored-by: Andrii Chubatiuk <andrew.chubatiuk@gmail.com>
(cherry picked from commit 76d205feae)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The optimization touches 2 things:
1. Reduces amount of allocations when comparing canonical metric names
between left and right parts of expressions.
2. Adds fast path for cases when right part of expression returns
scalar: `series_selector or on() vector(1)`, which is a typical
expression.
```
benchcmp old.txt new.txt
benchcmp is deprecated in favor of benchstat: https://pkg.go.dev/golang.org/x/perf/cmd/benchstat
benchmark old ns/op new ns/op delta
BenchmarkBinaryOpOr/tss:1_or_tss:1-14 291 272 -6.56%
BenchmarkBinaryOpOr/tss:1_or_tss:1000-14 44590 28592 -35.88%
BenchmarkBinaryOpOr/tss:1000_or_tss:1-14 103124 39563 -61.64%
BenchmarkBinaryOpOr/tss:1000_or_tss:1000-14 20386150 1859335 -90.88%
BenchmarkBinaryOpOr/tss:1000_or_on()_vector(0)-14 91382 36805 -59.72%
```
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8382
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit dc1f7ef0d0)
Previously, vmselect didn't stop multitenant query execution if it
receives error from vmstorage. Such as limit error or any other. It
continued to execute queries until it did it for all tenants. It leads
to the potential waste of resources.
In addition, callback error was incorrectly reference and can be updated by
subsequent callback call.
This commit returns error earlier, cancels sub-sequent requests for
tenants and properly return storageNode request error.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8461
### Describe Your Changes
Implement changes mentioned in #6898
Allow the administrator to specify the limit of returned TSDB series in
`/api/v1/status/tsdb` by making a TopN limit configurable from CLI.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: alicja-karasiewicz <alicja.karasiewicz@allegro.com>
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
This feature allows to track query requests by metric names. Tracker
state is stored in-memory, capped by 1/100 of allocated memory to the
storage. If cap exceeds, tracker rejects any new items add and instead
registers query requests for already observed metric names.
This feature is disable by default and new flag:
`-storage.trackMetricNamesStats` enables it.
New API added to the select component:
* /api/v1/status/metric_names_stats - which returns a JSON
object
with usage statistics.
* /admin/api/v1/status/metric_names_stats/reset - which resets internal
state of the tracker and reset tsid/cache.
New metrics were added for this feature:
* vm_cache_size_bytes{type="storage/metricNamesUsageTracker"}
* vm_cache_size{type="storage/metricNamesUsageTracker"}
* vm_cache_size_max_bytes{type="storage/metricNamesUsageTracker"}
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4458
---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
### Describe Your Changes
Previously, "selector @ another_selector" assumed that
"another_selector" metric is supposed to exist since "start" used in the
query.
If the query was evaluated in the following case (timestamps):
- start - 2, end - 10
- "another_selector" 5,6,7,8,9,10
- "selector" The resulting "at" timestamp would be taken from NaN (as
`int64(NaN * 1000)`), causing a panic or invalid behavior later.
Note that type cast of `NaN` to int64 is also platform-dependent, so
value of `int64(math.NaN() * 1000)` can produce `0` or max int64 on
different platforms and versions of Go.
This commit changes this and checks for the first non-NaN value. This
makes it easier to use for users as series are not always aligned and
returning an error in this case would disallow using this for some time
ranges.
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8444
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 7dfaef9088)
### Describe Your Changes
#8342
fix negative rate result when the lookbehind window is longer than
`-search.maxLookback` or `-search.maxStalenessInterval` and data
contains gap.
This issue was introduced since
[v1.110.0](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8072).
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
This commit improves integration with third-party solutions who rely on non-root
endpoints (i.e. MinIO) when the vmselect path has been specified in the
configured Prometheus URL like:
`http://vmselect.:8481/select/0/prometheus`
Comparable change has been done before
(b885a3b6e9), however only takes care of
the root path. This means endpoints `-/healthy` and `-/ready` are still
not available on full vmselect Prometheus paths, resulting in
unsupported path requests.
This change makes these endpoints available on the full paths like:
`/select/0/prometheus/-/healthy` and `/select/0/prometheus/-/ready`,
thus achieving full Prometheus compatibility for external dependencies.
Related issues:
- https://github.com/minio/console/issues/2829
- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1833
---
Signed-off-by: Joost Buskermolen <j.buskermolen@cloudmeesters.com>
metricsql: bump to v0.83.0
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7703
The update also returns an error if metric name is specified twice in
metrics selector.
For example, `foo{__name__="bar"}` is not allowed anymore. It would
successfully parse before
this change, but it won't satisfy the search filter any way. So it had
no sense in supporting this. This is why some test cases were removed.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit fa2107bbec)
At this time `bufferedwriter` [silently ignores connection close
errors](78eaa056c0/lib/bufferedwriter/bufferedwriter.go (L67)).
It may be very convenient in some situations (to not log such
unimportant errors), but it's too implicit and unsafe for the others.
For example, if you close [export
API](https://docs.victoriametrics.com/#how-to-export-time-series) client
connection in the middle of communication, VictoriaMetrics won't notice
it and will start to hog CPU by exporting all the data into nowhere
until it process all of them. If you'll make a few retries, it will be
effectively a DoS on the server.
This commit replaces this implicit error suppressing with explicit error
handling which fixes the issue with export API.
Issue was introduced at e78f3ac8ac
logger.Fatalf("BUG: ...") complicates investigating the bug, since it doesn't show the call stack,
which led to the bug. So it is better to consistently use logger.Panicf("BUG: ...") for logging programming bugs.
Previously, NewChild elements of querytracer could be referenced by concurrent
storageNode goroutines. After earlier return ( if search.skipSlowReplicas is set), it is
possible, that tracer objects could be still in-use by concurrent workers.
It may cause panics and data races. Most probable case is when parent tracer is finished, but children
still could write data to itself via Donef() method. It triggers read-write data race at trace
formatting.
This commit adds a new methods to the querytracer package, that allows to
create children not referenced by parent and add it to the parent later.
Orphaned child must be registered at the parent, when goroutine returns. It's done synchronously by the single caller via finishQueryTracer call.
If child didn't finished work and reference for it is used by concurrent goroutine, new child must be created instead with
context message.
It prevents panics and possible data races.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8114
---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
Initially delete_series API wasn't implemented for mulitenant auth token.
This commit fixes it and properly handle delete series requests for mulitenant auth token.
It also adds integration tests for this case.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8126
Introduced at v1.104.0 release:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1434
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: f41gh7 <nik@victoriametrics.com>
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8072
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
### Describe Your Changes
optimize for
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6226
for user who set `*AuthKey` flag, they will receive new response in
body:
```go
// query arg not set
The provided authKey '' doesn't match -search.resetCacheAuthKey
// incorrect query arg
The provided authKey '5dxd71hsz==' doesn't match -search.resetCacheAuthKey
```
previously, they receive:
```
The provided authKey doesn't match -search.resetCacheAuthKey
```
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
(cherry picked from commit 1f0b03aebe)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
…e `/api/v1/admin/tsdb/delete_series` call
Previously, it is limited by `-search.maxQueryDuration`, and can be
small for delete calls.
part of https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7857.
(cherry picked from commit 4574958e2e)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
`doInternal` has adaptive staleness detection mechanism. It is
calculated using timestamp distance between samples in selected list of
samples. It is dynamic because VM can store signals from many sources
with different samples resolution. And while it works for most of cases,
there are edge cases for rollup functions that are comparing values
between windows: increase, increase_pure, delta.
The edge case 1.
There was a gap between series because of the missed scrape or two. In
this case staleness will trigger and increase-like functions will assume
the value they need to compare with is 0. In result, this could produce
spikes for a flappy data - see
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/894 This
problem was solved by introducing a `realPrevValue` field -
1f19c167a4.
It stores the closest real sample value on selected interval and is used
for comparison when samples start after the gap.
The edge case 2.
`realPrevValue` doesn't respect staleness interval. In result, even if
gap between samples is huge (hours), the increase-like functions will
not consider it as a new series that started from 0. See
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8002.
Covering both edge cases is tricky, because `realPrevValue` has to
respect and not respect the staleness interval in the same time. In
other words, it should be able to ignore periodic missing gaps, but
reset if the gap is too big. While "too big gap" can't be figured out
empirically, I suggest using `-search.maxStalenessInterval` for this
purpose. If `-search.maxStalenessInterval` is set to 0 (default), then
`realPrevValue` ignores staleness interval. If
`-search.maxStalenessInterval` is > 0, then `realPrevValue` respects it
as a staleness interval.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 7d2a6764e7)
New version has additional checks and reduced resource consumption, so
it doesn't timeout for our internal repos.
To make linter happy, I addressed "redefinition of the built-in
function" lint error.
----
Signed-off-by: hagen1778 <roman@victoriametrics.com>
When vmselect process a rollup function it fetches all the raw samples
on requested `start-end` interval of the query. It then loops through
the raw samples, picks the range of the samples based on provided `step`
interval and invokes a rollup function for each of the picked ranges of
samples.
During this processing, vmselect always populates the `realPrevValue`
field with the closest previous raw sample value before the picked range
of samples. This `realPrevValue` is used by rollup functions like
increase_pure or delta to decide whether the counter change happened or
not. For example, we get the counter value == 1. If we've seen this
counter before and its value was also 1 - then no change happened. If we
didn't see it before, then this counter should have started with value=0
and we need to account for `1-0=1` change. All this is required to deal
with situations when scrapes are missing or `step` is too small.
However, vmselect doesn't check how "old" is the `realPrevValue`. In
other words, it doesn't respect the staleness interval when picking it.
In result, depending on the `start` and `end` params, vmselect can use
`realPrevValue` which is a couple of hours old and is unlikely to be a
temporary scrape fail. In result, some increases can be incorrectly
ingnored by vmselect.
This change makes sure that vmselect doesn't populate `realPrevValue`
with samples that are older than staleness interval.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ x ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
-------------------
To reproduce, create a dataset with one metric `foo` which has samples
with value=1 on interval of couple of hours and resolution 15s, and a
gap for an hour in the middle:
<img width="769" alt="image"
src="https://github.com/user-attachments/assets/a39b2740-b741-45f8-ad18-093b7c57c3b3"
/>
Then run `increase(foo[1m])` expression on this time range (disable
cache):
<img width="1472" alt="image"
src="https://github.com/user-attachments/assets/463cece1-f359-4c75-a96c-60092a31cab2"
/>
In result, there will be one increase on the beginning of the series.
And no increase after the gap. Then change the time range so it starts
in the middle of the gap:
<img width="1505" alt="image"
src="https://github.com/user-attachments/assets/f4a460c3-9fd1-4ec7-ab47-15e716ec1019"
/>
Now, there is an increase>0 because the `realPrevValue` wasn't
populated. This is wrong, because it hides the increase of the series.
With the fix, the original increase query on full time range should show
2 increases:
<img width="1492" alt="image"
src="https://github.com/user-attachments/assets/aa9d8a6b-7b22-41f6-9eb9-83b3113a6982"
/>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Since
44b071296d
`evalNumber` function no longer updating MetricName tenancy information.
This leads to mismatch in metric names between the query result and
evaluated number for all tenants other than 0:0.
For example, query `count(up) or 0` will return different results for
tenants 0:0 and 1:1 (assuming up is present for both tenants):
- tenant 0:0 - will only contain result of `count(up)`
- tenant 1:1 - will return both `count(up)` and `0` since metric names
will not be matched
This restores setting of tenancy information for metric name for
single-tenant queries.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7987
---
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
### Describe Your Changes
Binary operations like `exprFirst op exprSecond` in VictoriaMetrics are
performed in the following way:
1. Execute exprFirst.
2. Extract **common label filters** from the result of step 1.
3. Apply these common label filters to `exprSecond` and execute it, in
order to retrieve less time series from vmstorage nodes.
In step 2, only labels with less than `100` (hard-coded) value could be
used as **common label filter** (e.g. `{common_lb=~"v1|v2|...|v100"}`.
In our scenarios, a label, take `instance` label as an example, could
has thousands of candidate values. Regarding bring more pressure to
vmstorage node, it's still beneficial if labels with more than 100
values could be used as filter in `exprSecond`, with enough vmstorage
resources. After adjusting the value from `100` to `10000`, our query
round-trip time drops significantly from 5s to 2s.
This pull request change the hard-coded value into a configurable flag.
Parse cache is a pretty simple implementation of cache. It's just a
standard map with mutex.
Map with mutex overall has poor performance, plus when the cache
overflow occurs, the whole cache locks until 1k elements have been
deleted (now it's 10% of 10000 max elements in the cache). To avoid this
bottleneck and improve performance of cache on systems with many CPU
cores but keep it rather simple, we can implement cache with per bucket
locks like it's done in fastcache. The logic and API remain the same. So
now each bucket will have a map with approximately 78 elements (with 128
buckets), and overflow will occur now for each bucket, and only 7
elements need to be deleted.
Because exec_test.go has about 10k lines of code, it's better to move
the cache into a separate file to add tests and benchmarks for it,
because now it does not have them.
```
goos: windows
goarch: amd64
pkg: github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/promql
cpu: 11th Gen Intel(R) Core(TM) i9-11900K @ 3.50GHz
Current cache implementation performance on 8 cores:
BenchmarkCachePutNoOverFlow-8 1932 618372 ns/op 253 B/op 0 allocs/op
BenchmarkCacheGetNoOverflow-8 6547 211527 ns/op 0 B/op 0 allocs/op
BenchmarkCachePutGetNoOverflow-8 1873 621718 ns/op 261 B/op 0 allocs/op
BenchmarkCachePutOverflow-8 2262 464328 ns/op 32 B/op 0 allocs/op
BenchmarkCachePutGetOverflow-8 1764 655866 ns/op 38 B/op 0 allocs/op
New cache implementation performance on 8 cores:
BenchmarkCachePutNoOverFlow-8 10408 111412 ns/op 0 B/op 0 allocs/op
BenchmarkCacheGetNoOverflow-8 22407 52809 ns/op 0 B/op 0 allocs/op
BenchmarkCachePutGetNoOverflow-8 6583 168088 ns/op 0 B/op 0 allocs/op
BenchmarkCachePutOverflow-8 9822 117212 ns/op 2 B/op 0 allocs/op
BenchmarkCachePutGetOverflow-8 6481 175952 ns/op 3 B/op 0 allocs/op
Current cache implementation performance on 16 cores:
BenchmarkCachePutNoOverFlow-16 2331 475307 ns/op 218 B/op 0 allocs/op
BenchmarkCacheGetNoOverflow-16 6069 196905 ns/op 0 B/op 0 allocs/op
BenchmarkCachePutGetNoOverflow-16 1870 644236 ns/op 262 B/op 0 allocs/op
BenchmarkCachePutOverflow-16 2296 509279 ns/op 34 B/op 0 allocs/op
BenchmarkCachePutGetOverflow-16 1726 671510 ns/op 45 B/op 0 allocs/op
New cache implementation performance on 16 cores:
BenchmarkCachePutNoOverFlow-16 13549 82413 ns/op 0 B/op 0 allocs/op
BenchmarkCacheGetNoOverflow-16 30274 38997 ns/op 0 B/op 0 allocs/op
BenchmarkCachePutGetNoOverflow-16 8512 126239 ns/op 0 B/op 0 allocs/op
BenchmarkCachePutOverflow-16 13884 88124 ns/op 1 B/op 0 allocs/op
BenchmarkCachePutGetOverflow-16 7903 131299 ns/op 3 B/op 0 allocs/op
```
From the benchmarks above, we can see that the new implementation is ~5
times faster than the old one.
---------
Co-authored-by: f41gh7 <nik@victoriametrics.com>
previously vmstorage ignored limit values from vmselect component.
This behavior is prohibited starting from v1.105.0, with
85f60237e2.
This breaks the original intent of the -search.maxUniqueTimeseries command-line flag, which has been added at vmselect nodes in the commit b843f0e : to be able to override the default limit at vmstorage on the number of unique time series, at different subsets of vmselect nodes.
The behavior should be the following:
* If -search.maxUniqueTimeseries command-line flag isn't set at both vmselect and vmstorage nodes, then the limit on the number of unique time series must be automatically detected at vmstorage nodes according to
* vmstorage: automatically adjust -search.maxUniqueTimeseries max value . This simplifies configuration of VictoriaMetrics cluster for the typical case.
* If -search.maxUniqueTimeseries command-line flag is explicitly set at vmstorage node, then it must be used as the limit on the number of unique time series, without automatic detection of the limit. Explicitly set limit at vmstorage node cannot be exceeded by the limit from vmselect nodes.
* If the -search.maxUniqueTimeseries command-line flag is explicitly set at vmselect node, then it must override the automatically detected limit at vmstorage node. For example, if vmselect node provides the limit, which exceeds the automatically detected limit at vmstorage node, then the limit from the vmselect node must be applied during query execution at vmstorage node. This will allow properly executing queries from the subset of vmselect nodes for reporting queries described above.
related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7852
This PR fixes#5796. See the points 6 and 7 in `Steps to reproduce`:
> Now let's set time to only 5ms past the timestamp of the first point,
since even 199ms worked for the second point. Surprise, the point isn't
returned 💥:
>
> ```curl -s $VMQURL -d 'query=series1' -d 'time=1707123456705' -d
'step=1ms' | grep 10 # nothing!```
>
> But, 4ms works: 🤨🤔
>
> ```curl -s $VMQURL -d 'query=series1' -d 'time=1707123456704' -d
'step=1ms' | grep 10 # found```
This happens so because the actual step becomes 5ms due to jitter being
applied. THe fix is to do not apply jitter if scrape interval was not
detected (the case when vmstorage returns only one result). In this case
the scrape interval is set to `5m+step`.
An integration test has been added to check the steps to reproduce and
then to confirm that fix works. Note that the cluster tests are
currently disabled because the fix is not in cluster branch yet.
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Previously cluster with the following vmselect configuration:
./bin/vmselect
-storageNode=gr1/:8211,gr1/:8212
-storageNode=gr2/:8213,gr2/:8214
-search.skipSlowReplicas=true
-globalReplicationFactor=2
Here we have two vmstorage groups and -globalReplicationFactor=2, which effectively means that "every ingested sample is replicated across multiple vmstorage groups". Hence, gr1 and gr2 contain identical data set. And when we set -search.skipSlowReplicas=true it is expected vmselect should return result as soon as at least one storage group returned the full result.
In current state, -search.skipSlowReplicas is ignored on the storage group level. It is only respected within the group (with -replicationFactor flag).
This commit fixes global replication for skipSlowReplicas.
To ensure that the fix works and does not break
anything replication tests have been added. For checking the fix for
skipping slow replicas see `testGroupSkipSlowReplicas()`.
To emulate storage groups, the integration test creates a cluster with
multilevel vminsert. The L1 inserts are group-level inserts, each writes
to its own group of vmstorages. The L2 vminsert is a global vminsert
that writes replicated to the L1 vminserts.
To enable multilevel inserts changes in apptest framework and
`lib/ingestserver/clusternative/server.go` were necessary.
related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6924
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
### Describe Your Changes
- Fixes the handling of the `showLegend` flag.
- Fixes the handling of `alias`.
- Adds support for alias templates, allowing dynamic substitutions like
`{{label_name}}`.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7565
doing similar changes for both vmagent and vminsert (like one in
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7399) ends up
with almost same implementations for each of packages instead of having
this shared code in one place. one of the reasons is the same Timeseries
and Labels structure from different prompb and prompbmarshal packages.
My proposal is to use structures from prompb package only to
marshal/unmarshal sent/received data, but for internal transformations
use only structures from prompbmarshal package
Another example, where it already can help to simplify code is streaming
aggregation pipeline for vmsingle (now it first marshals
prompb.Timeseries to storage.MetricRow and then if streaming aggregation
or deduplication is enabled it unmarshals all the series back but to
prompbmarshal.Timeseries)
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
Previously multitenant cache was inited before flag.Parse call. It
didn't allow to change cache expiration value and default value was
always used.
This commit adds cache init at the first time cache was called.
Also this commit adds small cache improvements:
* chore for cleanup cache, it now uses common pattern for in-place items
filtering
* fail cache request fast if item is already expired
---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
This is a follow-up after 3120dc2
- Consistently use key for rollupCache in multitenant mode cache keys use different authTokens. Previously it could lead to panic in rare cases when cache state was inconsistent.
- Do not share `err` variable across goroutines for `processBlock` function. It could lead to data races.
Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7549
---------
Signed-off-by: Andrei Baidarov <abaidarov@yandex.ru>
Co-authored-by: f41gh7 <nik@victoriametrics.com>
Previously, for `^` aka pow function calls, VictoriaMetrics returned `1`
if left arg was Nan. For example, given query=`(hour()==2)^1` returns 1
for NaN produced by hour() == 2 function. It added additional non-exist
datapoints to the timeseries.
This commit port bugfix from `metricql` package and adds test for it.
Now, VictoriaMetrics
correctly returns `NaN` for such cases.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7359
Signed-off-by: f41gh7 <nik@victoriametrics.com>
(cherry picked from commit bb399518db)
This commit fixes panic for multitenant requests and empty storage node responses for tenants api.
It also optimizes `populateSqTenantTokensIfNeeded` function calls, by making it only once for query request. Previously it was incorrectly called multiple times per each storage node request.
Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7549
---------
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: f41gh7 <nik@victoriametrics.com>
Previously it incorrectly applied xFilesFactor, if it's value equal to 0.
This commit properly handles this case and returns result according to
the graphite documentation:
`xFilesFactor follows the same semantics as in Whisper storage schemas. Setting it to 0 (the default) means that only a single value in the series needs to be non-null for it to be considered non-empty, setting it to 1 means that all values in the series must be non-null. A setting of 0.5 means that at least half the values in the series must be non-null.`
Signed-off-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: Evgeniy Negriy <einegriy@avito.ru>
(cherry picked from commit d27dfac5c6)
### Describe Your Changes
I don't like this solution, but it works. Other possible solutions
described in an issue
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7068
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit a88f896b43)