### Describe Your Changes
#8342
fix negative rate result when the lookbehind window is longer than
`-search.maxLookback` or `-search.maxStalenessInterval` and data
contains gap.
This issue was introduced since
[v1.110.0](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8072).
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8072
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
`doInternal` has adaptive staleness detection mechanism. It is
calculated using timestamp distance between samples in selected list of
samples. It is dynamic because VM can store signals from many sources
with different samples resolution. And while it works for most of cases,
there are edge cases for rollup functions that are comparing values
between windows: increase, increase_pure, delta.
The edge case 1.
There was a gap between series because of the missed scrape or two. In
this case staleness will trigger and increase-like functions will assume
the value they need to compare with is 0. In result, this could produce
spikes for a flappy data - see
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/894 This
problem was solved by introducing a `realPrevValue` field -
1f19c167a4.
It stores the closest real sample value on selected interval and is used
for comparison when samples start after the gap.
The edge case 2.
`realPrevValue` doesn't respect staleness interval. In result, even if
gap between samples is huge (hours), the increase-like functions will
not consider it as a new series that started from 0. See
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8002.
Covering both edge cases is tricky, because `realPrevValue` has to
respect and not respect the staleness interval in the same time. In
other words, it should be able to ignore periodic missing gaps, but
reset if the gap is too big. While "too big gap" can't be figured out
empirically, I suggest using `-search.maxStalenessInterval` for this
purpose. If `-search.maxStalenessInterval` is set to 0 (default), then
`realPrevValue` ignores staleness interval. If
`-search.maxStalenessInterval` is > 0, then `realPrevValue` respects it
as a staleness interval.
### Checklist
The following checks are **mandatory**:
- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
New version has additional checks and reduced resource consumption, so
it doesn't timeout for our internal repos.
To make linter happy, I addressed "redefinition of the built-in
function" lint error.
----
Signed-off-by: hagen1778 <roman@victoriametrics.com>
When vmselect process a rollup function it fetches all the raw samples
on requested `start-end` interval of the query. It then loops through
the raw samples, picks the range of the samples based on provided `step`
interval and invokes a rollup function for each of the picked ranges of
samples.
During this processing, vmselect always populates the `realPrevValue`
field with the closest previous raw sample value before the picked range
of samples. This `realPrevValue` is used by rollup functions like
increase_pure or delta to decide whether the counter change happened or
not. For example, we get the counter value == 1. If we've seen this
counter before and its value was also 1 - then no change happened. If we
didn't see it before, then this counter should have started with value=0
and we need to account for `1-0=1` change. All this is required to deal
with situations when scrapes are missing or `step` is too small.
However, vmselect doesn't check how "old" is the `realPrevValue`. In
other words, it doesn't respect the staleness interval when picking it.
In result, depending on the `start` and `end` params, vmselect can use
`realPrevValue` which is a couple of hours old and is unlikely to be a
temporary scrape fail. In result, some increases can be incorrectly
ingnored by vmselect.
This change makes sure that vmselect doesn't populate `realPrevValue`
with samples that are older than staleness interval.
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
### Checklist
The following checks are **mandatory**:
- [ x ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
-------------------
To reproduce, create a dataset with one metric `foo` which has samples
with value=1 on interval of couple of hours and resolution 15s, and a
gap for an hour in the middle:
<img width="769" alt="image"
src="https://github.com/user-attachments/assets/a39b2740-b741-45f8-ad18-093b7c57c3b3"
/>
Then run `increase(foo[1m])` expression on this time range (disable
cache):
<img width="1472" alt="image"
src="https://github.com/user-attachments/assets/463cece1-f359-4c75-a96c-60092a31cab2"
/>
In result, there will be one increase on the beginning of the series.
And no increase after the gap. Then change the time range so it starts
in the middle of the gap:
<img width="1505" alt="image"
src="https://github.com/user-attachments/assets/f4a460c3-9fd1-4ec7-ab47-15e716ec1019"
/>
Now, there is an increase>0 because the `realPrevValue` wasn't
populated. This is wrong, because it hides the increase of the series.
With the fix, the original increase query on full time range should show
2 increases:
<img width="1492" alt="image"
src="https://github.com/user-attachments/assets/aa9d8a6b-7b22-41f6-9eb9-83b3113a6982"
/>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
This PR fixes#5796. See the points 6 and 7 in `Steps to reproduce`:
> Now let's set time to only 5ms past the timestamp of the first point,
since even 199ms worked for the second point. Surprise, the point isn't
returned 💥:
>
> ```curl -s $VMQURL -d 'query=series1' -d 'time=1707123456705' -d
'step=1ms' | grep 10 # nothing!```
>
> But, 4ms works: 🤨🤔
>
> ```curl -s $VMQURL -d 'query=series1' -d 'time=1707123456704' -d
'step=1ms' | grep 10 # found```
This happens so because the actual step becomes 5ms due to jitter being
applied. THe fix is to do not apply jitter if scrape interval was not
detected (the case when vmstorage returns only one result). In this case
the scrape interval is set to `5m+step`.
An integration test has been added to check the steps to reproduce and
then to confirm that fix works. Note that the cluster tests are
currently disabled because the fix is not in cluster branch yet.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
'any' type is supported starting from Go1.18. Let's consistently use it
instead of 'interface{}' type across the code base, since `any` is easier to read than 'interface{}'.
Prevsiously they were swapped - the first arg should be the label name and the second arg should be label filters
This is a follow-up for e389b7b959e8144fdff5075bf7a5a39b2b0c6dd3
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5847
This should improve maintainability of the code related to rollup functions,
since it is located in rollup.go
While at it, properly return empty results from holt_winters(), rate_over_sum(),
sum2_over_time(), geomean_over_time() and distinct_over_time() when there are no real samples
on the selected lookbehind window. Previously the previous sample value was mistakenly
returned from these functions.
changes(), increases_over_time() and resets() shouldn't take into account
value changes, which may occur because of precision errors.
The maximum guaranteed precision for raw samples stored in VictoriaMetrics is 12 decimal digits.
So do not count relative changes for values if they are smaller than 1e-12 comparing to the value.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/767
* app/vmselect/promql: properly handle possible negative results caused by float operations precision error in rollup functions like rate() or increase()
* fix test
* app/vmselect: drop `rollupDefault` function as duplicate
It is unclear why there are two identical fns `rollupDefault`
and `rollupDistinct`. Dropping one of them.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* Update app/vmselect/promql/rollup.go
* Update app/vmselect/promql/rollup.go
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Previously the lower bound could be too small, which could result in missing values at the beginning of the graph
for default_rollup() function. This function is automatically applied to all the series selectors if they aren't
explicitly wrapped into a rollup function - see https://docs.victoriametrics.com/MetricsQL.html#implicit-query-conversions
While at it, properly take into account `-search.minStalenessInterval` command-line flag when adjusting
the lower bound for the selected time range.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5388
`median_over_time` is handled by predefined WITH template in MetricsQL library which translates it to `quantile_over_time(0.5)`
This makes it impossble to use `median_over_time` as a usual rollup function for `aggr_over_time`.
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5034
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* metricsql: support optional 2nd argument for rollup functions
Support optional 2nd argument `min`, `max` or `avg` for rollup functions:
* rollup
* rollup_delta
* rollup_deriv
* rollup_increase
* rollup_rate
* rollup_scrape_interval
If second argument is passed, then rollup function will return only the selected aggregation type.
This change can be useful for situations where only one type of rollup calculation is needed.
For example, `rollup_rate(requests_total[5m], "max")`.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* wip
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Previously too short lookbehind window d for rate(m[d]) could be automatically extended
if it didn't cover at least two raw samples. This was needed in order to guarantee
non-empty results from rate(m[d]) on short time ranges.
Now the lookbehind window isn't extended if it is set explicitly,
since it is expected that the user knows what he is doing.
The lookbehind window continues to be extended when needed if it isn't set explicitly.
For example, in the case of rate(m).
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3483
The workaround was introduced to fix https://github.com/VictoriaMetrics/VictoriaMetrics/issues/962.
However, it didn't prove itself useful. Instead, it is recommended using `increase_pure` function.
Removing the workaround makes VM to produce accurate results when calculating
`delta` or `increase` functions over slow-changing counters with vary intervals
between data points.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The change allows to specify default value for `getScrapeInterval`
function when actual interval can't be calculated.
Before the change, function were returning `maxSilenceInterval` (5m)
in such cases, which may be not correct for instant queries processing.
The specific scenario where using `maxSilenceInterval` caused issues
is the following:
1. Series becomes stale;
2. Client (in this case vmalert) continues to request series every 15s;
3. Database returns empty results as expected;
4. But at some specific moment of time database returns datapoints from `now()-5m`,
because lookback window was extended to `maxSilenceInterval`.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* Document changes_prometheus(), increase_prometheus() and delta_prometheus() functions.
* Simplify their implementation
* Mention these functions in docs/CHANGELOG.md