### Describe Your Changes
In most cases histograms are exposed in sorted manner with lower buckets
being first. This means that during scraping buckets with lower bounds
have higher chance of being updated earlier than upper ones.
Previously, values were propagated from upper to lower bounds, which
means that in most cases that would produce results higher than expected
once all buckets will become updated.
Propagating from upper bound effectively limits highest value of
histogram to the value of previous scrape. Once the data will become
consistent in the subsequent evaluation this causes spikes in the
result.
Changing propagation to be from lower to higher buckets reduces value
spikes in most cases due to nature of the original inconsistency.
See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4580
An example histogram with previous(red) and updated(blue) versions:

This also makes logic of filling nan values with lower buckets values: [1 2 3 nan nan nan] => [1 2 3 3 3 3] obsolete.
Since buckets are now fixed from lower ones to upper this happens in the main loop, so there is no need in a second one.
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Andrii Chubatiuk <andrew.chubatiuk@gmail.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 6a4bd5049b)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
fix#6421
some aggregation func don't return \_\_name\_\_ value
(cherry picked from commit 65f414acee)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Unsupported path is already handled by `lib/httpserver`.
This prevents from misleading errors in logs caused by double-writing response headers.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit a5f81f67fd)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Occasionally, vmagent sends empty blocks to downstream servers. If a
downstream server returns an unexpected response, vmagent gets stuck in
a retry loop. While vmagent handles 400 and 409 errors, there are
various prometheus remote write implementations that return different
error codes. For example, vector returns a 422 error. To mitigate the
risk of vmagent getting stuck in a retry loop, it is advisable to skip
sending empty blocks to downstream servers.
Co-authored-by: hao.peng <hao.peng@smartx.com>
Co-authored-by: Zhu Jiekun <jiekun.dev@gmail.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit 3661373cc2)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
During shutdown period of vmalert, remotewrite client retrieve all pending time series from buffer queue, compose them into 1 batch and execute remote write.
This final batch may exceed the limit of -remoteWrite.maxBatchSize, and be rejected by the receiver (gateway, vmcluster or others).
This changes ensures that even during shutdown vmalert won't exceed the max batch size limit for remote write
destination.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6025
(cherry picked from commit 623d257faf)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* vmalert: fix sending alert messages
1. fix `endsAt` field in messages that send to alertmanager, previously rule with small interval could never be triggered;
2. fix behavior of `-rule.resendDelay`, before it could prevent sending firing message when rule state is volatile.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
(cherry picked from commit d7224b2d1c)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* app/{vmagent,vminsert}: adds /v1/metrics suffix for opentelemetry route path
it must fix compatibility with opentemetry-collector [spec](https://opentelemetry.io/docs/specs/otlp/\#otlphttp-request)
this suffix is hard-coded and cannot be changed with collector configuration
* Apply suggestions from code review
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* metricsql: fix label_join() when `dst_label` is equal to one of the `src_label`
* Update app/vmselect/promql/transform.go
* Update docs/CHANGELOG.md
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* [lib/promutils, lib/httputils] fixed floating-point error when parsing time in RFC3339 format (#5801)
* fixed tests
* fixed test
* Revert "fixed test"
This reverts commit 8a29764806.
* Revert "fixed tests"
This reverts commit 9ce13d1042.
* Revert "[lib/promutils, lib/httputils] fixed floating-point error when parsing time in RFC3339 format (#5801)"
This reverts commit a7a04bd4
* [lib/httputils] fixed floating-point error when parsing time in RFC3339 format (#5801)
---------
Co-authored-by: Nikolay <nik@victoriametrics.com>
changes(), increases_over_time() and resets() shouldn't take into account
value changes, which may occur because of precision errors.
The maximum guaranteed precision for raw samples stored in VictoriaMetrics is 12 decimal digits.
So do not count relative changes for values if they are smaller than 1e-12 comparing to the value.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/767
* app/vmselect/promql: properly handle possible negative results caused by float operations precision error in rollup functions like rate() or increase()
* fix test
Properly determine time range search for instant queries with too big look-behind window like `foo[100y]`.
Previously, such queries could return empty responses even if `foo` is present in database.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5553
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Before, retries happened only on writes into a network connection
between source and destination. But errors returned by server after
all the data was transmitted were logged, but not retried.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
app/vmselect/promql/eval.go:evalAggrFunc shunts evaluation
of AggrFuncExpr over rollupFunc over MetricsExpr to an optimized
path. tryGetArgRollupFuncWithMetricExpr() checks whether expression
can be shunted, but it mangles the AggrFuncExpr when the aggregation
function has more than one argument. This results in queries like
`sum(aggr_over_time("avg_over_time",m))` failing with error message
'expecting at least 2 args to "aggr_over_time"; got 1 args' while
the analogous query `sum(avg_over_time(m))` executes successfully.
This fix removes the unnecessary mangling.
Signed-off-by: Anton Tykhyy <atykhyy@gmail.com>