### Describe Your Changes
* check if `lastValue` was seen at least twice with different
timestamps. Otherwise, the difference between last timestamp and
previous timestamp could be `0` and will result into `NaN` calculation
* check if there items left in lastValue map after staleness cleanup.
Otherwise, `rate_avg` could have produce `NaN` result.
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
This change fixes the following panic:
```
2024-06-04T11:16:52.899Z warn app/vmauth/auth_config.go:353 cannot discover backend SRV records for http://srv+localhost:8080: lookup localhost on 10.100.10.4:53: server misbehaving; use it literally
panic: runtime error: integer divide by zero
goroutine 9 [running]:
github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver.handlerWrapper.func1()
/Users/lhhdz/wd/projects/go/VictoriaMetrics/lib/httpserver/httpserver.go:291 +0x58
panic({0x103115100?, 0x10338d700?})
/Users/lhhdz/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.22.3.darwin-arm64/src/runtime/panic.go:770 +0x124
main.getLeastLoadedBackendURL({0x0?, 0x22?, 0x1400014757b?}, 0x1400013c120?)
/Users/lhhdz/wd/projects/go/VictoriaMetrics/app/vmauth/auth_config.go:473 +0x210
main.(*URLPrefix).getBackendURL(0x140000aa080)
/Users/lhhdz/wd/projects/go/VictoriaMetrics/app/vmauth/auth_config.go:312 +0xb8
```
---------
Co-authored-by: Haley Wang <haley@victoriametrics.com>
It is better to remove deprecated flag completely, so vmctl will
fail if this flag is used and user can immediately fix the issue.
Before, flag was ignored and it is worse then fail fast.
follow-up after 8b46bb0c41 (diff-2bfab3db5cc1baf4c6d3ff6b19901926e3bdf4411ec685dac973e5fcff1c723b)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* adds idleConnTimeout flag, which must reduce probability of `broken
pipe` and `connection reset` errors.
* one-time retry trivial network requests for the same backend
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
* deprecate `--vm-disable-progress-bar` in favour of `--disable-progress-bar`
* new `--disable-progress-bar` consistently disables usage of progress bar
for all migration modes.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6367
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
- Use bytesutil.InternString() instead of strings.Clone() for inputKey and outputKey in aggregatorpushSamples().
This should reduce string allocation rate, since strings can be re-used between aggrState flushes.
- Reduce memory allocations at dedupAggrShard by storing dedupAggrSample by value in the active series map.
- Remove duplicate call to bytesutil.InternBytes() at Deduplicator, since it is already called inside dedupAggr.pushSamples().
- Add missing string interning at rateAggrState.pushSamples().
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6402
The main change is getting rid of interning of sample key. It was
discovered that for cases with many unique time series aggregated by
vmagent interned keys could grow up to hundreds of millions of objects.
This has negative impact on the following aspects:
1. It slows down garbage collection cycles, as GC has to scan all inuse
objects periodically. The higher is the number of inuse objects, the
longer it takes/the more CPU it takes.
2. It slows down the hot path of samples aggregation where each key
needs to be looked up in the map first.
The change makes code more fragile, but suppose to provide performance
optimization for heavy-loaded vmagents with stream aggregation enabled.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
### Describe Your Changes
Added streamaggr metrics to:
- `vm_streamaggr_samples_lag_seconds` - samples lag
- `vm_streamaggr_ignored_samples_total{reason="nan"}` - ignored NaN
samples
- `vm_streamaggr_ignored_samples_total{reason="too_old"}` - ignored old
samples
### Describe Your Changes
Fix for issue #6396: according to rmdir manpage, ENOTEMPTY and EEXIST
should be treated equally
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6396
### Checklist
The following checks are **mandatory**:
- [x ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Co-authored-by: Ludovic Pollet <ludovic.pollet@exfo.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
Unsupported path is already handled by `lib/httpserver`.
This prevents from misleading errors in logs caused by double-writing response headers.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
### Describe Your Changes
Scratch based images will be using a separate tag: "(version)-scratch"
and will be built for the same architecture as regular images.
This is useful for environments with higher security standards. In this
case using alpine as base layer requires updating images more frequently
in order to get the latest updates for the base image, even in case the
user did not need to update VictoriaMetrics version.
Tested that scratch images work for:
- vmagent - enterprise with kafka and opensource
- cluster
- single-node
No issues observed so far.
cc: @tenmozes
### Checklist
The following checks are **mandatory**:
- [x] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
* "*.idleConnTimeout" flags must reduce probability of `write: broken
pipe` and `read: connection reset by peer` errors Those errors may occur
if remote server closes TCP socket for connection, while it's still
exist at client.
* single time retries for `write: broken pipe` and `read: connection
reset by peer` must handle a case for incorrectly configured timeouts at
middleware proxies, mitigate minor network issues.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5661
### Describe Your Changes
Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.
---------
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
It allows to create alert for possible item drops at indexdb. It may
happen, if ingested metric size exceeds max indexdb item size.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
* It must reduce memory usage for misbehaving clients. Since
VictoriaMetrics stores sparse index inmemory.
* Reduce disk space usage for indexdb.
* Prevent possible indexDB items drops.
* It may trigger slow insert and new timeseries registration due to
default value for flag change
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6176
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
This should improve precision of `restarts` and `version change` annotations when
zooming-in/zooming-out on the dashboards.
The change also makes `restarts` dashboard visible on the panels, so user can disable it from
displaying if needed. This could be useful when restarts overlap with version change events.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* The new flag can be used for for skipping TLS certificates
verification when connecting to S3 endpoint. Affects vmbackup,
vmrestore, vmbackupmanager.
* replace deprecated `EndpointResolver` with `BaseEndpoint`
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1056
Signed-off-by: hagen1778 <roman@victoriametrics.com>
The error appears when executing annotations query against Prometheus backend
because the query itself hasn't specified look-behind window (which is allowed
in VictoriaMetrics query engine).
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6309
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Prevent excessive resource usage when stream aggregation config file
contains no matchers by prevent pushing data into Aggregators object.
Before this change a lot of extra work was invoked without reason.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Occasionally, vmagent sends empty blocks to downstream servers. If a
downstream server returns an unexpected response, vmagent gets stuck in
a retry loop. While vmagent handles 400 and 409 errors, there are
various prometheus remote write implementations that return different
error codes. For example, vector returns a 422 error. To mitigate the
risk of vmagent getting stuck in a retry loop, it is advisable to skip
sending empty blocks to downstream servers.
Co-authored-by: hao.peng <hao.peng@smartx.com>
Co-authored-by: Zhu Jiekun <jiekun.dev@gmail.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
Allocations are reduced by implementing custom json parser via fastjson
lib.
The change also re-uses `promInstant` object in attempt to reduce number
of
allocations when parsing big responses, as usually happens with heavy
recording rules.
```
name old allocs/op new allocs/op delta
ParsePrometheusResponse/Instant-10 9.65k ± 0% 5.60k ± 0% ~ (p=1.000 n=1+1)
```
Signed-off-by: hagen1778 <roman@victoriametrics.com>
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
it's needed to remove Summary metric type from the global state of
metrics package. metrics package tracks each bucket of summary and
periodically swaps old buckets with new.
Simple set unregister is not enough to release memory used by Set
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6247
Added `rate` and `rate_avg` output
Resource usage is the same as for increase output, tested on a benchmark
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
* FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent.html):
allow configuring `-remoteWrite.disableOnDiskQueue` and
`-remoteWrite.dropSamplesOnOverload` cmd-line flags per each
`-remoteWrite.url`. See this [pull
request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6065).
Thanks to @rbizos for implementaion!
* FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent.html): add
labels `path` and `url` to metrics
`vmagent_remotewrite_push_failures_total` and
`vmagent_remotewrite_samples_dropped_total`. Now number of failed pushes
and dropped samples can be tracked per `-remoteWrite.url`.
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Raphael Bizos <r.bizos@criteo.com>
Set correct suffix `<output>_prometheus` for aggregation outputs
`increase_prometheus` and `total_prometheus`
Before, outputs `total` and `total_prometheus` or `increase` and
`increase_prometheus` had the same suffix.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Though labels compressor is quite resource intensive, each aggregator
and deduplicator instance has it's own compressor. Made it shared across
all aggregators to consume less resources while using multiple
aggregators.
Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>