Reason for revert:
There are many statsd servers exist:
- https://github.com/statsd/statsd - classical statsd server
- https://docs.datadoghq.com/developers/dogstatsd/ - statsd server from DataDog built into DatDog Agent ( https://docs.datadoghq.com/agent/ )
- https://github.com/avito-tech/bioyino - high-performance statsd server
- https://github.com/atlassian/gostatsd - statsd server in Go
- https://github.com/prometheus/statsd_exporter - statsd server, which exposes the aggregated data as Prometheus metrics
These servers can be used for efficient aggregating of statsd data and sending it to VictoriaMetrics
according to https://docs.victoriametrics.com/#how-to-send-data-from-graphite-compatible-agents-such-as-statsd (
the https://github.com/prometheus/statsd_exporter can be scraped as usual Prometheus target
according to https://docs.victoriametrics.com/#how-to-scrape-prometheus-exporters-such-as-node-exporter ).
Adding support for statsd data ingestion protocol into VictoriaMetrics makes sense only if it provides
significant advantages over the existing statsd servers, while has no significant drawbacks comparing
to existing statsd servers.
The main advantage of statsd server built into VictoriaMetrics and vmagent - getting rid of additional statsd server.
The main drawback is non-trivial and inconvenient streaming aggregation configs, which must be used for the ingested statsd metrics (
see https://docs.victoriametrics.com/stream-aggregation/ ). These configs are incompatible with the configs for standalone statsd servers.
So you need to manually translate configs of the used statsd server to stream aggregation configs when migrating
from standalone statsd server to statsd server built into VictoriaMetrics (or vmagent).
Another important drawback is that it is very easy to shoot yourself in the foot when using built-in statsd server
with the -statsd.disableAggregationEnforcement command-line flag or with improperly configured streaming aggregation.
In this case the ingested statsd metrics will be stored to VictoriaMetrics as is without any aggregation.
This may result in high CPU usage during data ingestion, high disk space usage for storing all the unaggregated
statsd metrics and high CPU usage during querying, since all the unaggregated metrics must be read, unpacked and processed
during querying.
P.S. Built-in statsd server can be added to VictoriaMetrics and vmagent after figuring out more ergonomic
specialized configuration for aggregating of statsd metrics. The main requirements for this configuration:
- easy to write, read and update (ideally it should work out of the box for most cases without additional configuration)
- hard to misconfigure (e.g. hard to shoot yourself in the foot)
It would be great if this configuration will be compatible with the configuration of the most widely used statsd server.
In the mean time it is recommended continue using external statsd server.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6265
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5053
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5052
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/206
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4600
* app/vmstorage: close vminsert connections gradually before stopping storage
Implements graceful shutdown approach suggested here - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4922#issuecomment-1768146878
Test results for this can be found here - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4922#issuecomment-1790640274
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* app/vmstorage: update graceful shutdown logic
- close connections from vminsert in determenistic order
- update flag description
- lower default timeout to 25 seconds. 25 seconds value was chosen because the lowest default value used in default configuration deployments is 30s(default value in Kubernetes and ansible-playbooks).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* docs/cluster: add information about re-routing enhancement during restart
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* docs/changelog: add entry for new command-line flag
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* {app/vmstorage,lib/ingestserver}: address review feedback
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* docs/cluster: add note to update workload scheduler timeout
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* wip
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
* {lib/server, app/}: use `httpAuth.*` flag as fallback for `*AuthKey` if it is not set
* lib/ingestserver/opentsdbhttp: fix opentdb HTTP handler not respecting `httpAuth.*` flags
* Apply suggestions from code review
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Previously bytesutil.Resize() was copying the original byte slice contents to a newly allocated slice.
This wasted CPU cycles and memory bandwidth in some places, where the original slice contents wasn't needed
after slize resizing. Switch such places to bytesutil.ResizeNoCopy().
Rename the original bytesutil.Resize() function to bytesutil.ResizeWithCopy() for the sake of improved readability.
Additionally, allocate new slice with `make()` instead of `append()`. This guarantees that the capacity of the allocated slice
exactly matches the requested size. The `append()` could return a slice with bigger capacity as an optimization for further `append()` calls.
This could result in excess memory usage when the returned byte slice was cached (for instance, in lib/blockcache).
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
Previously the duration for graceful shutdown for http server could take more than a minute
because of imporperly set timeouts in setNetworkTimeout.
Now typical duration for graceful shutdown should be reduced to less than 5 seconds.