Nikolay
561dd2900a
app/vminsert: properly close vmstorage connection ( #4935 )
...
* app/vminsert: properly close vmstorage connection
previously vmstorage may stuck in broken state until vminsert restarts
since vmstorage was marked as read-only and connection was broken to it.
checkReadonly function never marked connection as broken
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4870
* wip
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-09-01 18:03:53 +02:00
Nikolay
ab4c3817ed
app/vminsert: fixes readonly check ( #4892 )
...
* app/vminsert: fixes readonly check
previously vminsert doesn't check readOnly state for vmstorage, since check was never performed for nil buffer
In this case every 30 second storage node loss readonly state and received some data.
It caused re-routing and possible slow down for ingestion
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4870
* wip
---------
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-08-30 16:27:23 +02:00
Nikolay
ff8eceb9f2
app/vminsert: correctly allocate buffer for storagenodes ( #554 )
...
in case of dynamic discovery number of nodes may change and we have to allocate new buffer for this case
otherwise vminsert may panic
2023-05-08 08:52:59 -07:00
Aliaksandr Valialkin
3a25a4b1de
app/{vminsert,vmselect}: speed up TestInitStopNodes()
2022-12-03 23:53:14 -08:00
Aliaksandr Valialkin
fe8d40f12c
app/{vminsert,vmselect}: test initialization with different number of storage nodes
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3329
2022-11-09 11:48:39 +02:00
Aliaksandr Valialkin
8540dd669b
app/vminsert/netstorage: move nodesHash from global state to storageNodesBucket
...
This should prevent from panics when the list of discovered vmstorage nodes changes.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3329
2022-11-09 11:45:24 +02:00
Aliaksandr Valialkin
976bbe3677
app/{vminsert,vmselect}: limit the access to storageNodes to getStorageNodesBucket and setStorageNodesBucket functions
...
This makes the code more maintainable and earier to test.
2022-10-28 11:41:55 +03:00
Aliaksandr Valialkin
4f53147ed4
app/{vminsert,vmselect}/netstorage: allow calling Init()+MustStop() in a loop
...
Previously netstorage.MustStop() call didn't free up all the resources,
so the subsequent call to nestorage.Init() would panic.
This allows writing tests, which call nestorage.Init() + nestorage.MustStop() in a loop.
2022-10-25 14:43:05 +03:00
Nikolay
505d359b39
app/vminsert: allows parsing tenant id from labels ( #3009 )
...
* app/vminsert: allows parsing tenant id from labels
it should help mitigate issues with vmagent's multiTenant mode, which works incorrectly at heavy load
and it cannot handle more then 100 different tenants.
This functional hidden with flag and do not change vminsert default behaviour
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2970
* Update docs/Cluster-VictoriaMetrics.md
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
* wip
* app/vminsert/netstorage: clean remaining labels in order to free up GC
* docs/Cluster-VictoriaMetrics.md: typo fix
* wip
* wip
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-09-30 17:28:35 +03:00
Aliaksandr Valialkin
dc58cb3fd8
app/vminsert/netstorage: pre-initialize the remaining throttled loggers
...
This is a follow-up after 6c66804fd3
2022-08-04 18:34:42 +03:00
Aliaksandr Valialkin
6c66804fd3
all: locate throttled loggers via logger.WithThrottler() only once and then use them
...
This reduces the contention on logThrottlerRegistryMu mutex when logger.WithThrottler()
is called frequently from concurrent goroutines.
2022-06-27 12:34:30 +03:00
Aliaksandr Valialkin
dceca7e864
all: remove explicit "xxhash" name when importing github.com/cespare/xxhash/v2 package
...
This is a follow-up for fe2269b999
2022-06-21 20:27:30 +03:00
Aliaksandr Valialkin
b28c6febf9
app/{vminsert,vmselect}: add -vmstorageDialTimeout
command-line flag for tuning the maximum time needed for establishing connections to vmstorage
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/711
2022-06-20 15:17:34 +03:00
Aliaksandr Valialkin
e613ca9ba8
app/vminsert/netstorage: re-route samples from readonly vmstorage nodes to healthy nodes if -dropSamplesOnOverload
command-line flag is set
2022-05-07 01:40:02 +03:00
Aliaksandr Valialkin
925fa9a7de
docs/Cluster-VictoriaMetrics.md: make the description for -rpc.disableCompression
command-line flag more clear
2022-05-06 16:24:56 +03:00
Aliaksandr Valialkin
8752cce157
app/vminsert: reduce the max packet size, which vminsert can send to vmstorage
...
This reduces the max memory usage for vminsert and vmstorage under heavy ingestion rate
by up to 50% on production workload
2022-04-05 15:39:58 +03:00
Aliaksandr Valialkin
755e26e67b
app/vminsert/netstorage: throttle warning logs, which can be too verbose when vminsert cannot send data to vmstorage nodes
2022-02-07 15:39:06 +02:00
Aliaksandr Valialkin
9fb5ce5fb6
app/vminsert/netstorage: log vmstorage addr, which cannot accept new samples due to overload and/or unavailability
2022-02-07 15:03:05 +02:00
Aliaksandr Valialkin
5aee6eb406
app/vminsert/netstorage: tune re-routing algorithm further
2022-02-07 14:35:39 +02:00
Aliaksandr Valialkin
1a5546006d
app/vminsert/netstorage: typo fix after 021ee53ba8
2022-02-07 13:20:43 +02:00
Aliaksandr Valialkin
c738739494
app/vminsert: add -dropSamplesOnOverload
command-line flag
...
Drop incoming samples if the destination vmstorage node is unavailable
and/or accepts data at slower rate than other vmstorage nodes
2022-02-07 12:32:18 +02:00
Aliaksandr Valialkin
021ee53ba8
app/vminsert: improve re-routing logic in order to spread rows more evenly among the available storage nodes
2022-02-06 20:20:02 +02:00
Aliaksandr Valialkin
4fddcf4c83
app/{vminsert,vmstorage}: follow-up after a171916ef5
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/269
2021-10-08 14:09:51 +03:00
Nikolay
a171916ef5
Adds read-only mode for vmstorage node ( #1680 )
...
* adds read-only mode for vmstorage
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/269
* changes order a bit
* moves isFreeDiskLimitReached var to storage struct
renames functions to be consistent
change protoparser api - with optional storage limit check for given openned storage
* renames freeSpaceLimit to ReadOnly
2021-10-08 12:52:56 +03:00
Aliaksandr Valialkin
64b6f3f1c8
app/vminsert: fix uneven distribution of time series among storage nodes
...
Use distinct seed for distribution hash calculations on the second level of vminsert nodes.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1672
2021-10-07 12:22:39 +03:00
Aliaksandr Valialkin
7ad54041fe
app/{vminsert,vmselect}: automatically add missing port in -storageNode
lists passed to vminsert
and vmselect
...
This should simplify manual setup of the cluster according to https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#cluster-setup
2021-09-15 18:04:30 +03:00
Aliaksandr Valialkin
a1b298a842
app/vminsert/netstorage: disable rerouting by default
...
Production clusters work more stable with the disabled rerouting during rolling restarts and/or
during spikes in time series churn rate. So it would be better disabling the rerouting by default.
The re-routing can be enabled by passing `-disableRerouting=false` command-line flag to `vminsert` nodes.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/791
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1054
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1165
2021-09-13 18:50:47 +03:00
Aliaksandr Valialkin
dca6f0f7de
app/vminsert/netstorage: remove the limit on the number of -storageNode
addresses
...
There is no any reasons to limit the number of `-storageNode` addresses to 255.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1599
2021-09-09 19:11:36 +03:00
Aliaksandr Valialkin
9eb828b2c2
app/vminsert: add vm_rpc_send_duration_seconds_total metric per each vminsert->vmstorage
link
...
This metric is useful for determining high link saturation with the following alerting rule:
rate(vm_rpc_send_duration_seconds_total) > 0.9s
2021-08-11 11:42:33 +03:00
Aliaksandr Valialkin
c18017a9c3
app/vminsert/netstorage: sort the -storageNode
list passed to vminsert
nodes
...
This should reduce resource usage (CPU, RAM, disk IO) at vmstorage nodes
if the addresses of vmstorage nodes are passed in random order to vminsert nodes.
2021-06-23 14:00:08 +03:00
Aliaksandr Valialkin
2c6b917749
app/vminsert/netstorage: update storageNode.lastRerouteTime before the rerouting
...
This is needed for reliable detection of storage nodes with recent rerouting
2021-06-08 12:06:32 +03:00
Aliaksandr Valialkin
0d067eb112
app/vminsert/netstorage: tune re-routing algorithm
...
Do not re-route data to unavailable storage node. Send it to the remaining storage nodes instead
even if they cannot keep up with the load. This should spread the load more evenly among available
storage nodes.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/791
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1054
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1165
2021-06-05 16:23:44 +03:00
Aliaksandr Valialkin
1c09e71f5b
app/vminsert: add -disableRerouting
command-line flag for disabling re-routing if some vmstorage nodes have lower performance than the others
...
Refactor the rerouting mechanism and make it more resilient to cases when some of vmstorage nodes are temporarily unavailable.
Reduce the probability of rerouting storm.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/791
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1054
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1165
2021-06-04 04:33:52 +03:00
Aliaksandr Valialkin
4a5f45c77e
app/vminsert: add support for data ingestion via other vminsert nodes
2021-05-08 19:53:45 +03:00
Aliaksandr Valialkin
512addc608
app/{vminsert,vmagent}: add -sortLabels
command-line option for sorting time series labels before ingesting them in the storage
...
This option can be useful when samples for the same time series are ingested with distinct order of labels.
For example, metric{k1="v1",k2="v2"} and metric{k2="v2",k1="v1"}.
2021-03-31 23:27:21 +03:00
Aliaksandr Valialkin
9e79fc27c8
app/vminsert/netstorage: properly update vm_rpc_rerouted_rows_processed_total
metric
...
Previously this metric wasn't updated because of improper defer call.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/955
Thanks to @xemxx for spotting the bug.
2020-12-11 13:07:05 +02:00
Aliaksandr Valialkin
1a237c6903
all: properly handle CPU limits set on the host system/container
...
This can reduce memory usage on systems with enabled CPU limits.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/946
2020-12-08 21:07:03 +02:00
Aliaksandr Valialkin
c6adcafedb
app/vminsert: export vm_rpc_vmstorage_is_reachable
metric, which can be used for monitoring reachability of vmstorage nodes from vminsert nodes
2020-11-17 22:13:26 +02:00
Aliaksandr Valialkin
882e2e2099
app/vminsert/netstorage: return 503 status code to client when all the vmstorage nodes are unavailable
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/896
2020-11-14 00:44:41 +02:00
Aliaksandr Valialkin
ffa6581c46
app/vminsert: refresh the list of healthy storage nodes only if the the row cannot be sent to destination storage node
...
Previously the list had been generated for each rerouted row. This could consume additional CPU time during rerouting,
which could lead to rerouting slowdown.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/791
2020-09-29 01:29:24 +03:00
Aliaksandr Valialkin
bc37f1cbec
app/vminsert: do not pollute logs with repated cannot dial storageNode
errors
...
Log only the first error per -storageNode
2020-09-29 00:20:32 +03:00
Aliaksandr Valialkin
9d123eb22a
app/vminsert: remove useless delays when sending data to vmstorage
...
This improves the maximum data ingestion performance for cluster VictoriaMetrics
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/791
2020-09-28 21:41:15 +03:00
Aliaksandr Valialkin
fe08b1eb26
app/vminsert: improve error message when the data cannot be sent to vmstorage - log reroutedBR buffer size
...
This should improve debuggability for improperly configured cluster
2020-08-31 17:51:44 +03:00
Aliaksandr Valialkin
c91ccce50c
app/vminsert: fix relabeling for metrics ingested via Influx line protocol
...
Previously the enabled relabeling with `-relabelConfig` command-line flag could result in missing labels
if a single Influx line protocol message contains multiple field values.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/638
2020-07-23 13:25:37 +03:00
Aliaksandr Valialkin
6ebac3ab63
app/vminsert: add ability to apply relabeling to all the incoming metrics if -relabelConfig
command-line arg points to a file with a list of relabel_config
entries
...
See https://victoriametrics.github.io/#relabeling
2020-07-02 20:36:33 +03:00
Aliaksandr Valialkin
d962568e93
all: use %w instead of %s for wrapping errors in fmt.Errorf
...
This will simplify examining the returned errors such as httpserver.ErrorWithStatusCode .
See https://blog.golang.org/go1.13-errors for details.
2020-06-30 23:33:46 +03:00
Aliaksandr Valialkin
a586b8b6d4
app/vminsert/netstorage: do not re-route every time series to more than two vmstorage nodes when certain vmstorage nodes are temporarily slower than the rest of them
...
Previously vminsert may spread data for a single time series across all the available vmstorage nodes
when vmstorage nodes couldn't handle the given ingestion rate. This could lead to increased usage
of CPU and memory on every vmstorage node, since every vmstorage node had to register all the time
series seen in the cluster. Now a time series may spread to maximum two vmstorage nodes under heavy load.
Every time series is routed to a single vmstorage node under normal load.
2020-06-25 16:42:37 +03:00
Aliaksandr Valialkin
2fc2679a3f
app/vminsert/netstorage: remove possible race condition when broken connection may be recovered before acquiring storageNode.bcLock
2020-06-20 16:38:08 +03:00
Aliaksandr Valialkin
4400700832
app/vminsert: properly replicate data for the last RF-1
storage nodes for -replicationFactor=RF
...
Previously the data for the last `RF-1` storage noes has been incorrectly replicated to the first storage node.
2020-06-19 12:40:22 +03:00
Aliaksandr Valialkin
85c1ccb8b8
app/vminsert/netstorage: add missing return
in storageNode.checkHealth on connection failure
2020-06-18 20:51:51 +03:00