vmselect: introduce search.skipSlowReplicas cmd-line flag (#4538)

* vmselect: introduce `search.skipSlowReplicas` cmd-line flag

vmselect has two logical conditions during request processing when
`-replicationFactor` cmd-line flag is set:
1. If at least `len(storageNodes) - replicationFactor` responded, it could skip
waiting for the rest of nodes to respond. This could lead to problems described
here https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1207.
2. Mark response as partial if less than `len(storageNodes) - replicationFactor` responded
without an error.

The P1 showed itself error-prone and became the main reason why
`-replicationFactor` wasn't recommended to use at vmselect level.
However, this optimization could be still very useful in situations
when there are slow and fast replicas in cluster.

But P2 remains viable and important conditionless.
Hiding P1 behind the feature-flag `search.skipSlowReplicas`
should make `-replicationFactor` flag usable again. And let users
choose whether they want P1 to be respected.

Related issues
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1207
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/711

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* docs: update changelog

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
This commit is contained in:
Roman Khavronenko 2023-07-07 11:50:26 +02:00 committed by GitHub
parent e2a2d64053
commit 173ccf4333
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
5 changed files with 23 additions and 13 deletions

View file

@ -648,6 +648,7 @@ It is available in the [helm-charts](https://github.com/VictoriaMetrics/helm-cha
By default, VictoriaMetrics offloads replication to the underlying storage pointed by `-storageDataPath` such as [Google compute persistent disk](https://cloud.google.com/compute/docs/disks#pdspecs), which guarantees data durability. VictoriaMetrics supports application-level replication if replicated durable persistent disks cannot be used for some reason.
The replication can be enabled by passing `-replicationFactor=N` command-line flag to `vminsert`. This instructs `vminsert` to store `N` copies for every ingested sample on `N` distinct `vmstorage` nodes. This guarantees that all the stored data remains available for querying if up to `N-1` `vmstorage` nodes are unavailable.
Passing `-replicationFactor=N` command-line flag to `vmselect` instructs it to not mark responses as `partial` if less `replicationFactor` storage nodes failed to respond on query time.
The cluster must contain at least `2*N-1` `vmstorage` nodes, where `N` is replication factor, in order to maintain the given replication factor for newly ingested data when `N-1` of storage nodes are unavailable.
@ -1196,6 +1197,8 @@ Below is the output for `/path/to/vmselect -help`:
Optional authKey for resetting rollup cache via /internal/resetRollupResultCache call
-search.setLookbackToStep
Whether to fix lookback interval to 'step' query arg value. If set to true, the query model becomes closer to InfluxDB data model. If set to true, then -search.maxLookback and -search.maxStalenessInterval are ignored
-search.skipSlowReplicas
Whether to skip waiting for all replicas to respond during search query. Enabling this setting may improve query speed by serving results from the fastest vmstorage replicas in the cluster. But could also lead to incomplete results if replicas contain data gaps. Consider enabling this setting only if all replicas contain identical data.
-search.treatDotsAsIsInRegexps
Whether to treat dots as is in regexp label filters used in queries. For example, foo{bar=~"a.b.c"} will be automatically converted to foo{bar=~"a\\.b\\.c"}, i.e. all the dots in regexp filters will be automatically escaped in order to match only dot char instead of matching any char. Dots in ".+", ".*" and ".{n}" regexps aren't escaped. This option is DEPRECATED in favor of {__graphite__="a.*.c"} syntax for selecting metrics matching the given Graphite metrics filter
-selectNode array

View file

@ -37,6 +37,9 @@ var (
"vmselect cancels responses from the slowest -replicationFactor-1 vmstorage nodes if -replicationFactor is set by assuming it already received complete data. "+
"It isn't recommended setting this flag to values other than 1 at vmselect nodes, since it may result in incomplete responses "+
"after adding new vmstorage nodes even if the replication is enabled at vminsert nodes")
skipSlowReplicas = flag.Bool("search.skipSlowReplicas", false, "Whether to skip waiting for all replicas to respond during search query. "+
"Enabling this setting may improve query speed by serving results from the fastest vmstorage replicas in the cluster. "+
"But could also lead to incomplete results if replicas contain data gaps. Consider enabling this setting only if all replicas contain identical data.")
maxSamplesPerSeries = flag.Int("search.maxSamplesPerSeries", 30e6, "The maximum number of raw samples a single query can scan per each time series. See also -search.maxSamplesPerQuery")
maxSamplesPerQuery = flag.Int("search.maxSamplesPerQuery", 1e9, "The maximum number of raw samples a single query can process across all time series. This protects from heavy queries, which select unexpectedly high number of raw samples. See also -search.maxSamplesPerSeries")
vmstorageDialTimeout = flag.Duration("vmstorageDialTimeout", 5*time.Second, "Timeout for establishing RPC connections from vmselect to vmstorage")
@ -1726,6 +1729,18 @@ func (snr *storageNodesRequest) collectResults(partialResultsCounter *metrics.Co
result := <-snr.resultsCh
if err := f(result.data); err != nil {
snr.finishQueryTracer(result.qt, fmt.Sprintf("error: %s", err))
resultsCollected++
if *skipSlowReplicas && resultsCollected > len(sns)-*replicationFactor {
// There is no need in waiting for the remaining results,
// because the collected results contain all the data according to the given -replicationFactor.
// This should speed up responses when a part of vmstorage nodes are slow and/or temporarily unavailable.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/711
//
// It is expected that cap(snr.resultsCh) == len(sns), otherwise goroutine leak is possible.
snr.finishQueryTracers(fmt.Sprintf("cancel request because %d out of %d nodes already returned response according to -replicationFactor=%d",
resultsCollected, len(sns), *replicationFactor))
return false, nil
}
var er *errRemote
if errors.As(err, &er) {
// Immediately return the error reported by vmstorage to the caller,
@ -1752,18 +1767,6 @@ func (snr *storageNodesRequest) collectResults(partialResultsCounter *metrics.Co
continue
}
snr.finishQueryTracer(result.qt, "")
resultsCollected++
if resultsCollected > len(sns)-*replicationFactor {
// There is no need in waiting for the remaining results,
// because the collected results contain all the data according to the given -replicationFactor.
// This should speed up responses when a part of vmstorage nodes are slow and/or temporarily unavailable.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/711
//
// It is expected that cap(snr.resultsCh) == len(sns), otherwise goroutine leak is possible.
snr.finishQueryTracers(fmt.Sprintf("cancel request because %d out of %d nodes already returned response according to -replicationFactor=%d",
resultsCollected, len(sns), *replicationFactor))
return false, nil
}
}
if len(errsPartial) < *replicationFactor {
// Assume that the result is full if the the number of failing vmstorage nodes

View file

@ -65,6 +65,7 @@ Released at 2023-06-30
* BUGFIX: [storage](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html): prevent from possible crashloop after the migration from versions below `v1.90.0` to newer versions. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4336) for details.
* BUGFIX: [vmui](https://docs.victoriametrics.com/#vmui): fix a memory leak issue associated with chart updates. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4455).
* BUGFIX: [vmbackupmanager](https://docs.victoriametrics.com/vmbackupmanager.html): fix removing storage data dir before restoring from backup.
* BUGFIX: vmselect: wait for all vmstorage nodes to respond when the `-replicationFactor` flag is set bigger than > 1. Before, vmselect could have [skip waiting for the slowest replicas](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/711) to respond. This could have resulted in issues illustrated [here](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1207). Now, this optimization is disabled by default and could be re-enabled by passing `-search.skipSlowReplicas` cmd-line flag to vmselect. See more details [here](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4538).
## [v1.91.2](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.91.2)

View file

@ -659,6 +659,7 @@ It is available in the [helm-charts](https://github.com/VictoriaMetrics/helm-cha
By default, VictoriaMetrics offloads replication to the underlying storage pointed by `-storageDataPath` such as [Google compute persistent disk](https://cloud.google.com/compute/docs/disks#pdspecs), which guarantees data durability. VictoriaMetrics supports application-level replication if replicated durable persistent disks cannot be used for some reason.
The replication can be enabled by passing `-replicationFactor=N` command-line flag to `vminsert`. This instructs `vminsert` to store `N` copies for every ingested sample on `N` distinct `vmstorage` nodes. This guarantees that all the stored data remains available for querying if up to `N-1` `vmstorage` nodes are unavailable.
Passing `-replicationFactor=N` command-line flag to `vmselect` instructs it to not mark responses as `partial` if less `replicationFactor` storage nodes failed to respond on query time.
The cluster must contain at least `2*N-1` `vmstorage` nodes, where `N` is replication factor, in order to maintain the given replication factor for newly ingested data when `N-1` of storage nodes are unavailable.
@ -1207,6 +1208,8 @@ Below is the output for `/path/to/vmselect -help`:
Optional authKey for resetting rollup cache via /internal/resetRollupResultCache call
-search.setLookbackToStep
Whether to fix lookback interval to 'step' query arg value. If set to true, the query model becomes closer to InfluxDB data model. If set to true, then -search.maxLookback and -search.maxStalenessInterval are ignored
-search.skipSlowReplicas
Whether to skip waiting for all replicas to respond during search query. Enabling this setting may improve query speed by serving results from the fastest vmstorage replicas in the cluster. But could also lead to incomplete results if replicas contain data gaps. Consider enabling this setting only if all replicas contain identical data.
-search.treatDotsAsIsInRegexps
Whether to treat dots as is in regexp label filters used in queries. For example, foo{bar=~"a.b.c"} will be automatically converted to foo{bar=~"a\\.b\\.c"}, i.e. all the dots in regexp filters will be automatically escaped in order to match only dot char instead of matching any char. Dots in ".+", ".*" and ".{n}" regexps aren't escaped. This option is DEPRECATED in favor of {__graphite__="a.*.c"} syntax for selecting metrics matching the given Graphite metrics filter
-selectNode array

View file

@ -932,7 +932,7 @@ The shortlist of configuration flags is the following:
-datasource.disableKeepAlive
Whether to disable long-lived connections to the datasource. If true, disables HTTP keep-alives and will only use the connection to the server for a single HTTP request.
-datasource.disableStepParam
Whether to disable adding 'step' param to the issued instant queries. This might be useful when using vmalert with datasources that do not support 'step' param for instant queries, like Google Managed Prometheus. It is not recommended to enable this flag if you use vmalert to query VictoriaMetrics.
Whether to disable adding 'step' param to the issued instant queries. This might be useful when using vmalert with datasources that do not support 'step' param for instant queries, like Google Managed Prometheus. It is not recommended to enable this flag if you use vmalert with VictoriaMetrics.
-datasource.headers string
Optional HTTP extraHeaders to send with each request to the corresponding -datasource.url. For example, -datasource.headers='My-Auth:foobar' would send 'My-Auth: foobar' HTTP header with every request to the corresponding -datasource.url. Multiple headers must be delimited by '^^': -datasource.headers='header1:value1^^header2:value2'
-datasource.lookback duration