diff --git a/README.md b/README.md index a5d60ed41f..e7c770df56 100644 --- a/README.md +++ b/README.md @@ -648,6 +648,7 @@ It is available in the [helm-charts](https://github.com/VictoriaMetrics/helm-cha By default, VictoriaMetrics offloads replication to the underlying storage pointed by `-storageDataPath` such as [Google compute persistent disk](https://cloud.google.com/compute/docs/disks#pdspecs), which guarantees data durability. VictoriaMetrics supports application-level replication if replicated durable persistent disks cannot be used for some reason. The replication can be enabled by passing `-replicationFactor=N` command-line flag to `vminsert`. This instructs `vminsert` to store `N` copies for every ingested sample on `N` distinct `vmstorage` nodes. This guarantees that all the stored data remains available for querying if up to `N-1` `vmstorage` nodes are unavailable. +Passing `-replicationFactor=N` command-line flag to `vmselect` instructs it to not mark responses as `partial` if less `replicationFactor` storage nodes failed to respond on query time. The cluster must contain at least `2*N-1` `vmstorage` nodes, where `N` is replication factor, in order to maintain the given replication factor for newly ingested data when `N-1` of storage nodes are unavailable. @@ -1196,6 +1197,8 @@ Below is the output for `/path/to/vmselect -help`: Optional authKey for resetting rollup cache via /internal/resetRollupResultCache call -search.setLookbackToStep Whether to fix lookback interval to 'step' query arg value. If set to true, the query model becomes closer to InfluxDB data model. If set to true, then -search.maxLookback and -search.maxStalenessInterval are ignored + -search.skipSlowReplicas + Whether to skip waiting for all replicas to respond during search query. Enabling this setting may improve query speed by serving results from the fastest vmstorage replicas in the cluster. But could also lead to incomplete results if replicas contain data gaps. Consider enabling this setting only if all replicas contain identical data. -search.treatDotsAsIsInRegexps Whether to treat dots as is in regexp label filters used in queries. For example, foo{bar=~"a.b.c"} will be automatically converted to foo{bar=~"a\\.b\\.c"}, i.e. all the dots in regexp filters will be automatically escaped in order to match only dot char instead of matching any char. Dots in ".+", ".*" and ".{n}" regexps aren't escaped. This option is DEPRECATED in favor of {__graphite__="a.*.c"} syntax for selecting metrics matching the given Graphite metrics filter -selectNode array diff --git a/app/vmselect/netstorage/netstorage.go b/app/vmselect/netstorage/netstorage.go index 5c5e654dfb..d47adf5329 100644 --- a/app/vmselect/netstorage/netstorage.go +++ b/app/vmselect/netstorage/netstorage.go @@ -37,6 +37,9 @@ var ( "vmselect cancels responses from the slowest -replicationFactor-1 vmstorage nodes if -replicationFactor is set by assuming it already received complete data. "+ "It isn't recommended setting this flag to values other than 1 at vmselect nodes, since it may result in incomplete responses "+ "after adding new vmstorage nodes even if the replication is enabled at vminsert nodes") + skipSlowReplicas = flag.Bool("search.skipSlowReplicas", false, "Whether to skip waiting for all replicas to respond during search query. "+ + "Enabling this setting may improve query speed by serving results from the fastest vmstorage replicas in the cluster. "+ + "But could also lead to incomplete results if replicas contain data gaps. Consider enabling this setting only if all replicas contain identical data.") maxSamplesPerSeries = flag.Int("search.maxSamplesPerSeries", 30e6, "The maximum number of raw samples a single query can scan per each time series. See also -search.maxSamplesPerQuery") maxSamplesPerQuery = flag.Int("search.maxSamplesPerQuery", 1e9, "The maximum number of raw samples a single query can process across all time series. This protects from heavy queries, which select unexpectedly high number of raw samples. See also -search.maxSamplesPerSeries") vmstorageDialTimeout = flag.Duration("vmstorageDialTimeout", 5*time.Second, "Timeout for establishing RPC connections from vmselect to vmstorage") @@ -1726,6 +1729,18 @@ func (snr *storageNodesRequest) collectResults(partialResultsCounter *metrics.Co result := <-snr.resultsCh if err := f(result.data); err != nil { snr.finishQueryTracer(result.qt, fmt.Sprintf("error: %s", err)) + resultsCollected++ + if *skipSlowReplicas && resultsCollected > len(sns)-*replicationFactor { + // There is no need in waiting for the remaining results, + // because the collected results contain all the data according to the given -replicationFactor. + // This should speed up responses when a part of vmstorage nodes are slow and/or temporarily unavailable. + // See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/711 + // + // It is expected that cap(snr.resultsCh) == len(sns), otherwise goroutine leak is possible. + snr.finishQueryTracers(fmt.Sprintf("cancel request because %d out of %d nodes already returned response according to -replicationFactor=%d", + resultsCollected, len(sns), *replicationFactor)) + return false, nil + } var er *errRemote if errors.As(err, &er) { // Immediately return the error reported by vmstorage to the caller, @@ -1752,18 +1767,6 @@ func (snr *storageNodesRequest) collectResults(partialResultsCounter *metrics.Co continue } snr.finishQueryTracer(result.qt, "") - resultsCollected++ - if resultsCollected > len(sns)-*replicationFactor { - // There is no need in waiting for the remaining results, - // because the collected results contain all the data according to the given -replicationFactor. - // This should speed up responses when a part of vmstorage nodes are slow and/or temporarily unavailable. - // See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/711 - // - // It is expected that cap(snr.resultsCh) == len(sns), otherwise goroutine leak is possible. - snr.finishQueryTracers(fmt.Sprintf("cancel request because %d out of %d nodes already returned response according to -replicationFactor=%d", - resultsCollected, len(sns), *replicationFactor)) - return false, nil - } } if len(errsPartial) < *replicationFactor { // Assume that the result is full if the the number of failing vmstorage nodes diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md index 22c7b9bce6..37fea4bb8d 100644 --- a/docs/CHANGELOG.md +++ b/docs/CHANGELOG.md @@ -65,6 +65,7 @@ Released at 2023-06-30 * BUGFIX: [storage](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html): prevent from possible crashloop after the migration from versions below `v1.90.0` to newer versions. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4336) for details. * BUGFIX: [vmui](https://docs.victoriametrics.com/#vmui): fix a memory leak issue associated with chart updates. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4455). * BUGFIX: [vmbackupmanager](https://docs.victoriametrics.com/vmbackupmanager.html): fix removing storage data dir before restoring from backup. +* BUGFIX: vmselect: wait for all vmstorage nodes to respond when the `-replicationFactor` flag is set bigger than > 1. Before, vmselect could have [skip waiting for the slowest replicas](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/711) to respond. This could have resulted in issues illustrated [here](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1207). Now, this optimization is disabled by default and could be re-enabled by passing `-search.skipSlowReplicas` cmd-line flag to vmselect. See more details [here](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4538). ## [v1.91.2](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.91.2) diff --git a/docs/Cluster-VictoriaMetrics.md b/docs/Cluster-VictoriaMetrics.md index 4f6a9cedcb..b6503b8ad1 100644 --- a/docs/Cluster-VictoriaMetrics.md +++ b/docs/Cluster-VictoriaMetrics.md @@ -659,6 +659,7 @@ It is available in the [helm-charts](https://github.com/VictoriaMetrics/helm-cha By default, VictoriaMetrics offloads replication to the underlying storage pointed by `-storageDataPath` such as [Google compute persistent disk](https://cloud.google.com/compute/docs/disks#pdspecs), which guarantees data durability. VictoriaMetrics supports application-level replication if replicated durable persistent disks cannot be used for some reason. The replication can be enabled by passing `-replicationFactor=N` command-line flag to `vminsert`. This instructs `vminsert` to store `N` copies for every ingested sample on `N` distinct `vmstorage` nodes. This guarantees that all the stored data remains available for querying if up to `N-1` `vmstorage` nodes are unavailable. +Passing `-replicationFactor=N` command-line flag to `vmselect` instructs it to not mark responses as `partial` if less `replicationFactor` storage nodes failed to respond on query time. The cluster must contain at least `2*N-1` `vmstorage` nodes, where `N` is replication factor, in order to maintain the given replication factor for newly ingested data when `N-1` of storage nodes are unavailable. @@ -1207,6 +1208,8 @@ Below is the output for `/path/to/vmselect -help`: Optional authKey for resetting rollup cache via /internal/resetRollupResultCache call -search.setLookbackToStep Whether to fix lookback interval to 'step' query arg value. If set to true, the query model becomes closer to InfluxDB data model. If set to true, then -search.maxLookback and -search.maxStalenessInterval are ignored + -search.skipSlowReplicas + Whether to skip waiting for all replicas to respond during search query. Enabling this setting may improve query speed by serving results from the fastest vmstorage replicas in the cluster. But could also lead to incomplete results if replicas contain data gaps. Consider enabling this setting only if all replicas contain identical data. -search.treatDotsAsIsInRegexps Whether to treat dots as is in regexp label filters used in queries. For example, foo{bar=~"a.b.c"} will be automatically converted to foo{bar=~"a\\.b\\.c"}, i.e. all the dots in regexp filters will be automatically escaped in order to match only dot char instead of matching any char. Dots in ".+", ".*" and ".{n}" regexps aren't escaped. This option is DEPRECATED in favor of {__graphite__="a.*.c"} syntax for selecting metrics matching the given Graphite metrics filter -selectNode array diff --git a/docs/vmalert.md b/docs/vmalert.md index a914402a56..9646759ab4 100644 --- a/docs/vmalert.md +++ b/docs/vmalert.md @@ -932,7 +932,7 @@ The shortlist of configuration flags is the following: -datasource.disableKeepAlive Whether to disable long-lived connections to the datasource. If true, disables HTTP keep-alives and will only use the connection to the server for a single HTTP request. -datasource.disableStepParam - Whether to disable adding 'step' param to the issued instant queries. This might be useful when using vmalert with datasources that do not support 'step' param for instant queries, like Google Managed Prometheus. It is not recommended to enable this flag if you use vmalert to query VictoriaMetrics. + Whether to disable adding 'step' param to the issued instant queries. This might be useful when using vmalert with datasources that do not support 'step' param for instant queries, like Google Managed Prometheus. It is not recommended to enable this flag if you use vmalert with VictoriaMetrics. -datasource.headers string Optional HTTP extraHeaders to send with each request to the corresponding -datasource.url. For example, -datasource.headers='My-Auth:foobar' would send 'My-Auth: foobar' HTTP header with every request to the corresponding -datasource.url. Multiple headers must be delimited by '^^': -datasource.headers='header1:value1^^header2:value2' -datasource.lookback duration