app/vmselect/netstorage: follow-up after 173ccf4333

- Clarify docs about -replicationFactor command-line flag at vmselect
- Clarify description for -replicationFactor and -search.skipSlowReplicas command-line flags
- Fix the logic for returning responses if -search.skipSlowReplicas command-line flag
  is enabled. The logic was broken in the 173ccf4333,
  so it could return responses only if some of vmstorage nodes return error,
  while it should return when query results are successfully collected from more than
  (len(storageNodes) - replicationFactor) vmstorage nodes.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1207
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/711
This commit is contained in:
Aliaksandr Valialkin 2023-07-09 11:58:12 -07:00
parent 3c2308fd52
commit e1a2404db5
No known key found for this signature in database
GPG key ID: A72BEC6CD3D0DED1
3 changed files with 33 additions and 24 deletions

View file

@ -513,6 +513,10 @@ The cluster works in the following way when some of `vmstorage` nodes are unavai
In this case `vmselect` returns an error if at least a single `vmstorage` node is unavailable. In this case `vmselect` returns an error if at least a single `vmstorage` node is unavailable.
Another option is to pass `deny_partial_response=1` query arg to requests to `vmselect` nodes. Another option is to pass `deny_partial_response=1` query arg to requests to `vmselect` nodes.
`vmselect` also accepts `-replicationFactor=N` command-line flag. This flag instructs `vmselect` to return full response
if less than `-replicationFactor` vmstorage nodes are unavailable during querying, since it assumes that the remaining
`vmstorage` nodes contain the full data. See [these docs](#replication-and-data-safety) for details.
`vmselect` doesn't serve partial responses for API handlers returning raw datapoints - [`/api/v1/export*` endpoints](https://docs.victoriametrics.com/#how-to-export-time-series), since users usually expect this data is always complete. `vmselect` doesn't serve partial responses for API handlers returning raw datapoints - [`/api/v1/export*` endpoints](https://docs.victoriametrics.com/#how-to-export-time-series), since users usually expect this data is always complete.
Data replication can be used for increasing storage durability. See [these docs](#replication-and-data-safety) for details. Data replication can be used for increasing storage durability. See [these docs](#replication-and-data-safety) for details.
@ -648,7 +652,8 @@ It is available in the [helm-charts](https://github.com/VictoriaMetrics/helm-cha
By default, VictoriaMetrics offloads replication to the underlying storage pointed by `-storageDataPath` such as [Google compute persistent disk](https://cloud.google.com/compute/docs/disks#pdspecs), which guarantees data durability. VictoriaMetrics supports application-level replication if replicated durable persistent disks cannot be used for some reason. By default, VictoriaMetrics offloads replication to the underlying storage pointed by `-storageDataPath` such as [Google compute persistent disk](https://cloud.google.com/compute/docs/disks#pdspecs), which guarantees data durability. VictoriaMetrics supports application-level replication if replicated durable persistent disks cannot be used for some reason.
The replication can be enabled by passing `-replicationFactor=N` command-line flag to `vminsert`. This instructs `vminsert` to store `N` copies for every ingested sample on `N` distinct `vmstorage` nodes. This guarantees that all the stored data remains available for querying if up to `N-1` `vmstorage` nodes are unavailable. The replication can be enabled by passing `-replicationFactor=N` command-line flag to `vminsert`. This instructs `vminsert` to store `N` copies for every ingested sample on `N` distinct `vmstorage` nodes. This guarantees that all the stored data remains available for querying if up to `N-1` `vmstorage` nodes are unavailable.
Passing `-replicationFactor=N` command-line flag to `vmselect` instructs it to not mark responses as `partial` if less `replicationFactor` storage nodes failed to respond on query time.
Passing `-replicationFactor=N` command-line flag to `vmselect` instructs it to not mark responses as `partial` if less than `-replicationFactor` vmstorage nodes are unavailable during the query. See [cluster availability docs](#cluster-availability) for details.
The cluster must contain at least `2*N-1` `vmstorage` nodes, where `N` is replication factor, in order to maintain the given replication factor for newly ingested data when `N-1` of storage nodes are unavailable. The cluster must contain at least `2*N-1` `vmstorage` nodes, where `N` is replication factor, in order to maintain the given replication factor for newly ingested data when `N-1` of storage nodes are unavailable.
@ -1117,7 +1122,7 @@ Below is the output for `/path/to/vmselect -help`:
Optional URL to push metrics exposed at /metrics page. See https://docs.victoriametrics.com/#push-metrics . By default, metrics exposed at /metrics page aren't pushed to any remote storage Optional URL to push metrics exposed at /metrics page. See https://docs.victoriametrics.com/#push-metrics . By default, metrics exposed at /metrics page aren't pushed to any remote storage
Supports an array of values separated by comma or specified via multiple flags. Supports an array of values separated by comma or specified via multiple flags.
-replicationFactor int -replicationFactor int
How many copies of every time series is available on vmstorage nodes. vmselect cancels responses from the slowest -replicationFactor-1 vmstorage nodes if -replicationFactor is set by assuming it already received complete data. It isn't recommended setting this flag to values other than 1 at vmselect nodes, since it may result in incomplete responses after adding new vmstorage nodes even if the replication is enabled at vminsert nodes (default 1) How many copies of every time series is available on the provided -storageNode nodes. vmselect continues returning full responses when up to replicationFactor-1 vmstorage nodes are temporarily unavailable during querying. See also -search.skipSlowReplicas (default 1)
-search.cacheTimestampOffset duration -search.cacheTimestampOffset duration
The maximum duration since the current time for response data, which is always queried from the original raw data, without using the response cache. Increase this value if you see gaps in responses due to time synchronization issues between VictoriaMetrics and data sources (default 5m0s) The maximum duration since the current time for response data, which is always queried from the original raw data, without using the response cache. Increase this value if you see gaps in responses due to time synchronization issues between VictoriaMetrics and data sources (default 5m0s)
-search.denyPartialResponse -search.denyPartialResponse
@ -1198,7 +1203,7 @@ Below is the output for `/path/to/vmselect -help`:
-search.setLookbackToStep -search.setLookbackToStep
Whether to fix lookback interval to 'step' query arg value. If set to true, the query model becomes closer to InfluxDB data model. If set to true, then -search.maxLookback and -search.maxStalenessInterval are ignored Whether to fix lookback interval to 'step' query arg value. If set to true, the query model becomes closer to InfluxDB data model. If set to true, then -search.maxLookback and -search.maxStalenessInterval are ignored
-search.skipSlowReplicas -search.skipSlowReplicas
Whether to skip waiting for all replicas to respond during search query. Enabling this setting may improve query speed by serving results from the fastest vmstorage replicas in the cluster. But could also lead to incomplete results if replicas contain data gaps. Consider enabling this setting only if all replicas contain identical data. Whether to skip -replicationFactor - 1 slowest vmstorage nodes during querying. Enabling this setting may improve query speed, but it could also lead to incomplete results if some queried data has less than -replicationFactor copies at vmstorage nodes. Consider enabling this setting only if all the queried data contains -replicationFactor copies in the cluster
-search.treatDotsAsIsInRegexps -search.treatDotsAsIsInRegexps
Whether to treat dots as is in regexp label filters used in queries. For example, foo{bar=~"a.b.c"} will be automatically converted to foo{bar=~"a\\.b\\.c"}, i.e. all the dots in regexp filters will be automatically escaped in order to match only dot char instead of matching any char. Dots in ".+", ".*" and ".{n}" regexps aren't escaped. This option is DEPRECATED in favor of {__graphite__="a.*.c"} syntax for selecting metrics matching the given Graphite metrics filter Whether to treat dots as is in regexp label filters used in queries. For example, foo{bar=~"a.b.c"} will be automatically converted to foo{bar=~"a\\.b\\.c"}, i.e. all the dots in regexp filters will be automatically escaped in order to match only dot char instead of matching any char. Dots in ".+", ".*" and ".{n}" regexps aren't escaped. This option is DEPRECATED in favor of {__graphite__="a.*.c"} syntax for selecting metrics matching the given Graphite metrics filter
-selectNode array -selectNode array

View file

@ -33,13 +33,12 @@ import (
) )
var ( var (
replicationFactor = flag.Int("replicationFactor", 1, "How many copies of every time series is available on vmstorage nodes. "+ replicationFactor = flag.Int("replicationFactor", 1, "How many copies of every time series is available on the provided -storageNode nodes. "+
"vmselect cancels responses from the slowest -replicationFactor-1 vmstorage nodes if -replicationFactor is set by assuming it already received complete data. "+ "vmselect continues returning full responses when up to replicationFactor-1 vmstorage nodes are temporarily unavailable during querying. "+
"It isn't recommended setting this flag to values other than 1 at vmselect nodes, since it may result in incomplete responses "+ "See also -search.skipSlowReplicas")
"after adding new vmstorage nodes even if the replication is enabled at vminsert nodes") skipSlowReplicas = flag.Bool("search.skipSlowReplicas", false, "Whether to skip -replicationFactor - 1 slowest vmstorage nodes during querying. "+
skipSlowReplicas = flag.Bool("search.skipSlowReplicas", false, "Whether to skip waiting for all replicas to respond during search query. "+ "Enabling this setting may improve query speed, but it could also lead to incomplete results if some queried data has less than -replicationFactor "+
"Enabling this setting may improve query speed by serving results from the fastest vmstorage replicas in the cluster. "+ "copies at vmstorage nodes. Consider enabling this setting only if all the queried data contains -replicationFactor copies in the cluster")
"But could also lead to incomplete results if replicas contain data gaps. Consider enabling this setting only if all replicas contain identical data.")
maxSamplesPerSeries = flag.Int("search.maxSamplesPerSeries", 30e6, "The maximum number of raw samples a single query can scan per each time series. See also -search.maxSamplesPerQuery") maxSamplesPerSeries = flag.Int("search.maxSamplesPerSeries", 30e6, "The maximum number of raw samples a single query can scan per each time series. See also -search.maxSamplesPerQuery")
maxSamplesPerQuery = flag.Int("search.maxSamplesPerQuery", 1e9, "The maximum number of raw samples a single query can process across all time series. This protects from heavy queries, which select unexpectedly high number of raw samples. See also -search.maxSamplesPerSeries") maxSamplesPerQuery = flag.Int("search.maxSamplesPerQuery", 1e9, "The maximum number of raw samples a single query can process across all time series. This protects from heavy queries, which select unexpectedly high number of raw samples. See also -search.maxSamplesPerSeries")
vmstorageDialTimeout = flag.Duration("vmstorageDialTimeout", 5*time.Second, "Timeout for establishing RPC connections from vmselect to vmstorage") vmstorageDialTimeout = flag.Duration("vmstorageDialTimeout", 5*time.Second, "Timeout for establishing RPC connections from vmselect to vmstorage")
@ -1729,17 +1728,6 @@ func (snr *storageNodesRequest) collectResults(partialResultsCounter *metrics.Co
result := <-snr.resultsCh result := <-snr.resultsCh
if err := f(result.data); err != nil { if err := f(result.data); err != nil {
snr.finishQueryTracer(result.qt, fmt.Sprintf("error: %s", err)) snr.finishQueryTracer(result.qt, fmt.Sprintf("error: %s", err))
if *skipSlowReplicas && resultsCollected > len(sns)-*replicationFactor {
// There is no need in waiting for the remaining results,
// because the collected results contain all the data according to the given -replicationFactor.
// This should speed up responses when a part of vmstorage nodes are slow and/or temporarily unavailable.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/711
//
// It is expected that cap(snr.resultsCh) == len(sns), otherwise goroutine leak is possible.
snr.finishQueryTracers(fmt.Sprintf("cancel request because %d out of %d nodes already returned response according to -replicationFactor=%d",
resultsCollected, len(sns), *replicationFactor))
return false, nil
}
var er *errRemote var er *errRemote
if errors.As(err, &er) { if errors.As(err, &er) {
// Immediately return the error reported by vmstorage to the caller, // Immediately return the error reported by vmstorage to the caller,
@ -1767,6 +1755,17 @@ func (snr *storageNodesRequest) collectResults(partialResultsCounter *metrics.Co
} }
snr.finishQueryTracer(result.qt, "") snr.finishQueryTracer(result.qt, "")
resultsCollected++ resultsCollected++
if *skipSlowReplicas && resultsCollected > len(sns)-*replicationFactor {
// There is no need in waiting for the remaining results,
// because the collected results contain all the data according to the given -replicationFactor.
// This should speed up responses when a part of vmstorage nodes are slow and/or temporarily unavailable.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/711
//
// It is expected that cap(snr.resultsCh) == len(sns), otherwise goroutine leak is possible.
snr.finishQueryTracers(fmt.Sprintf("cancel request because -search.skipSlowReplicas is set and %d out of %d nodes already returned response "+
"according to -replicationFactor=%d", resultsCollected, len(sns), *replicationFactor))
return false, nil
}
} }
if len(errsPartial) < *replicationFactor { if len(errsPartial) < *replicationFactor {
// Assume that the result is full if the the number of failing vmstorage nodes // Assume that the result is full if the the number of failing vmstorage nodes

View file

@ -524,6 +524,10 @@ The cluster works in the following way when some of `vmstorage` nodes are unavai
In this case `vmselect` returns an error if at least a single `vmstorage` node is unavailable. In this case `vmselect` returns an error if at least a single `vmstorage` node is unavailable.
Another option is to pass `deny_partial_response=1` query arg to requests to `vmselect` nodes. Another option is to pass `deny_partial_response=1` query arg to requests to `vmselect` nodes.
`vmselect` also accepts `-replicationFactor=N` command-line flag. This flag instructs `vmselect` to return full response
if less than `-replicationFactor` vmstorage nodes are unavailable during querying, since it assumes that the remaining
`vmstorage` nodes contain the full data. See [these docs](#replication-and-data-safety) for details.
`vmselect` doesn't serve partial responses for API handlers returning raw datapoints - [`/api/v1/export*` endpoints](https://docs.victoriametrics.com/#how-to-export-time-series), since users usually expect this data is always complete. `vmselect` doesn't serve partial responses for API handlers returning raw datapoints - [`/api/v1/export*` endpoints](https://docs.victoriametrics.com/#how-to-export-time-series), since users usually expect this data is always complete.
Data replication can be used for increasing storage durability. See [these docs](#replication-and-data-safety) for details. Data replication can be used for increasing storage durability. See [these docs](#replication-and-data-safety) for details.
@ -659,7 +663,8 @@ It is available in the [helm-charts](https://github.com/VictoriaMetrics/helm-cha
By default, VictoriaMetrics offloads replication to the underlying storage pointed by `-storageDataPath` such as [Google compute persistent disk](https://cloud.google.com/compute/docs/disks#pdspecs), which guarantees data durability. VictoriaMetrics supports application-level replication if replicated durable persistent disks cannot be used for some reason. By default, VictoriaMetrics offloads replication to the underlying storage pointed by `-storageDataPath` such as [Google compute persistent disk](https://cloud.google.com/compute/docs/disks#pdspecs), which guarantees data durability. VictoriaMetrics supports application-level replication if replicated durable persistent disks cannot be used for some reason.
The replication can be enabled by passing `-replicationFactor=N` command-line flag to `vminsert`. This instructs `vminsert` to store `N` copies for every ingested sample on `N` distinct `vmstorage` nodes. This guarantees that all the stored data remains available for querying if up to `N-1` `vmstorage` nodes are unavailable. The replication can be enabled by passing `-replicationFactor=N` command-line flag to `vminsert`. This instructs `vminsert` to store `N` copies for every ingested sample on `N` distinct `vmstorage` nodes. This guarantees that all the stored data remains available for querying if up to `N-1` `vmstorage` nodes are unavailable.
Passing `-replicationFactor=N` command-line flag to `vmselect` instructs it to not mark responses as `partial` if less `replicationFactor` storage nodes failed to respond on query time.
Passing `-replicationFactor=N` command-line flag to `vmselect` instructs it to not mark responses as `partial` if less than `-replicationFactor` vmstorage nodes are unavailable during the query. See [cluster availability docs](#cluster-availability) for details.
The cluster must contain at least `2*N-1` `vmstorage` nodes, where `N` is replication factor, in order to maintain the given replication factor for newly ingested data when `N-1` of storage nodes are unavailable. The cluster must contain at least `2*N-1` `vmstorage` nodes, where `N` is replication factor, in order to maintain the given replication factor for newly ingested data when `N-1` of storage nodes are unavailable.
@ -1128,7 +1133,7 @@ Below is the output for `/path/to/vmselect -help`:
Optional URL to push metrics exposed at /metrics page. See https://docs.victoriametrics.com/#push-metrics . By default, metrics exposed at /metrics page aren't pushed to any remote storage Optional URL to push metrics exposed at /metrics page. See https://docs.victoriametrics.com/#push-metrics . By default, metrics exposed at /metrics page aren't pushed to any remote storage
Supports an array of values separated by comma or specified via multiple flags. Supports an array of values separated by comma or specified via multiple flags.
-replicationFactor int -replicationFactor int
How many copies of every time series is available on vmstorage nodes. vmselect cancels responses from the slowest -replicationFactor-1 vmstorage nodes if -replicationFactor is set by assuming it already received complete data. It isn't recommended setting this flag to values other than 1 at vmselect nodes, since it may result in incomplete responses after adding new vmstorage nodes even if the replication is enabled at vminsert nodes (default 1) How many copies of every time series is available on the provided -storageNode nodes. vmselect continues returning full responses when up to replicationFactor-1 vmstorage nodes are temporarily unavailable during querying. See also -search.skipSlowReplicas (default 1)
-search.cacheTimestampOffset duration -search.cacheTimestampOffset duration
The maximum duration since the current time for response data, which is always queried from the original raw data, without using the response cache. Increase this value if you see gaps in responses due to time synchronization issues between VictoriaMetrics and data sources (default 5m0s) The maximum duration since the current time for response data, which is always queried from the original raw data, without using the response cache. Increase this value if you see gaps in responses due to time synchronization issues between VictoriaMetrics and data sources (default 5m0s)
-search.denyPartialResponse -search.denyPartialResponse
@ -1209,7 +1214,7 @@ Below is the output for `/path/to/vmselect -help`:
-search.setLookbackToStep -search.setLookbackToStep
Whether to fix lookback interval to 'step' query arg value. If set to true, the query model becomes closer to InfluxDB data model. If set to true, then -search.maxLookback and -search.maxStalenessInterval are ignored Whether to fix lookback interval to 'step' query arg value. If set to true, the query model becomes closer to InfluxDB data model. If set to true, then -search.maxLookback and -search.maxStalenessInterval are ignored
-search.skipSlowReplicas -search.skipSlowReplicas
Whether to skip waiting for all replicas to respond during search query. Enabling this setting may improve query speed by serving results from the fastest vmstorage replicas in the cluster. But could also lead to incomplete results if replicas contain data gaps. Consider enabling this setting only if all replicas contain identical data. Whether to skip -replicationFactor - 1 slowest vmstorage nodes during querying. Enabling this setting may improve query speed, but it could also lead to incomplete results if some queried data has less than -replicationFactor copies at vmstorage nodes. Consider enabling this setting only if all the queried data contains -replicationFactor copies in the cluster
-search.treatDotsAsIsInRegexps -search.treatDotsAsIsInRegexps
Whether to treat dots as is in regexp label filters used in queries. For example, foo{bar=~"a.b.c"} will be automatically converted to foo{bar=~"a\\.b\\.c"}, i.e. all the dots in regexp filters will be automatically escaped in order to match only dot char instead of matching any char. Dots in ".+", ".*" and ".{n}" regexps aren't escaped. This option is DEPRECATED in favor of {__graphite__="a.*.c"} syntax for selecting metrics matching the given Graphite metrics filter Whether to treat dots as is in regexp label filters used in queries. For example, foo{bar=~"a.b.c"} will be automatically converted to foo{bar=~"a\\.b\\.c"}, i.e. all the dots in regexp filters will be automatically escaped in order to match only dot char instead of matching any char. Dots in ".+", ".*" and ".{n}" regexps aren't escaped. This option is DEPRECATED in favor of {__graphite__="a.*.c"} syntax for selecting metrics matching the given Graphite metrics filter
-selectNode array -selectNode array