mirror of
https://github.com/VictoriaMetrics/VictoriaMetrics.git
synced 2025-03-11 15:34:56 +00:00
app/vmselect/netstorage: add support for the ability to set cross-group replication factor at vmselect
The cross-group replication factor can be set via `-globalReplicationFactor` command-line flag at vmselect. In this case vmselect continues returning full responses if up to globalReplicationFactor-1 groups are unavailable. See https://docs.victoriametrics.com/cluster-victoriametrics/#vmstorage-groups-at-vmselect for details. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6054
This commit is contained in:
parent
e138f4827e
commit
90794e84bc
4 changed files with 124 additions and 60 deletions
66
README.md
66
README.md
|
@ -773,38 +773,60 @@ These issues are addressed by [vmagent](https://docs.victoriametrics.com/vmagent
|
|||
|
||||
`vmselect` can be configured to query multiple distinct groups of `vmstorage` nodes with individual `-replicationFactor` per each group.
|
||||
The following format for `-storageNode` command-line flag value should be used for assigning a particular `addr` of `vmstorage` to a particular `groupName` -
|
||||
`-storageNode=groupName/addr`. For example, the following command runs `vmselect`, which continues returning full responses if up to one node per each group is temporarily unavailable
|
||||
`-storageNode=groupName/addr`. The `groupName` can contain arbitrary value. The only rule is that every `vmstorage` group must have an unique name.
|
||||
|
||||
For example, the following command runs `vmselect`, which continues returning full responses if up to one node per each group is temporarily unavailable
|
||||
because the given `-replicationFactor=2` is applied individually per each group:
|
||||
|
||||
```
|
||||
/path/to/vmselect \
|
||||
-replicationFactor=2 \
|
||||
-storageNode=group1/host1 \
|
||||
-storageNode=group1/host2 \
|
||||
-storageNode=group1/host3 \
|
||||
-storageNode=group2/host4 \
|
||||
-storageNode=group2/host5 \
|
||||
-storageNode=group2/host6 \
|
||||
-storageNode=group3/host7 \
|
||||
-storageNode=group3/host8 \
|
||||
-storageNode=group3/host9
|
||||
-storageNode=g1/host1,g1/host2,g1/host3 \
|
||||
-storageNode=g2/host4,g2/host5,g2/host6 \
|
||||
-storageNode=g3/host7,g3/host8,g3/host9
|
||||
```
|
||||
|
||||
It is possible to specify distinct `-replicationFactor` per each group via the following format - `-replicationFactor=groupName:rf`.
|
||||
For example, the following command runs `vmselect`, which uses `-replicationFactor=3` for the `group1`, while it uses `-replicationFactor=1` for the `group2`:
|
||||
It is possible specifying distinct `-replicationFactor` per each group via the following format - `-replicationFactor=groupName:rf`.
|
||||
For example, the following command runs `vmselect`, which uses `-replicationFactor=3` for the group `g1`, `-replicationFactor=2` for the group `g2`
|
||||
and `-replicationFactor=1` for the group `g3`:
|
||||
|
||||
```
|
||||
/path/to/vmselect \
|
||||
-replicationFactor=group1:3 \
|
||||
-storageNode=group1/host1 \
|
||||
-storageNode=group1/host2 \
|
||||
-storageNode=group1/host3 \
|
||||
-replicationFactor=group2:1 \
|
||||
-storageNode=group2/host4 \
|
||||
-storageNode=group2/host5 \
|
||||
-storageNode=group2/host6
|
||||
-replicationFactor=g1:3 \
|
||||
-storageNode=g1/host1,g1/host2,g1/host3 \
|
||||
-replicationFactor=g2:2 \
|
||||
-storageNode=g2/host4,g2/host5,g2/host6 \
|
||||
-replicationFactor=g3:1 \
|
||||
-storageNode=g3/host4,g3/host5,g3/host6
|
||||
```
|
||||
|
||||
If every ingested sample is replicated across multiple `vmstorage` groups, then pass `-globalReplicationFactor=N` command-line flag to `vmselect`,
|
||||
so it could continue returning full responses if up to `N-1` `vmstorage` groups are temporarily unavailable.
|
||||
For example, the following command runs `vmselect`, which continues returning full responses if any number of `vmstorage` nodes
|
||||
in a single `vmstorage` group are temporarily unavailable:
|
||||
|
||||
```
|
||||
/path/to/vmselect \
|
||||
-globalReplicationFactor=2 \
|
||||
-storageNode=g1/host1,g1/host2,g1/host3 \
|
||||
-storageNode=g2/host4,g2/host5,g2/host6 \
|
||||
-storageNode=g3/host7,g3/host8,g3/host9
|
||||
```
|
||||
|
||||
It is OK to mix `-replicationFactor` and `-globalReplicationFactor`. For example, the folling command runs `vmselect`, which continues returning full responses
|
||||
if any number of `vmstorage` nodes in a single `vmstorage` group are temporarily unavailable and the remaining groups contain up to two unavailable `vmstorage` node:
|
||||
|
||||
```
|
||||
/path/to/vmselect \
|
||||
-globalReplicationFactor=2 \
|
||||
-replicationFactor=3 \
|
||||
-storageNode=g1/host1,g1/host2,g1/host3 \
|
||||
-storageNode=g2/host4,g2/host5,g2/host6 \
|
||||
-storageNode=g3/host7,g3/host8,g3/host9
|
||||
```
|
||||
|
||||
See also [multi-level cluster setup](#multi-level-cluster-setup).
|
||||
|
||||
## Helm
|
||||
|
||||
Helm chart simplifies managing cluster version of VictoriaMetrics in Kubernetes.
|
||||
|
@ -1360,6 +1382,8 @@ Below is the output for `/path/to/vmselect -help`:
|
|||
Flag value can be read from the given file when using -flagsAuthKey=file:///abs/path/to/file or -flagsAuthKey=file://./relative/path/to/file . Flag value can be read from the given http/https url when using -flagsAuthKey=http://host/path or -flagsAuthKey=https://host/path
|
||||
-fs.disableMmap
|
||||
Whether to use pread() instead of mmap() for reading data files. By default, mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
|
||||
-globalReplicationFactor int
|
||||
How many copies of every ingested sample is available across vmstorage groups. vmselect continues returning full responses when up to globalReplicationFactor-1 vmstorage groups are temporarily unavailable. See https://docs.victoriametrics.com/cluster-victoriametrics/#vmstorage-groups-at-vmselect . See also -replicationFactor (default 1)
|
||||
-http.connTimeout duration
|
||||
Incoming connections to -httpListenAddr are closed after the configured timeout. This may help evenly spreading load among a cluster of services behind TCP-level load balancer. Zero value disables closing of incoming connections (default 2m0s)
|
||||
-http.disableResponseCompression
|
||||
|
@ -1461,7 +1485,7 @@ Below is the output for `/path/to/vmselect -help`:
|
|||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
Value can contain comma inside single-quoted or double-quoted string, {}, [] and () braces.
|
||||
-replicationFactor array
|
||||
How many copies of every time series is available on the provided -storageNode nodes. vmselect continues returning full responses when up to replicationFactor-1 vmstorage nodes are temporarily unavailable during querying. See also -search.skipSlowReplicas (default 1)
|
||||
How many copies of every ingested sample is available across -storageNode nodes. vmselect continues returning full responses when up to replicationFactor-1 vmstorage nodes are temporarily unavailable. See also -globalReplicationFactor and -search.skipSlowReplicas (default 1)
|
||||
Supports an array of `key:value` entries separated by comma or specified via multiple flags.
|
||||
-search.cacheTimestampOffset duration
|
||||
The maximum duration since the current time for response data, which is always queried from the original raw data, without using the response cache. Increase this value if you see gaps in responses due to time synchronization issues between VictoriaMetrics and data sources (default 5m0s)
|
||||
|
|
|
@ -35,9 +35,12 @@ import (
|
|||
)
|
||||
|
||||
var (
|
||||
replicationFactor = flagutil.NewDictInt("replicationFactor", 1, "How many copies of every time series is available on the provided -storageNode nodes. "+
|
||||
"vmselect continues returning full responses when up to replicationFactor-1 vmstorage nodes are temporarily unavailable during querying. "+
|
||||
"See also -search.skipSlowReplicas")
|
||||
globalReplicationFactor = flag.Int("globalReplicationFactor", 1, "How many copies of every ingested sample is available across vmstorage groups. "+
|
||||
"vmselect continues returning full responses when up to globalReplicationFactor-1 vmstorage groups are temporarily unavailable. "+
|
||||
"See https://docs.victoriametrics.com/cluster-victoriametrics/#vmstorage-groups-at-vmselect . See also -replicationFactor")
|
||||
replicationFactor = flagutil.NewDictInt("replicationFactor", 1, "How many copies of every ingested sample is available across -storageNode nodes. "+
|
||||
"vmselect continues returning full responses when up to replicationFactor-1 vmstorage nodes are temporarily unavailable. "+
|
||||
"See also -globalReplicationFactor and -search.skipSlowReplicas")
|
||||
skipSlowReplicas = flag.Bool("search.skipSlowReplicas", false, "Whether to skip -replicationFactor - 1 slowest vmstorage nodes during querying. "+
|
||||
"Enabling this setting may improve query speed, but it could also lead to incomplete results if some queried data has less than -replicationFactor "+
|
||||
"copies at vmstorage nodes. Consider enabling this setting only if all the queried data contains -replicationFactor copies in the cluster")
|
||||
|
@ -1870,6 +1873,7 @@ func (snr *storageNodesRequest) collectResults(partialResultsCounter *metrics.Co
|
|||
groupsCount := sns[0].group.groupsCount
|
||||
resultsCollectedPerGroup := make(map[*storageNodesGroup]int, groupsCount)
|
||||
errsPartialPerGroup := make(map[*storageNodesGroup][]error)
|
||||
groupsPartial := make(map[*storageNodesGroup]struct{})
|
||||
for range sns {
|
||||
// There is no need in timer here, since all the goroutines executing the f function
|
||||
// passed to startStorageNodesRequest must be finished until the deadline.
|
||||
|
@ -1895,6 +1899,12 @@ func (snr *storageNodesRequest) collectResults(partialResultsCounter *metrics.Co
|
|||
|
||||
errsPartialPerGroup[group] = append(errsPartialPerGroup[group], err)
|
||||
if snr.denyPartialResponse && len(errsPartialPerGroup[group]) >= group.replicationFactor {
|
||||
groupsPartial[group] = struct{}{}
|
||||
if len(groupsPartial) < *globalReplicationFactor {
|
||||
// Ignore this error, since the number of groups with partial results is smaller than the globalReplicationFactor.
|
||||
continue
|
||||
}
|
||||
|
||||
// Return the error to the caller if partial responses are denied
|
||||
// and the number of partial responses for the given group reach its replicationFactor,
|
||||
// since this means that the response is partial.
|
||||
|
@ -1932,36 +1942,41 @@ func (snr *storageNodesRequest) collectResults(partialResultsCounter *metrics.Co
|
|||
}
|
||||
|
||||
// Verify whether the full result can be returned
|
||||
isFullResponse := true
|
||||
failedGroups := 0
|
||||
for g, errsPartial := range errsPartialPerGroup {
|
||||
if len(errsPartial) >= g.replicationFactor {
|
||||
isFullResponse = false
|
||||
break
|
||||
failedGroups++
|
||||
}
|
||||
}
|
||||
if isFullResponse {
|
||||
// Assume that the result is full if the the number of failing vmstorage nodes
|
||||
// is smaller than the replicationFactor per each group.
|
||||
if failedGroups < *globalReplicationFactor {
|
||||
// Assume that the result is full if the the number of failed groups is smaller than the globalReplicationFactor.
|
||||
return false, nil
|
||||
}
|
||||
|
||||
// Verify whether there is at least a single node per each group, which successfully returned result,
|
||||
// in order to return partial result.
|
||||
// Verify whether at least a single node per each group successfully returned result in order to be able returning partial result.
|
||||
missingGroups := 0
|
||||
var firstErr error
|
||||
for g, errsPartial := range errsPartialPerGroup {
|
||||
if len(errsPartial) == g.nodesCount {
|
||||
// All the vmstorage nodes at the given group g returned error.
|
||||
// Return only the first error, since it has no sense in returning all errors.
|
||||
// Returns 503 status code for partial response, so the caller could retry it if needed.
|
||||
err := &httpserver.ErrorWithStatusCode{
|
||||
Err: errsPartial[0],
|
||||
StatusCode: http.StatusServiceUnavailable,
|
||||
missingGroups++
|
||||
if firstErr == nil {
|
||||
// Return only the first error, since it has no sense in returning all errors.
|
||||
firstErr = errsPartial[0]
|
||||
}
|
||||
return false, err
|
||||
}
|
||||
if len(errsPartial) > 0 {
|
||||
partialErrorsLogger.Warnf("%d out of %d vmstorage nodes at group %q were unavailable during the query; a sample error: %s", len(errsPartial), len(sns), g.name, errsPartial[0])
|
||||
}
|
||||
}
|
||||
if missingGroups >= *globalReplicationFactor {
|
||||
// Too many groups contain all the non-working vmstorage nodes.
|
||||
// Returns 503 status code, so the caller could retry it if needed.
|
||||
err := &httpserver.ErrorWithStatusCode{
|
||||
Err: firstErr,
|
||||
StatusCode: http.StatusServiceUnavailable,
|
||||
}
|
||||
return false, err
|
||||
}
|
||||
|
||||
// Return partial results.
|
||||
// This allows continuing returning responses in the case
|
||||
|
|
|
@ -30,6 +30,7 @@ See also [LTS releases](https://docs.victoriametrics.com/lts-releases/).
|
|||
|
||||
## tip
|
||||
|
||||
* FEATURE: [VictoriaMetrics cluster](https://docs.victoriametrics.com/cluster-victoriametrics/): add support for fault domain awareness to `vmselect`. It can be configured to return full responses if up to `-globalReplicationFactor - 1` fault domains (aka `vmstorage` groups) are unavailable. See [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6054) and [these docs](https://docs.victoriametrics.com/cluster-victoriametrics/#vmstorage-groups-at-vmselect).
|
||||
* FEATURE: all VictoriaMetrics [enterprise](https://docs.victoriametrics.com/enterprise/) components: add support for automatic issuing of TLS certificates for HTTPS server at `-httpListenAddr` via [Let's Encrypt service](https://letsencrypt.org/). See [these docs](https://docs.victoriametrics.com/#automatic-issuing-of-tls-certificates) and [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5949).
|
||||
* FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent/): support [DNS SRV](https://en.wikipedia.org/wiki/SRV_record) addresses in `-remoteWrite.url` command-line option and in scrape target urls. For example, `-remoteWrite.url=http://srv+victoria-metrics/api/v1/write` automatically resolves the `victoria-metrics` DNS SRV to a list of hostnames with TCP ports and then sends the collected metrics to these TCP addresses. See [these docs](https://docs.victoriametrics.com/vmagent/#srv-urls) and [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6053).
|
||||
* FEATURE: [vmauth](https://docs.victoriametrics.com/vmauth/): support automatic discovering and load balancing for TCP addresses behind DNS SRV addresses. These addresses can be put inside `url_prefix` urls in the form `http://srv+addr/path`, where the `addr` is the [DNS SRV](https://en.wikipedia.org/wiki/SRV_record) address, which is automatically resolved to hostnames with TCP ports. See [these docs](https://docs.victoriametrics.com/vmauth/#srv-urls) for details.
|
||||
|
|
|
@ -784,38 +784,60 @@ These issues are addressed by [vmagent](https://docs.victoriametrics.com/vmagent
|
|||
|
||||
`vmselect` can be configured to query multiple distinct groups of `vmstorage` nodes with individual `-replicationFactor` per each group.
|
||||
The following format for `-storageNode` command-line flag value should be used for assigning a particular `addr` of `vmstorage` to a particular `groupName` -
|
||||
`-storageNode=groupName/addr`. For example, the following command runs `vmselect`, which continues returning full responses if up to one node per each group is temporarily unavailable
|
||||
`-storageNode=groupName/addr`. The `groupName` can contain arbitrary value. The only rule is that every `vmstorage` group must have an unique name.
|
||||
|
||||
For example, the following command runs `vmselect`, which continues returning full responses if up to one node per each group is temporarily unavailable
|
||||
because the given `-replicationFactor=2` is applied individually per each group:
|
||||
|
||||
```
|
||||
/path/to/vmselect \
|
||||
-replicationFactor=2 \
|
||||
-storageNode=group1/host1 \
|
||||
-storageNode=group1/host2 \
|
||||
-storageNode=group1/host3 \
|
||||
-storageNode=group2/host4 \
|
||||
-storageNode=group2/host5 \
|
||||
-storageNode=group2/host6 \
|
||||
-storageNode=group3/host7 \
|
||||
-storageNode=group3/host8 \
|
||||
-storageNode=group3/host9
|
||||
-storageNode=g1/host1,g1/host2,g1/host3 \
|
||||
-storageNode=g2/host4,g2/host5,g2/host6 \
|
||||
-storageNode=g3/host7,g3/host8,g3/host9
|
||||
```
|
||||
|
||||
It is possible to specify distinct `-replicationFactor` per each group via the following format - `-replicationFactor=groupName:rf`.
|
||||
For example, the following command runs `vmselect`, which uses `-replicationFactor=3` for the `group1`, while it uses `-replicationFactor=1` for the `group2`:
|
||||
It is possible specifying distinct `-replicationFactor` per each group via the following format - `-replicationFactor=groupName:rf`.
|
||||
For example, the following command runs `vmselect`, which uses `-replicationFactor=3` for the group `g1`, `-replicationFactor=2` for the group `g2`
|
||||
and `-replicationFactor=1` for the group `g3`:
|
||||
|
||||
```
|
||||
/path/to/vmselect \
|
||||
-replicationFactor=group1:3 \
|
||||
-storageNode=group1/host1 \
|
||||
-storageNode=group1/host2 \
|
||||
-storageNode=group1/host3 \
|
||||
-replicationFactor=group2:1 \
|
||||
-storageNode=group2/host4 \
|
||||
-storageNode=group2/host5 \
|
||||
-storageNode=group2/host6
|
||||
-replicationFactor=g1:3 \
|
||||
-storageNode=g1/host1,g1/host2,g1/host3 \
|
||||
-replicationFactor=g2:2 \
|
||||
-storageNode=g2/host4,g2/host5,g2/host6 \
|
||||
-replicationFactor=g3:1 \
|
||||
-storageNode=g3/host4,g3/host5,g3/host6
|
||||
```
|
||||
|
||||
If every ingested sample is replicated across multiple `vmstorage` groups, then pass `-globalReplicationFactor=N` command-line flag to `vmselect`,
|
||||
so it could continue returning full responses if up to `N-1` `vmstorage` groups are temporarily unavailable.
|
||||
For example, the following command runs `vmselect`, which continues returning full responses if any number of `vmstorage` nodes
|
||||
in a single `vmstorage` group are temporarily unavailable:
|
||||
|
||||
```
|
||||
/path/to/vmselect \
|
||||
-globalReplicationFactor=2 \
|
||||
-storageNode=g1/host1,g1/host2,g1/host3 \
|
||||
-storageNode=g2/host4,g2/host5,g2/host6 \
|
||||
-storageNode=g3/host7,g3/host8,g3/host9
|
||||
```
|
||||
|
||||
It is OK to mix `-replicationFactor` and `-globalReplicationFactor`. For example, the folling command runs `vmselect`, which continues returning full responses
|
||||
if any number of `vmstorage` nodes in a single `vmstorage` group are temporarily unavailable and the remaining groups contain up to two unavailable `vmstorage` node:
|
||||
|
||||
```
|
||||
/path/to/vmselect \
|
||||
-globalReplicationFactor=2 \
|
||||
-replicationFactor=3 \
|
||||
-storageNode=g1/host1,g1/host2,g1/host3 \
|
||||
-storageNode=g2/host4,g2/host5,g2/host6 \
|
||||
-storageNode=g3/host7,g3/host8,g3/host9
|
||||
```
|
||||
|
||||
See also [multi-level cluster setup](#multi-level-cluster-setup).
|
||||
|
||||
## Helm
|
||||
|
||||
Helm chart simplifies managing cluster version of VictoriaMetrics in Kubernetes.
|
||||
|
@ -1371,6 +1393,8 @@ Below is the output for `/path/to/vmselect -help`:
|
|||
Flag value can be read from the given file when using -flagsAuthKey=file:///abs/path/to/file or -flagsAuthKey=file://./relative/path/to/file . Flag value can be read from the given http/https url when using -flagsAuthKey=http://host/path or -flagsAuthKey=https://host/path
|
||||
-fs.disableMmap
|
||||
Whether to use pread() instead of mmap() for reading data files. By default, mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
|
||||
-globalReplicationFactor int
|
||||
How many copies of every ingested sample is available across vmstorage groups. vmselect continues returning full responses when up to globalReplicationFactor-1 vmstorage groups are temporarily unavailable. See https://docs.victoriametrics.com/cluster-victoriametrics/#vmstorage-groups-at-vmselect . See also -replicationFactor (default 1)
|
||||
-http.connTimeout duration
|
||||
Incoming connections to -httpListenAddr are closed after the configured timeout. This may help evenly spreading load among a cluster of services behind TCP-level load balancer. Zero value disables closing of incoming connections (default 2m0s)
|
||||
-http.disableResponseCompression
|
||||
|
@ -1472,7 +1496,7 @@ Below is the output for `/path/to/vmselect -help`:
|
|||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
Value can contain comma inside single-quoted or double-quoted string, {}, [] and () braces.
|
||||
-replicationFactor array
|
||||
How many copies of every time series is available on the provided -storageNode nodes. vmselect continues returning full responses when up to replicationFactor-1 vmstorage nodes are temporarily unavailable during querying. See also -search.skipSlowReplicas (default 1)
|
||||
How many copies of every ingested sample is available across -storageNode nodes. vmselect continues returning full responses when up to replicationFactor-1 vmstorage nodes are temporarily unavailable. See also -globalReplicationFactor and -search.skipSlowReplicas (default 1)
|
||||
Supports an array of `key:value` entries separated by comma or specified via multiple flags.
|
||||
-search.cacheTimestampOffset duration
|
||||
The maximum duration since the current time for response data, which is always queried from the original raw data, without using the response cache. Increase this value if you see gaps in responses due to time synchronization issues between VictoriaMetrics and data sources (default 5m0s)
|
||||
|
|
Loading…
Reference in a new issue