docs/Cluster-VictoriaMetrics.md: clarify replication docs

This commit is contained in:
Aliaksandr Valialkin 2021-02-15 01:44:11 +02:00
parent 54a09de037
commit cb6eba2ce0
2 changed files with 22 additions and 10 deletions

View file

@ -338,9 +338,15 @@ It is available in the [helm-charts](https://github.com/VictoriaMetrics/helm-cha
## Replication and data safety ## Replication and data safety
In order to enable application-level replication, `-replicationFactor=N` command-line flag must be passed to `vminsert`. By default VictoriaMetrics offloads replication to the underlying storage pointed by `-storageDataPath`.
The replication can be enabled by passing `-replicationFactor=N` command-line flag to `vminsert`.
This guarantees that all the data remains available for querying if up to `N-1` `vmstorage` nodes are unavailable. This guarantees that all the data remains available for querying if up to `N-1` `vmstorage` nodes are unavailable.
For example, when `-replicationFactor=3` is passed to `vminsert`, then it replicates all the ingested data to 3 distinct `vmstorage` nodes. The cluster must contain at least `2*N-1` `vmstorage` nodes, where `N`
is replication factor, in order to maintain the given replication factor for newly ingested data when `N-1` of storage nodes are lost.
For example, when `-replicationFactor=3` is passed to `vminsert`, then it replicates all the ingested data to 3 distinct `vmstorage` nodes,
so up to 2 `vmstorage` nodes can be lost without data loss. The minimum number of `vmstorage` nodes should be equal to `2*3-1 = 5`, so when 2 `vmstorage` nodes are lost,
the remaining 3 `vmstorage` nodes could provide the `-replicationFactor=3` for newly ingested data.
When the replication is enabled, `-replicationFactor=N` and `-dedup.minScrapeInterval=1ms` command-line flag must be passed to `vmselect` nodes. When the replication is enabled, `-replicationFactor=N` and `-dedup.minScrapeInterval=1ms` command-line flag must be passed to `vmselect` nodes.
The `-replicationFactor=N` improves query performance when a part of vmstorage nodes respond slowly and/or temporarily unavailable. The `-replicationFactor=N` improves query performance when a part of vmstorage nodes respond slowly and/or temporarily unavailable.
@ -350,9 +356,9 @@ when [deduplication](https://victoriametrics.github.io/Single-server-VictoriaMet
Note that [replication doesn't save from disaster](https://medium.com/@valyala/speeding-up-backups-for-big-time-series-databases-533c1a927883), Note that [replication doesn't save from disaster](https://medium.com/@valyala/speeding-up-backups-for-big-time-series-databases-533c1a927883),
so it is recommended performing regular backups. See [these docs](#backups) for details. so it is recommended performing regular backups. See [these docs](#backups) for details.
By default VictoriaMetrics offloads replication to the underlying storage pointed by `-storageDataPath`. Note that the replication increases resource usage - CPU, RAM, disk space, network bandwidth - by up to `-replicationFactor` times. So it may be worth
It is recommended storing data on [Google Compute Engine persistent disks](https://cloud.google.com/compute/docs/disks/#pdspecs), offloading the replication to underlying storage pointed by `-storageDataPath` such as [Google Compute Engine persistent disk](https://cloud.google.com/compute/docs/disks/#pdspecs),
since they are protected from data loss and data corruption. They also provide consistently high performance which is protected from data loss and data corruption. It also provide consistently high performance
and [may be resized](https://cloud.google.com/compute/docs/disks/add-persistent-disk) without downtime. and [may be resized](https://cloud.google.com/compute/docs/disks/add-persistent-disk) without downtime.
HDD-based persistent disks should be enough for the majority of use cases. HDD-based persistent disks should be enough for the majority of use cases.

View file

@ -338,9 +338,15 @@ It is available in the [helm-charts](https://github.com/VictoriaMetrics/helm-cha
## Replication and data safety ## Replication and data safety
In order to enable application-level replication, `-replicationFactor=N` command-line flag must be passed to `vminsert`. By default VictoriaMetrics offloads replication to the underlying storage pointed by `-storageDataPath`.
The replication can be enabled by passing `-replicationFactor=N` command-line flag to `vminsert`.
This guarantees that all the data remains available for querying if up to `N-1` `vmstorage` nodes are unavailable. This guarantees that all the data remains available for querying if up to `N-1` `vmstorage` nodes are unavailable.
For example, when `-replicationFactor=3` is passed to `vminsert`, then it replicates all the ingested data to 3 distinct `vmstorage` nodes. The cluster must contain at least `2*N-1` `vmstorage` nodes, where `N`
is replication factor, in order to maintain the given replication factor for newly ingested data when `N-1` of storage nodes are lost.
For example, when `-replicationFactor=3` is passed to `vminsert`, then it replicates all the ingested data to 3 distinct `vmstorage` nodes,
so up to 2 `vmstorage` nodes can be lost without data loss. The minimum number of `vmstorage` nodes should be equal to `2*3-1 = 5`, so when 2 `vmstorage` nodes are lost,
the remaining 3 `vmstorage` nodes could provide the `-replicationFactor=3` for newly ingested data.
When the replication is enabled, `-replicationFactor=N` and `-dedup.minScrapeInterval=1ms` command-line flag must be passed to `vmselect` nodes. When the replication is enabled, `-replicationFactor=N` and `-dedup.minScrapeInterval=1ms` command-line flag must be passed to `vmselect` nodes.
The `-replicationFactor=N` improves query performance when a part of vmstorage nodes respond slowly and/or temporarily unavailable. The `-replicationFactor=N` improves query performance when a part of vmstorage nodes respond slowly and/or temporarily unavailable.
@ -350,9 +356,9 @@ when [deduplication](https://victoriametrics.github.io/Single-server-VictoriaMet
Note that [replication doesn't save from disaster](https://medium.com/@valyala/speeding-up-backups-for-big-time-series-databases-533c1a927883), Note that [replication doesn't save from disaster](https://medium.com/@valyala/speeding-up-backups-for-big-time-series-databases-533c1a927883),
so it is recommended performing regular backups. See [these docs](#backups) for details. so it is recommended performing regular backups. See [these docs](#backups) for details.
By default VictoriaMetrics offloads replication to the underlying storage pointed by `-storageDataPath`. Note that the replication increases resource usage - CPU, RAM, disk space, network bandwidth - by up to `-replicationFactor` times. So it may be worth
It is recommended storing data on [Google Compute Engine persistent disks](https://cloud.google.com/compute/docs/disks/#pdspecs), offloading the replication to underlying storage pointed by `-storageDataPath` such as [Google Compute Engine persistent disk](https://cloud.google.com/compute/docs/disks/#pdspecs),
since they are protected from data loss and data corruption. They also provide consistently high performance which is protected from data loss and data corruption. It also provide consistently high performance
and [may be resized](https://cloud.google.com/compute/docs/disks/add-persistent-disk) without downtime. and [may be resized](https://cloud.google.com/compute/docs/disks/add-persistent-disk) without downtime.
HDD-based persistent disks should be enough for the majority of use cases. HDD-based persistent disks should be enough for the majority of use cases.