docs/Cluster-VictoriaMetrics.md: clarify required conditions for cluster availability

2025-01-10 15:14:09 +00:00 · 2022-08-20 09:39:07 +03:00 · 2022-08-20 09:39:07 +03:00 · 34d764874f
commit 34d764874f
parent 6d2354e7a4
2 changed files with 86 additions and 8 deletions
--- a/README.md
+++ b/README.md
@ -301,11 +301,50 @@ the update process. See [cluster availability](#cluster-availability) section fo

 ## Cluster availability

- HTTP load balancer must stop routing requests to unavailable `vminsert` and `vmselect` nodes.
- The cluster remains available if at least a single `vmstorage` node exists:
+VictoriaMetrics cluster architecture prioritizes availability over data consistency.
+This means that the cluster remains available for data ingestion and data querying
+if some of its components are temporarily unavailable.

-  - `vminsert` re-routes incoming data from unavailable `vmstorage` nodes to healthy `vmstorage` nodes
-  - `vmselect` continues serving partial responses if at least a single `vmstorage` node is available. If consistency over availability is preferred, then either pass `-search.denyPartialResponse` command-line flag to `vmselect` or pass `deny_partial_response=1` query arg in requests to `vmselect`.
+VictoriaMetrics cluster remains available if the following conditions are met:
+
+- HTTP load balancer must stop routing requests to unavailable `vminsert` and `vmselect` nodes.
+
+- At least a single `vminsert` node must remain available in the cluster for processing data ingestion workload.
+  The remaining active `vminsert` nodes must have enough compute capacity (CPU, RAM, network bandwidth)
+  for handling the current data ingestion workload.
+  If the remaining active `vminsert` nodes have no enough resources for processing the data ingestion workload,
+  then arbitrary delays may occur during data ingestion.
+  See [capacity planning](#capacity-planning) and [cluster resizing](#cluster-resizing-and-scalability) docs for more details.
+
+- At least a single `vmselect` node must remain available in the cluster for processing query workload.
+  The remaining active `vmselect` nodes must have enough compute capacity (CPU, RAM, network bandwidth, disk IO)
+  for handling the current query workload.
+  If the remaining active `vmselect` nodes have no enough resources for processing query workload,
+  then arbitrary failures and delays may occur during query processing.
+  See [capacity planning](#capacity-planning) and [cluster resizing](#cluster-resizing-and-scalability) docs for more details.
+
+- At least a single `vmstorage` node must remain available in the cluster for accepting newly ingested data
+  and for processing incoming queries. The remaining active `vmstorage` nodes must have enough compute capacity
+  (CPU, RAM, network bandwidth, disk IO, free disk space) for  handling the current workload.
+  If the remaining active `vmstorage` nodes have no enough resources for processing query workload,
+  then arbitrary failures and delay may occur during data ingestion and query processing.
+  See [capacity planning](#capacity-planning) and [cluster resizing](#cluster-resizing-and-scalability) docs for more details.
+
+The cluster works in the following way when some of `vmstorage` nodes are unavailable:
+
+- `vminsert` re-routes newly ingested data from unavailable `vmstorage` nodes to remaining healthy `vmstorage` nodes.
+  This guarantees that the newly ingested data is properly saved if the healthy `vmstorage` nodes have enough CPU, RAM, disk IO and network bandwidth
+  for processing the increased data ingestion workload.
+  `vminsert` spreads evenly the additional data among the healthy `vmstorage` nodes in order to spread evenly
+  the increased load on these nodes.
+
+- `vmselect` continues serving queries if at least a single `vmstorage` nodes is available.
+  It marks responses as partial for queries served from the remaining healthy `vmstorage` nodes,
+  since such responses may miss historical data stored on the temporarily unavailable `vmstorage` nodes.
+  Every partial JSON response contains `"isPartial": true` option.
+  If you prefer consistency over availability, then run `vmselect` nodes with `-search.denyPartialResponse` command-line flag.
+  In this case `vmselect` returns an error if at least a single `vmstorage` node is unavailable.
+  Another option is to pass `deny_partial_response=1` query arg to requests to `vmselect` nodes.

 `vmselect` doesn't serve partial responses for API handlers returning raw datapoints - [`/api/v1/export*` endpoints](https://docs.victoriametrics.com/#how-to-export-time-series), since users usually expect this data is always complete.

--- a/docs/Cluster-VictoriaMetrics.md
+++ b/docs/Cluster-VictoriaMetrics.md
@ -305,11 +305,50 @@ the update process. See [cluster availability](#cluster-availability) section fo

 ## Cluster availability

- HTTP load balancer must stop routing requests to unavailable `vminsert` and `vmselect` nodes.
- The cluster remains available if at least a single `vmstorage` node exists:
+VictoriaMetrics cluster architecture prioritizes availability over data consistency.
+This means that the cluster remains available for data ingestion and data querying
+if some of its components are temporarily unavailable.

-  - `vminsert` re-routes incoming data from unavailable `vmstorage` nodes to healthy `vmstorage` nodes
-  - `vmselect` continues serving partial responses if at least a single `vmstorage` node is available. If consistency over availability is preferred, then either pass `-search.denyPartialResponse` command-line flag to `vmselect` or pass `deny_partial_response=1` query arg in requests to `vmselect`.
+VictoriaMetrics cluster remains available if the following conditions are met:
+
+- HTTP load balancer must stop routing requests to unavailable `vminsert` and `vmselect` nodes.
+
+- At least a single `vminsert` node must remain available in the cluster for processing data ingestion workload.
+  The remaining active `vminsert` nodes must have enough compute capacity (CPU, RAM, network bandwidth)
+  for handling the current data ingestion workload.
+  If the remaining active `vminsert` nodes have no enough resources for processing the data ingestion workload,
+  then arbitrary delays may occur during data ingestion.
+  See [capacity planning](#capacity-planning) and [cluster resizing](#cluster-resizing-and-scalability) docs for more details.
+
+- At least a single `vmselect` node must remain available in the cluster for processing query workload.
+  The remaining active `vmselect` nodes must have enough compute capacity (CPU, RAM, network bandwidth, disk IO)
+  for handling the current query workload.
+  If the remaining active `vmselect` nodes have no enough resources for processing query workload,
+  then arbitrary failures and delays may occur during query processing.
+  See [capacity planning](#capacity-planning) and [cluster resizing](#cluster-resizing-and-scalability) docs for more details.
+
+- At least a single `vmstorage` node must remain available in the cluster for accepting newly ingested data
+  and for processing incoming queries. The remaining active `vmstorage` nodes must have enough compute capacity
+  (CPU, RAM, network bandwidth, disk IO, free disk space) for  handling the current workload.
+  If the remaining active `vmstorage` nodes have no enough resources for processing query workload,
+  then arbitrary failures and delay may occur during data ingestion and query processing.
+  See [capacity planning](#capacity-planning) and [cluster resizing](#cluster-resizing-and-scalability) docs for more details.
+
+The cluster works in the following way when some of `vmstorage` nodes are unavailable:
+
+- `vminsert` re-routes newly ingested data from unavailable `vmstorage` nodes to remaining healthy `vmstorage` nodes.
+  This guarantees that the newly ingested data is properly saved if the healthy `vmstorage` nodes have enough CPU, RAM, disk IO and network bandwidth
+  for processing the increased data ingestion workload.
+  `vminsert` spreads evenly the additional data among the healthy `vmstorage` nodes in order to spread evenly
+  the increased load on these nodes.
+
+- `vmselect` continues serving queries if at least a single `vmstorage` nodes is available.
+  It marks responses as partial for queries served from the remaining healthy `vmstorage` nodes,
+  since such responses may miss historical data stored on the temporarily unavailable `vmstorage` nodes.
+  Every partial JSON response contains `"isPartial": true` option.
+  If you prefer consistency over availability, then run `vmselect` nodes with `-search.denyPartialResponse` command-line flag.
+  In this case `vmselect` returns an error if at least a single `vmstorage` node is unavailable.
+  Another option is to pass `deny_partial_response=1` query arg to requests to `vmselect` nodes.

 `vmselect` doesn't serve partial responses for API handlers returning raw datapoints - [`/api/v1/export*` endpoints](https://docs.victoriametrics.com/#how-to-export-time-series), since users usually expect this data is always complete.