From 34d764874ff7bed298a189aa666421cf20ef9818 Mon Sep 17 00:00:00 2001 From: Aliaksandr Valialkin Date: Sat, 20 Aug 2022 09:39:07 +0300 Subject: [PATCH] docs/Cluster-VictoriaMetrics.md: clarify required conditions for cluster availability --- README.md | 47 ++++++++++++++++++++++++++++++--- docs/Cluster-VictoriaMetrics.md | 47 ++++++++++++++++++++++++++++++--- 2 files changed, 86 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index 6ffe1b9a7..74f448b9f 100644 --- a/README.md +++ b/README.md @@ -301,11 +301,50 @@ the update process. See [cluster availability](#cluster-availability) section fo ## Cluster availability -- HTTP load balancer must stop routing requests to unavailable `vminsert` and `vmselect` nodes. -- The cluster remains available if at least a single `vmstorage` node exists: +VictoriaMetrics cluster architecture prioritizes availability over data consistency. +This means that the cluster remains available for data ingestion and data querying +if some of its components are temporarily unavailable. - - `vminsert` re-routes incoming data from unavailable `vmstorage` nodes to healthy `vmstorage` nodes - - `vmselect` continues serving partial responses if at least a single `vmstorage` node is available. If consistency over availability is preferred, then either pass `-search.denyPartialResponse` command-line flag to `vmselect` or pass `deny_partial_response=1` query arg in requests to `vmselect`. +VictoriaMetrics cluster remains available if the following conditions are met: + +- HTTP load balancer must stop routing requests to unavailable `vminsert` and `vmselect` nodes. + +- At least a single `vminsert` node must remain available in the cluster for processing data ingestion workload. + The remaining active `vminsert` nodes must have enough compute capacity (CPU, RAM, network bandwidth) + for handling the current data ingestion workload. + If the remaining active `vminsert` nodes have no enough resources for processing the data ingestion workload, + then arbitrary delays may occur during data ingestion. + See [capacity planning](#capacity-planning) and [cluster resizing](#cluster-resizing-and-scalability) docs for more details. + +- At least a single `vmselect` node must remain available in the cluster for processing query workload. + The remaining active `vmselect` nodes must have enough compute capacity (CPU, RAM, network bandwidth, disk IO) + for handling the current query workload. + If the remaining active `vmselect` nodes have no enough resources for processing query workload, + then arbitrary failures and delays may occur during query processing. + See [capacity planning](#capacity-planning) and [cluster resizing](#cluster-resizing-and-scalability) docs for more details. + +- At least a single `vmstorage` node must remain available in the cluster for accepting newly ingested data + and for processing incoming queries. The remaining active `vmstorage` nodes must have enough compute capacity + (CPU, RAM, network bandwidth, disk IO, free disk space) for handling the current workload. + If the remaining active `vmstorage` nodes have no enough resources for processing query workload, + then arbitrary failures and delay may occur during data ingestion and query processing. + See [capacity planning](#capacity-planning) and [cluster resizing](#cluster-resizing-and-scalability) docs for more details. + +The cluster works in the following way when some of `vmstorage` nodes are unavailable: + +- `vminsert` re-routes newly ingested data from unavailable `vmstorage` nodes to remaining healthy `vmstorage` nodes. + This guarantees that the newly ingested data is properly saved if the healthy `vmstorage` nodes have enough CPU, RAM, disk IO and network bandwidth + for processing the increased data ingestion workload. + `vminsert` spreads evenly the additional data among the healthy `vmstorage` nodes in order to spread evenly + the increased load on these nodes. + +- `vmselect` continues serving queries if at least a single `vmstorage` nodes is available. + It marks responses as partial for queries served from the remaining healthy `vmstorage` nodes, + since such responses may miss historical data stored on the temporarily unavailable `vmstorage` nodes. + Every partial JSON response contains `"isPartial": true` option. + If you prefer consistency over availability, then run `vmselect` nodes with `-search.denyPartialResponse` command-line flag. + In this case `vmselect` returns an error if at least a single `vmstorage` node is unavailable. + Another option is to pass `deny_partial_response=1` query arg to requests to `vmselect` nodes. `vmselect` doesn't serve partial responses for API handlers returning raw datapoints - [`/api/v1/export*` endpoints](https://docs.victoriametrics.com/#how-to-export-time-series), since users usually expect this data is always complete. diff --git a/docs/Cluster-VictoriaMetrics.md b/docs/Cluster-VictoriaMetrics.md index f7286f355..08bc75688 100644 --- a/docs/Cluster-VictoriaMetrics.md +++ b/docs/Cluster-VictoriaMetrics.md @@ -305,11 +305,50 @@ the update process. See [cluster availability](#cluster-availability) section fo ## Cluster availability -- HTTP load balancer must stop routing requests to unavailable `vminsert` and `vmselect` nodes. -- The cluster remains available if at least a single `vmstorage` node exists: +VictoriaMetrics cluster architecture prioritizes availability over data consistency. +This means that the cluster remains available for data ingestion and data querying +if some of its components are temporarily unavailable. - - `vminsert` re-routes incoming data from unavailable `vmstorage` nodes to healthy `vmstorage` nodes - - `vmselect` continues serving partial responses if at least a single `vmstorage` node is available. If consistency over availability is preferred, then either pass `-search.denyPartialResponse` command-line flag to `vmselect` or pass `deny_partial_response=1` query arg in requests to `vmselect`. +VictoriaMetrics cluster remains available if the following conditions are met: + +- HTTP load balancer must stop routing requests to unavailable `vminsert` and `vmselect` nodes. + +- At least a single `vminsert` node must remain available in the cluster for processing data ingestion workload. + The remaining active `vminsert` nodes must have enough compute capacity (CPU, RAM, network bandwidth) + for handling the current data ingestion workload. + If the remaining active `vminsert` nodes have no enough resources for processing the data ingestion workload, + then arbitrary delays may occur during data ingestion. + See [capacity planning](#capacity-planning) and [cluster resizing](#cluster-resizing-and-scalability) docs for more details. + +- At least a single `vmselect` node must remain available in the cluster for processing query workload. + The remaining active `vmselect` nodes must have enough compute capacity (CPU, RAM, network bandwidth, disk IO) + for handling the current query workload. + If the remaining active `vmselect` nodes have no enough resources for processing query workload, + then arbitrary failures and delays may occur during query processing. + See [capacity planning](#capacity-planning) and [cluster resizing](#cluster-resizing-and-scalability) docs for more details. + +- At least a single `vmstorage` node must remain available in the cluster for accepting newly ingested data + and for processing incoming queries. The remaining active `vmstorage` nodes must have enough compute capacity + (CPU, RAM, network bandwidth, disk IO, free disk space) for handling the current workload. + If the remaining active `vmstorage` nodes have no enough resources for processing query workload, + then arbitrary failures and delay may occur during data ingestion and query processing. + See [capacity planning](#capacity-planning) and [cluster resizing](#cluster-resizing-and-scalability) docs for more details. + +The cluster works in the following way when some of `vmstorage` nodes are unavailable: + +- `vminsert` re-routes newly ingested data from unavailable `vmstorage` nodes to remaining healthy `vmstorage` nodes. + This guarantees that the newly ingested data is properly saved if the healthy `vmstorage` nodes have enough CPU, RAM, disk IO and network bandwidth + for processing the increased data ingestion workload. + `vminsert` spreads evenly the additional data among the healthy `vmstorage` nodes in order to spread evenly + the increased load on these nodes. + +- `vmselect` continues serving queries if at least a single `vmstorage` nodes is available. + It marks responses as partial for queries served from the remaining healthy `vmstorage` nodes, + since such responses may miss historical data stored on the temporarily unavailable `vmstorage` nodes. + Every partial JSON response contains `"isPartial": true` option. + If you prefer consistency over availability, then run `vmselect` nodes with `-search.denyPartialResponse` command-line flag. + In this case `vmselect` returns an error if at least a single `vmstorage` node is unavailable. + Another option is to pass `deny_partial_response=1` query arg to requests to `vmselect` nodes. `vmselect` doesn't serve partial responses for API handlers returning raw datapoints - [`/api/v1/export*` endpoints](https://docs.victoriametrics.com/#how-to-export-time-series), since users usually expect this data is always complete.