From 6823aaaf08bfd5fc6fded0e0a516e31b5a785376 Mon Sep 17 00:00:00 2001
From: Aliaksandr Valialkin <valyala@gmail.com>
Date: Sat, 19 Oct 2019 10:47:46 +0300
Subject: [PATCH] README.md: add `capacity planning` chapter

---
 README.md | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/README.md b/README.md
index f33d534d6..a2817f104 100644
--- a/README.md
+++ b/README.md
@@ -184,6 +184,33 @@ Cluster should remain in working state if at least a single node of each type re
 the update process. See [cluster availability](#cluster-availability) section for details.
 
 
+### Capacity planning
+
+Each instance type - `vminsert`, `vmselect` and `vmstorage` - can run on the most suitable hardware.
+
+#### vminsert
+
+* The recommended total number of vCPU cores for all the `vminsert` instances can be calculated from the ingestion rate: `vCPUs = ingestion_rate / 150K`.
+* The recommended number of vCPU cores per each `vminsert` instance should equal to the number of `vmstorage` instances in the cluster.
+* The amount of RAM per each `vminsert` instance should be 1GB or more. RAM is used as a buffer for spikes in ingestion rate.
+* Sometimes `-rpc.disableCompression` command-line flag on `vminsert` instances could increase ingestion capacity at the cost
+  of higher network bandwidth usage between `vminsert` and `vmstorage`.
+
+#### vmstorage
+
+* The recommended total number of vCPU cores for all the `vmstorage` instances can be calculated from the ingestion rate: `vCPUs = ingestion_rate / 150K`.
+* The recommended total amount of RAM for all the `vmstorage` instances can be calculated from the number of active time series: `RAM = active_time_series * 1KB`.
+  Time series is active if it received at least a single data point during the last hour or if it has been queried during the last hour.
+* The recommended total amount of storage space for all the `vmstorage` instances can be calculated
+  from the ingestion rate and retention: `storage_space = ingestion_rate * retention_seconds`.
+
+#### vmselect
+
+The recommended hardware for `vmselect` instances highly depends on the type of queries. Lightweight queries over small number of time series usually require
+small number of vCPU cores and small amount of RAM on `vmselect`, while heavy queries over big number of time series (>10K) usually require
+bigger number of vCPU cores and bigger amounts of RAM.
+
+
 ### Helm
 
 Helm chart simplifies managing cluster version of VictoriaMetrics in Kubernetes.