VictoriaMetrics/docs/SampleSizeCalculations.md
Roman Khavronenko c6a8ebb11f
docs: update docs ordering and formatting (#1192)
The major change is adding `sort` directive to docs. For those docs which are copied
from internal packages `sort` is added via makefile command. For the rest it is added
manually since they're updated manually as well.

The rest of changes is connected with markdown formatting. For example, changing headers
in some files (`##` => `#`) makes navigation on .github.io to look better. This especially
useful for `changelog` docs.

Table of contents for `vmctl` is dropped, since we already have it autogenerated on .github.io.

No link changes expected. The corresponding PR to `cluster` branch will be made in follow-up PR.
2021-04-07 13:39:16 +03:00

79 lines
3.3 KiB
Markdown

---
sort: 15
---
# Sample size calculations
These calculations are for the “Lowest sample size” graph at https://victoriametrics.com/ .
How many metrics can be stored in 2tb disk for 2 years?
Seconds in 2 years:
2 years * 365 days * 24 hours * 60 minutes * 60 seconds = 63072000 seconds
Resolution = 1 point per 10 second
That means each metric will contain 6307200 points.
2tb disk contains
2 (tb) * 1024 (gb) * 1024 (mb) * 1024 (kb) * 1024 (b) = 2199023255552 bytes
## VictoriaMetrics
Based on production data from our customers, sample size is 0.4 byte
That means one metric with 10 seconds resolution will need
6307200 points * 0.4 bytes/point = 2522880 bytes or 2.4 megabytes.
Calculation for number of metrics can be stored in 2 tb disk:
2199023255552 (disk size) / 2522880 (one metric for 2 year) = 871632 metrics
So in 2tb we can store 871 632 metrics
## Graphite
Based on https://m30m.github.io/whisper-calculator/ sample size of graphite metrics is 12b + 28b for each metric
That means, one metric with 10 second resolution will need 75686428 bytes or 72.18 megabytes
Calculation for number of metrics can be stored in 2 tb disk:
2199023255552 / 75686428 = 29 054 metrics
## OpenTSDB
Let's check official openTSDB site
http://opentsdb.net/faq.html
16 bytes of HBase overhead, 3 bytes for the metric, 4 bytes for the timestamp, 6 bytes per tag, 2 bytes of OpenTSDB overhead, up to 8 bytes for the value. Integers are stored with variable length encoding and can consume 1, 2, 4 or 8 bytes.
That means, one metric with 10 second resolution will need
6307200 * (1 + 4) + 3 + 16 + 2 = 31536021 bytes or 30 megabytes in the best scenario and
6307200 * (8 + 4) + 3 + 16 + 2 = 75686421 bytes or 72 megabytes in the worst scenario.
Calculation for number of metrics can be stored in 2 tb disk:
2199023255552 / 31536021 = 69 730 metrics for best scenario
2199023255552 / 75686421 = 29 054 metrics for worst scenario
Also, openTSDB allows to use compression
" LZO is able to achieve a compression factor of 4.2x "
So, let's multiply numbers on 4.2
69 730 * 4,2 = 292 866 metrics for best scenario
29 054 * 4,2 = 122 026 metrics for worst scenario
## M3DB
Let's look at official m3db site https://m3db.github.io/m3/m3db/architecture/engine/
They can achieve a sample size of 1.45 bytes/datapoint
That means, one metric with 10 second resolution will need 9145440 bytes or 8,72177124 megabytes
Calculation for number of metrics can be stored in 2 tb disk:
2199023255552 / 9145440 = 240 450 metrics
## InfluxDB
Based on official influxDB site https://docs.influxdata.com/influxdb/v1.8/guides/hardware_sizing/#bytes-and-compression
"Non-string values require approximately three bytes". That means, one metric with 10 second resolution will need
6307200 * 3 = 18921600 bytes or 18 megabytes
Calculation for number of metrics can be stored in 2 tb disk:
2199023255552 / 18921600 = 116 217 metrics
## Prometheus
Let's check official site: https://prometheus.io/docs/prometheus/latest/storage/
"On average, Prometheus uses only around 1-2 bytes per sample."
That means, one metric with 10 second resolution will need
6307200 * 1 = 6307200 bytes in best scenario
6307200 * 2 = 12614400 bytes in worst scenario.
Calculation for number of metrics can be stored in 2 tb disk:
2199023255552 / 6307200 = 348 652 metrics for the best case
2199023255552 / 12614400 = 174 326 metrics for the worst cases