docs: clarify Retention, improve English (#2266)

This commit is contained in:
Dan Dascalescu 2022-03-03 18:28:46 +02:00 committed by Aliaksandr Valialkin
parent cd522dee14
commit e15b27b792
No known key found for this signature in database
GPG key ID: A72BEC6CD3D0DED1

View file

@ -102,7 +102,7 @@ Just download [VictoriaMetrics executable](https://github.com/VictoriaMetrics/Vi
The following command-line flags are used the most: The following command-line flags are used the most:
* `-storageDataPath` - VictoriaMetrics stores all the data in this directory. Default path is `victoria-metrics-data` in the current working directory. * `-storageDataPath` - VictoriaMetrics stores all the data in this directory. Default path is `victoria-metrics-data` in the current working directory.
* `-retentionPeriod` - retention for stored data. Older data is automatically deleted. Default retention is 1 month. See [these docs](#retention) for more details. * `-retentionPeriod` - retention for stored data. Older data is automatically deleted. Default retention is 1 month. See [the Retention section](#retention) for more details.
Other flags have good enough default values, so set them only if you really need this. Pass `-help` to see [all the available flags with description and default values](#list-of-command-line-flags). Other flags have good enough default values, so set them only if you really need this. Pass `-help` to see [all the available flags with description and default values](#list-of-command-line-flags).
@ -744,7 +744,7 @@ The delete API is intended mainly for the following cases:
* One-off deleting of accidentally written invalid (or undesired) time series. * One-off deleting of accidentally written invalid (or undesired) time series.
* One-off deleting of user data due to [GDPR](https://en.wikipedia.org/wiki/General_Data_Protection_Regulation). * One-off deleting of user data due to [GDPR](https://en.wikipedia.org/wiki/General_Data_Protection_Regulation).
It isn't recommended using delete API for the following cases, since it brings non-zero overhead: Using the delete API is not recommended in the following cases, since it brings a non-zero overhead:
* Regular cleanups for unneeded data. Just prevent writing unneeded data into VictoriaMetrics. * Regular cleanups for unneeded data. Just prevent writing unneeded data into VictoriaMetrics.
This can be done with [relabeling](#relabeling). This can be done with [relabeling](#relabeling).
@ -753,7 +753,7 @@ It isn't recommended using delete API for the following cases, since it brings n
time series occupy disk space until the next merge operation, which can never occur when deleting too old data. time series occupy disk space until the next merge operation, which can never occur when deleting too old data.
[Forced merge](#forced-merge) may be used for freeing up disk space occupied by old data. [Forced merge](#forced-merge) may be used for freeing up disk space occupied by old data.
It is better using `-retentionPeriod` command-line flag for efficient pruning of old data. It's better to use the `-retentionPeriod` command-line flag for efficient pruning of old data.
## Forced merge ## Forced merge
@ -1147,7 +1147,7 @@ write data to the same VictoriaMetrics instance. These vmagent or Prometheus ins
VictoriaMetrics stores time series data in [MergeTree](https://en.wikipedia.org/wiki/Log-structured_merge-tree)-like VictoriaMetrics stores time series data in [MergeTree](https://en.wikipedia.org/wiki/Log-structured_merge-tree)-like
data structures. On insert, VictoriaMetrics accumulates up to 1s of data and dumps it on disk to data structures. On insert, VictoriaMetrics accumulates up to 1s of data and dumps it on disk to
`<-storageDataPath>/data/small/YYYY_MM/` subdirectory forming a `part` with the following `<-storageDataPath>/data/small/YYYY_MM/` subdirectory forming a `part` with the following
name pattern `rowsCount_blocksCount_minTimestamp_maxTimestamp`. Each part consists of two "columns": name pattern: `rowsCount_blocksCount_minTimestamp_maxTimestamp`. Each part consists of two "columns":
values and timestamps. These are sorted and compressed raw time series values. Additionally, part contains values and timestamps. These are sorted and compressed raw time series values. Additionally, part contains
index files for searching for specific series in the values and timestamps files. index files for searching for specific series in the values and timestamps files.
@ -1177,24 +1177,24 @@ See also [how to work with snapshots](#how-to-work-with-snapshots).
## Retention ## Retention
Retention is configured with `-retentionPeriod` command-line flag. For instance, `-retentionPeriod=3` means Retention is configured with the `-retentionPeriod` command-line flag, which takes a number followed by a time unit character - `h(ours)`, `d(ays)`, `w(eeks)`, `y(ears)`. If the time unit is not specified, a month is assumed. For instance, `-retentionPeriod=3` means that the data will be stored for 3 months and then deleted. The default retention period is one month.
that the data will be stored for 3 months and then deleted.
Data is split in per-month partitions inside `<-storageDataPath>/data/{small,big}` folders.
Data partitions outside the configured retention are deleted on the first day of new month.
Data is split in per-month partitions inside `<-storageDataPath>/data/{small,big}` folders.
Data partitions outside the configured retention are deleted on the first day of the new month.
Each partition consists of one or more data parts with the following name pattern `rowsCount_blocksCount_minTimestamp_maxTimestamp`. Each partition consists of one or more data parts with the following name pattern `rowsCount_blocksCount_minTimestamp_maxTimestamp`.
Data parts outside of the configured retention are eventually deleted during Data parts outside of the configured retention are eventually deleted during
[background merge](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282). [background merge](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282).
In order to keep data according to `-retentionPeriod` max disk space usage is going to be `-retentionPeriod` + 1 month. The maximum disk space usage for a given `-retentionPeriod` is going to be (`-retentionPeriod` + 1) months.
For example if `-retentionPeriod` is set to 1, data for January is deleted on March 1st. For example, if `-retentionPeriod` is set to 1, data for January is deleted on March 1st.
VictoriaMetrics supports retention smaller than 1 month. For example, `-retentionPeriod=5d` would set data retention for 5 days. Please note, the time range covered by data part is not limited by retention period unit. Hence, data part may contain data
Please note, time range covered by data part is not limited by retention period unit. Hence, data part may contain data
for multiple days and will be deleted only when fully outside of the configured retention. for multiple days and will be deleted only when fully outside of the configured retention.
It is safe to extend `-retentionPeriod` on existing data. If `-retentionPeriod` is set to lower It is safe to extend `-retentionPeriod` on existing data. If `-retentionPeriod` is set to a lower
value than before then data outside the configured period will be eventually deleted. value than before, then data outside the configured period will be eventually deleted.
VictoriaMetrics does not support indefinite retention, but you can specify an arbitrarily high duration, e.g. `-retentionPeriod=100y`.
## Multiple retentions ## Multiple retentions