docs/{vmbackup,vmbackupmanager}.md: clarify why storing backups to S3 Glacier can be time-consuming and expensive

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5614
This is a follow-up for e14e3d9c8c
This commit is contained in:
Aliaksandr Valialkin 2024-01-16 21:50:15 +02:00
parent b0287867fe
commit c830064c2f
No known key found for this signature in database
GPG key ID: 52C003EE2BCDB9EB
2 changed files with 23 additions and 11 deletions

View file

@ -62,6 +62,10 @@ with the following command:
```
It saves time and network bandwidth costs by performing server-side copy for the shared data from the `-origin` to `-dst`.
Typical object storage just creates new names for already existing objects when performing server-side copy,
so this operation should be fast and inexpensive. Unfortunately, there are object storage systems such as [S3 Glacier](https://aws.amazon.com/s3/storage-classes/glacier/),
which make full copies for the copied objects during server-side copy. This may significantly slow down server-side copy
and make it very expensive.
### Incremental backups
@ -82,20 +86,24 @@ Smart backups mean storing full daily backups into `YYYYMMDD` folders and creati
./vmbackup -storageDataPath=</path/to/victoria-metrics-data> -snapshot.createURL=http://localhost:8428/snapshot/create -dst=gs://<bucket>/latest
```
Where `<latest-snapshot>` is the latest [snapshot](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots).
The command will upload only changed data to `gs://<bucket>/latest`.
This command creates an [instant snapshot](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots)
and uploads it to `gs://<bucket>/latest`. It uploads only the changed data (aka incremental backup). This saves network bandwidth costs and time
when backing up large amounts of data.
* Run the following command once a day:
```console
./vmbackup -storageDataPath=</path/to/victoria-metrics-data> -snapshot.createURL=http://localhost:8428/snapshot/create -dst=gs://<bucket>/<YYYYMMDD> -origin=gs://<bucket>/latest
./vmbackup -storageDataPath=</path/to/victoria-metrics-data> -origin=gs://<bucket>/latest -dst=gs://<bucket>/<YYYYMMDD>
```
Where `<daily-snapshot>` is the snapshot for the last day `<YYYYMMDD>`.
This command creates server-side copy of the backup from `gs://<bucket>/latest` to `gs://<bucket>/<YYYYMMDD>`, were `<YYYYMMDD>` is the current
date like `20240125`. Server-side copy of the backup should be fast on most object storage systems, since it just creates new names for already
existing objects. The server-side copy can be slow on some object storage systems such as [S3 Glacier](https://aws.amazon.com/s3/storage-classes/glacier/),
since they may perform full object copy instead of creating new names for already existing objects. This may be slow and expensive.
The `smart backups` approach described above saves network bandwidth costs on hourly backups (since they are incremental)
and allows recovering data from either the last hour (the `latest` backup) or from any day (`YYYYMMDD` backups).
This approach saves network bandwidth costs on hourly backups (since they are incremental) and allows recovering data from either the last hour (`latest` backup)
or from any day (`YYYYMMDD` backups). Because of this feature, it is not recommended to store `latest` data folder
in storages with expensive reads or additional archiving features (like [S3 Glacier](https://aws.amazon.com/s3/storage-classes/glacier/)).
Note that hourly backup shouldn't run when creating daily backup.
Do not forget to remove old backups when they are no longer needed in order to save storage costs.
@ -115,7 +123,9 @@ from `gs://bucket/foo` to `gs://bucket/bar`:
The `-origin` and `-dst` must point to the same object storage bucket or to the same filesystem.
The server-side backup copy is usually performed at much faster speed comparing to the usual backup, since backup data isn't transferred
between the remote storage and locally running `vmbackup` tool.
between the remote storage and locally running `vmbackup` tool. Object storage systems usually just make new names for already existing
objects during server-side copy. Unfortunately there are systems such as [S3 Glacier](https://aws.amazon.com/s3/storage-classes/glacier/),
which perform full object copy during server-side copying. This may be slow and expensive.
If the `-dst` already contains some data, then its' contents is synced with the `-origin` data. This allows making incremental server-side copies of backups.

View file

@ -124,9 +124,11 @@ The result on the GCS bucket
<img alt="latest folder" src="vmbackupmanager_latest_folder.webp">
Please note, `latest` data folder is used for [smart backups](https://docs.victoriametrics.com/vmbackup.html#smart-backups).
It is not recommended to store `latest` data folder in storages with expensive reads or additional archiving features
(like [S3 Glacier](https://aws.amazon.com/s3/storage-classes/glacier/)).
`vmbackupmanager` uses [smart backups](https://docs.victoriametrics.com/vmbackup.html#smart-backups) technique in order
to speed up backups and save both data transfer costs and data copying costs. This includes server-side copy of already existing
objects. Typical object storage systems implement server-side copy by creating new names for already existing objects.
This is very fast and efficient. Unfortunately there are systems such as [S3 Glacier](https://aws.amazon.com/s3/storage-classes/glacier/),
which perform full object copy during server-side copying. This may be slow and expensive.
Please, see [vmbackup docs](https://docs.victoriametrics.com/vmbackup.html#advanced-usage) for more examples of authentication with different
storage types.