VictoriaMetrics/app/vmbackup/README.md
hagen1778 709a2bad66
docs: remove extra / in the end of the link
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-16 13:24:30 +02:00

28 KiB

vmbackup

vmbackup creates VictoriaMetrics data backups from instant snapshots.

vmbackup supports incremental and full backups. Incremental backups are created automatically if the destination path already contains data from the previous backup. Full backups can be sped up with -origin pointing to an already existing backup on the same remote storage. In this case vmbackup makes server-side copy for the shared data between the existing backup and new backup. It saves time and costs on data transfer.

Backup process can be interrupted at any time. It is automatically resumed from the interruption point when restarting vmbackup with the same args.

Backed up data can be restored with vmrestore.

See this article for more details.

See also vmbackupmanager tool built on top of vmbackup. This tool simplifies creation of hourly, daily, weekly and monthly backups.

Supported storage types

vmbackup supports the following -dst storage types:

  • GCS. Example: gs://<bucket>/<path/to/backup>
  • S3. Example: s3://<bucket>/<path/to/backup>
  • Azure Blob Storage. Example: azblob://<container>/<path/to/backup>
  • Any S3-compatible storage such as MinIO, Ceph or Swift. See these docs for details.
  • Local filesystem. Example: fs://</absolute/path/to/backup>. Note that vmbackup prevents from storing the backup into the directory pointed by -storageDataPath command-line flag, since this directory should be managed solely by VictoriaMetrics or vmstorage.

Use cases

Regular backups

Regular backup can be performed with the following command:

./vmbackup -storageDataPath=</path/to/victoria-metrics-data> -snapshot.createURL=http://localhost:8428/snapshot/create -dst=gs://<bucket>/<path/to/new/backup>
  • </path/to/victoria-metrics-data> - path to VictoriaMetrics data pointed by -storageDataPath command-line flag in single-node VictoriaMetrics or in cluster vmstorage. There is no need to stop VictoriaMetrics for creating backups since they are performed from immutable instant snapshots.
  • http://victoriametrics:8428/snapshot/create is the url for creating snapshots according to these docs. vmbackup creates a snapshot by querying the provided -snapshot.createURL, then performs the backup and then automatically removes the created snapshot.
  • <bucket> is an already existing name for GCS bucket.
  • <path/to/new/backup> is the destination path where new backup will be placed.

Regular backups with server-side copy from existing backup

If the destination GCS bucket already contains the previous backup at -origin path, then new backup can be sped up with the following command:

./vmbackup -storageDataPath=</path/to/victoria-metrics-data> -snapshot.createURL=http://localhost:8428/snapshot/create -dst=gs://<bucket>/<path/to/new/backup> -origin=gs://<bucket>/<path/to/existing/backup>

It saves time and network bandwidth costs by performing server-side copy for the shared data from the -origin to -dst.

Incremental backups

Incremental backups are performed if -dst points to an already existing backup. In this case only new data is uploaded to remote storage. It saves time and network bandwidth costs when working with big backups:

./vmbackup -storageDataPath=</path/to/victoria-metrics-data> -snapshot.createURL=http://localhost:8428/snapshot/create -dst=gs://<bucket>/<path/to/existing/backup>

Smart backups

Smart backups mean storing full daily backups into YYYYMMDD folders and creating incremental hourly backup into latest folder:

  • Run the following command every hour:
./vmbackup -storageDataPath=</path/to/victoria-metrics-data> -snapshot.createURL=http://localhost:8428/snapshot/create -dst=gs://<bucket>/latest

Where <latest-snapshot> is the latest snapshot. The command will upload only changed data to gs://<bucket>/latest.

  • Run the following command once a day:
./vmbackup -storageDataPath=</path/to/victoria-metrics-data> -snapshot.createURL=http://localhost:8428/snapshot/create -dst=gs://<bucket>/<YYYYMMDD> -origin=gs://<bucket>/latest

Where <daily-snapshot> is the snapshot for the last day <YYYYMMDD>.

This approach saves network bandwidth costs on hourly backups (since they are incremental) and allows recovering data from either the last hour (latest backup) or from any day (YYYYMMDD backups). Note that hourly backup shouldn't run when creating daily backup.

Do not forget to remove old backups when they are no longer needed in order to save storage costs.

See also vmbackupmanager tool for automating smart backups.

Server-side copy of the existing backup

Sometimes it is needed to make server-side copy of the existing backup. This can be done by specifying the source backup path via -origin command-line flag, while the destination path for backup copy must be specified via -dst command-line flag. For example, the following command copies backup from gs://bucket/foo to gs://bucket/bar:

./vmbackup -origin=gs://bucket/foo -dst=gs://bucket/bar

The -origin and -dst must point to the same object storage bucket or to the same filesystem.

The server-side backup copy is usually performed at much faster speed comparing to the usual backup, since backup data isn't transferred between the remote storage and locally running vmbackup tool.

If the -dst already contains some data, then its' contents is synced with the -origin data. This allows making incremental server-side copies of backups.

How does it work?

The backup algorithm is the following:

  1. Create a snapshot by querying the provided -snapshot.createURL
  2. Collect information about files in the created snapshot, in the -dst and in the -origin.
  3. Determine which files in -dst are missing in the created snapshot, and delete them. These are usually small files, which are already merged into bigger files in the snapshot.
  4. Determine which files in the created snapshot are missing in -dst. These are usually small new files and bigger merged files.
  5. Determine which files from step 3 exist in the -origin, and perform server-side copy of these files from -origin to -dst. These are usually the biggest and the oldest files, which are shared between backups.
  6. Upload the remaining files from step 3 from the created snapshot to -dst.
  7. Delete the created snapshot.

The algorithm splits source files into 1 GiB chunks in the backup. Each chunk is stored as a separate file in the backup. Such splitting balances between the number of files in the backup and the amounts of data that needs to be re-transferred after temporary errors.

vmbackup relies on instant snapshot properties:

  • All the files in the snapshot are immutable.
  • Old files are periodically merged into new files.
  • Smaller files have higher probability to be merged.
  • Consecutive snapshots share many identical files.

These properties allow performing fast and cheap incremental backups and server-side copying from -origin paths. See this article for more details. vmbackup can work improperly or slowly when these properties are violated.

Troubleshooting

  • If the backup is slow, then try setting higher value for -concurrency flag. This will increase the number of concurrent workers that upload data to backup storage.
  • If vmbackup eats all the network bandwidth or CPU, then either decrease the -concurrency command-line flag value or set -maxBytesPerSecond command-line flag value to lower value.
  • If vmbackup consumes all the CPU on systems with big number of CPU cores, then try running it with -filestream.disableFadvise command-line flag.
  • If vmbackup has been interrupted due to temporary error, then just restart it with the same args. It will resume the backup process.
  • Backups created from single-node VictoriaMetrics cannot be restored at cluster VictoriaMetrics and vice versa.

Advanced usage

Providing credentials as a file

Obtaining credentials from a file.

Add flag -credsFilePath=/etc/credentials with the following content:

  • for S3 (AWS, MinIO or other S3 compatible storages):

    [default]
    aws_access_key_id=theaccesskey
    aws_secret_access_key=thesecretaccesskeyvalue
    
  • for GCP cloud storage:

    {
           "type": "service_account",
           "project_id": "project-id",
           "private_key_id": "key-id",
           "private_key": "-----BEGIN PRIVATE KEY-----\nprivate-key\n-----END PRIVATE KEY-----\n",
           "client_email": "service-account-email",
           "client_id": "client-id",
           "auth_uri": "https://accounts.google.com/o/oauth2/auth",
           "token_uri": "https://accounts.google.com/o/oauth2/token",
           "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
           "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/service-account-email"
    }
    

Providing credentials via env variables

Obtaining credentials from env variables.

  • For AWS S3 compatible storages set env variable AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. Also you can set env variable AWS_SHARED_CREDENTIALS_FILE with path to credentials file.
  • For GCE cloud storage set env variable GOOGLE_APPLICATION_CREDENTIALS with path to credentials file.
  • For Azure storage either set env variables AZURE_STORAGE_ACCOUNT_NAME and AZURE_STORAGE_ACCOUNT_KEY, or AZURE_STORAGE_ACCOUNT_CONNECTION_STRING.

Please, note that vmbackup will use credentials provided by cloud providers metadata service when applicable.

Using cloud providers metadata service

vmbackup and vmbackupmanager will automatically use cloud providers metadata service in order to obtain credentials if they are running in cloud environment and credentials are not explicitly provided via flags or env variables.

Providing credentials in Kubernetes

The simplest way to provide credentials in Kubernetes is to use Secrets and inject them into the pod as environment variables. For example, the following secret can be used for AWS S3 credentials:

apiVersion: v1
kind: Secret
metadata:
  name: vmbackup-credentials
data:
  access_key: key
  secret_key: secret

And then it can be injected into the pod as environment variables:

...
env:
- name: AWS_ACCESS_KEY_ID
  valueFrom:
    secretKeyRef:
      key: access_key
      name: vmbackup-credentials
- name: AWS_SECRET_ACCESS_KEY
  valueFrom:
    secretKeyRef:
      key: secret_key
      name: vmbackup-credentials
...

A more secure way is to use IAM roles to provide tokens for pods instead of managing credentials manually.

For AWS deployments it will be required to configure IAM roles for service accounts. In order to use IAM roles for service accounts with vmbackup or vmbackupmanager it is required to create ServiceAccount with IAM role mapping:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: monitoring-backups
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::{ACCOUNT_ID}:role/{ROLE_NAME}

And configure pod to use service account. After this vmbackup and vmbackupmanager will automatically use IAM role for service account in order to obtain credentials.

For GCP deployments it will be required to configure Workload Identity. In order to use Workload Identity with vmbackup or vmbackupmanager it is required to create ServiceAccount with Workload Identity annotation:

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: monitoring-backups
  annotations:
    iam.gke.io/gcp-service-account: {sa_name}@{project_name}.iam.gserviceaccount.com

And configure pod to use service account. After this vmbackup and vmbackupmanager will automatically use Workload Identity for servicpe account in order to obtain credentials.

Using custom S3 endpoint

Usage with s3 custom url endpoint. It is possible to use vmbackup with s3 compatible storages like minio, cloudian, etc. You have to add a custom url endpoint via flag:

  • for MinIO

      -customS3Endpoint=http://localhost:9000
    
  • for aws gov region

      -customS3Endpoint=https://s3-fips.us-gov-west-1.amazonaws.com
    

Permanent deletion of objects in S3 and compatible storages

By default, when using S3 compatible storages, vmbackup and vmbackupmanager will use the basic delete operation, which will delete current version of the object only. In order to enforce removing all versions of an object when object is deleted, you need to use -deleteAllObjectVersions flag. Using this flag will enforce listing all versions of an object and deleting them one by one.

Alternatively, it is possible to use object storage lifecycle rules to remove non-current versions of objects automatically. Refer to the respective documentation for your object storage provider for more details.

Command-line flags

Run vmbackup -help in order to see all the available options:

  -concurrency int
     The number of concurrent workers. Higher concurrency may reduce backup duration (default 10)
  -configFilePath string
     Path to file with S3 configs. Configs are loaded from default location if not set.
     See https://docs.aws.amazon.com/general/latest/gr/aws-security-credentials.html
  -configProfile string
     Profile name for S3 configs. If no set, the value of the environment variable will be loaded (AWS_PROFILE or AWS_DEFAULT_PROFILE), or if both not set, DefaultSharedConfigProfile is used
  -credsFilePath string
     Path to file with GCS or S3 credentials. Credentials are loaded from default locations if not set.
     See https://cloud.google.com/iam/docs/creating-managing-service-account-keys and https://docs.aws.amazon.com/general/latest/gr/aws-security-credentials.html
  -customS3Endpoint string
     Custom S3 endpoint for use with S3-compatible storages (e.g. MinIO). S3 is used if not set
  -deleteAllObjectVersions
     Whether to prune previous object versions when deleting an object. By default, when object storage has versioning enabled deleting the file removes only current version. This option forces removal of all previous versions. See: https://docs.victoriametrics.com/vmbackup.html#permanent-deletion-of-objects-in-s3-compatible-storages
  -dst string
     Where to put the backup on the remote storage. Example: gs://bucket/path/to/backup, s3://bucket/path/to/backup, azblob://container/path/to/backup or fs:///path/to/local/backup/dir
     -dst can point to the previous backup. In this case incremental backup is performed, i.e. only changed data is uploaded
  -enableTCP6
     Whether to enable IPv6 for listening and dialing. By default, only IPv4 TCP and UDP are used
  -envflag.enable
     Whether to enable reading flags from environment variables in addition to the command line. Command line flag values have priority over values from environment vars. Flags are read only from the command line if this flag isn't set. See https://docs.victoriametrics.com/#environment-variables for more details
  -envflag.prefix string
     Prefix for environment variables if -envflag.enable is set
  -eula
     Deprecated, please use -license or -licenseFile flags instead. By specifying this flag, you confirm that you have an enterprise license and accept the ESA https://victoriametrics.com/legal/esa/ . This flag is available only in Enterprise binaries. See https://docs.victoriametrics.com/enterprise.html
  -filestream.disableFadvise
     Whether to disable fadvise() syscall when reading large data files. The fadvise() syscall prevents from eviction of recently accessed data from OS page cache during background merges and backups. In some rare cases it is better to disable the syscall if it uses too much CPU
  -flagsAuthKey string
     Auth key for /flags endpoint. It must be passed via authKey query arg. It overrides httpAuth.* settings
  -fs.disableMmap
     Whether to use pread() instead of mmap() for reading data files. By default, mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
  -http.connTimeout duration
     Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
  -http.disableResponseCompression
     Disable compression of HTTP responses to save CPU resources. By default, compression is enabled to save network bandwidth
  -http.idleConnTimeout duration
     Timeout for incoming idle http connections (default 1m0s)
  -http.maxGracefulShutdownDuration duration
     The maximum duration for a graceful shutdown of the HTTP server. A highly loaded server may require increased value for a graceful shutdown (default 7s)
  -http.pathPrefix string
     An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
  -http.shutdownDelay duration
     Optional delay before http server shutdown. During this delay, the server returns non-OK responses from /health page, so load balancers can route new requests to other servers
  -httpAuth.password string
     Password for HTTP server's Basic Auth. The authentication is disabled if -httpAuth.username is empty
  -httpAuth.username string
     Username for HTTP server's Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
  -httpListenAddr string
     TCP address for exporting metrics at /metrics page (default ":8420")
  -internStringCacheExpireDuration duration
     The expiry duration for caches for interned strings. See https://en.wikipedia.org/wiki/String_interning . See also -internStringMaxLen and -internStringDisableCache (default 6m0s)
  -internStringDisableCache
     Whether to disable caches for interned strings. This may reduce memory usage at the cost of higher CPU usage. See https://en.wikipedia.org/wiki/String_interning . See also -internStringCacheExpireDuration and -internStringMaxLen
  -internStringMaxLen int
     The maximum length for strings to intern. A lower limit may save memory at the cost of higher CPU usage. See https://en.wikipedia.org/wiki/String_interning . See also -internStringDisableCache and -internStringCacheExpireDuration (default 500)
  -license string
     Lisense key for VictoriaMetrics Enterprise. See https://victoriametrics.com/products/enterprise/ . Trial Enterprise license can be obtained from https://victoriametrics.com/products/enterprise/trial/ . This flag is available only in Enterprise binaries. The license key can be also passed via file specified by -licenseFile command-line flag
  -license.forceOffline
     Whether to enable offline verification for VictoriaMetrics Enterprise license key, which has been passed either via -license or via -licenseFile command-line flag. The issued license key must support offline verification feature. Contact info@victoriametrics.com if you need offline license verification. This flag is avilable only in Enterprise binaries
  -licenseFile string
     Path to file with license key for VictoriaMetrics Enterprise. See https://victoriametrics.com/products/enterprise/ . Trial Enterprise license can be obtained from https://victoriametrics.com/products/enterprise/trial/ . This flag is available only in Enterprise binaries. The license key can be also passed inline via -license command-line flag
  -loggerDisableTimestamps
     Whether to disable writing timestamps in logs
  -loggerErrorsPerSecondLimit int
     Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
  -loggerFormat string
     Format for logs. Possible values: default, json (default "default")
  -loggerJSONFields string
     Allows renaming fields in JSON formatted logs. Example: "ts:timestamp,msg:message" renames "ts" to "timestamp" and "msg" to "message". Supported fields: ts, level, caller, msg
  -loggerLevel string
     Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
  -loggerOutput string
     Output for the logs. Supported values: stderr, stdout (default "stderr")
  -loggerTimezone string
     Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
  -loggerWarnsPerSecondLimit int
     Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
  -maxBytesPerSecond size
     The maximum upload speed. There is no limit if it is set to 0
     Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 0)
  -memory.allowedBytes size
     Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to a non-zero value. Too low a value may increase the cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from the OS page cache resulting in higher disk IO usage
     Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 0)
  -memory.allowedPercent float
     Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low a value may increase cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from the OS page cache which will result in higher disk IO usage (default 60)
  -metricsAuthKey string
     Auth key for /metrics endpoint. It must be passed via authKey query arg. It overrides httpAuth.* settings
  -origin string
     Optional origin directory on the remote storage with old backup for server-side copying when performing full backup. This speeds up full backups
  -pprofAuthKey string
     Auth key for /debug/pprof/* endpoints. It must be passed via authKey query arg. It overrides httpAuth.* settings
  -pushmetrics.extraLabel array
     Optional labels to add to metrics pushed to -pushmetrics.url . For example, -pushmetrics.extraLabel='instance="foo"' adds instance="foo" label to all the metrics pushed to -pushmetrics.url
     Supports an array of values separated by comma or specified via multiple flags.
  -pushmetrics.interval duration
     Interval for pushing metrics to -pushmetrics.url (default 10s)
  -pushmetrics.url array
     Optional URL to push metrics exposed at /metrics page. See https://docs.victoriametrics.com/#push-metrics . By default, metrics exposed at /metrics page aren't pushed to any remote storage
     Supports an array of values separated by comma or specified via multiple flags.
  -s3ForcePathStyle
     Prefixing endpoint with bucket name when set false, true by default. (default true)
  -s3StorageClass string
     The Storage Class applied to objects uploaded to AWS S3. Supported values are: GLACIER, DEEP_ARCHIVE, GLACIER_IR, INTELLIGENT_TIERING, ONEZONE_IA, OUTPOSTS, REDUCED_REDUNDANCY, STANDARD, STANDARD_IA.
     See https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-class-intro.html
  -snapshot.createURL string
     VictoriaMetrics create snapshot url. When this is given a snapshot will automatically be created during backup. Example: http://victoriametrics:8428/snapshot/create . There is no need in setting -snapshotName if -snapshot.createURL is set
  -snapshot.deleteURL string
     VictoriaMetrics delete snapshot url. Optional. Will be generated from -snapshot.createURL if not provided. All created snapshots will be automatically deleted. Example: http://victoriametrics:8428/snapshot/delete
  -snapshotName string
     Name for the snapshot to backup. See https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots. There is no need in setting -snapshotName if -snapshot.createURL is set
  -storageDataPath string
     Path to VictoriaMetrics data. Must match -storageDataPath from VictoriaMetrics or vmstorage (default "victoria-metrics-data")
  -tls
     Whether to enable TLS for incoming HTTP requests at -httpListenAddr (aka https). -tlsCertFile and -tlsKeyFile must be set if -tls is set
  -tlsCertFile string
     Path to file with TLS certificate if -tls is set. Prefer ECDSA certs instead of RSA certs as RSA certs are slower. The provided certificate file is automatically re-read every second, so it can be dynamically updated
  -tlsCipherSuites array
     Optional list of TLS cipher suites for incoming requests over HTTPS if -tls is set. See the list of supported cipher suites at https://pkg.go.dev/crypto/tls#pkg-constants
     Supports an array of values separated by comma or specified via multiple flags.
  -tlsKeyFile string
     Path to file with TLS key if -tls is set. The provided key file is automatically re-read every second, so it can be dynamically updated
  -tlsMinVersion string
     Optional minimum TLS version to use for incoming requests over HTTPS if -tls is set. Supported values: TLS10, TLS11, TLS12, TLS13
  -version
     Show VictoriaMetrics version

How to build from sources

It is recommended using binary releases - see vmutils-* archives there.

Development build

  1. Install Go. The minimum supported version is Go 1.20.
  2. Run make vmbackup from the root folder of the repository. It builds vmbackup binary and puts it into the bin folder.

Production build

  1. Install docker.
  2. Run make vmbackup-prod from the root folder of the repository. It builds vmbackup-prod binary and puts it into the bin folder.

Building docker images

Run make package-vmbackup. It builds victoriametrics/vmbackup:<PKG_TAG> docker image locally. <PKG_TAG> is auto-generated image tag, which depends on source code in the repository. The <PKG_TAG> may be manually set via PKG_TAG=foobar make package-vmbackup.

The base docker image is alpine but it is possible to use any other base image by setting it via <ROOT_IMAGE> environment variable. For example, the following command builds the image on top of scratch image:

ROOT_IMAGE=scratch make package-vmbackup