VictoriaMetrics/docs/FAQ.md
hagen1778 1dd56f022d
docs: update differences between vmagent and Prometheus in FAQ
* mention stream aggregation
* rm statement that Prometheus can only pull data, which is not true anymore
* mention absence of backfilling limitations

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-11-22 17:38:13 +01:00

41 KiB
Raw Blame History

weight title menu aliases
24 FAQ
docs
parent weight
victoriametrics 24
/FAQ.html
/faq.html

What is the main purpose of VictoriaMetrics?

To provide the best monitoring solution.

Who uses VictoriaMetrics?

See case studies.

Which features does VictoriaMetrics have?

See these docs.

Are there performance comparisons with other solutions?

Yes. See these benchmarks.

How to start using VictoriaMetrics?

See these docs.

Hot to contribute to VictoriaMetrics?

See these docs.

Does VictoriaMetrics support replication?

Yes. See these docs for details.

Can I use VictoriaMetrics instead of Prometheus?

Yes in most cases. VictoriaMetrics can substitute Prometheus in the following aspects:

  • Prometheus-compatible service discovery and target scraping can be done with vmagent and with single-node VictoriaMetrics. See these docs.
  • Prometheus-compatible alerting rules and recording rules can be processed with vmalert.
  • Prometheus-compatible querying in Grafana is supported by VictoriaMetrics. See these docs.

What is the difference between vmagent and Prometheus?

While both vmagent and Prometheus may scrape Prometheus targets (aka /metrics pages) according to the provided Prometheus-compatible scrape configs and send data to multiple remote storage systems, vmagent has the following additional features:

What is the difference between vmagent and Prometheus agent?

Both vmagent and Prometheus agent serve the same purpose to efficiently scrape Prometheus-compatible targets at the edge. They have the following differences:

  • vmagent usually requires lower amounts of CPU, RAM and disk IO compared to the Prometheus agent.
  • vmagent supports both pull and push data collection it can accept data via many popular data ingestion protocols such as InfluxDB line protocol, Graphite protocol, OpenTSDB protocol, DataDog protocol, Prometheus protocol, OpenTelemetry metrics protocol, CSV and JSON see these docs.
  • vmagent doesn't have limitations on backfilling of historical data.
  • vmagent can easily scale horizontally to multiple instances for scraping a big number of targets see these docs.
  • vmagent supports improved relabeling.
  • vmagent can limit the number of scraped metrics per target see these docs.
  • vmagent supports loading scrape configs from multiple files see these docs.
  • vmagent supports data reading and data writing from/to Kafka see these docs.
  • vmagent can read and update scrape configs from http and https URLs, while the Prometheus agent can read them only from the local file system.
  • vmagent supports stream aggregation feature for performing aggregates on collected or received samples before sending them to remote storage.

Is it safe to enable remote write in Prometheus?

Yes. Prometheus continues writing data to local storage after enabling remote write, so all the existing local storage data and new data is available for querying via Prometheus as usual.

It is recommended using vmagent for scraping Prometheus targets and writing data to VictoriaMetrics.

How does VictoriaMetrics compare to other remote storage solutions for Prometheus such as M3DB, Thanos, Cortex, Mimir, etc.?

  • VictoriaMetrics is easier to configure and operate than competing solutions.
  • VictoriaMetrics is more cost-efficient, since it requires less RAM, disk space, disk IO and network IO than competing solutions.
  • VictoriaMetrics performs typical queries faster than competing solutions.
  • VictoriaMetrics has a simpler architecture, which translates into fewer bugs and more useful features compared to competing TSDBs.

See the following articles and talks for details:

VictoriaMetrics also uses less RAM than Thanos components.

What is the difference between VictoriaMetrics and QuestDB?

What is the difference between VictoriaMetrics and Grafana Mimir?

Grafana Mimir is a Cortex fork, so it has the same differences as Cortex. See what is the difference between VictoriaMetrics and Cortex.

See also Grafana Mimir vs VictoriaMetrics benchmark.

What is the difference between VictoriaMetrics and Cortex?

VictoriaMetrics is similar to Cortex in the following aspects:

The main differences between Cortex and VictoriaMetrics:

  • Cortex re-uses Prometheus source code, while VictoriaMetrics is written from scratch.
  • Cortex heavily relies on third-party services such as Consul, Memcache, DynamoDB, BigTable, Cassandra, etc. This may increase operational complexity and reduce system reliability compared to VictoriaMetrics' case, which doesn't use any external services. Compare Cortex' Architecture to VictoriaMetrics' architecture.
  • VictoriaMetrics provides production-ready single-node solution, which is much easier to set up and operate than a Cortex cluster.
  • Cortex may lose up to 12 hours of recent data on Ingestor failure see the corresponding docs. VictoriaMetrics may lose only a few seconds of recent data, which isn't synced to persistent storage yet. See this article for details.
  • Cortex is usually slower and requires more CPU and RAM than VictoriaMetrics. See this talk from adidas at PromCon 2019 and other case studies.
  • VictoriaMetrics accepts data in multiple popular data ingestion protocols additionally to Prometheus remote_write protocol InfluxDB, OpenTSDB, Graphite, CSV, JSON, native binary. See these docs for details.
  • VictoriaMetrics provides the MetricsQL query language, while Cortex provides the PromQL query language.
  • VictoriaMetrics can be queried via Graphite's API.

What is the difference between VictoriaMetrics and Thanos?

  • Thanos re-uses Prometheus source code, while VictoriaMetrics is written from scratch.
  • VictoriaMetrics accepts data via the standard remote_write API for Prometheus, while Thanos uses a non-standard sidecar which must run alongside each Prometheus instance.
  • The Thanos sidecar requires disabling data compaction in Prometheus, which may hurt Prometheus performance and increase RAM usage. See these docs for more details.
  • Thanos stores data in object storage (Amazon S3 or Google GCS), while VictoriaMetrics stores data in block storage (GCP persistent disks, Amazon EBS or bare metal HDD). While object storage is usually less expensive, block storage provides much lower latencies and higher throughput. VictoriaMetrics works perfectly with HDD-based block storage there is no need for using more expensive SSD or NVMe disks in most cases.
  • Thanos may lose up to 2 hours of recent data, which wasn't uploaded yet to object storage. VictoriaMetrics may lose only a few seconds of recent data, which hasn't been synced to persistent storage yet. See this article for details.
  • VictoriaMetrics provides a production-ready single-node solution, which is much easier to set up and operate than Thanos components.
  • Thanos may be harder to set up and operate compared to VictoriaMetrics, since it has more moving parts, which can be connected with fewer reliable networks. See this article for details.
  • Thanos is usually slower and requires more CPU and RAM than VictoriaMetrics. See this talk from adidas at PromCon 2019.
  • VictoriaMetrics accepts data via multiple popular data ingestion protocols in addition to the Prometheus remote_write protocol InfluxDB, OpenTSDB, Graphite, CSV, JSON, native binary. See these docs for details.
  • VictoriaMetrics provides the MetricsQL query language, while Thanos provides the PromQL query language.
  • VictoriaMetrics can be queried via Graphite's API.

How does VictoriaMetrics compare to InfluxDB?

  • VictoriaMetrics requires 10x less RAM and it works faster.
  • VictoriaMetrics needs lower amounts of storage space than InfluxDB for production data.
  • VictoriaMetrics doesn't support InfluxQL or Flux but provides a better query language MetricsQL. See this tutorial for details.
  • VictoriaMetrics accepts data in multiple popular data ingestion protocols in addition to InfluxDB Prometheus remote_write, OpenTSDB, Graphite, CSV, JSON, native binary. See these docs for details.
  • VictoriaMetrics can be queried via Graphite's API.

How does VictoriaMetrics compare to TimescaleDB?

Does VictoriaMetrics use Prometheus technologies like other clustered TSDBs built on top of Prometheus such as Thanos or Cortex?

No. VictoriaMetrics core is written in Go from scratch by fasthttp's author. The architecture is optimized for storing and querying large amounts of time series data with high cardinality. VictoriaMetrics storage uses certain ideas from ClickHouse. Special thanks to Alexey Milovidov.

What is the pricing for VictoriaMetrics?

The following versions are open source and free:

We provide commercial support for both versions. Contact us for the pricing.

The following commercial versions of VictoriaMetrics are available:

  • VictoriaMetrics Cloud the most cost-efficient hosted monitoring platform, operated by VictoriaMetrics core team.

The following commercial versions of VictoriaMetrics are planned:

  • Cloud monitoring solution based on VictoriaMetrics.

Contact us for more information on our plans.

Why doesn't VictoriaMetrics support the Prometheus remote read API?

The remote read API requires transferring all the raw data for all the requested metrics over the given time range. For instance, if a query covers 1000 metrics with 10K values each, then the remote read API has to return 1000*10K=10M metric values to Prometheus. This is slow and expensive. Prometheus' remote read API isn't intended for querying foreign data aka global query view. See this issue for details.

So just query VictoriaMetrics directly via vmui, the Prometheus Querying API or via Prometheus datasource in Grafana.

Does VictoriaMetrics deduplicate data from Prometheus instances scraping the same targets (aka HA pairs)?

Yes. See these docs for details.

Where is the source code of VictoriaMetrics?

Source code for the following versions is available in the following places:

Is VictoriaMetrics a good fit for data from IoT sensors and industrial sensors?

VictoriaMetrics is able to handle data from hundreds of millions of IoT sensors and industrial sensors. It supports high cardinality data, perfectly scales up on a single node and scales horizontally to multiple nodes.

What is the difference between single-node and cluster versions of VictoriaMetrics?

Both single-node and cluster versions of VictoriaMetrics share the core source code, so they have many common features. They have the following differences though:

  • Single-node VictoriaMetrics runs on a single host, while cluster version of VictoriaMetrics can scale to many hosts. Single-node VictoriaMetrics scales vertically though, e.g. its capacity and performance scales almost linearly when increasing available CPU, RAM, disk IO and disk space. See an article about vertical scalability of a single-node VictoriaMetrics.

  • Cluster version of VictoriaMetrics supports multitenancy, while single-node VictoriaMetrics doesn't support it.

  • Cluster version of VictoriaMetrics supports data replication, while single-node VictoriaMetrics relies on the durability of the persistent storage pointed by -storageDataPath command-line flag. See these docs for details.

  • Single-node VictoriaMetrics provides higher capacity and performance comparing to cluster version of VictoriaMetrics when running on the same hardware with the same amounts of CPU and RAM, since it has no overhead on data transfer between cluster components over the network.

See also which type of VictoriaMetrics is recommended to use.

Where can I ask questions about VictoriaMetrics?

Questions about VictoriaMetrics can be asked via the following channels:

See the full list of community channels.

Where can I file bugs and feature requests regarding VictoriaMetrics?

File bugs and feature requests here.

Where can I find information about multi-tenancy?

See these docs. Multitenancy is supported only by the cluster version of VictoriaMetrics.

How to set a memory limit for VictoriaMetrics components?

All the VictoriaMetrics components provide command-line flags to control the size of internal buffers and caches: -memory.allowedPercent and -memory.allowedBytes (pass -help to any VictoriaMetrics component in order to see the description for these flags). These limits don't take into account additional memory, which may be needed for processing incoming queries. Hard limits may be enforced only by the OS via cgroups, Docker (see these docs) or Kubernetes (see these docs).

Memory usage for VictoriaMetrics components can be tuned according to the following docs:

How can I run VictoriaMetrics on FreeBSD/OpenBSD?

VictoriaMetrics is included in OpenBSD and FreeBSD ports so just install it from there or use pre-built binaries from releases page.

Does VictoriaMetrics support the Graphite query language?

Yes. See these docs.

What is an active time series?

A time series is uniquely identified by its name plus a set of its labels. For example, temperature{city="NY",country="US"} and temperature{city="SF",country="US"} are two distinct series, since they differ by the city label. A time series is considered active if it received at least a single sample during the last hour or it has been touched by queries during the last hour. The number of active time series is displayed on the official Grafana dashboard for VictoriaMetrics - see these docs for details.

What is high churn rate?

If old time series are constantly substituted by new time series at a high rate, then such a state is called high churn rate. High churn rate has the following negative consequences:

  • Increased total number of time series stored in the database.
  • Increased size of inverted index, which is stored at <-storageDataPath>/indexdb, since the inverted index contains entries for every label of every time series with at least a single ingested sample.
  • Slow-down of queries over multiple days.

The main reason for high churn rate is a metric label with frequently changed value. Examples of such labels:

  • queryid, which changes with each query at postgres_exporter.
  • pod, which changes with each new deployment in Kubernetes.
  • A label derived from the current time such as timestamp, minute or hour.
  • A hash or uuid label, which changes frequently.

The solution against high churn rate is to identify and eliminate labels with frequently changed values. Cardinality explorer can help determining these labels. If labels can't be removed, try pre-aggregating data before it gets ingested into database with stream aggregation.

The official Grafana dashboards for VictoriaMetrics contain graphs for churn rate - see these docs for details.

What is high cardinality?

High cardinality usually means a high number of active time series. High cardinality may lead to high memory usage and/or to a high percentage of slow inserts. The source of high cardinality is usually a label with a large number of unique values, which presents a big share of the ingested time series. Examples of such labels:

  • user_id
  • url
  • ip

The solution is to identify and remove the source of high cardinality with the help of cardinality explorer.

The official Grafana dashboards for VictoriaMetrics contain graphs, which show the number of active time series - see these docs for details.

What is a slow insert?

VictoriaMetrics maintains in-memory cache for mapping of active time series into internal series ids. The cache size depends on the available memory for VictoriaMetrics in the host system. If the information about all the active time series doesn't fit the cache, then VictoriaMetrics needs to read and unpack the information from disk on every incoming sample for time series missing in the cache. This operation is much slower than the cache lookup, so such an insert is named a slow insert. A high percentage of slow inserts on the official dashboard for VictoriaMetrics indicates a memory shortage for the current number of active time series. Such a condition usually leads to a significant slowdown for data ingestion and to significantly increased disk IO and CPU usage. The solution is to add more memory or to reduce the number of active time series.

Cardinality explorer can be helpful for locating the source of high number of active time series.

How to optimize MetricsQL query?

See this article.

VictoriaMetrics also provides query tracer and cardinality explorer, which can help during query optimization.

See also troubleshooting slow queries.

Both single-node VictoriaMetrics and VictoriaMetrics cluster are production-ready.

Single-node VictoriaMetrics is able to handle quite big workloads in production with tens of millions of active time series at the ingestion rate of million of samples per second. See this case study.

Single-node VictoriaMetrics requires lower amounts of CPU and RAM for handling the same workload comparing to cluster version of VictoriaMetrics, since it doesn't need to pass the encoded data over the network between cluster components.

The performance of a single-node VictoriaMetrics scales almost perfectly with the available CPU, RAM and disk IO resources on the host where it runs - see this article.

Single-node VictoriaMetrics is easier to setup and operate comparing to cluster version of VictoriaMetrics.

Given the facts above it is recommended to use single-node VictoriaMetrics in the majority of cases.

Cluster version of VictoriaMetrics may be preferred over single-node VictoriaMetrics in the following relatively rare cases:

  • If multitenancy support is needed, since single-node VictoriaMetrics doesn't support multitenancy. Though it is possible to run multiple single-node VictoriaMetrics instances - one per each tenant - and route incoming requests from particular tenant to the needed VictoriaMetrics instance via vmauth.

  • If the current workload cannot be handled by a single-node VictoriaMetrics. For example, if you are going to ingest hundreds of millions of active time series at ingestion rates exceeding a million samples per second, then it is better to use cluster version of VictoriaMetrics, since its capacity can scale horizontally with the number of nodes in the cluster.

How to migrate data from single-node VictoriaMetrics to cluster version?

Single-node VictoriaMetrics stores data on disk in slightly different format comparing to cluster version of VictoriaMetrics. So it is impossible to just copy the on-disk data from -storageDataPath directory from single-node VictoriaMetrics to a vmstorage node in VictoriaMetrics cluster. If you need migrating data from single-node VictoriaMetrics to cluster version, then follow these instructions.

Why isn't MetricsQL 100% compatible with PromQL?

MetricsQL provides better user experience than PromQL. It fixes a few annoying issues in PromQL. This prevents MetricsQL to be 100% compatible with PromQL. See this article for details.

How to migrate data from Prometheus to VictoriaMetrics?

Please see these docs.

How to migrate data from InfluxDB to VictoriaMetrics?

Please see these docs.

How to migrate data from OpenTSDB to VictoriaMetrics?

Please see these docs.

How to migrate data from Graphite to VictoriaMetrics?

Please use the whisper-to-graphite tool for reading data from Graphite and pushing them to VictoriaMetrics via Graphite's import API.

Why do the same metrics have differences in VictoriaMetrics' and Prometheus' dashboards?

There could be a slight difference in stored values for time series. Due to different compression algorithms, VM may reduce the precision for float values with more than 12 significant decimal digits. Please see this article.

The query engine may behave differently for some functions. Please see this article.

If downsampling and deduplication are enabled how will this work?

Deduplication is a special case of zero-offset downsampling. So, if both downsampling and deduplication are enabled, then deduplication is replaced by zero-offset downsampling

How to upgrade or downgrade VictoriaMetrics without downtime?

Single-node VictoriaMetrics cannot be restarted / upgraded or downgraded without downtime, since it needs to be gracefully shut down and then started again. See how to upgrade VictoriaMetrics.

Cluster version of VictoriaMetrics can be restarted / upgraded / downgraded without downtime according to these instructions.

Why VictoriaMetrics misses automatic data re-balancing between vmstorage nodes?

VictoriaMetrics doesn't rebalance data between vmstorage nodes when new vmstorage nodes are added to the cluster. This means that newly added vmstorage nodes will have less data at -storageDataPath comparing to the old vmstorage nodes until the historical data is removed from the old vmstorage nodes when it goes outside the configured retention.

The automatic rebalancing is the process of moving data between vmstorage nodes, so every node has the same amounts of data eventually. It is disabled by default because it may consume additional CPU, network bandwidth and disk IO at vmstorage nodes for long periods of time, which, in turn, can negatively impact VictoriaMetrics cluster availability.

Additionally, it is unclear how to handle the automatic re-balancing if cluster configuration changes when the re-balancing is in progress.

The amounts of data stored in vmstorage becomes equal among old vmstorage nodes and new vmstorage nodes after historical data is removed from the old vmstorage nodes because it goes outside of configured retention.

The data ingestion load becomes even between old vmstorage nodes and new vmstorage nodes almost immediately after adding new vmstorage nodes to the cluster, since vminsert nodes evenly distribute incoming time series among the nodes specified in -storageNode command-line flag. The newly added vmstorage nodes may experience increased load during the first couple of minutes because they need to register active time series.

The query load becomes even between old vmstorage nodes and new vmstorage nodes after most of queries are executed over time ranges with data covered by new vmstorage nodes. Usually the most of queries are received from alerting and recording rules, which query data on limited time ranges such as a few hours or few days at max. This means that the query load between old vmstorage nodes and new vmstorage nodes should become even in a few hours / days after adding new vmstorage nodes.

Why VictoriaMetrics misses automatic recovery of replication factor?

VictoriaMetrics doesn't restore replication factor when some of vmstorage nodes are removed from the cluster because of the following reasons:

  • Automatic replication factor recovery needs copying non-trivial amounts of data between the remaining vmstorage nodes. This copying takes additional CPU, disk IO and network bandwidth at vmstorage nodes. This may negatively impact VictoriaMetrics cluster availability during extended periods of time.

  • It is unclear when the automatic replication factor recovery must be started. How to distinguish the expected temporary vmstorage node unavailability because of maintenance, upgrade or config changes from permanent loss of data at the vmstorage node?

It is recommended reading replication and data safety docs for more details.