mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2025-03-11 15:34:56 +00:00

No description

Find a file

Aliaksandr Valialkin 202eb429a7 lib/logstorage: refactor storage format to be more efficient for querying wide events It has been appeared that VictoriaLogs is frequently used for collecting logs with tens of fields. For example, standard Kuberntes setup on top of Filebeat generates more than 20 fields per each log. Such logs are also known as "wide events". The previous storage format was optimized for logs with a few fields. When at least a single field was referenced in the query, then the all the meta-information about all the log fields was unpacked and parsed per each scanned block during the query. This could require a lot of additional disk IO and CPU time when logs contain many fields. Resolve this issue by providing an (field -> metainfo_offset) index per each field in every data block. This index allows reading and extracting only the needed metainfo for fields used in the query. This index is stored in columnsHeaderIndexFilename ( columns_header_index.bin ). This allows increasing performance for queries over wide events by 10x and more. Another issue was that the data for bloom filters and field values across all the log fields except of _msg was intermixed in two files - fieldBloomFilename ( field_bloom.bin ) and fieldValuesFilename ( field_values.bin ). This could result in huge disk read IO overhead when some small field was referred in the query, since the Operating System usually reads more data than requested. It reads the data from disk in at least 4KiB blocks (usually the block size is much bigger in the range 64KiB - 512KiB). So, if 512-byte bloom filter or values' block is read from the file, then the Operating System reads up to 512KiB of data from disk, which results in 1000x disk read IO overhead. This overhead isn't visible for recently accessed data, since this data is usually stored in RAM (aka Operating System page cache), but this overhead may become very annoying when performing the query over large volumes of data which isn't present in OS page cache. The solution for this issue is to split bloom filters and field values across multiple shards. This reduces the worst-case disk read IO overhead by at least Nx where N is the number of shards, while the disk read IO overhead is completely removed in best case when the number of columns doesn't exceed N. Currently the number of shards is 8 - see bloomValuesShardsCount . This solution increases performance for queries over large volumes of newly ingested data by up to 1000x. The new storage format is versioned as v1, while the old storage format is version as v0. It is stored in the partHeader.FormatVersion. Parts with the old storage format are converted into parts with the new storage format during background merge. It is possible to force merge by querying /internal/force_merge HTTP endpoint - see https://docs.victoriametrics.com/victorialogs/#forced-merge .		2024-10-16 17:35:07 +02:00
.github	Revert "deployment: build image for vmagent streamaggr benchmark (#6515 )"	2024-07-16 13:27:06 +02:00
app	vmui: clarify the info for TotalSeries stat (#7271 )	2024-10-16 15:15:28 +02:00
cspell	docs: fixes misspelled typos	2024-09-13 12:14:24 +02:00
dashboards	dashboards: fix description about pending datapoints (#7235 )	2024-10-11 13:47:14 +02:00
deployment	deployment/alerts: fix quoting on DiskRunsOutOfSpace (#7234 )	2024-10-11 00:44:18 -07:00
docs	lib/logstorage: refactor storage format to be more efficient for querying wide events	2024-10-16 17:35:07 +02:00
lib	lib/logstorage: refactor storage format to be more efficient for querying wide events	2024-10-16 17:35:07 +02:00
package	simplify release process (#3012 )	2022-08-31 02:27:24 +03:00
ports/OpenBSD	docs: convert png images to webp in all the docs except of docs/operator/*	2023-11-22 19:21:00 +02:00
vendor	app/vlogscli: add interactive command-line tool for querying VictoriaLogs	2024-10-01 12:23:07 +02:00
.dockerignore	added packer build for DigitalOcean Droplets (#1917 )	2021-12-21 12:09:14 +02:00
.gitignore	testing: allow disabling fsync to make tests run faster (#6871 )	2024-08-30 10:54:46 +02:00
.golangci.yml	.golangci.yml: properly specify functions to exclude for return values check after the upgrade to v1.59.1 at `239a7b6e6f`	2024-06-11 16:41:01 +02:00
.wwhrd.yml	add MPL-2.0 to approved licenses	2024-08-29 10:35:36 +02:00
CODE_OF_CONDUCT.md	A good change for MD files (#2353 )	2022-03-22 13:40:55 +02:00
CONTRIBUTING.md	Move CONTRIBUTING.md to docs/	2024-04-20 23:11:22 +02:00
go.mod	app/vlogscli: add interactive command-line tool for querying VictoriaLogs	2024-10-01 12:23:07 +02:00
go.sum	app/vlogscli: add interactive command-line tool for querying VictoriaLogs	2024-10-01 12:23:07 +02:00
LICENSE	LICENSE: update the current year from 2023 to 2024	2024-01-17 01:48:04 +02:00
Makefile	app/vlogscli: add support for live tailing	2024-10-09 12:30:17 +02:00
README.md	dox: fix anchor in github readme (#7160 )	2024-10-03 10:39:57 +02:00
SECURITY.md	add new LTS release v1.102.x	2024-08-02 11:12:20 +02:00
VM_logo.zip	docs: update logos files and usage rules (#6980 )	2024-09-24 11:53:58 +02:00

README.md

VictoriaMetrics

VictoriaMetrics is a fast, cost-saving, and scalable solution for monitoring and managing time series data. It delivers high performance and reliability, making it an ideal choice for businesses of all sizes.

Here are some resources and information about VictoriaMetrics:

Documentation: docs.victoriametrics.com
Case studies: Grammarly, Roblox, Wix,....
Available: Binary releases, Docker images, Source code
Deployment types: Single-node version, Cluster version, and Enterprise version
Changelog: CHANGELOG, and How to upgrade
Community: Slack, Twitter, LinkedIn, YouTube

Yes, we open-source both the single-node VictoriaMetrics and the cluster version.

Prominent features

VictoriaMetrics is optimized for timeseries data, even when old time series are constantly replaced by new ones at a high rate, it offers a lot of features:

Long-term storage for Prometheus or as a drop-in replacement for Prometheus and Graphite in Grafana.
Powerful stream aggregation: Can be used as a StatsD alternative.
Ideal for big data: Works well with large amounts of time series data from APM, Kubernetes, IoT sensors, connected cars, industrial telemetry, financial data and various Enterprise workloads.
Query language: Supports both PromQL and the more performant MetricsQL.
Easy to setup: No dependencies, single small binary, configuration through command-line flags, but the default is also fine-tuned; backup and restore with instant snapshots.
Global query view: Multiple Prometheus instances or any other data sources may ingest data into VictoriaMetrics and queried via a single query.
Various Protocols: Support metric scraping, ingestion and backfilling in various protocol.
- Prometheus exporters, Prometheus remote write API, Prometheus exposition format.
- InfluxDB line protocol over HTTP, TCP and UDP.
- Graphite plaintext protocol with tags.
- OpenTSDB put message.
- HTTP OpenTSDB /api/put requests.
- JSON line format.
- Arbitrary CSV data.
- Native binary format.
- DataDog agent or DogStatsD.
- NewRelic infrastructure agent.
- OpenTelemetry metrics format.
NFS-based storages: Supports storing data on NFS-based storages such as Amazon EFS, Google Filestore.
And many other features such as metrics relabeling, cardinality limiter, etc.

Enterprise version

In addition, the Enterprise version includes extra features:

Anomaly detection: Automation and simplification of your alerting rules, covering complex anomalies found in metrics data.
Backup automation: Automates regular backup procedures.
Multiple retentions: Reducing storage costs by specifying different retentions for different datasets.
Downsampling: Reducing storage costs and increasing performance for queries over historical data.
Stable releases with long-term support lines (LTS).
Comprehensive support: First-class consulting, feature requests and technical support provided by the core VictoriaMetrics dev team.
Many other features, which you can read about on the Enterprise page.

Contact us if you need enterprise support for VictoriaMetrics. Or you can request a free trial license here, downloaded Enterprise binaries are available at Github Releases.

We strictly apply security measures in everything we do. VictoriaMetrics has achieved security certifications for Database Software Development and Software-Based Monitoring Services. See Security page for more details.

Benchmarks

Some good benchmarks VictoriaMetrics achieved:

Minimal memory footprint: handling millions of unique timeseries with 10x less RAM than InfluxDB, up to 7x less RAM than Prometheus, Thanos or Cortex.
Highly scalable and performance for data ingestion and querying, 20x outperforms InfluxDB and TimescaleDB.
High data compression: 70x more data points may be stored into limited storage than TimescaleDB, 7x less storage space is required than Prometheus, Thanos or Cortex.
Reducing storage costs: 10x more effective than Graphite according to the Grammarly case study.
A single-node VictoriaMetrics can replace medium-sized clusters built with competing solutions such as Thanos, M3DB, Cortex, InfluxDB or TimescaleDB. See VictoriaMetrics vs Thanos, Measuring vertical scalability, Remote write storage wars - PromCon 2019.
Optimized for storage: Works well with high-latency IO and low IOPS (HDD and network storage in AWS, Google Cloud, Microsoft Azure, etc.).

Community and contributions

Feel free asking any questions regarding VictoriaMetrics:

If you like VictoriaMetrics and want to contribute, then please read these docs.

VictoriaMetrics Logo

The provided ZIP file contains three folders with different logo orientations. Each folder includes the following file types:

JPEG: Preview files
PNG: Preview files with transparent background
AI: Adobe Illustrator files

VictoriaMetrics Logo Usage Guidelines

Font

Font Used: Lato Black
Download here: Lato Font

Color Palette

Black #000000
Purple #4d0e82
Orange #ff2e00
White #ffffff

Logo Usage Rules

Only use the Lato Black font as specified.
Maintain sufficient clear space around the logo for visibility.
Do not modify the spacing, alignment, or positioning of design elements.
You may resize the logo as needed, but ensure all proportions remain intact.

Thank you for your cooperation!