VictoriaMetrics/lib
Aliaksandr Valialkin 202eb429a7
lib/logstorage: refactor storage format to be more efficient for querying wide events
It has been appeared that VictoriaLogs is frequently used for collecting logs with tens of fields.
For example, standard Kuberntes setup on top of Filebeat generates more than 20 fields per each log.
Such logs are also known as "wide events".

The previous storage format was optimized for logs with a few fields. When at least a single field
was referenced in the query, then the all the meta-information about all the log fields was unpacked
and parsed per each scanned block during the query. This could require a lot of additional disk IO
and CPU time when logs contain many fields. Resolve this issue by providing an (field -> metainfo_offset)
index per each field in every data block. This index allows reading and extracting only the needed
metainfo for fields used in the query. This index is stored in columnsHeaderIndexFilename ( columns_header_index.bin ).
This allows increasing performance for queries over wide events by 10x and more.

Another issue was that the data for bloom filters and field values across all the log fields except of _msg
was intermixed in two files - fieldBloomFilename ( field_bloom.bin ) and fieldValuesFilename ( field_values.bin ).
This could result in huge disk read IO overhead when some small field was referred in the query,
since the Operating System usually reads more data than requested. It reads the data from disk
in at least 4KiB blocks (usually the block size is much bigger in the range 64KiB - 512KiB).
So, if 512-byte bloom filter or values' block is read from the file, then the Operating System
reads up to 512KiB of data from disk, which results in 1000x disk read IO overhead. This overhead isn't visible
for recently accessed data, since this data is usually stored in RAM (aka Operating System page cache),
but this overhead may become very annoying when performing the query over large volumes of data
which isn't present in OS page cache.

The solution for this issue is to split bloom filters and field values across multiple shards.
This reduces the worst-case disk read IO overhead by at least Nx where N is the number of shards,
while the disk read IO overhead is completely removed in best case when the number of columns doesn't exceed N.
Currently the number of shards is 8 - see bloomValuesShardsCount . This solution increases
performance for queries over large volumes of newly ingested data by up to 1000x.

The new storage format is versioned as v1, while the old storage format is version as v0.
It is stored in the partHeader.FormatVersion.

Parts with the old storage format are converted into parts with the new storage format during background merge.
It is possible to force merge by querying /internal/force_merge HTTP endpoint - see https://docs.victoriametrics.com/victorialogs/#forced-merge .
2024-10-16 17:35:07 +02:00
..
appmetrics all: add -metrics.exposeMetadata command-line flag, which can be used for adding TYPE and HELP metadata for metrics exposed at /metrics page 2023-12-19 03:20:40 +02:00
auth lib/auth: add NewTokenPossibleMultitenant() for parsing auth token, which can be multitenant 2023-08-30 14:17:55 +02:00
awsapi lib/awsapi: properly assume role with webIdentity token (#5495) 2023-12-20 19:05:39 +02:00
backup lib/backup/s3remote: add retryer configuration (#6747) 2024-08-07 16:55:29 +02:00
blockcache all: consistently use 'any' instead of 'interface{}' 2024-07-10 00:20:37 +02:00
bloomfilter lib: consistently use atomic.* types instead of atomic.* functions 2024-02-24 02:07:53 +02:00
bufferedwriter app/vmselect: move common http functionality from app/vmselect/searchutils to lib/httputils 2023-06-19 22:34:20 -07:00
buildinfo all: open-sourcing single-node version 2019-05-23 00:18:06 +03:00
bytesutil lib/bytesutil: smooth buffer growth rate (#6761) 2024-08-07 16:49:43 +02:00
cgroup lib/cgroup: round GOMAXPROCS to the lower integer value of cpuQuota 2024-09-23 16:09:12 +02:00
contextutil lib/contextutil: make golanci-lint happy by substituing unused function arg name with _ 2024-09-26 17:06:48 +02:00
decimal lib/slicesutil: add helper functions for setting slice length and extending its capacity 2024-05-12 11:32:17 +02:00
encoding lib/encoding: optimize UnmarshalVarUint64, UnmarshalVarInt64 and UnmarshalBytes a bit 2024-05-14 01:23:54 +02:00
envflag lib/envflag: do not allow unsupported form for boolean command-line flags in the form -boolFlag value 2023-08-17 13:26:53 +02:00
envtemplate allowed using dashes and dots in environment variables names (#4009) 2023-03-24 15:43:05 -07:00
envutil testing: allow disabling fsync to make tests run faster (#6871) 2024-08-30 10:54:46 +02:00
fastnum lib/fastnum: use unsafe.Slice() instead of deprecated reflect.SliceHeader 2024-02-29 17:17:13 +02:00
fasttime lib: consistently use atomic.* types instead of atomic.* functions 2024-02-24 02:07:53 +02:00
filestream vlinsert: added opentelemetry logs support 2024-09-03 20:12:05 +02:00
flagutil fscore: rollback trailing space trim (#7106) 2024-09-29 10:59:25 +02:00
formatutil app/vmbackupmanager: add metrics for better observability (#488) 2022-12-20 14:18:06 -08:00
fs fscore: rollback trailing space trim (#7106) 2024-09-29 10:59:25 +02:00
htmlcomponents lib/htmlcomponents: use relative links for the top page and for favicon.ico 2023-11-13 20:29:05 +01:00
httpserver app/vlselect: add /select/logsql/stats_query endpoint, which is going to be used by vmalert 2024-09-06 23:06:43 +02:00
httputils app/vlinsert: support _time field without timezone information during data ingestion 2024-09-26 12:49:35 +02:00
influxutils app/{vminsert,vmagent}: add healthcheck for influx ingestion endpoints (#6749) 2024-08-05 09:34:54 +02:00
ingestserver Revert c6c5a5a186 and b2765c45d0 2024-07-03 23:51:56 +02:00
leveledbytebufferpool lib/leveledbytebufferpool: do not pool byte slices bigger than 2^18 bytes 2024-06-13 16:56:25 +02:00
logger app/vlogscli: add interactive command-line tool for querying VictoriaLogs 2024-10-01 12:23:07 +02:00
logstorage lib/logstorage: refactor storage format to be more efficient for querying wide events 2024-10-16 17:35:07 +02:00
lrucache all: consistently use 'any' instead of 'interface{}' 2024-07-10 00:20:37 +02:00
memory all: cleanup: remove // +build ... lines, since they are no longer needed after Go1.17, and the minimum supported Go version for VictoriaMetrics source code is Go1.20 2023-11-13 19:12:51 +01:00
mergeset lib/mergeset: fix typos in comments 2024-08-07 15:54:15 +02:00
metricsql all: make fmt via the upcoming Go1.19 2022-07-11 19:22:15 +03:00
netutil lib/promscrape: fixes proxy autorization (#6783) 2024-08-19 22:31:18 +02:00
persistentqueue app/vmagent/remotewrite: follow-up for 87fd400dfc 2024-07-13 02:25:19 +02:00
procutil all: cleanup: remove // +build ... lines, since they are no longer needed after Go1.17, and the minimum supported Go version for VictoriaMetrics source code is Go1.20 2023-11-13 19:12:51 +01:00
promauth vmagent: add support of HTTP2 client for Kubernetes SD (#7114) 2024-10-08 10:36:31 +02:00
prompb Revert "Exemplar support (#5982)" 2024-07-03 15:30:21 +02:00
prompbmarshal Revert "Exemplar support (#5982)" 2024-07-03 15:30:21 +02:00
promrelabel lib/promrelabel: follow-up for 8958cecad6 2024-08-27 13:04:26 +02:00
promscrape vmagent: add support of HTTP2 client for Kubernetes SD (#7114) 2024-10-08 10:36:31 +02:00
promutils stream aggregation: fix possible duplicated aggregation results (#7118) 2024-09-30 14:24:59 +02:00
protoparser lib/protoparser/influx: enable batch processing by default (#7165) 2024-10-15 11:48:40 +02:00
proxy lib/promscrape: fixes proxy autorization (#6783) 2024-08-19 22:31:18 +02:00
pushmetrics lib/pushmetrics: wait until the background goroutines, which push metrics, are stopped at pushmetrics.Stop() 2024-01-15 13:50:36 +02:00
querytracer make go vet happy 2024-08-19 21:15:33 +02:00
ratelimiter app/vmagent: properly shutdown when -maxIngestionRate limit is reached 2024-03-30 06:43:48 +02:00
regexutil lib/logstorage: work-in-progress 2024-05-25 22:59:13 +02:00
slicesutil lib/slicesutil: add helper functions for setting slice length and extending its capacity 2024-05-12 11:32:17 +02:00
snapshot lib/httputils: parse URL before creating HTTP transport (#6820) 2024-08-16 11:32:04 +02:00
storage (app|lib)/vmstorage: do not increment vm_rows_ignored_total on NaNs (#7166) 2024-10-02 12:37:27 +02:00
streamaggr tests: fix slice init length (#6897) 2024-08-30 10:55:25 +02:00
stringsutil all: consistently use stringsutil.JSONString() for formatting JSON strings with fmt.* functions instead of using "%q" formatter 2024-07-17 13:52:13 +02:00
syncwg all: open-sourcing single-node version 2019-05-23 00:18:06 +03:00
tenantmetrics lib/encoding/zstd: switch back from atomic.Pointer to atomic.Value for map[...]... 2023-07-20 20:56:11 -07:00
timerpool lib/timerpool: use timer pool in concurrency limiters 2019-05-28 17:20:10 +03:00
timeutil app/vlinsert: support _time field without timezone information during data ingestion 2024-09-26 12:49:35 +02:00
uint64set lib/uint64set: optimize Set.Has() for nil Set - it should be inlined now 2024-07-15 23:59:20 +02:00
workingsetcache lib: consistently use atomic.* types instead of atomic.* functions 2024-02-24 02:07:53 +02:00
writeconcurrencylimiter app/vmagent/remotewrite: clarify the reason behind the default value for -remoteWrite.queues in the same way as the reason for -maxConcurrentInserts is defined at 73f5fb0f0c 2024-03-06 13:43:08 +02:00