VictoriaMetrics/lib
Aliaksandr Valialkin e1cf962bad
lib/storage: switch from global to per-day index for MetricName -> TSID mapping
Previously all the newly ingested time series were registered in global `MetricName -> TSID` index.
This index was used during data ingestion for locating the TSID (internal series id)
for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names).

The `MetricName -> TSID` index is stored on disk in order to make sure that the data
isn't lost on VictoriaMetrics restart or unclean shutdown.

The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding
data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache,
and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics
uses in-memory cache for speeding up the lookup for active time series.
This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested
active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk.

VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases:

- If `storage/tsid` cache capacity isn't enough for active time series.
  Then just increase available memory for VictoriaMetrics or reduce the number of active time series
  ingested into VictoriaMetrics.

- If new time series is ingested into VictoriaMetrics. In this case it cannot find
  the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index,
  since it doesn't know that the index has no the corresponding entry too.
  This is a typical event under high churn rate, when old time series are constantly substituted
  with new time series.

Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index,
are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics.

Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName`
for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod.
This index can become very large under high churn rate and long retention. VictoriaMetrics
caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups.
The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing
recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series.

This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly
reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics
consults only the index for the current day when new time series is ingested into it.

The downside of this change is increased indexdb size on disk for workloads without high churn rate,
e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store
identical `MetricName -> TSID` entries for static time series for every day.

This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation,
since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 .

At the same time the change fixes the issue, which could result in lost access to time series,
which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698

The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed
in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685

This is a follow-up for 1f28b46ae9
2023-07-13 17:03:50 -07:00
..
appmetrics all: add ability to push internal metrics to remote storage system specified via -pushmetrics.url 2022-07-21 20:15:29 +03:00
auth app/vminsert: allows parsing tenant id from labels (#3009) 2022-09-30 17:28:35 +03:00
awsapi .golangci.yml: properly enable revive linter and fix all the warnings it detects 2023-02-26 12:19:58 -08:00
backup fix removing storage data dir before restoring from backup (#598) 2023-07-06 22:32:12 -07:00
blockcache all: remove explicit "xxhash" name when importing github.com/cespare/xxhash/v2 package 2022-06-21 20:24:28 +03:00
bloomfilter lib/promscrape: add the ability to limit the number of unique series per each scrape target 2021-09-01 16:08:12 +03:00
bufferedwriter app/vmselect: move common http functionality from app/vmselect/searchutils to lib/httputils 2023-07-06 17:22:23 -07:00
buildinfo all: open-sourcing single-node version 2019-05-23 00:18:06 +03:00
bytesutil lib/bytesutil: substitute parentheses with slashes in ByteBuffer.Path() output, so it can be passed to path manipulating functions 2023-07-06 17:23:52 -07:00
cgroup lib/cgroup: add SetGOGC() function 2023-07-06 17:24:31 -07:00
consts app/vminsert: reduce the max packet size, which vminsert can send to vmstorage 2022-04-05 15:39:58 +03:00
decimal lib/decimal: use consistent randomizer in tests 2023-01-23 19:24:05 -08:00
encoding lib/encoding: add MarshalBool/UnmarshalBool and GetUint32s/PutUint32s functions 2023-07-06 17:24:52 -07:00
envflag lib/envflag: small refactoring after 518c340ae3 and 02096e06d0 2022-10-29 02:29:19 +03:00
envtemplate allowed using dashes and dots in environment variables names (#4009) 2023-03-24 17:57:19 -07:00
fastnum Makefile: add build and test rules with enabled race detector. These rules have -race suffix 2020-03-05 12:05:16 +02:00
fasttime lib: extract common code for returning fast unix timestamp into lib/fasttime 2020-05-14 23:06:50 +03:00
filestream lib/filestream: change Create() to MustCreate() 2023-04-14 15:14:24 -07:00
flagutil lib/flagutil: ArrayString: support commas inside quoted strings and inside [], {} and () braces 2023-03-28 21:25:07 -07:00
formatutil app/vmbackupmanager: add metrics for better observability (#488) 2022-12-20 14:18:43 -08:00
fs lib/fs: add ReaderAt.Path() function 2023-07-06 17:25:19 -07:00
handshake lib/handshake: do not pollute logs with cannot read hello messages on TCP health checks 2023-05-18 10:37:59 -07:00
htmlcomponents app/vmselect: remove dependency on lib/promscrape from app/vmselect 2023-01-03 23:27:36 -08:00
httpserver docs: make httpAuth.* flags description less ambiguous (#4588) 2023-07-09 12:36:14 -07:00
httputils lib/httputils: fix test after b49d04b3dc 2023-07-06 22:21:43 -07:00
influxutils lib/flagutil: rename Array to ArrayString 2022-10-01 18:28:19 +03:00
ingestserver lib/netutil: init implimentation of proxy protocol (#3687) 2023-01-26 23:25:22 -08:00
leveledbytebufferpool all: make fmt via the upcoming Go1.19 2022-07-11 19:23:25 +03:00
logger add error handler for parsing prometheus text format to vmagent and v… (#3693) 2023-01-23 22:36:23 -08:00
logjson app/vlinsert/jsonline: code prettifying 2023-07-06 21:35:55 -07:00
logstorage lib/logstorage: fix panic (#4620) 2023-07-13 12:04:59 -07:00
lrucache all: remove explicit "xxhash" name when importing github.com/cespare/xxhash/v2 package 2022-06-21 20:24:28 +03:00
memory max value for memory.allowedPercent changed from 200 to 100 (#4171) (#4251) 2023-05-08 23:20:56 -07:00
mergeset lib/mergeset: simplify fulsuhInmemoryParts() a bit 2023-07-13 12:33:43 -07:00
metricsql all: make fmt via the upcoming Go1.19 2022-07-11 19:23:25 +03:00
netutil lib/netutil: ignore arificial timeout generated by net/http.Server 2023-07-06 17:26:15 -07:00
persistentqueue app/vmagent,lib/persistentqueue: show warning message if --remoteWrite.maxDiskUsagePerURL flag lower than 500MB (#4196) 2023-05-08 15:45:21 -07:00
procutil lib/procutil: stop immediately after receiving the second SIGINT or SIGTERM signal 2022-10-20 21:58:49 +03:00
promauth lib/promscrape: disable support for service discovery and metrics scrape via http2 2023-07-06 16:04:31 -07:00
prompb app/vminsert: moved -maxInsertRequestSize command-line flag out of lib/prompb in order to prevent its inclusion in vmselect and vmstorage apps 2020-01-28 22:53:50 +02:00
prompbmarshal all: use %w instead of %s for wrapping errors in fmt.Errorf 2020-06-30 23:33:46 +03:00
promrelabel lib/promrelabel: use monospace font at textarea for writing relabel configs on /metric-relabel-debug and /target-relabel-debug pages 2023-05-18 20:49:47 -07:00
promscrape fixed service name detection for consulagent service discovery in case of a difference in service name and service id (#4390) (#4439) 2023-07-06 16:53:29 -07:00
promutils fix parse for invalid partial RFC3339 format (#4539) 2023-07-06 22:09:35 -07:00
protoparser lib/promutils: properly return error when incorrect Prometheus label names are passed to NewLabelsFromString() 2023-05-12 17:02:06 -07:00
proxy lib/promauth: add ability to send additional http headers in requests to scrape targets 2022-06-22 20:40:50 +03:00
pushmetrics fixed typos in documentation and commandline flags descriptions (#4275) 2023-05-10 02:22:06 -07:00
querytracer lib/querytracer: fix remaining tests after 49ebc48809 2022-12-08 18:18:50 -08:00
regexutil app,lib: fix typos in comments (#3804) 2023-02-13 09:32:35 -08:00
snapshot app/vmbackup: prevent password leaks (#3672) 2023-01-18 11:40:52 -08:00
storage lib/storage: switch from global to per-day index for MetricName -> TSID mapping 2023-07-13 17:03:50 -07:00
streamaggr Revert "lib/streamaggr: discard samples with timestamps outside of aggregation interval (#4199)" 2023-05-08 21:50:19 -07:00
syncwg all: open-sourcing single-node version 2019-05-23 00:18:06 +03:00
tenantmetrics app/vminsert: allows parsing tenant id from labels (#3009) 2022-09-30 17:28:35 +03:00
timerpool lib/timerpool: use timer pool in concurrency limiters 2019-05-28 17:30:10 +03:00
uint64set lib/uint64set: use repeatable randomizer in tests 2023-01-23 19:24:05 -08:00
vmselectapi lib/vmselectapi: move the code for checking the expected client errors into a isExpectedError() function 2023-07-06 16:37:59 -07:00
workingsetcache lib/workingsetcache: expose -cacheExpireDuration command-line flag for fine-tuning of the cache expiration 2022-11-17 21:55:11 +02:00
writeconcurrencylimiter lib/writeconcurrencylimiter: initialize concurrencyLimitCh before exporting vm_concurrent_insert_capacity and vm_concurrent_insert_current metrics 2023-02-07 11:08:39 -08:00