github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-11-21 14:44:00 +00:00

Author	SHA1	Message	Date
Aliaksandr Valialkin	73f0a805e2	lib/{storage,mergeset}: convert beffered items into searchable in-memory parts exactly once per the given flush interval Previously the interval between item addition and its conversion to searchable in-memory part could vary significantly because of too coarse per-second precision. Switch from fasttime.UnixTimestamp() to time.Now().UnixMilli() for millisecond precision. It is OK to use time.Now() for tracking the time when buffered items must be converted to searchable in-memory parts, since time.Now() calls aren't located in hot paths. Increase the flush interval for converting buffered samples to searchable in-memory parts from one second to two seconds. This should reduce the number of blocks, which are needed to be processed during high-frequency alerting queries. This, in turn, should reduce CPU usage. While at it, hardcode the maximum size of rawRows shard to 8Mb, since this size gives the optimal data ingestion pefromance according to load tests. This reduces memory usage and CPU usage on systems with big amounts of RAM under high data ingestion rate.	2024-02-22 20:21:14 +02:00
Aliaksandr Valialkin	463bc27312	lib/storage: avoid superflouos copy of block header data	2024-02-22 20:21:14 +02:00
Aliaksandr Valialkin	8d9d7a8a12	app/vmstorage: expose vm_snapshots metric, which shows the current number of snapshots While at it, refresh docs about snapshots - https://docs.victoriametrics.com/#how-to-work-with-snapshots	2024-02-22 18:32:57 +02:00
Aliaksandr Valialkin	aec9cd4316	lib/storage: do not pool rawRowsBlock when flushing rawRows to in-memory blocks The pooled rawRowsBlock objects occupies big amounts of memory between flushes, and the flushes are relatively rare. So it is better to don't use the pool and to allocate rawRow blocks on demand. This should reduce the average memory usage between flushes.	2024-02-22 17:37:48 +02:00
Aliaksandr Valialkin	b7dfe9894c	lib/storage: do not keep rawRows buffer across flush() calls The buffer can be quite big under high ingestion rate (e.g. more than 100MB). This leads to increased memory usage between buffer flushes. So it is better to re-create the buffer on every flush in order to reduce memory usage between buffer flushes.	2024-02-22 17:22:26 +02:00
Aliaksandr Valialkin	6b9bedd0f9	app/vmstorage: expose vm_last_partition_parts metrics, which may help identifying performance issues related to the increased number of parts in the last partition	2024-02-15 14:51:19 +02:00
Aliaksandr Valialkin	95222b2079	all: upgrade Go builder from Go1.21.7 to Go1.22.0 See https://go.dev/doc/go1.22	2024-02-12 21:59:51 +02:00
Aliaksandr Valialkin	a49a50701a	lib/mergeset: do not panic on too long items passed to Table.AddItems() Instead, log a sample of these long items once per 5 seconds into error log, so users could notice and fix the issue with too long labels or too many labels. Previously this panic could occur in production when ingesting samples with too long labels.	2024-02-12 19:32:18 +02:00
Aliaksandr Valialkin	2eb967231b	lib/storage: do not append headerData to bsw.indexData if its size exceeds maxBlockSize This is a follow-up optimization after `3c246cdf00`	2024-02-12 18:04:37 +02:00
Aliaksandr Valialkin	39e0007e14	lib/snapshot: move Time, Validate and NewName into lib/snapshot/snapshotutil package This allows removing importing unneeded command-line flags into binaries, which import lib/storage, which, in turn, was importing lib/snapshot in order to use Time, Validate and NewName functions. This is a follow-up for `83e55456e2` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5738	2024-02-09 04:18:45 +02:00
Aliaksandr Valialkin	3c246cdf00	lib/{storage,mergeset}: do not create index blocks with sizes exceeding 64Kb in common case This should reduce memory fragmentation and memory usage for indexdb/indexBlocks and storage/indexBlocks caches	2024-02-08 13:52:24 +02:00
Aliaksandr Valialkin	9e8e4cca6a	lib/storage: move fixupTimestamps() call to Block.Init() This is a follow-up for `0bf7921721`	2024-02-06 22:43:48 +02:00
Zakhar Bessarab	0bf7921721	lib/storage/raw_row: properly initialize TS for tmp blocks (#5762 ) Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2024-02-06 20:42:32 +00:00
Aliaksandr Valialkin	431aa16c8d	lib/storage: keep (date, metricID) entries only for the last two dates Entries for the previous dates is usually not used, so there is little sense in keeping them in memory. This should reduce the size of storage/date_metricID cache, which can be monitored via vm_cache_entries{type="storage/date_metricID"} metric.	2024-01-29 18:43:59 +01:00
Aliaksandr Valialkin	5d66ee88bd	lib/storage: do not check the limit for -search.maxUniqueTimeseries when performing /api/v1/labels and /api/v1/label/.../values requests This limit has little sense for these APIs, since: - Thses APIs frequently result in scanning of all the time series on the given time range. For example, if extra_filters={datacenter="some_dc"} . - Users expect these APIs shouldn't hit the -search.maxUniqueTimeseries limit, which is intended for limiting resource usage at /api/v1/query and /api/v1/query_range requests. Also limit the concurrency for /api/v1/labels, /api/v1/label/.../values and /api/v1/series requests in order to limit the maximum memory usage and CPU usage for these API. This limit shouldn't affect typical use cases for these APIs: - Grafana dashboard load when dashboard labels should be loaded - Auto-suggestion list load when editing the query in Grafana or vmui Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055	2024-01-29 16:45:12 +01:00
Aliaksandr Valialkin	bb7a419cc3	lib/{mergeset,storage}: make background merge more responsive and scalable - Maintain a separate worker pool per each part type (in-memory, file, big and small). Previously a shared pool was used for merging all the part types. A single merge worker could merge parts with mixed types at once. For example, it could merge simultaneously an in-memory part plus a big file part. Such a merge could take hours for big file part. During the duration of this merge the in-memory part was pinned in memory and couldn't be persisted to disk under the configured -inmemoryDataFlushInterval . Another common issue, which could happen when parts with mixed types are merged, is uncontrolled growth of in-memory parts or small parts when all the merge workers were busy with merging big files. Such growth could lead to significant performance degradataion for queries, since every query needs to check ever growing list of parts. This could also slow down the registration of new time series, since VictoriaMetrics searches for the internal series_id in the indexdb for every new time series. The third issue is graceful shutdown duration, which could be very long when a background merge is running on in-memory parts plus big file parts. This merge couldn't be interrupted, since it merges in-memory parts. A separate pool of merge workers per every part type elegantly resolves both issues: - In-memory parts are merged to file-based parts in a timely manner, since the maximum size of in-memory parts is limited. - Long-running merges for big parts do not block merges for in-memory parts and small parts. - Graceful shutdown duration is now limited by the time needed for flushing in-memory parts to files. Merging for file parts is instantly canceled on graceful shutdown now. - Deprecate -smallMergeConcurrency command-line flag, since the new background merge algorithm should automatically self-tune according to the number of available CPU cores. - Deprecate -finalMergeDelay command-line flag, since it wasn't working correctly. It is better to run forced merge when needed - https://docs.victoriametrics.com/#forced-merge - Tune the number of shards for pending rows and items before the data goes to in-memory parts and becomes visible for search. This improves the maximum data ingestion rate and the maximum rate for registration of new time series. This should reduce the duration of data ingestion slowdown in VictoriaMetrics cluster on e.g. re-routing events, when some of vmstorage nodes become temporarily unavailable. - Prevent from possible "sync: WaitGroup misuse" panic on graceful shutdown. This is a follow-up for `fa566c68a6` . Thanks @misutoth to for the inspiration at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5190 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3790 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3337 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3425 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3647 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3641 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291	2024-01-26 22:27:47 +01:00
Aliaksandr Valialkin	c3a585cfe5	lib/storage: rename AssistedMerges to AssistedMergesCount in order to make these field names less misleading These fields are counters, not gauges, so adding Count suffix to them makes easier to understand this while reading the code	2024-01-25 10:19:32 +02:00
Aliaksandr Valialkin	ae643ef1f1	lib/{storage,mergeset}: reduce the maxium compression level for the stored data This reduces CPU usage a bit, while doesn't increase resulting file sizes according to synthetic tests.	2024-01-23 17:46:50 +02:00
Aliaksandr Valialkin	6c214397ed	lib/storage: compress metricIDs, which match the given filters, before storing them in tagFiltersToMetricIDsCache This allows reducing the indexdb/tagFiltersToMetricIDs cache size by 8 on average. The cache size can be checked via vm_cache_size_bytes{type="indexdb/tagFiltersToMetricIDs"} metric exposed at /metrics page.	2024-01-23 16:09:55 +02:00
Aliaksandr Valialkin	4d78954158	lib/storage: do not sort metricIDs passed to Storage.prefetchMetricNames, since the caller is responsible for the sorting	2024-01-23 16:08:38 +02:00
Aliaksandr Valialkin	3449d563bd	all: add up to 10% random jitter to the interval between periodic tasks performed by various components This should smooth CPU and RAM usage spikes related to these periodic tasks, by reducing the probability that multiple concurrent periodic tasks are performed at the same time.	2024-01-22 18:40:32 +02:00
Aliaksandr Valialkin	9b4294e53e	lib/storage: reduce the contention on dateMetricIDCache mutex when new time series are registered at high rate The dateMetricIDCache puts recently registered (date, metricID) entries into mutable cache protected by the mutex. The dateMetricIDCache.Has() checks for the entry in the mutable cache when it isn't found in the immutable cache. Access to the mutable cache is protected by the mutex. This means this access is slow on systems with many CPU cores. The mutabe cache was merged into immutable cache every 10 seconds in order to avoid slow access to mutable cache. This means that ingestion of new time series to VictoriaMetrics could result in significant slowdown for up to 10 seconds because of bottleneck at the mutex. Fix this by merging the mutable cache into immutable cache after len(cacheItems) / 2 cache hits under the mutex, e.g. when the entry is found in the mutable cache. This should automatically adjust intervals between merges depending on the addition rate for new time series (aka churn rate): - The interval will be much smaller than 10 seconds under high churn rate. This should reduce the mutex contention for mutable cache. - The interval will be bigger than 10 seconds under low churn rate. This should reduce the uneeded work on merging of mutable cache into immutable cache.	2024-01-22 18:40:32 +02:00
Aliaksandr Valialkin	1c7f990fad	app/vmselect: handle negative time range start in a generic manner inside NewSearchQuery() This is a follow-up for `cf03e11d89` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5553 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5630	2024-01-21 23:45:31 +02:00
Aliaksandr Valialkin	0b2ea1a7c7	all: call atomic.Load* in front of atomic.CompareAndSwap* at places where the atomic.CompareAndSwap* returns false most of the time This allows avoiding slow inter-CPU synchornization induced by atomic.CompareAndSwap*	2024-01-21 14:04:54 +02:00
Aliaksandr Valialkin	90768aa418	lib/storage/partition.go: remove misleading comment, which falsely states that inmemoryParts isn't visible to search Thanks to @satjd for raising attention to this comment at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5410	2024-01-21 04:49:35 +02:00
Aliaksandr Valialkin	5bdf62de5b	lib/storage: do not prefetch metric names for small number of metricIDs This eliminates prefetchedMetricIDsLock lock contention for queries, which return less than 500 time series. This is a follow-up for `9d886a2eb0`	2024-01-17 13:48:15 +02:00
Aliaksandr Valialkin	9d886a2eb0	lib/storage: follow-up for `4b8088e377` - Clarify the bugfix description at docs/CHANGELOG.md - Simplify the code by accessing prefetchedMetricIDs struct under the lock instead of using lockless access to immutable struct. This shouldn't worsen code scalability too much on busy systems with many CPU cores, since the code executed under the lock is quite small and fast. This allows removing cloning of prefetchedMetricIDs struct every time new metric names are pre-fetched. This should reduce load on Go GC, since the cloning of uin64set.Set struct allocates many new objects.	2024-01-16 15:29:57 +02:00
Roman Khavronenko	4b8088e377	lib/storage: properly check for `storage/prefetchedMetricIDs` cache expiration deadline (#5607 ) Before, this cache was limited only by size. Cache invalidation by time happens with jitter to prevent thundering herd problem. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-01-15 10:03:06 +01:00
Aliaksandr Valialkin	f2229c2e42	lib/prompb: change type of Label.Name and Label.Value from []byte to string This makes it more consistent with lib/prompbmarshal.Label	2024-01-14 22:33:21 +02:00
Artem Navoiev	d374595e31	docs: mention staleNaN handling during deduplication See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5587 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2024-01-11 11:53:58 +01:00
hagen1778	d493da562e	lib/storage: fix typo Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-11-21 11:20:43 +01:00
hagen1778	e96b4410a1	lib/storage: fix typo Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-11-21 10:52:53 +01:00
Aliaksandr Valialkin	230230cf0b	lib/logger: add `-loggerMaxArgLen` command-line flag for fine-tuning the maximum length of logged args	2023-11-11 12:30:08 +01:00
Aliaksandr Valialkin	7ac49162c6	lib/storage: follow-up for `29cebd82fb` Use atomic.CompareAndSwapUint32() instead of atomic.LoadUint32() followed by atomic.StoreUint32(). This makes the code more clear. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5159	2023-10-31 16:08:54 +01:00
Roman Khavronenko	29cebd82fb	lib/storage: log warning about RO mode only on state change (#5191 ) Before, vmstorage would log the same message each second producing excessive amount of logs. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5159 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-10-30 10:52:57 +01:00
Aliaksandr Valialkin	42dd71bb63	all: consistently use %w instead of %s in when error is passed to fmt.Errorf() This allows consistently using errors.Is() for verifying whether the given error wraps some other known error.	2023-10-25 21:24:03 +02:00
hagen1778	fd2d07ba33	lib/storage: follow-up after `188cfe3a85` `188cfe3a85` See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5159 Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-10-17 15:45:14 +02:00
Ilya Trefilov	188cfe3a85	lib/storage: do not create tsid if metric contains stale marker(#5069 ) (#5174 ) https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5069	2023-10-17 15:30:58 +02:00
Haleygo	8b6ccad41d	fix ingesting stale point, follow up `fe8cc573d1` (#5179 )	2023-10-16 09:05:37 +02:00
hagen1778	fe8cc573d1	docs: remove extra `/` in the end of the link Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-10-14 07:43:40 +02:00
Aliaksandr Valialkin	d41841c0c9	lib/{mergeset,storage}: consistently reset isInMerge field in parts passed to mergeParts() before returning from the function While at it consistently check that the isInMerge field is set in all the parts passed to mergeParts()	2023-10-02 08:05:29 +02:00
Aliaksandr Valialkin	3ca6fea858	lib/{mergeset,storage}: perform at most one assisted merge per each call to addRows/addItems This should reduce tail latency during data ingestion. This shouldn't slow down data ingestion in the worst case, since assisted merges are spread among distinct addRows/addItems calls after this change.	2023-10-01 22:19:46 +02:00
Aliaksandr Valialkin	223ef96198	lib/storage: remove unused atomicSetBool function after `717c53af27`	2023-09-25 17:37:24 +02:00
Aliaksandr Valialkin	15dfd94f3b	lib/storage: make it clear that the number of big merge workers always equals to 4 See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4915#issuecomment-1733922830	2023-09-25 17:15:45 +02:00
Aliaksandr Valialkin	717c53af27	lib/storage: stop exposing vm_merge_need_free_disk_space metric This metric confuses users and has no any useful information. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/686#issuecomment-1733844128	2023-09-25 16:52:39 +02:00
Aliaksandr Valialkin	3140ef7261	lib/storage: log fatal error inside searchMetricName() instead of propagating it to the caller This simplifies the code a bit at searchMetricName() and searchMetricNameWithCache() call sites This is a result of investigating https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4972	2023-09-22 11:41:06 +02:00
Zakhar Bessarab	bea3431ed1	lib/storage/partition: add check to ensure parts exist on disk (#5017 ) * lib/storage/partition: add check to ensure parts exist on disk If part exists in parts.json but is missing on disk there will be a misleading error similar to "unexpected number of substrings in the part name". This change forces verification of part existence and throws a correct error in case it is missing on disk. Such issue can be result of https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5005 or disk corruption. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/storage/partition: use filepath.Join instead of string concatenation Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/storage/partition: add action points for error message Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * all: add a check for missing part in lib/mergeset and lib/logstorage --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-09-19 11:17:41 +02:00
faceair	b6ad581b45	lib/storage: remove ForceMergeAllParts internal loop (#4999 ) Signed-off-by: faceair <git@faceair.me>	2023-09-15 19:04:54 +02:00
Aliaksandr Valialkin	a09c680170	lib/storage: handle fatal errors inside indexSearch.getTSIDByMetricID() instead of returning them to the caller This simplifies the code a bit at caller side	2023-09-15 11:55:42 +02:00
Dima Lazerka	0c7d46d637	flagutil: Make .Msecs private (#4906 ) * Introduce flagutil.Duration To avoid conversion bugs * Fix tests * Clarify documentation re. month=31 days * Add fasttime.UnixTime() to obtain time.Time The goal is to refactor out the last usage of `.Msecs`. * Use fasttime for time.Now() * wip - Remove fasttime.UnixTime(), since it doesn't improve code readability and maintainability - Run `make docs-sync` for syncing changes from README.md to docs/ folder - Make lib/flagutil.Duration.Msec private - Rename msecsPerMonth const to msecsPer31Days in order to be consistent with retention31Days --------- Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-09-03 10:33:37 +02:00

1 2 3 4 5 ...

726 commits