github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-12-11 14:53:49 +00:00

Author	SHA1	Message	Date
Zakhar Bessarab	517bd9392c	lib/storage/partition: prevent panic in case resulting in-memory part is empty after merge (#7329 ) It is possible for in-memory part to be empty if ingested samples are removed by retention filters. In this case, data will not be discarded due to retention before creating in memory part. After in-memory parts merge samples will be removed resulting in creating completely empty part at destination. This commit checks for resulting part and skips it, if it's empty. --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2024-10-27 20:42:42 +01:00
Aliaksandr Valialkin	d2a825279b	Revert "refactor(vmstorage): Refactor the code to reduce the time complexity of `MustAddRows` and improve readability (#6629 )" This reverts commit `e280d90e9a`. Reason for revert: the updated code doesn't improve the performance of table.MustAddRows for the typical case when rows contain timestamps belonging to ptws[0]. The performance may be improved in theory for the case when all the rows belong to partiton other than ptws[0], but this partition is automatically moved to ptws[0] by the code at lines `6aad1d43e9/lib/storage/table.go (L287-L298)` , so the next time the typical case will work. Also the updated code makes the code harder to follow, since it introduces an additional level of indirection with non-trivial semantics inside table.MustAddRows - the partition.TimeRangeInPartition() function. This function needs to be inspected and understood when reading the code at table.MustAddRows(). This function depends on minTsInRows and maxTsInRows vars, which are defined and initialized many lines above the partition.TimeRangeInPartition() call. This complicates reading and understanding the code even more. The previous code was using clearer loop over rows with the clear call to partition.HasTimestamp() for every timestamp in the row. The partition.HasTimestamp() call is used in the table.MustAddRows() function multiple times. This makes the use of partition.HasTimestamp() call more consistent, easier to understand and easier to maintain comparing to the mix of partition.HasTimestamp() and partition.TimeRangeInPartition() calls. Aslo, there is no need in documenting some hardcore software engineering refactoring at docs/CHANGLELOG.md, since the docs/CHANGELOG.md is intended for VictoriaMetrics users, who may not know software engineering. The docs/CHANGELOG.md must document user-visible changes, and the docs must be concise and clear for VictoriaMetrics users. See https://docs.victoriametrics.com/contributing/#pull-request-checklist for more details. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6629	2024-07-25 14:43:00 +02:00
Ruixiang Tan	8e2ff15203	refactor(vmstorage): Refactor the code to reduce the time complexity of `MustAddRows` and improve readability (#6629 ) ### Describe Your Changes The original logic is not only highly complex but also poorly readable, so it can be modified to increase readability and reduce time complexity. --------- Co-authored-by: Zhu Jiekun <jiekun@victoriametrics.com>	2024-07-25 13:52:54 +02:00
Aliaksandr Valialkin	4878152678	lib/{storage,mergeset}: do not allow setting dataFlushInterval to values smaller than pending{Items,Rows}FlushInterval Pending rows and items unconditionally remain in memory for up to pending{Items,Rows}FlushInterval, so there is no any sense in setting dataFlushInterval (the interval for guaranteed flush of in-memory data to disk) to values smaller than pending{Items,Rows}FlushInterval, since this doesn't affect the interval for flushing pending rows and items from memory to disk. This is a follow-up for `4c80b17027` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6221	2024-07-15 10:11:23 +02:00
Hui Wang	ec56f4625e	storage: correctly apply `-inmemoryDataFlushInterval` when it's set t… (#6221 ) …o minimum supported value 1s pendingRowsFlushInterval was bumped to 2s in `73f0a805e2` (cherry picked from commit `4c80b17027`)	2024-05-13 16:50:02 +02:00
Aliaksandr Valialkin	9607902289	lib/storage: remove outdated misleading comments	2024-05-12 10:25:06 +02:00
Aliaksandr Valialkin	5320cc3198	lib/{mergeset,storage}: log deleting directories inside partitions if they are missing in parts.json This should improve debuggability of unexpected deletion of directories inside partitions. While at it, log the proper path to parts.json when the directory for big part is missing in the partition. parts.json is located inside directory with small parts, and there is no parts.json file inside directory with big parts.	2024-04-17 12:00:10 +02:00
Aliaksandr Valialkin	1ffad3a182	lib/storage: consistently use stopCh instead of stop	2024-04-03 02:54:51 +03:00
Aliaksandr Valialkin	b6d1d6982e	lib/storage/partition.go: reduce code difference a bit with enterprise branch	2024-04-03 02:36:49 +03:00
Nikolay	c457f7de69	lib/storage: adds metrics for downsampling (#382 ) * lib/storage: adds metrics for downsampling vm_downsampling_partitions_scheduled - shows the number of parts, that must be downsampled vm_downsampling_partitions_scheduled_size_bytes - shows total size in bytes for parts, the must be donwsampled These two metrics answer the questions - is downsampling running? how many parts scheduled for downsampling and how many of them currently downsampled? Storage space that it occupies. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2612 * wip Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2024-04-03 02:36:05 +03:00
Zakhar Bessarab	7c1ee69205	lib/storage/table: wait for merges to be completed when closing a table (#5965 ) * lib/storage/table: properly wait for force merges to be completed during shutdown Properly keep track of running background merges and wait for merges completion when closing the table. Previously, force merge was not in sync with overall storage shutdown which could lead to holding ptw ref. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * docs: add changelog entry Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>	2024-04-02 21:25:30 +03:00
Aliaksandr Valialkin	a1baf25c2e	lib/storage: consistently use atomic.* types instead of atomic.* function calls on ordinary types See `ea9e2b19a5`	2024-02-24 00:33:07 +02:00
Aliaksandr Valialkin	d0538d11d3	lib/mergeset: consistently use atomic.* types instead of atomic.* function calls on ordinary types See `ea9e2b19a5`	2024-02-24 00:29:12 +02:00
Aliaksandr Valialkin	e7dfcdfff6	lib/storage: consistently use atomic.* type for refCount and mustDrop fields in indexDB, table and partition structs See `ea9e2b19a5`	2024-02-24 00:26:26 +02:00
Aliaksandr Valialkin	1eb3346ecc	lib/{storage,mergeset}: properly fix 'unaligned 64-bit atomic operation' panic on 32-bit architectures The issue has been introduced in `bace9a2501` The improper fix was in the `d4c0615dcd` , since it fixed the issue just by an accident, because Go comiler aligned the rawRowsShards field by 4-byte boundary inside partition struct. The proper fix is to use atomic.Int64 field - this guarantees that the access to this field won't result in unaligned 64-bit atomic operation. See https://github.com/golang/go/issues/50860 and https://github.com/golang/go/issues/19057	2024-02-24 00:25:08 +02:00
hagen1778	ab4fae9dc2	lib/storage: cleanup after `d4c0615dcd` Signed-off-by: hagen1778 <roman@victoriametrics.com> (cherry picked from commit `c8d1d2ab72`)	2024-02-23 18:55:40 +01:00
Dmytro Kozlov	eb22083924	lib/storage: fix aligning (#5860 ) (cherry picked from commit `d4c0615dcd`)	2024-02-23 18:55:39 +01:00
Aliaksandr Valialkin	19032f9913	lib/{mergeset,storage}: convert bufferred items to searchable parts more optimally Do not convert shard items to part when a shard becomes full. Instead, collect multiple full shards and then convert them to a searchable part at once. This reduces the number of searchable parts, which, in turn, should increase query performance, since queries need to scan smaller number of parts.	2024-02-23 01:21:03 +02:00
Aliaksandr Valialkin	08c5250a7b	lib/storage: handle common case when the number of rows passed to flushRowsToInmemoryParts() doesnt exceed maxRawRowsPerShard	2024-02-23 01:12:18 +02:00
Aliaksandr Valialkin	8669584e9f	lib/{storage,mergeset}: convert beffered items into searchable in-memory parts exactly once per the given flush interval Previously the interval between item addition and its conversion to searchable in-memory part could vary significantly because of too coarse per-second precision. Switch from fasttime.UnixTimestamp() to time.Now().UnixMilli() for millisecond precision. It is OK to use time.Now() for tracking the time when buffered items must be converted to searchable in-memory parts, since time.Now() calls aren't located in hot paths. Increase the flush interval for converting buffered samples to searchable in-memory parts from one second to two seconds. This should reduce the number of blocks, which are needed to be processed during high-frequency alerting queries. This, in turn, should reduce CPU usage. While at it, hardcode the maximum size of rawRows shard to 8Mb, since this size gives the optimal data ingestion pefromance according to load tests. This reduces memory usage and CPU usage on systems with big amounts of RAM under high data ingestion rate.	2024-02-23 01:11:57 +02:00
Aliaksandr Valialkin	3f9022bc08	lib/storage: do not pool rawRowsBlock when flushing rawRows to in-memory blocks The pooled rawRowsBlock objects occupies big amounts of memory between flushes, and the flushes are relatively rare. So it is better to don't use the pool and to allocate rawRow blocks on demand. This should reduce the average memory usage between flushes.	2024-02-23 01:06:28 +02:00
Aliaksandr Valialkin	bf07e2ac87	lib/storage: do not keep rawRows buffer across flush() calls The buffer can be quite big under high ingestion rate (e.g. more than 100MB). This leads to increased memory usage between buffer flushes. So it is better to re-create the buffer on every flush in order to reduce memory usage between buffer flushes.	2024-02-23 01:06:09 +02:00
Aliaksandr Valialkin	7a8b92b590	lib/{mergeset,storage}: make background merge more responsive and scalable - Maintain a separate worker pool per each part type (in-memory, file, big and small). Previously a shared pool was used for merging all the part types. A single merge worker could merge parts with mixed types at once. For example, it could merge simultaneously an in-memory part plus a big file part. Such a merge could take hours for big file part. During the duration of this merge the in-memory part was pinned in memory and couldn't be persisted to disk under the configured -inmemoryDataFlushInterval . Another common issue, which could happen when parts with mixed types are merged, is uncontrolled growth of in-memory parts or small parts when all the merge workers were busy with merging big files. Such growth could lead to significant performance degradataion for queries, since every query needs to check ever growing list of parts. This could also slow down the registration of new time series, since VictoriaMetrics searches for the internal series_id in the indexdb for every new time series. The third issue is graceful shutdown duration, which could be very long when a background merge is running on in-memory parts plus big file parts. This merge couldn't be interrupted, since it merges in-memory parts. A separate pool of merge workers per every part type elegantly resolves both issues: - In-memory parts are merged to file-based parts in a timely manner, since the maximum size of in-memory parts is limited. - Long-running merges for big parts do not block merges for in-memory parts and small parts. - Graceful shutdown duration is now limited by the time needed for flushing in-memory parts to files. Merging for file parts is instantly canceled on graceful shutdown now. - Deprecate -smallMergeConcurrency command-line flag, since the new background merge algorithm should automatically self-tune according to the number of available CPU cores. - Deprecate -finalMergeDelay command-line flag, since it wasn't working correctly. It is better to run forced merge when needed - https://docs.victoriametrics.com/#forced-merge - Tune the number of shards for pending rows and items before the data goes to in-memory parts and becomes visible for search. This improves the maximum data ingestion rate and the maximum rate for registration of new time series. This should reduce the duration of data ingestion slowdown in VictoriaMetrics cluster on e.g. re-routing events, when some of vmstorage nodes become temporarily unavailable. - Prevent from possible "sync: WaitGroup misuse" panic on graceful shutdown. This is a follow-up for `fa566c68a6` . Thanks @misutoth to for the inspiration at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5190 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3790 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3337 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3425 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3647 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3641 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291	2024-01-26 22:19:52 +01:00
Aliaksandr Valialkin	0715f1efcd	lib/storage: rename AssistedMerges to AssistedMergesCount in order to make these field names less misleading These fields are counters, not gauges, so adding Count suffix to them makes easier to understand this while reading the code	2024-01-25 10:21:13 +02:00
Aliaksandr Valialkin	3199558da9	lib/{storage,mergeset}: reduce the maxium compression level for the stored data This reduces CPU usage a bit, while doesn't increase resulting file sizes according to synthetic tests.	2024-01-23 17:47:40 +02:00
Aliaksandr Valialkin	d52fd73f18	all: add up to 10% random jitter to the interval between periodic tasks performed by various components This should smooth CPU and RAM usage spikes related to these periodic tasks, by reducing the probability that multiple concurrent periodic tasks are performed at the same time.	2024-01-22 18:39:16 +02:00
Aliaksandr Valialkin	2f94bef59c	lib/storage/partition.go: remove misleading comment, which falsely states that inmemoryParts isn't visible to search Thanks to @satjd for raising attention to this comment at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5410	2024-01-22 01:11:36 +02:00
hagen1778	91e365acb6	lib/storage: fix typo Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-11-21 12:10:34 +02:00
Aliaksandr Valialkin	f55d114785	lib/{mergeset,storage}: consistently reset isInMerge field in parts passed to mergeParts() before returning from the function While at it consistently check that the isInMerge field is set in all the parts passed to mergeParts()	2023-10-02 20:34:52 +02:00
Aliaksandr Valialkin	8b1d6b995e	lib/{mergeset,storage}: perform at most one assisted merge per each call to addRows/addItems This should reduce tail latency during data ingestion. This shouldn't slow down data ingestion in the worst case, since assisted merges are spread among distinct addRows/addItems calls after this change.	2023-10-02 20:33:51 +02:00
Aliaksandr Valialkin	9ae92ff2ee	lib/storage: remove unused atomicSetBool function after `717c53af27`	2023-09-25 17:37:45 +02:00
Aliaksandr Valialkin	60fe63df07	lib/storage: make it clear that the number of big merge workers always equals to 4 See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4915#issuecomment-1733922830	2023-09-25 17:17:40 +02:00
Aliaksandr Valialkin	a421db5977	lib/storage: stop exposing vm_merge_need_free_disk_space metric This metric confuses users and has no any useful information. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/686#issuecomment-1733844128	2023-09-25 17:00:14 +02:00
Zakhar Bessarab	47d9e82b52	lib/storage/partition: add check to ensure parts exist on disk (#5017 ) * lib/storage/partition: add check to ensure parts exist on disk If part exists in parts.json but is missing on disk there will be a misleading error similar to "unexpected number of substrings in the part name". This change forces verification of part existence and throws a correct error in case it is missing on disk. Such issue can be result of https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5005 or disk corruption. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/storage/partition: use filepath.Join instead of string concatenation Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/storage/partition: add action points for error message Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * all: add a check for missing part in lib/mergeset and lib/logstorage --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-09-19 11:18:21 +02:00
faceair	609c76eec9	lib/storage: remove ForceMergeAllParts internal loop (#4999 ) Signed-off-by: faceair <git@faceair.me>	2023-09-18 16:35:37 +02:00
Aliaksandr Valialkin	d8afd7fe98	Makefile: update golangci-lint from v1.51.2 to v1.54.2 See https://github.com/golangci/golangci-lint/releases/tag/v1.54.2	2023-09-01 10:25:49 +02:00
Aliaksandr Valialkin	eea088d87f	docs/CHANGELOG.md: clarify description for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4336 bugfix This is a follow-up for `5eb5df96e2`	2023-07-06 22:42:02 -07:00
Nikolay	dd7ebd6779	lib/storage: creates parts.json on start-up if it not exists. (#4450 ) * lib/storage: creates parts.json on start-up if it not exists. It fixes migrations from versions below v1.90.0. Previously parts.json was created only after successful merge. But if merge was interruped for some reason (OOM or shutdown), parts.json wasn't created and partitions left after interruped merge weren't properly deleted. Since VM cannot check if it must be removed or not. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4336 * Apply suggestions from code review Co-authored-by: Roman Khavronenko <roman@victoriametrics.com> * Update lib/storage/partition.go Co-authored-by: Roman Khavronenko <roman@victoriametrics.com> --------- Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>	2023-07-06 17:10:26 -07:00
Roman Khavronenko	09c05608f2	lib/storage: add comment for how `mustBeDeleted` field should be used (#4454 ) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-07-06 17:02:44 -07:00
Aliaksandr Valialkin	bc98ea9a8d	lib/storage: reduce the unimportant logging during Storage start / stop This should improve the visibility of potentially important logs	2023-05-16 15:32:35 -07:00
Aliaksandr Valialkin	8b15f93426	lib/{mergeset,storage}: make mustReadPartNames() code more clear	2023-04-14 23:17:08 -07:00
Aliaksandr Valialkin	d739511f5b	lib/storage: replace OpenStorage() with MustOpenStorage() Callers of OpenStorage() log the returned error and exit. The error logging and exit can be performed inside MustOpenStorage() alongside with printing the stack trace for better debuggability. This simplifies the code at caller side.	2023-04-14 23:04:42 -07:00
Aliaksandr Valialkin	cf4701db65	lib/fs: add MustReadDir() function Use fs.MustReadDir() instead of os.ReadDir() across the code in order to reduce the code verbosity. The fs.MustReadDir() logs the error with the directory name and the call stack on error before exit. This information should be enough for debugging the cause of the error.	2023-04-14 22:11:40 -07:00
Aliaksandr Valialkin	0a11c46cd2	lib/storage: validate rows in partition.AddRows() only during tests	2023-04-14 20:53:05 -07:00
Aliaksandr Valialkin	e2de5bf763	lib/{storage,mergeset}: convert InitFromFilePart to MustInitFromFilePart Callers of InitFromFilePart log the error and exit. It is better to log the error with the path to the part and the call stack directly inside the MustInitFromFilePart() function. This simplifies the code at callers' side while leaving the same level of debuggability.	2023-04-14 15:47:20 -07:00
Aliaksandr Valialkin	df99965564	lib/filestream: change Create() to MustCreate() Callers of this function log the returned error and exit. It is better logging the error together with the path to the filename and call stack directly inside the function. This simplifies the code at callers' side without reducing the level of debuggability	2023-04-14 15:14:24 -07:00
Aliaksandr Valialkin	67df75484f	lib/{mergeset,storage}: remove isInMerge flag from parts only when they werent removed yet from the list of active parts This prevents from possible panic during access to pw.p when it is set to nil at partWrapper.decRef() called inside swapSrcWithDstParts()	2023-04-14 00:16:18 -07:00
Aliaksandr Valialkin	7fb2b14ca0	docs/CHANGELOG.md: run at least 4 background mergers on systems with less than 4 CPU cores This reduces the probability of sudden spike in the number of small parts when all the background mergers are busy with big merges.	2023-04-13 23:37:05 -07:00
Aliaksandr Valialkin	8846ce5f1d	lib/{mergeset,storage}: make sure that getFlushToDiskDeadline() takes into account only in-memory parts	2023-04-13 23:17:24 -07:00
Aliaksandr Valialkin	f75b1b7a53	lib/fs: add Must prefix to CopyDirectory and CopyFile functions Callers of these functions log the returned error and then exit. Let's log the error with the call stack inside the function itself. This simplifies the code at callers' side, while leaving the same level of debuggability in case of errors.	2023-04-13 23:04:37 -07:00

1 2 3 4

189 commits