github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-12-01 14:47:38 +00:00

Author	SHA1	Message	Date
Aliaksandr Valialkin	bb7a419cc3	lib/{mergeset,storage}: make background merge more responsive and scalable - Maintain a separate worker pool per each part type (in-memory, file, big and small). Previously a shared pool was used for merging all the part types. A single merge worker could merge parts with mixed types at once. For example, it could merge simultaneously an in-memory part plus a big file part. Such a merge could take hours for big file part. During the duration of this merge the in-memory part was pinned in memory and couldn't be persisted to disk under the configured -inmemoryDataFlushInterval . Another common issue, which could happen when parts with mixed types are merged, is uncontrolled growth of in-memory parts or small parts when all the merge workers were busy with merging big files. Such growth could lead to significant performance degradataion for queries, since every query needs to check ever growing list of parts. This could also slow down the registration of new time series, since VictoriaMetrics searches for the internal series_id in the indexdb for every new time series. The third issue is graceful shutdown duration, which could be very long when a background merge is running on in-memory parts plus big file parts. This merge couldn't be interrupted, since it merges in-memory parts. A separate pool of merge workers per every part type elegantly resolves both issues: - In-memory parts are merged to file-based parts in a timely manner, since the maximum size of in-memory parts is limited. - Long-running merges for big parts do not block merges for in-memory parts and small parts. - Graceful shutdown duration is now limited by the time needed for flushing in-memory parts to files. Merging for file parts is instantly canceled on graceful shutdown now. - Deprecate -smallMergeConcurrency command-line flag, since the new background merge algorithm should automatically self-tune according to the number of available CPU cores. - Deprecate -finalMergeDelay command-line flag, since it wasn't working correctly. It is better to run forced merge when needed - https://docs.victoriametrics.com/#forced-merge - Tune the number of shards for pending rows and items before the data goes to in-memory parts and becomes visible for search. This improves the maximum data ingestion rate and the maximum rate for registration of new time series. This should reduce the duration of data ingestion slowdown in VictoriaMetrics cluster on e.g. re-routing events, when some of vmstorage nodes become temporarily unavailable. - Prevent from possible "sync: WaitGroup misuse" panic on graceful shutdown. This is a follow-up for `fa566c68a6` . Thanks @misutoth to for the inspiration at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5190 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3790 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3337 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3425 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3647 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3641 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291	2024-01-26 22:27:47 +01:00
Aliaksandr Valialkin	c3a585cfe5	lib/storage: rename AssistedMerges to AssistedMergesCount in order to make these field names less misleading These fields are counters, not gauges, so adding Count suffix to them makes easier to understand this while reading the code	2024-01-25 10:19:32 +02:00
Aliaksandr Valialkin	ae643ef1f1	lib/{storage,mergeset}: reduce the maxium compression level for the stored data This reduces CPU usage a bit, while doesn't increase resulting file sizes according to synthetic tests.	2024-01-23 17:46:50 +02:00
Aliaksandr Valialkin	3449d563bd	all: add up to 10% random jitter to the interval between periodic tasks performed by various components This should smooth CPU and RAM usage spikes related to these periodic tasks, by reducing the probability that multiple concurrent periodic tasks are performed at the same time.	2024-01-22 18:40:32 +02:00
Aliaksandr Valialkin	90768aa418	lib/storage/partition.go: remove misleading comment, which falsely states that inmemoryParts isn't visible to search Thanks to @satjd for raising attention to this comment at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5410	2024-01-21 04:49:35 +02:00
hagen1778	e96b4410a1	lib/storage: fix typo Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-11-21 10:52:53 +01:00
Aliaksandr Valialkin	d41841c0c9	lib/{mergeset,storage}: consistently reset isInMerge field in parts passed to mergeParts() before returning from the function While at it consistently check that the isInMerge field is set in all the parts passed to mergeParts()	2023-10-02 08:05:29 +02:00
Aliaksandr Valialkin	3ca6fea858	lib/{mergeset,storage}: perform at most one assisted merge per each call to addRows/addItems This should reduce tail latency during data ingestion. This shouldn't slow down data ingestion in the worst case, since assisted merges are spread among distinct addRows/addItems calls after this change.	2023-10-01 22:19:46 +02:00
Aliaksandr Valialkin	223ef96198	lib/storage: remove unused atomicSetBool function after `717c53af27`	2023-09-25 17:37:24 +02:00
Aliaksandr Valialkin	15dfd94f3b	lib/storage: make it clear that the number of big merge workers always equals to 4 See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4915#issuecomment-1733922830	2023-09-25 17:15:45 +02:00
Aliaksandr Valialkin	717c53af27	lib/storage: stop exposing vm_merge_need_free_disk_space metric This metric confuses users and has no any useful information. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/686#issuecomment-1733844128	2023-09-25 16:52:39 +02:00
Zakhar Bessarab	bea3431ed1	lib/storage/partition: add check to ensure parts exist on disk (#5017 ) * lib/storage/partition: add check to ensure parts exist on disk If part exists in parts.json but is missing on disk there will be a misleading error similar to "unexpected number of substrings in the part name". This change forces verification of part existence and throws a correct error in case it is missing on disk. Such issue can be result of https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5005 or disk corruption. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/storage/partition: use filepath.Join instead of string concatenation Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * lib/storage/partition: add action points for error message Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> * all: add a check for missing part in lib/mergeset and lib/logstorage --------- Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com> Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2023-09-19 11:17:41 +02:00
faceair	b6ad581b45	lib/storage: remove ForceMergeAllParts internal loop (#4999 ) Signed-off-by: faceair <git@faceair.me>	2023-09-15 19:04:54 +02:00
Aliaksandr Valialkin	edee262ecc	Makefile: update golangci-lint from v1.51.2 to v1.54.2 See https://github.com/golangci/golangci-lint/releases/tag/v1.54.2	2023-09-01 10:16:42 +02:00
Aliaksandr Valialkin	152ca00fb8	docs/CHANGELOG.md: clarify description for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4336 bugfix This is a follow-up for `5eb5df96e2`	2023-07-06 17:09:03 -07:00
Nikolay	5eb5df96e2	lib/storage: creates parts.json on start-up if it not exists. (#4450 ) * lib/storage: creates parts.json on start-up if it not exists. It fixes migrations from versions below v1.90.0. Previously parts.json was created only after successful merge. But if merge was interruped for some reason (OOM or shutdown), parts.json wasn't created and partitions left after interruped merge weren't properly deleted. Since VM cannot check if it must be removed or not. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4336 * Apply suggestions from code review Co-authored-by: Roman Khavronenko <roman@victoriametrics.com> * Update lib/storage/partition.go Co-authored-by: Roman Khavronenko <roman@victoriametrics.com> --------- Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>	2023-06-15 11:19:22 +02:00
Roman Khavronenko	f50f35a8e0	lib/storage: add comment for how `mustBeDeleted` field should be used (#4454 ) Signed-off-by: hagen1778 <roman@victoriametrics.com>	2023-06-15 11:17:45 +02:00
Aliaksandr Valialkin	278278af95	lib/storage: reduce the unimportant logging during Storage start / stop This should improve the visibility of potentially important logs	2023-05-16 15:14:21 -07:00
Aliaksandr Valialkin	2a4c48c59d	lib/{mergeset,storage}: make mustReadPartNames() code more clear	2023-04-14 23:16:59 -07:00
Aliaksandr Valialkin	52006149b2	lib/storage: replace OpenStorage() with MustOpenStorage() Callers of OpenStorage() log the returned error and exit. The error logging and exit can be performed inside MustOpenStorage() alongside with printing the stack trace for better debuggability. This simplifies the code at caller side.	2023-04-14 23:02:40 -07:00
Aliaksandr Valialkin	3727251910	lib/fs: add MustReadDir() function Use fs.MustReadDir() instead of os.ReadDir() across the code in order to reduce the code verbosity. The fs.MustReadDir() logs the error with the directory name and the call stack on error before exit. This information should be enough for debugging the cause of the error.	2023-04-14 22:10:46 -07:00
Aliaksandr Valialkin	60d92894c5	lib/storage: validate rows in partition.AddRows() only during tests	2023-04-14 20:52:36 -07:00
Aliaksandr Valialkin	c0b852d50d	lib/{storage,mergeset}: convert InitFromFilePart to MustInitFromFilePart Callers of InitFromFilePart log the error and exit. It is better to log the error with the path to the part and the call stack directly inside the MustInitFromFilePart() function. This simplifies the code at callers' side while leaving the same level of debuggability.	2023-04-14 15:46:12 -07:00
Aliaksandr Valialkin	9183a439c7	lib/filestream: change Create() to MustCreate() Callers of this function log the returned error and exit. It is better logging the error together with the path to the filename and call stack directly inside the function. This simplifies the code at callers' side without reducing the level of debuggability	2023-04-14 15:12:48 -07:00
Aliaksandr Valialkin	e0595af2bf	lib/{mergeset,storage}: remove isInMerge flag from parts only when they werent removed yet from the list of active parts This prevents from possible panic during access to pw.p when it is set to nil at partWrapper.decRef() called inside swapSrcWithDstParts()	2023-04-14 00:08:11 -07:00
Aliaksandr Valialkin	9f8209d593	docs/CHANGELOG.md: run at least 4 background mergers on systems with less than 4 CPU cores This reduces the probability of sudden spike in the number of small parts when all the background mergers are busy with big merges.	2023-04-13 23:43:17 -07:00
Aliaksandr Valialkin	550d5c7ea4	lib/{mergeset,storage}: make sure that getFlushToDiskDeadline() takes into account only in-memory parts	2023-04-13 23:43:17 -07:00
Aliaksandr Valialkin	809fbaeaac	lib/fs: add Must prefix to CopyDirectory and CopyFile functions Callers of these functions log the returned error and then exit. Let's log the error with the call stack inside the function itself. This simplifies the code at callers' side, while leaving the same level of debuggability in case of errors.	2023-04-13 23:02:59 -07:00
Aliaksandr Valialkin	5f487ed996	lib/fs: rename HardLinkFiles to MustHardLinkFiles Callers of this function log the returned error and then exit. Let's log the error with the call stack inside the function itself. This simplifies the code at callers' side, while leaving the same level of debuggability in case of errors.	2023-04-13 22:48:07 -07:00
Aliaksandr Valialkin	30425ca81a	lib/fs: rename WriteFileAtomically to MustWriteAtomic Callers of this function log the returned error and exit. So let's just log the error with the given filepath and the call stack inside the function itself and then exit. This simplifies the code at callers' place while leaves the same level of debuggability in case of errors.	2023-04-13 22:41:15 -07:00
Aliaksandr Valialkin	036a7b7365	lib/fs: replace MkdirAllIfNotExist->MustMkdirIfNotExist and MkdirAllFailIfExist->MustMkdirFailIfExist Callers of these functions log the returned error and then exit. The returned error already contains the path to directory, which was failed to be created. So let's just log the error together with the call stack inside these functions. This leaves the debuggability of the returned error at the same level while allows simplifying the code at callers' side. While at it, properly use MustMkdirFailIfExist instead of MustMkdirIfNotExist inside inmemoryPart.MustStoreToDisk(). It is expected that the inmemoryPart.MustStoreToDick() must fail if there is already a directory under the given path.	2023-04-13 22:11:59 -07:00
Aliaksandr Valialkin	2a8395be05	lib/fs: replace WriteFileAndSync with MustWriteAndSync When WriteFileAndSync fails, then the caller eventually logs the error message and exits. The error message returned by WriteFileAndSync already contains the path to the file, which couldn't be created. This information alongside the call stack is enough for debugging the issue. So just use log.Panicf("FATAL: ...") inside MustWriteAndSync(). This simplifies error handling at caller side a bit.	2023-04-13 21:33:19 -07:00
Aliaksandr Valialkin	42bba64aa7	lib/{mergeset,storage}: explicitly fsync the created part directory listing Previously the created part directory listing was fsynced implicitly when storing metadata.json file in it. Also remove superflouous fsync for part directory listing, which was called at blockStreamWriter.MustClose(). After that the metadata.json file is created, so an additional fsync for the directory contents is needed.	2023-04-13 21:03:08 -07:00
Aliaksandr Valialkin	e1211a1187	app/vmstorage: deprecate -bigMergeConcurrency command-line flag Improperly configured -bigMergeConcurrency command-line flag usually leads to uncontrolled growth of unmerged parts, which, in turn, increases CPU usage and query durations. So it is better deprecating this flag. In rare cases -smallMergeConcurrency command-line flag can be used instead for controlling the concurrency of background merges.	2023-04-13 20:40:24 -07:00
Aliaksandr Valialkin	c8f2febaa1	lib/storage: consistently use OS-independent separator in file paths This is needed for Windows support, which uses `\` instead of `/` as file separator Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70	2023-03-25 14:33:58 -07:00
Aliaksandr Valialkin	b14d96618c	all: follow-up after `34634ec357` - Use windows.FlushFileBuffers() instead of windows.Fsync() at streamTracker.adviseDontNeed() for consistency with implementations for other architectures. - Use filepath.Base() instead of filepath.Split(), since the dir part isn't used. This simplifies the code a bit. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70	2023-03-25 11:57:39 -07:00
Nikolay	34634ec357	lib/fs: adds memory map for windows (#3988 ) This is a follow-up for `43b24164ef` * lib/fs: adds memory map for windows it should improve performance for file reading * lib/storage: replace '/' with os specific separator it must fix an errors for windows * lib/fs: mention windows fsync support * lib/filestream: adds fdatasync for windows writes Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70	2023-03-25 11:43:19 -07:00
Dmytro Kozlov	5c92022cc6	lib/storage: fix collect downsampling metrics (#489 ) * lib/storage: fix downsampling * lib/storage: update logic * lib/storage: fix comments, removed unneeded check	2023-03-19 23:34:46 -07:00
Aliaksandr Valialkin	43b24164ef	all: add Windows build for VictoriaMetrics This commit changes background merge algorithm, so it becomes compatible with Windows file semantics. The previous algorithm for background merge: 1. Merge source parts into a destination part inside tmp directory. 2. Create a file in txn directory with instructions on how to atomically swap source parts with the destination part. 3. Perform instructions from the file. 4. Delete the file with instructions. This algorithm guarantees that either source parts or destination part is visible in the partition after unclean shutdown at any step above, since the remaining files with instructions is replayed on the next restart, after that the remaining contents of the tmp directory is deleted. Unfortunately this algorithm doesn't work under Windows because it disallows removing and moving files, which are in use. So the new algorithm for background merge has been implemented: 1. Merge source parts into a destination part inside the partition directory itself. E.g. now the partition directory may contain both complete and incomplete parts. 2. Atomically update the parts.json file with the new list of parts after the merge, e.g. remove the source parts from the list and add the destination part to the list before storing it to parts.json file. 3. Remove the source parts from disk when they are no longer used. This algorithm guarantees that either source parts or destination part is visible in the partition after unclean shutdown at any step above, since incomplete partitions from step 1 or old source parts from step 3 are removed on the next startup by inspecting parts.json file. This algorithm should work under Windows, since it doesn't remove or move files in use. This algorithm has also the following benefits: - It should work better for NFS. - It fits object storage semantics. The new algorithm changes data storage format, so it is impossible to downgrade to the previous versions of VictoriaMetrics after upgrading to this algorithm. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3236 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3821 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70	2023-03-19 01:36:51 -07:00
Aliaksandr Valialkin	6460475e3b	lib/{mergeset,storage}: prevent from long wait time when creating a snapshot under high data ingestion rate Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3873	2023-03-19 00:15:30 -07:00
Aliaksandr Valialkin	a26c6628fd	lib/{fs,mergeset,storage}: substitute os.Open()+os.File.Readdir() with os.ReadDir() This simplifies code a bit	2023-03-17 21:03:37 -07:00
Nikolay	6bfe9cc733	lib{mergset,storage}: prevent possible race condition with logging st… (#3900 ) lib{mergset,storage}: prevent possible race condition with logging stats for merges Previously partwrapper could be release by background process and reference for part may be invalid during logging stats. It will lead to panic at vmstorage https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3897	2023-03-03 12:33:42 +01:00
Aliaksandr Valialkin	0d3f31f60e	lib/storage: follow-up for `39cdc546dd` - Use flag.Duration instead of flagutil.Duration for -snapshotCreateTimeout, since the flagutil.Duration is intended mostly for big durations, e.g. days, months and years, while the -snapshotCreateTimeout is usually smaller than one hour. - Add links to https://docs.victoriametrics.com/#how-to-work-with-snapshots in docs/CHANGELOG.md, so readers could easily find the corresponding docs when reading the changelog. - Properly remove all the created directories on unsuccessful attempt to create snapshot in Storage.CreateSnapshot(). Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551	2023-02-27 13:07:38 -08:00
Zakhar Bessarab	d8eaa511b0	lib/{fs,mergeset,storage}: skip `.must-remove.` dirs when creating snapshot (#3858 ) (#3867 )	2023-02-24 12:38:42 -08:00
Oleksandr Redko	9fff48c3e3	app,lib: fix typos in comments (#3804 )	2023-02-13 13:27:13 +01:00
Aliaksandr Valialkin	3ec8a4dc80	lib/{mergeset,storage}: allow at least 3 concurrent flushes during background merges on systems with 1 or 2 CPU cores This should prevent from data ingestion slowdown and query performance degradation on systems with small number of CPU cores (1 or 2), when big merge is performed. This should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3790 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3337	2023-02-11 12:08:52 -08:00
Nikolay	9254e494f9	lib/storage: fixes finalDedup for backfilled data (#3737 ) previously historical data backfilling may trigger force merge for previous month every hour it consumes cpu, disk io and decrease cluster performance. Following commit fixes it by applying deduplication for InMemoryParts	2023-02-01 09:54:21 -08:00
Nikolay	465a285324	lib/storage: properly release parts inMerge lock (#3711 ) if storage doesn't have enough disk space, finalDedupWatcher holds inMerge lock for all parts and never release it until storage restart	2023-01-26 08:05:20 -08:00
Aliaksandr Valialkin	2ac530eb28	lib/{storage,mergeset}: wake up background merges as soon as there is a potential work for them Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3647	2023-01-18 01:10:18 -08:00
Aliaksandr Valialkin	b8409d6600	lib/{storage,mergeset}: do not run assisted merges when flushing pending samples to parts Assisted merges are intended to be performed by goroutines, which accept the incoming samples, in order to limit the data ingestion rate. The worker, which converts pending samples to parts, shouldn't be penalized by assisted merges, since this may result in increased number of pending rows as seen at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3647#issuecomment-1385039142 when the assisted merge takes too much time.	2023-01-18 00:20:58 -08:00

1 2 3 4

167 commits