Commit graph

209 commits

Author SHA1 Message Date
Aliaksandr Valialkin
077f84964a
lib/mergeset: do not store commonPrefix in blockHeader if the block contains only a single item
There is no sense in storing commonPrefix for blockHeader containing only a single item,
since this only increases blockHeader size without any benefits.
2024-02-08 13:47:24 +02:00
Aliaksandr Valialkin
e1bf8440eb
lib/mergeset: prevent from possible too big indexBlockSize panic
This panic could occur when samples with too long label values are ingested into VictoriaMetrics.
This could result in too long fistItem and commonPrefix values at blockHeader (up to 64kb each).
This may inflate the maximum index block size by 4 * maxIndexBlockSize.
2024-02-08 12:54:10 +02:00
Aliaksandr Valialkin
bb7a419cc3
lib/{mergeset,storage}: make background merge more responsive and scalable
- Maintain a separate worker pool per each part type (in-memory, file, big and small).
  Previously a shared pool was used for merging all the part types.
  A single merge worker could merge parts with mixed types at once. For example,
  it could merge simultaneously an in-memory part plus a big file part.
  Such a merge could take hours for big file part. During the duration of this merge
  the in-memory part was pinned in memory and couldn't be persisted to disk
  under the configured -inmemoryDataFlushInterval .

  Another common issue, which could happen when parts with mixed types are merged,
  is uncontrolled growth of in-memory parts or small parts when all the merge workers
  were busy with merging big files. Such growth could lead to significant performance
  degradataion for queries, since every query needs to check ever growing list of parts.
  This could also slow down the registration of new time series, since VictoriaMetrics
  searches for the internal series_id in the indexdb for every new time series.

  The third issue is graceful shutdown duration, which could be very long when a background
  merge is running on in-memory parts plus big file parts. This merge couldn't be interrupted,
  since it merges in-memory parts.

  A separate pool of merge workers per every part type elegantly resolves both issues:
  - In-memory parts are merged to file-based parts in a timely manner, since the maximum
    size of in-memory parts is limited.
  - Long-running merges for big parts do not block merges for in-memory parts and small parts.
  - Graceful shutdown duration is now limited by the time needed for flushing in-memory parts to files.
    Merging for file parts is instantly canceled on graceful shutdown now.

- Deprecate -smallMergeConcurrency command-line flag, since the new background merge algorithm
  should automatically self-tune according to the number of available CPU cores.

- Deprecate -finalMergeDelay command-line flag, since it wasn't working correctly.
  It is better to run forced merge when needed - https://docs.victoriametrics.com/#forced-merge

- Tune the number of shards for pending rows and items before the data goes to in-memory parts
  and becomes visible for search. This improves the maximum data ingestion rate and the maximum rate
  for registration of new time series. This should reduce the duration of data ingestion slowdown
  in VictoriaMetrics cluster on e.g. re-routing events, when some of vmstorage nodes become temporarily
  unavailable.

- Prevent from possible "sync: WaitGroup misuse" panic on graceful shutdown.

This is a follow-up for fa566c68a6 .
Thanks @misutoth to for the inspiration at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5190
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3790
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3337
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3425
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3647
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3641
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291
2024-01-26 22:27:47 +01:00
Aliaksandr Valialkin
e84c877503
lib/mergeset: remove inmemoryBlock pooling, since it wasn't effecitve
This should reduce memory usage a bit when new time series are ingested at high rate (aka high churn rate)
2024-01-26 21:34:57 +01:00
Aliaksandr Valialkin
18df07e824
lib/mergeset: start assisted merge for file parts only if the number of file parts is bigger than maxFileParts
The maxFileParts usage has been accidentally removed in fa566c68a6

While at it, add Count suffix to *AssistedMerges counter names in order to make them less misleading.
Previously their names were falsely suggesting that these are gauges, which show the number of concurrently
executed assisted merges.
2024-01-24 15:08:42 +02:00
Aliaksandr Valialkin
fa566c68a6
lib/mergeset: really limit the number of in-memory parts to 15
It has been appeared that the registration of new time series slows down linearly
with the number of indexdb parts, since VictoriaMetrics needs to check every indexdb part
when it searches for TSID by newly ingested metric name.

The number of in-memory parts grows when new time series are registered
at high rate. The number of in-memory parts grows faster on systems with big number
of CPU cores, because the mergeset maintains per-CPU buffers with newly added entries
for the indexdb, and every such entry is transformed eventually into a separate in-memory part.

The solution has been suggested in https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212
by @misutoth - to limit the number of in-memory parts with buffered channel.
This solution is implemented in this commit. Additionally, this commit merges per-CPU parts
into a single part before adding it to the list of in-memory parts. This reduces CPU load
when searching for TSID by newly ingested metric name.

The https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212 recommends setting the limit on the number
of in-memory parts to 100, but my internal testing shows that much lower limit 15 works with the same efficiency
on a system with 16 CPU cores while reducing memory usage for `indexdb/dataBlocks` cache by up to 50%.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5190
2024-01-24 03:38:12 +02:00
Aliaksandr Valialkin
ae643ef1f1
lib/{storage,mergeset}: reduce the maxium compression level for the stored data
This reduces CPU usage a bit, while doesn't increase resulting file sizes according to synthetic tests.
2024-01-23 17:46:50 +02:00
Aliaksandr Valialkin
1c5163ae51
lib/mergeset: make sure that the first and the last items are in the original range after prepareBlock()
Previously the checks were to strict by requiring to leave the same first and last items by prepareBlock()

Thanks to @ahfuzhang for the suggestion at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5655
2024-01-23 12:58:32 +02:00
Aliaksandr Valialkin
980338861f
lib/mergeset: skip comparison for every item in the block during merge if the last item in the block is smaller than the first item in the next block
Thanks to @ahfuzhang for the suggestion at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5651
2024-01-23 03:15:52 +02:00
Aliaksandr Valialkin
3449d563bd
all: add up to 10% random jitter to the interval between periodic tasks performed by various components
This should smooth CPU and RAM usage spikes related to these periodic tasks,
by reducing the probability that multiple concurrent periodic tasks are performed at the same time.
2024-01-22 18:40:32 +02:00
Aliaksandr Valialkin
0b2ea1a7c7
all: call atomic.Load* in front of atomic.CompareAndSwap* at places where the atomic.CompareAndSwap* returns false most of the time
This allows avoiding slow inter-CPU synchornization induced by atomic.CompareAndSwap*
2024-01-21 14:04:54 +02:00
Aliaksandr Valialkin
d41841c0c9
lib/{mergeset,storage}: consistently reset isInMerge field in parts passed to mergeParts() before returning from the function
While at it consistently check that the isInMerge field is set in all the parts passed to mergeParts()
2023-10-02 08:05:29 +02:00
Aliaksandr Valialkin
3ca6fea858
lib/{mergeset,storage}: perform at most one assisted merge per each call to addRows/addItems
This should reduce tail latency during data ingestion.

This shouldn't slow down data ingestion in the worst case, since assisted merges are spread among
distinct addRows/addItems calls after this change.
2023-10-01 22:19:46 +02:00
Zakhar Bessarab
bea3431ed1
lib/storage/partition: add check to ensure parts exist on disk (#5017)
* lib/storage/partition: add check to ensure parts exist on disk

If part exists in parts.json but is missing on disk there will be a misleading error similar to "unexpected number of substrings in the part name".

This change forces verification of part existence and throws a correct error in case it is missing on disk.

Such issue can be result of https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5005 or disk corruption.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/storage/partition: use filepath.Join instead of string concatenation

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/storage/partition: add action points for error message

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* all: add a check for missing part in lib/mergeset and lib/logstorage

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-09-19 11:17:41 +02:00
Aliaksandr Valialkin
edee262ecc
Makefile: update golangci-lint from v1.51.2 to v1.54.2
See https://github.com/golangci/golangci-lint/releases/tag/v1.54.2
2023-09-01 10:16:42 +02:00
Aliaksandr Valialkin
0c49552849
lib/mergeset: skip common prefix in binarySearchKey() function
This should improve performance a bit when the search if performed among items with long common prefix
2023-07-13 22:04:59 -07:00
Aliaksandr Valialkin
4ba19f6b32
lib/mergeset: simplify fulsuhInmemoryParts() a bit 2023-07-13 12:33:30 -07:00
Aliaksandr Valialkin
152ca00fb8
docs/CHANGELOG.md: clarify description for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4336 bugfix
This is a follow-up for 5eb5df96e2
2023-07-06 17:09:03 -07:00
Aliaksandr Valialkin
298aab3f54
lib/mergeset: do not create flock.lock file at mergeset table, since it is created at the lib/storage.Storage level 2023-06-19 22:45:31 -07:00
Nikolay
5eb5df96e2
lib/storage: creates parts.json on start-up if it not exists. (#4450)
* lib/storage: creates parts.json on start-up if it not exists.
It fixes migrations from versions below v1.90.0.
Previously parts.json was created only after successful merge.
But if merge was interruped for some reason (OOM or shutdown), parts.json wasn't created and partitions left after interruped merge weren't properly deleted.
Since VM cannot check if it must be removed or not.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4336

* Apply suggestions from code review

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

* Update lib/storage/partition.go

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

---------

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2023-06-15 11:19:22 +02:00
Roman Khavronenko
f71cc99a8c
lib/mergeset: add comment for how mustBeDeleted field should be used (#4449)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-06-14 18:13:16 +02:00
Aliaksandr Valialkin
d330c7e6fc
lib/mergeset: remove superflouos logging when opening and closing the Table
The logged messages had little useful info, while they were polluting log output during VictoriaMetrics start/stop
2023-05-16 15:01:25 -07:00
Aliaksandr Valialkin
3cbc0975f6
lib/mergeset: close and open the table before making snapshots at TestTableCreateSnapshotAt()
This gives guarantees that all the in-memory data is written to disk at the snapshot time.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4272
See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4316
2023-05-16 14:55:11 -07:00
Aliaksandr Valialkin
09b403d38a
lib/{mergeset,storage}: make it clear that DebugFlush() doesn't store all the recently ingested data to disk
DebugFlush() makes sure that the recently ingested data becomes visible to search.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4272
2023-05-16 11:50:17 -07:00
Aliaksandr Valialkin
2a4c48c59d
lib/{mergeset,storage}: make mustReadPartNames() code more clear 2023-04-14 23:16:59 -07:00
Aliaksandr Valialkin
3727251910
lib/fs: add MustReadDir() function
Use fs.MustReadDir() instead of os.ReadDir() across the code in order to reduce the code verbosity.
The fs.MustReadDir() logs the error with the directory name and the call stack on error
before exit. This information should be enough for debugging the cause of the error.
2023-04-14 22:10:46 -07:00
Aliaksandr Valialkin
df619bdff0
all: consistently use fs.MustClose() for closing lock files 2023-04-14 20:14:21 -07:00
Aliaksandr Valialkin
2a3b19e1d2
lib/fs: convert CreateFlockFile to MustCreateFlockFile
Callers of CreateFlockFile log the returned err and exit.
It is better to log the error inside the MustCreateFlockFile together with the path
to the specified directory and the call stack. This simplifies
the code at the callers' side while leaving the debuggability at the same level.
2023-04-14 19:50:01 -07:00
Aliaksandr Valialkin
c0b852d50d
lib/{storage,mergeset}: convert InitFromFilePart to MustInitFromFilePart
Callers of InitFromFilePart log the error and exit.
It is better to log the error with the path to the part and the call stack
directly inside the MustInitFromFilePart() function.
This simplifies the code at callers' side while leaving the same level of debuggability.
2023-04-14 15:46:12 -07:00
Aliaksandr Valialkin
9183a439c7
lib/filestream: change Create() to MustCreate()
Callers of this function log the returned error and exit.
It is better logging the error together with the path to the filename
and call stack directly inside the function. This simplifies
the code at callers' side without reducing the level of debuggability
2023-04-14 15:12:48 -07:00
Aliaksandr Valialkin
5eb163a08a
lib/filestream: transform Open() -> MustOpen()
Callers of this function log the returned error and exit.
Let's log the error with the path to the filename and call stack
inside the function. This simplifies the code at callers' side
without reducing the level of debuggability.
2023-04-14 15:03:42 -07:00
Aliaksandr Valialkin
f341b7b3f8
lib/fs: substitute ReadFullData with MustReadData
Callers of ReadFullData() log the error and then exit.
So let's log the error with the path to the filename and the call stack
inside MustReadData(). This simplifies the code at callers' side,
while leaving the debuggability at the same level.
2023-04-14 14:39:29 -07:00
Aliaksandr Valialkin
e0595af2bf
lib/{mergeset,storage}: remove isInMerge flag from parts only when they werent removed yet from the list of active parts
This prevents from possible panic during access to pw.p when it is set to nil at partWrapper.decRef() called inside swapSrcWithDstParts()
2023-04-14 00:08:11 -07:00
Aliaksandr Valialkin
9f8209d593
docs/CHANGELOG.md: run at least 4 background mergers on systems with less than 4 CPU cores
This reduces the probability of sudden spike in the number of small parts when all the background mergers
are busy with big merges.
2023-04-13 23:43:17 -07:00
Aliaksandr Valialkin
550d5c7ea4
lib/{mergeset,storage}: make sure that getFlushToDiskDeadline() takes into account only in-memory parts 2023-04-13 23:43:17 -07:00
Aliaksandr Valialkin
5f487ed996
lib/fs: rename HardLinkFiles to MustHardLinkFiles
Callers of this function log the returned error and then exit.
Let's log the error with the call stack inside the function itself.
This simplifies the code at callers' side, while leaving the same
level of debuggability in case of errors.
2023-04-13 22:48:07 -07:00
Aliaksandr Valialkin
30425ca81a
lib/fs: rename WriteFileAtomically to MustWriteAtomic
Callers of this function log the returned error and exit.
So let's just log the error with the given filepath and the call stack
inside the function itself and then exit. This simplifies the code
at callers' place while leaves the same level of debuggability in case of errors.
2023-04-13 22:41:15 -07:00
Aliaksandr Valialkin
036a7b7365
lib/fs: replace MkdirAllIfNotExist->MustMkdirIfNotExist and MkdirAllFailIfExist->MustMkdirFailIfExist
Callers of these functions log the returned error and then exit. The returned error already contains the path
to directory, which was failed to be created. So let's just log the error together with the call stack
inside these functions. This leaves the debuggability of the returned error at the same level
while allows simplifying the code at callers' side.

While at it, properly use MustMkdirFailIfExist instead of MustMkdirIfNotExist inside inmemoryPart.MustStoreToDisk().
It is expected that the inmemoryPart.MustStoreToDick() must fail if there is already a directory under the given path.
2023-04-13 22:11:59 -07:00
Aliaksandr Valialkin
344209e5e6
lib/fs: rename MustWriteFileAndSync to MustWriteSync in order to improve readability a bit
This is a follow-up for 2a8395be05
2023-04-13 21:43:32 -07:00
Aliaksandr Valialkin
b15c5961ab
lib/{mergeset,storage}: remove unused path field from blockStreamWriter
This is a follow-up after 42bba64aa7
2023-04-13 21:39:59 -07:00
Aliaksandr Valialkin
2a8395be05
lib/fs: replace WriteFileAndSync with MustWriteAndSync
When WriteFileAndSync fails, then the caller eventually logs the error message
and exits. The error message returned by WriteFileAndSync already contains the path
to the file, which couldn't be created. This information alongside the call stack
is enough for debugging the issue. So just use log.Panicf("FATAL: ...") inside MustWriteAndSync().
This simplifies error handling at caller side a bit.
2023-04-13 21:33:19 -07:00
Aliaksandr Valialkin
25f089de9d
lib/{mergeset,storage}: properly fsync part directory listing after writing in-memory part to disk
This is a follow-up after 42bba64aa7

Previously the part directory listing was fsync'ed implicitly inside partHeader.WriteMetadata()
by calling fs.WriteFileAtomically(). Now it must be fsync'ed explicitly.

There is no need in fsync'ing the parent directory, since it is fsync'ed by the caller
when updating parts.json file.
2023-04-13 21:19:04 -07:00
Aliaksandr Valialkin
42bba64aa7
lib/{mergeset,storage}: explicitly fsync the created part directory listing
Previously the created part directory listing was fsynced implicitly
when storing metadata.json file in it.

Also remove superflouous fsync for part directory listing,
which was called at blockStreamWriter.MustClose().
After that the metadata.json file is created, so an additional fsync
for the directory contents is needed.
2023-04-13 21:03:08 -07:00
Aliaksandr Valialkin
c8f2febaa1
lib/storage: consistently use OS-independent separator in file paths
This is needed for Windows support, which uses `\` instead of `/` as file separator

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70
2023-03-25 14:33:58 -07:00
Aliaksandr Valialkin
36bbdd7d4b
lib/mergeset: consistently use OS-independent separator in file paths
This is needed for Windows support, which uses `\` instead of `/` as file separator

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70
2023-03-25 13:39:41 -07:00
Aliaksandr Valialkin
43b24164ef
all: add Windows build for VictoriaMetrics
This commit changes background merge algorithm, so it becomes compatible with Windows file semantics.

The previous algorithm for background merge:

1. Merge source parts into a destination part inside tmp directory.
2. Create a file in txn directory with instructions on how to atomically
   swap source parts with the destination part.
3. Perform instructions from the file.
4. Delete the file with instructions.

This algorithm guarantees that either source parts or destination part
is visible in the partition after unclean shutdown at any step above,
since the remaining files with instructions is replayed on the next restart,
after that the remaining contents of the tmp directory is deleted.

Unfortunately this algorithm doesn't work under Windows because
it disallows removing and moving files, which are in use.

So the new algorithm for background merge has been implemented:

1. Merge source parts into a destination part inside the partition directory itself.
   E.g. now the partition directory may contain both complete and incomplete parts.
2. Atomically update the parts.json file with the new list of parts after the merge,
   e.g. remove the source parts from the list and add the destination part to the list
   before storing it to parts.json file.
3. Remove the source parts from disk when they are no longer used.

This algorithm guarantees that either source parts or destination part
is visible in the partition after unclean shutdown at any step above,
since incomplete partitions from step 1 or old source parts from step 3 are removed
on the next startup by inspecting parts.json file.

This algorithm should work under Windows, since it doesn't remove or move files in use.
This algorithm has also the following benefits:

- It should work better for NFS.
- It fits object storage semantics.

The new algorithm changes data storage format, so it is impossible to downgrade
to the previous versions of VictoriaMetrics after upgrading to this algorithm.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3236
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3821
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70
2023-03-19 01:36:51 -07:00
Aliaksandr Valialkin
6460475e3b
lib/{mergeset,storage}: prevent from long wait time when creating a snapshot under high data ingestion rate
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3873
2023-03-19 00:15:30 -07:00
Aliaksandr Valialkin
a26c6628fd
lib/{fs,mergeset,storage}: substitute os.Open()+os.File.Readdir() with os.ReadDir()
This simplifies code a bit
2023-03-17 21:03:37 -07:00
Nikolay
6bfe9cc733
lib{mergset,storage}: prevent possible race condition with logging st… (#3900)
lib{mergset,storage}: prevent possible race condition with logging stats for merges

Previously partwrapper could be release by background process and reference for part may be invalid 
during logging stats. It will lead to panic at vmstorage
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3897
2023-03-03 12:33:42 +01:00
Aliaksandr Valialkin
0d3f31f60e
lib/storage: follow-up for 39cdc546dd
- Use flag.Duration instead of flagutil.Duration for -snapshotCreateTimeout,
  since the flagutil.Duration is intended mostly for big durations, e.g. days, months and years,
  while the -snapshotCreateTimeout is usually smaller than one hour.
- Add links to https://docs.victoriametrics.com/#how-to-work-with-snapshots in docs/CHANGELOG.md,
  so readers could easily find the corresponding docs when reading the changelog.
- Properly remove all the created directories on unsuccessful attempt to create
  snapshot in Storage.CreateSnapshot().

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551
2023-02-27 13:07:38 -08:00