Roman Khavronenko
d59d829cdb
lib/storage: bump max merge concurrency for small parts to 15 ( #2997 )
...
* lib/storage: bump max merge concurrency for small parts to 15
The change is based on the feedback from users on github.
Thier examples show, that limit of 8 sometimes become a
bottleneck. Users report that without limit concurrency
can climb up to 15-20 merges at once.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* Update lib/storage/partition.go
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-08-21 23:32:08 +03:00
Roman Khavronenko
a0e7432e42
lib/storage: prevent excessive loops when storage is in RO ( #2962 )
...
* lib/storage: prevent excessive loops when storage is in RO
Returning nil error when storage is in RO mode results
into excessive loops and function calls which could
result into CPU exhaustion. Returning an err instead
will trigger delays in the for loop and save some resources.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* document the change
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-08-09 12:17:00 +03:00
Aliaksandr Valialkin
134751e43e
all: locate throttled loggers via logger.WithThrottler() only once and then use them
...
This reduces the contention on logThrottlerRegistryMu mutex when logger.WithThrottler()
is called frequently from concurrent goroutines.
2022-06-27 13:45:50 +03:00
Roman Khavronenko
1ee1e986da
lib/storage: limit max mergeConcurrency value for systems with high number of CPUs ( #2673 )
...
Workers count for merges affects the max part size during merges. Such behaviour
protects storage from running out of disk space for scenario when all workers
are merging parts with the max size.
This works very well for most cases. But for systems where high number of CPUs
is allocated for vmstorage components this could significantly impact the max
part size and result in more unmerged parts than expected.
While checking multiple production highly loaded setups it was discovered that
`max_over_time(vm_active_merges{type="storage/big}[1h]}"` rarely exceeds 2,
and `max_over_time(vm_active_merges{type="storage/small}[1h]}"` rarely exceeds 4.
The change in this commit limits the max value for concurrency accordingly.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-06-07 14:55:09 +03:00
Aliaksandr Valialkin
ea06d2fd3c
lib/storage: stop background merge when storage enters read-only mode
...
This should prevent from `no space left on device` errors when VictoriaMetrics
under-estimates the additional disk space needed for background merge.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2603
2022-06-01 14:36:45 +03:00
Aliaksandr Valialkin
57143e9435
lib/storage: increase the number of rawRowsShard shards on systems with more than 4 CPU cores
...
This should improve data ingestion scalability on systems with many CPU cores
2022-04-06 19:49:20 +03:00
Aliaksandr Valialkin
50cf74ce4b
lib/storage: reuse sync.WaitGroup objects
...
This reduces GC load by up to 10% according to memory profiling
2022-04-06 13:34:04 +03:00
Aliaksandr Valialkin
59877d9f32
lib/{mergeset,storage}: tune compression levels for small blocks
...
This should reduce CPU usage spent on compression
2022-02-25 15:33:40 +02:00
Aliaksandr Valialkin
145337792d
lib/{mergeset,storage}: properly limit cache sizes for indexdb
...
Previously these caches could exceed limits set via `-memory.allowedPercent` and/or `-memory.allowedBytes`,
since limits were set independently per each data part. If the number of data parts was big, then limits could be exceeded,
which could result to out of memory errors.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-01-20 18:37:17 +02:00
Aliaksandr Valialkin
ce333f28d8
all: use logger.WithThrottler() where appropriate
2021-12-21 17:03:25 +02:00
Aliaksandr Valialkin
8a7f08ded3
lib/storage: properly update per-part min_dedup_interval
file contents after merge
...
Previously 0s was always written even if -dedup.minScrapeInterval was set to non-zero value
This is a follow-up for 4ff647137a
2021-12-17 20:13:24 +02:00
Aliaksandr Valialkin
4ff647137a
lib/storage: deduplicate samples more thoroughly
...
Previously some duplicate samples may be left on disk for time series with high churn rate.
This may result in higher disk space usage.
2021-12-15 15:59:58 +02:00
Aliaksandr Valialkin
7275ebf91a
app/vmstorage: export vm_cache_size_max_bytes metrics for determining capacity of various caches
...
The vm_cache_size_max_bytes metric can be used for determining caches which reach their capacity via the following query:
vm_cache_size_bytes / vm_cache_size_max_bytes > 0.9
2021-12-02 10:30:43 +02:00
Aliaksandr Valialkin
2fb5a6ca78
lib/storage: do not take into account -storage.minFreeDiskSpaceBytes during background merges
2021-12-01 11:02:36 +02:00
Aliaksandr Valialkin
d666755159
lib/storage: take into account -storage.minFreeDiskSpaceBytes
when performing big merges
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/269
2021-11-30 12:56:35 +02:00
Aliaksandr Valialkin
ffc0ab1774
lib/{mergeset,storage}: improve the detection of the needed free space for background merge
...
This should prevent from possible out of disk space crashes during big merges.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1560
2021-08-25 09:35:44 +03:00
Aliaksandr Valialkin
c2deee9911
lib/storage: yet another attempt to properly determine disk space shortage, which prevents from optimal merges
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1373
2021-07-27 12:04:50 +03:00
Aliaksandr Valialkin
9a83e9018d
lib/storage: properly detect free disk space shortage during data merge
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1373
2021-07-02 17:40:54 +03:00
Aliaksandr Valialkin
dcbc22552f
lib/storage: fix infinite loop introduced in aa9b56a046
2021-06-17 14:28:10 +03:00
Aliaksandr Valialkin
aa9b56a046
lib/{mergeset,storage}: reduce the number of fsync calls on data ingestion path on systems with many cpu cores
...
VictoriaMetrics maintains a buffer per CPU core for the ingested data. These buffers are flushed to disk every second.
These buffers are flushed to disk in parallel starting from the commit 56b6b893ce
.
This resulted in increased write disk IO usage on systems with many cpu cores
as described at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1338#issuecomment-863046999 .
This commit merges the per-CPU buffers into bigger in-memory buffers before flushing them to disk.
This should reduce the rate of fsync syscalls and, consequently, the write disk IO on systems with many CPU cores.
This should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1338
See also https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1244
2021-06-17 13:52:08 +03:00
Aliaksandr Valialkin
69b1482bdb
lib/storage: consistency renaming: getMaxRawRowsPerPartition -> getMaxRawRowsPerShard
2021-06-11 10:57:23 +03:00
Aliaksandr Valialkin
044ab46824
lib/storage: reduce the amounts of memory which can be occupied by rawRow items during data ingestion on a system with many CPU cores
2021-06-11 10:57:23 +03:00
Aliaksandr Valialkin
a4ff4b8e65
lib/storage: allow filling all the rows up to their capacity in rawRowsShard.addRows
...
This should reduce memory usage a bit on data ingestion path
2021-05-24 15:22:59 +03:00
Aliaksandr Valialkin
ec79abc382
lib/{mergeset,storage}: reduce the number of IFNO log messages like merged ... items across ... blocks in ... seconds
...
Log these messages if the merge takes more than 30 seconds instead of 10 seconds.
2021-05-23 14:03:21 +03:00
Nikolay
477369b62f
adds stalePartsRemover ( #1261 )
...
for new created partitions
2021-05-03 11:34:00 +03:00
Aliaksandr Valialkin
87179c6839
lib/{storage,mergeset}: fix unaligned 64-bit atomic operation
panic for 32-bit architectures
...
The panic has been introduced in 56b6b893ce
2021-04-27 16:41:32 +03:00
Aliaksandr Valialkin
56b6b893ce
lib/mergeset: split rows ingestion among multiple shards
...
This improves rows ingestion on systems with many CPU cores by reducing lock contention.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1244
Thanks to @waldoweng for the original idea and draft implementation at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1243
2021-04-27 15:36:34 +03:00
Aliaksandr Valialkin
f89c1f7f49
lib/storage: typo fix in info message when deleting the part outside the configured retention
...
Previously the message was displaying incorrect retention time
2021-04-27 13:32:46 +03:00
Aliaksandr Valialkin
bbebdf9ba1
lib/{storage,mergeset}: remove empty directories on startup. Such directories can be left after unclean shutdown on NFS storage
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1142
2021-04-22 13:02:44 +03:00
Aliaksandr Valialkin
48656dcc38
lib/{mergeset,storage}: allow merging smaller number of small parts
...
While this may increase CPU and disk IO usage needed for background merge,
this also recudes CPU usage during queries in production. This is because
such queries tend to read recently added data and it is better to have lower number
of parts for such data in order to reduce CPU usage.
This partially reverts ebf8da3730
2021-02-21 21:28:36 +02:00
Aliaksandr Valialkin
cb96a1865b
app/vmstorage: export missing vm_cache_size_bytes
metrics for indexdb and data caches
2021-02-09 00:47:00 +02:00
Aliaksandr Valialkin
9dcb18e03d
app/vmstorage: disable final merge by default, since it may result in high disk IO and CPU usage without measurable benefits such as increased query performance and reduced disk space usage
2021-01-08 00:16:05 +02:00
Aliaksandr Valialkin
490c69c64e
lib/storage: wait for pending transactions before closing and dropping the partition
...
This deflakes `make test-full-386` test
2020-12-25 11:45:53 +02:00
Aliaksandr Valialkin
cab7e936a3
lib/storage: physically remove stale parts
...
Previously they were removed from partition struct, but the corresponding directories weren't removed.
This is a follow-up for 46dba00756
2020-12-24 16:51:36 +02:00
Aliaksandr Valialkin
9e4ed5e591
lib/storage: do not remove parts outside the configured retention if they are currently merged
...
These parts are automatically removed after the merge is complete.
2020-12-24 08:51:28 +02:00
Aliaksandr Valialkin
46dba00756
lib/storage: remove stale parts as soon as they go outside the configured retention
...
Previously such parts could remain undeleted for long durations until they are merged with other parts.
This should help for `-retentionPeriod` values smaller than one month.
2020-12-22 19:54:31 +02:00
Aliaksandr Valialkin
d65c03c004
lib/storage: properly determine max rows for output part when merging small parts
2020-12-18 23:14:38 +02:00
Aliaksandr Valialkin
ebf8da3730
lib/{storage,mergeset}: tune background merge process in order to reduce CPU usage and disk IO usage
2020-12-18 20:01:08 +02:00
Aliaksandr Valialkin
4146fc4668
all: properly handle CPU limits set on the host system/container
...
This can reduce memory usage on systems with enabled CPU limits.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/946
2020-12-08 21:07:29 +02:00
Aliaksandr Valialkin
f3f62ab04e
lib/storage: do not report about the need of free disk space if parts cannot be merged due to too big write amplification
2020-11-03 15:32:02 +02:00
Aliaksandr Valialkin
5bfd4e6218
app/vmstorage: support for -retentionPeriod
smaller than one month
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/173
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/17
2020-10-20 14:31:44 +03:00
Aliaksandr Valialkin
af90b3121c
app/vmstorage: add -finalMergeDelay
command-line flag for configuring the delay before final merge for per-month partitions after no new data is ingested to it
2020-10-07 17:35:44 +03:00
Aliaksandr Valialkin
bec9b31b81
lib/storage: allow set values higher than 1 for vm_merge_need_free_disk_space
if there are multiple partitions with deferred merges due to disk space shortage
2020-09-29 22:51:43 +03:00
Aliaksandr Valialkin
a9db81c4ab
app/vmstorage: add metrics for determining whether background merges need additional disk space to complete
...
These metrics are:
* vm_small_merge_need_free_disk_space
* vm_big_merge_need_free_disk_space
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/686
2020-09-29 21:48:33 +03:00
Aliaksandr Valialkin
9739283dad
lib/storage: reduce CPU load for idle VictoriaMetrics by reducing the frequency for the need for background merges
2020-09-21 15:54:11 +03:00
Aliaksandr Valialkin
1f33dd717f
lib/storage: add /internal/force_merge
handler for running forced compactions on historical per-month partitions
...
This may be useful for freeing up storage space after time series deletion.
See https://victoriametrics.github.io/#force-merge for more details.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/686
2020-09-17 12:20:40 +03:00
Aliaksandr Valialkin
8beb0da6ad
lib/{mergeset,storage}: compare errors with errors.Is()
2020-09-17 03:03:02 +03:00
Aliaksandr Valialkin
067d7c1ea1
lib/{mergeset,storage}: code prettifying
2020-09-17 02:06:31 +03:00
Aliaksandr Valialkin
020bd8685e
lib/storage: removed duplicate checks for empty parts during merge - another check is in the beginning of mergeParts functions
2020-09-17 01:49:03 +03:00
Aliaksandr Valialkin
5e71fab8a6
lib/storage: reduce the maximum number of concurrent merge workers to GOMAXPROCS/2
...
Previously the limit has been raised to GOMAXPROCS, but it has been appeared that this
increases query latencies since more CPUs are busy with merges.
While at it, substitute `*MergeConcurrencyLimitCh` channels with simple integer limits.
2020-07-31 17:46:56 +03:00
Aliaksandr Valialkin
e7959094f6
lib/storage: remove prioritizing of merging small parts over merging big parts, since it doesn't work as expected
...
The prioritizing could lead to big merge starvation, which could end up in too big number of parts that must be merged into big parts.
Multiple big merges may be initiated after the migration from v1.39.0 or v1.39.1. It is OK - these merges should be finished soon,
which should return CPU and disk IO usage to normal levels.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/618
2020-07-30 19:57:27 +03:00
Aliaksandr Valialkin
6f05c4d351
lib/storage: improve prioritizing of data ingestion over querying
...
Prioritize also small merges over big merges.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648
2020-07-23 13:23:36 +03:00
Aliaksandr Valialkin
61c611f5ad
lib/storage: properly calculate global metrics in UpdateStats()
2020-07-23 00:35:15 +03:00
Aliaksandr Valialkin
228d137936
lib/storage: reorder mergeBlockStreams() args in order to make them more consistent
2020-07-22 21:58:10 +03:00
Aliaksandr Valialkin
d5dddb0953
all: use %w instead of %s for wrapping errors in fmt.Errorf
...
This will simplify examining the returned errors such as httpserver.ErrorWithStatusCode .
See https://blog.golang.org/go1.13-errors for details.
2020-06-30 23:05:11 +03:00
Tristan Su
ac3700ed1e
lib/storage: set big/small merge concurrency ( #568 )
...
fixed #567
Co-authored-by: Tristan Su <suqing.sq@alibaba-inc.com>
2020-06-19 01:25:48 +03:00
Aliaksandr Valialkin
3d4008263f
lib/fs: optimize MustGetFreeSpace performance by caching the results for up to 2 seconds
2020-06-04 13:15:47 +03:00
Aliaksandr Valialkin
0eacea1de1
lib/{storage,mergeset}: further tuning of compression levels depending on block size
...
This should improve performance for querying newly added data, since it can be unpacked faster.
2020-05-15 13:24:37 +03:00
Aliaksandr Valialkin
0afd48d2ee
lib: extract common code for returning fast unix timestamp into lib/fasttime
2020-05-14 23:02:07 +03:00
Aliaksandr Valialkin
7d73623c69
lib/{storage,mergeset}: make sure that requests
and misses
cache counters never go down
2020-04-10 14:45:01 +03:00
Aliaksandr Valialkin
eceaf13e5e
lib/{storage,mergeset}: use time.Ticker instead of time.Timer where appropriate
...
It has been appeared that time.Timer was used in places where time.Ticker must be used instead.
This could result in blocked goroutines as in the https://github.com/VictoriaMetrics/VictoriaMetrics/issues/316 .
2020-02-13 13:10:07 +02:00
Aliaksandr Valialkin
680080887d
all: consistently log durations in seconds with millisecond precision
...
This should improve logs readability
2020-01-22 18:28:27 +02:00
Aliaksandr Valialkin
fc71602039
lib/storage: limit maxRaRowsPerPartition by 500K for any number of rawRowsShardsPerPartition
...
This should reduce write amplification for high ingestion rate on multi-CPU systems
2020-01-04 23:57:31 +02:00
Aliaksandr Valialkin
1825893eef
lib/storage: scale ingestion performance by sharding rawRows on systems with more than 8 CPU cores
2019-12-19 18:18:29 +02:00
Aliaksandr Valialkin
0ed9258545
lib/{mergeset,storage}: log info message when both source and destination part paths from txn are missing during startup
...
This is expected condition after unclean shutdown (OOM, hard reset, `kill -9`) on NFS disk.
2019-12-09 15:44:53 +02:00
Aliaksandr Valialkin
72345eb5bd
lib/{mergeset,storage}: make sure pending transaction deletions are finished before and after runTransactions
call.
...
`runTransactions` call issues async deletions for transaction files. The previously issued transaction deletions
can race with the next call to `runTransactions`. Prevent this by waiting until all the pending transaction
deletions are funished in the beginning of `runTransactions`. Also make sure that all the pending transaction
deletions are finished before returning from `runTransactions`.
2019-12-04 21:40:30 +02:00
Aliaksandr Valialkin
638a5cbb16
lib/{mergeset,storage}: remove transaction files only after the mentioned dirs are really removed
...
This should fix the issue on NFS when incompletely removed dirs may be left
after unclean shutdown (OOM, kill -9, hard reset, etc.), while the corresponding transaction
files are already removed.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/162
2019-12-02 21:36:31 +02:00
Aliaksandr Valialkin
5c2099ecfe
lib/storage: return back finalPartsToMerge from 2 to 3 in order to prevent from excessive merges in old partitions
2019-11-05 17:27:48 +02:00
Aliaksandr Valialkin
c62399eb3e
lib/{storage,mergeset}: create missing partition directories after restoring from backups
...
Backup tools could skip empty directories. So re-create such directories on the first run.
2019-11-02 02:27:11 +02:00
Aliaksandr Valialkin
d18ea0c95b
app/vmstorage: add -bigMergeConcurrency
and -smallMergeConcurrency
flags for tuning the maximum number of CPU cores used during merges
2019-10-31 16:19:13 +02:00
Aliaksandr Valialkin
26d570bb3a
lib/storage: get parts to merge after applying the limit on the number of concurrent merges
...
This should reduce write amplification under high ingestion rate.
2019-10-30 02:04:56 +02:00
Aliaksandr Valialkin
2e2eff90d5
lib/{mergeset,storage}: limit the maximum number of concurrent merges; leave smaller number of parts during final merge
2019-10-29 12:45:28 +02:00
Aliaksandr Valialkin
e83fe938c8
all: make fmt
2019-10-17 20:04:34 +03:00
Aliaksandr Valialkin
97ce4e03a5
all: add support for GOARCH=386 and fix all the issues related to 32-bit architectures such as GOARCH=arm
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/212
2019-10-17 18:23:23 +03:00
Aliaksandr Valialkin
2abd5154e0
lib/storage: typo fix in comment to maxRowsPerSmallPart.
2019-10-08 18:51:20 +03:00
Aliaksandr Valialkin
b986516fbe
lib/storage: create and use lib/uint64set
instead of map[uint64]struct{}
...
This should improve inverted index search performance for filters matching big number of time series,
since `lib/uint64set.Set` is faster than `map[uint64]struct{}` for both `Add` and `Has` calls.
See the corresponding benchmarks in `lib/uint64set`.
2019-09-24 21:17:55 +03:00
Aliaksandr Valialkin
0686ac52c3
lib/{storage,mergeset}: merge tag->metricID
rows into tag->metricIDs
rows for common tag
values
...
This should improve lookup performance if the same `label=value` pair exists
in big number of time series.
This should also reduce memory usage for mergeset data cache, since `tag->metricIDs` rows
occupy less space than the original `tag->metricID` rows.
2019-09-20 22:06:41 +03:00
Aliaksandr Valialkin
bad53e4207
lib/mergeset: dynamically calculate the maximum number of items per part, which can be cached in OS page cache
2019-09-11 14:53:45 +03:00
Aliaksandr Valialkin
9196c085a7
all: port to FreeBSD on GOARCH=amd64
2019-08-28 01:19:23 +03:00
Aliaksandr Valialkin
4b688fffee
lib/storage: calculate the maximum number of rows per small part from -memory.allowedPercent
...
This should improve query speed over recent data on machines with big amounts of RAM
2019-08-25 14:41:12 +03:00
Aliaksandr Valialkin
1402a6b981
lib/storage: properly limit the number of output rows in small and big parts storage
...
Previously small parts storage didn't take into account the available disk space for big parts.
2019-08-25 14:41:12 +03:00
Aliaksandr Valialkin
3308279c4e
lib/storage: remove outdated comment on maxRowsPerSmallPart
...
The commend became outdated after the commit ed6ac1a5df027f0dfc22448e3b27c26b6f77c67a,
which stops merging of small parts on graceful shutdown instead of waiting
for their completion.
2019-08-25 13:47:32 +03:00
Aliaksandr Valialkin
5d8d110010
lib/fs: atomically create file with the given contents on WriteFileAtomically
...
This should prevent from `transaction` and `metadata.json` files corruption
on unclean shutdown such as OOM, `kill -9`, power loss, etc.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/148
2019-08-12 15:02:55 +03:00
Aliaksandr Valialkin
56c154f45b
all: add vm_data_size_bytes
metrics for easy monitoring of on-disk data size and on-disk inverted index size
2019-07-04 19:42:30 +03:00
Aliaksandr Valialkin
419197ba08
lib/fs: consolidate *RemoveAll* funcs into a single MustRemoveAll func
...
The func syncs parent dir in order to persist directory removal
in the event of power loss
2019-06-12 01:53:46 +03:00
Aliaksandr Valialkin
935bfd7a18
lib/fs: consistency renaming SyncPath -> MustSyncPath, since it doesnt return error
2019-06-11 23:13:49 +03:00
Aliaksandr Valialkin
ac7b186f13
all: try hard removing directory with contents
...
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/61
2019-06-11 01:57:59 +03:00
Aliaksandr Valialkin
54fb8b21f9
all: fix misspellings
2019-05-25 21:51:11 +03:00
Aliaksandr Valialkin
1836c415e6
all: open-sourcing single-node version
2019-05-23 00:18:06 +03:00