Commit graph

494 commits

Author SHA1 Message Date
Dmytro Kozlov
018d2303c4
Cardinality explorer (#2625)
* Cardinality explorer

* vmui, vmselect: updated field name, added description to spinner

* make vmui-update

* updated const name, make vmui-update

* lib/storage: changes calculation for totalSeries values

* added static files

* wip

* wip

* wip

* wip

* docs/CHANGELOG.md: document cardinality explorer feature

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2233

Co-authored-by: f41gh7 <nik@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-06-08 18:43:05 +03:00
Roman Khavronenko
1ee1e986da
lib/storage: limit max mergeConcurrency value for systems with high number of CPUs (#2673)
Workers count for merges affects the max part size during merges. Such behaviour
protects storage from running out of disk space for scenario when all workers
are merging parts with the max size.

This works very well for most cases. But for systems where high number of CPUs
is allocated for vmstorage components this could significantly impact the max
part size and result in more unmerged parts than expected.

While checking multiple production highly loaded setups it was discovered that
`max_over_time(vm_active_merges{type="storage/big}[1h]}"` rarely exceeds 2,
and `max_over_time(vm_active_merges{type="storage/small}[1h]}"` rarely exceeds 4.
The change in this commit limits the max value for concurrency accordingly.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-06-07 14:55:09 +03:00
Aliaksandr Valialkin
ea06d2fd3c
lib/storage: stop background merge when storage enters read-only mode
This should prevent from `no space left on device` errors when VictoriaMetrics
under-estimates the additional disk space needed for background merge.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2603
2022-06-01 14:36:45 +03:00
Roman Khavronenko
642eb1c534
lib/storage: make indexdb/tagFilters cache size configurable (#2667)
The default size of `indexdb/tagFilters` now can be overridden via
`storage.cacheSizeIndexDBTagFilters` flag.
Please, be careful with changing default size since it may
lead to inefficient work of the vmstorage or OOM exceptions.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2663
Signed-off-by: hagen1778 <roman@victoriametrics.com>

Co-authored-by: Nikolay <nik@victoriametrics.com>
2022-06-01 10:07:53 +02:00
Aliaksandr Valialkin
41958ed5dd
all: add initial support for query tracing
See https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#query-tracing

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1403
2022-06-01 02:29:23 +03:00
Aliaksandr Valialkin
a1add5c2c7
lib/storage: make fmt 2022-05-31 12:54:37 +03:00
Aliaksandr Valialkin
bac75ea8a2
lib/storage: do not take into account series from the next day when match[] filter is passed to /api/v1/status/tsdb 2022-05-31 12:15:26 +03:00
Aliaksandr Valialkin
f6d11a49aa
lib/storage: add ability to change the indexdb rotation time offset with -retentionTimezoneOffset command-line flag
This is a follow-up for 0fbf59199a

See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/2574
2022-05-25 16:05:29 +03:00
阳明
0fbf59199a
lib/storage: Remove the effect of time zone on next retention period (#2568) (#2574) 2022-05-25 15:08:24 +03:00
Dmytro Kozlov
7dd9f3b98e
{vmbackup, vmbackup/snapshot}: fixed problem with snapshot backup in another snapshot folder (#2535)
* {vmbackup, vmbackup/snapshot}: validate snapshot name

* vmbackup/snapshot: added another checks

* backup/actions: added check that we ignore backup_complete.ignore file

* vmbackup: moved snapshot to lib directory

* lib/snapshot: added functions description

* lib/snapshot: fixed typo

* vmbackup: code cleanup

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-05-04 22:12:03 +03:00
Aliaksandr Valialkin
0d86644d65
lib/storage: leave the last sample per each discrete interval during the deduplicaton
This aligns better with staleness logic in Prometheus - https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness
2022-05-02 21:50:45 +03:00
Artem Navoiev
37cf509c3a
lib/{storage,flagutil} - Add option for snapshot autoremoval (#2487)
* lib/{storage,flagutil} - Add option for snapshot autoremoval

- add prometheus-like duration as command flag
- add option to delete stale snapshots
- update duration.go flag to re-use own code

* wip

* lib/flagutil: re-use Duration.Set() call in NewDuration

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-05-02 11:00:15 +03:00
Aliaksandr Valialkin
54de0531a4
app/vmstorage: properly handle maxSeries limit passed from vmselect to vmstorage 2022-04-12 11:23:04 +03:00
Aliaksandr Valialkin
57143e9435
lib/storage: increase the number of rawRowsShard shards on systems with more than 4 CPU cores
This should improve data ingestion scalability on systems with many CPU cores
2022-04-06 19:49:20 +03:00
Aliaksandr Valialkin
50cf74ce4b
lib/storage: reuse sync.WaitGroup objects
This reduces GC load by up to 10% according to memory profiling
2022-04-06 13:34:04 +03:00
Nikolay
9a88c1a91e
lib/{storage,regexpcache}: replaces regexpCacheMap with LRU cache (#2293)
* lib/{storage,regexpcache}: replaces regexpCacheMap with LRU cache

It should decrease memory usage for regexp caching
with storing cacheEntry by pointer - golang map should be able to effectivly shrink it's size
original issue with this case - unexpected map grows and storage OOM

Apply suggestions from code review

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

Adds missing metrics for regexp cache and regexpPrefixes cache

* wip

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-03-26 12:54:50 +02:00
Aliaksandr Valialkin
6e364e19ef
app/vmselect: add fine-grained limits for the number of returned/scanned time series for various APIs 2022-03-26 11:29:49 +02:00
Aliaksandr Valialkin
2ae3a9a8a3
lib/storage: reduce the interval for checking for free disk space from 30 seconds to 1 second
This should reduce the probability of out of disk space panics when -storage.minFreeDiskSpaceBytes is set to low values.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2305
2022-03-18 16:52:27 +02:00
jduncan0000
e5868b9c29
Fix for issue #2255 - matchTagFilters for positive empty-match filters (#2304)
* fix for issue 2255 - matchTagFilters for positive empty-match filters

* add example to comments

* formatting

* add test for positive empty match

* formatting
2022-03-18 12:58:22 +02:00
Aliaksandr Valialkin
3eef1ddc7d
lib/storage: trashing -> thrashing typo in docs
This is a follow-up for 918ed5cb32
2022-03-16 13:05:26 +02:00
Aliaksandr Valialkin
59877d9f32
lib/{mergeset,storage}: tune compression levels for small blocks
This should reduce CPU usage spent on compression
2022-02-25 15:33:40 +02:00
Aliaksandr Valialkin
7e99bbb967
lib/storage: document why job-like and instance-like labels must be stored at mn.Tags[0] and mn.Tags[1]
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2244
2022-02-25 13:21:07 +02:00
Aliaksandr Valialkin
8bf3fb917a
lib/storage: add a comment to indexSearch.containsTimeRange() on why it allows false positives
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2239
2022-02-24 12:47:27 +02:00
Aliaksandr Valialkin
a16f1ae565
lib/storage: properly handle series selector matching multiple metric names plus a negative filter
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2238

This is a follow-up for 00cbb099b6
2022-02-24 12:15:54 +02:00
Aliaksandr Valialkin
62b46007c5
lib/workingsetcache: reduce the default cache rotation period from hour to 20 minutes
This should reduce memory usage under high time series churn rate
2022-02-23 13:41:45 +02:00
Aliaksandr Valialkin
f72b35665f
lib/storage: optimize /api/v1/status/tsdb call by skipping all the artificially created tag entries at once
This is a follow-up for b71be42d90
2022-02-21 18:23:35 +02:00
Roman Khavronenko
b6ed9afd6d
lib: allow to configure cache size by type (#2206)
* lib: allow to configure cache size by type

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1940
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Apply suggestions from code review

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-21 13:50:34 +02:00
Aliaksandr Valialkin
2b87b4d183
lib/storage: typo fix after c3affb0c4f 2022-02-17 12:55:54 +02:00
Aliaksandr Valialkin
c3affb0c4f
lib/storage: simplify code for searching for label values
This is a follow-up after 9dd191b27c
2022-02-17 12:29:38 +02:00
Aliaksandr Valialkin
9dd191b27c
lib/storage: properly skip composite tag entries when searching for tag names or tag values
This is a follow-up for b71be42d90

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2200
2022-02-16 23:01:19 +02:00
Aliaksandr Valialkin
6ff71474a6
lib/storage: document why tsid cache is reset before saving it to disk
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2205
2022-02-16 18:37:56 +02:00
Aliaksandr Valialkin
b71be42d90
lib/storage: use binary search instead of full scan for skipping artificial tags when searching for tag names or tag values
This should improve performance for /api/v1/labels and /api/v1/label/<label_name>/values

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2200
2022-02-16 18:15:41 +02:00
Aliaksandr Valialkin
96dce63dbd
lib/storage: tune the logic for pre-populating of the per-day inverted index for the next day
- Postpone the pre-poulation to the last hour of the current day. This should reduce the number
  of useless entries in the next per-day index, which shouldn't be created there,
  when the corresponding time series are stopped to be pushed during the current day.

- Make the pre-population more smooth in time by using the hash of MetricID instead of MetricID itself
  when calculating the need for for the given MetricID pre-population.

- Sync the logic for pre-population of the next day inverted index with the logic of pre-populating tsid cache
  after indexdb rotation. This should improve code maintainability.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/430
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401
2022-02-12 16:33:16 +02:00
Roman Khavronenko
cf1a8bce6b
lib/index: reduce read/write load after indexDB rotation (#2177)
* lib/index: reduce read/write load after indexDB rotation

IndexDB in VM is responsible for storing TSID - ID's used for identifying
time series. The index is stored on disk and used by both ingestion and read path.

IndexDB is stored separately to data parts and is global for all stored data.
It can't be deleted partially as VM deletes data parts. Instead, indexDB is
rotated once in `retention` interval.

The rotation procedure means that `current` indexDB becomes `previous`,
and new freshly created indexDB struct becomes `current`. So in any time,
VM holds indexDB for current and previous retention periods.
When time series is ingested or queried, VM checks if its TSID is present
in `current` indexDB. If it is missing, it checks the `previous` indexDB.
If TSID was found, it gets copied to the `current` indexDB. In this way
`current` indexDB stores only series which were active during the retention
period.

To improve indexDB lookups, VM uses a cache layer called `tsidCache`. Both
write and read path consult `tsidCache` and on miss the relad lookup happens.

When rotation happens, VM resets the `tsidCache`. This is needed for ingestion
path to trigger `current` indexDB re-population. Since index re-population
requires additional resources, every index rotation event may cause some extra
load on CPU and disk. While it may be unnoticeable for most of the cases,
for systems with very high number of unique series each rotation may lead
to performance degradation for some period of time.

This PR makes an attempt to smooth out resource usage after the rotation.
The changes are following:
1. `tsidCache` is no longer reset after the rotation;
2. Instead, each entry in `tsidCache` gains a notion of indexDB to which
they belong;
3. On ingestion path after the rotation we check if requested TSID was
found in `tsidCache`. Then we have 3 branches:
3.1 Fast path. It was found, and belongs to the `current` indexDB. Return TSID.
3.2 Slow path. It wasn't found, so we generate it from scratch,
add to `current` indexDB, add it to `tsidCache`.
3.3 Smooth path. It was found but does not belong to the `current` indexDB.
In this case, we add it to the `current` indexDB with some probability.
The probability is based on time passed since the last rotation with some threshold.
The more time has passed since rotation the higher is chance to re-populate `current` indexDB.
The default re-population interval in this PR is set to `1h`, during which entries from
`previous` index supposed to slowly re-populate `current` index.

The new metric `vm_timeseries_repopulated_total` was added to identify how many TSIDs
were moved from `previous` indexDB to the `current` indexDB. This metric supposed to
grow only during the first `1h` after the last rotation.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-12 00:30:08 +02:00
Aliaksandr Valialkin
08428464e9
lib/storage: fix broken BenchmarkHeadPostingForMatchers for {i=~".*"} after f4dead529f
The commit f4dead529f makes such query to return nothing instead of all the time series.
This aligns more with Prometheus behaviour.
2022-02-12 00:27:10 +02:00
Roman Khavronenko
e3adcbec6e
lib/promscrape: support prometheus-like duration in scrape configs (#2169)
* lib/promscrape: support prometheus-like duration in scrape configs

The change allows to specify duration values like `1d`, `1w`
for fields `scrape_interval`, `scrape_timeout`, etc.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/817#issuecomment-1033384766
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/blockcache: make linter happy

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/promscrape: support prometheus-like duration in scrape configs

* add support for extra fields `scrape_align_interval` and `scrape_offset`;
* support Prometheus duration parsing for `__scrape_interval__`
and `__scrape_duration__` labels;

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

* wip

* docs/CHANGELOG.md: document the feature

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-11 16:17:00 +02:00
Aliaksandr Valialkin
4bdd10ab90
lib/bytesutil: split Resize* funcs to MayOverallocate and NoOverallocate for more fine-grained control over memory allocations
Follow-up for f4989edd96

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-02-01 00:18:42 +02:00
Aliaksandr Valialkin
a8509c112a
lib/storage: avoid allocations of tsidPrev on every blockStreamReader.NextBlock() call
This is a follow-up for 00b7c97d2a

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2082
2022-01-31 22:46:53 +02:00
Aliaksandr Valialkin
96aa3761fc
lib/storage/table.go: add missing tb.ptwsLock.Unlock() before the return
This is a follow-up for a1083d0531

See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/2103
2022-01-28 14:15:42 +02:00
匠心零度
1999bbfe82
optimized code (#2103)
* optimized code ,because only the first error,so no need var errors []error

* optimized code ,because only the first error,so no need var errors []error

Co-authored-by: lirenzuo <lirenzuo@shein.com>
2022-01-28 14:15:41 +02:00
Aliaksandr Valialkin
f4989edd96
lib/bytesutil: split Resize() into ResizeNoCopy() and ResizeWithCopy() functions
Previously bytesutil.Resize() was copying the original byte slice contents to a newly allocated slice.
This wasted CPU cycles and memory bandwidth in some places, where the original slice contents wasn't needed
after slize resizing. Switch such places to bytesutil.ResizeNoCopy().

Rename the original bytesutil.Resize() function to bytesutil.ResizeWithCopy() for the sake of improved readability.

Additionally, allocate new slice with `make()` instead of `append()`. This guarantees that the capacity of the allocated slice
exactly matches the requested size. The `append()` could return a slice with bigger capacity as an optimization for further `append()` calls.
This could result in excess memory usage when the returned byte slice was cached (for instance, in lib/blockcache).

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-01-25 15:24:44 +02:00
Aliaksandr Valialkin
ede93469ea
lib/mergeset: tune caches size limits for indexdb/dataBlocks and indexdb/indexBlocks
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-01-21 12:45:43 +02:00
Aliaksandr Valialkin
5f84b17ed6
lib/storage: properly limit cardinality when ingesting multiple samples for the same time series in a single request 2022-01-21 12:38:09 +02:00
Aliaksandr Valialkin
00b7c97d2a
lib/storage: verify that blocks in a single part are sorted by TSID when reading sequential blocks from the part
This may help narrowing down the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2082
2022-01-20 20:36:37 +02:00
Aliaksandr Valialkin
ea87f21e23
lib/storage: set bsm.Block to nil on error, so the previous block couldn't be used.
This may help nailing down the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2082
2022-01-20 20:13:14 +02:00
Aliaksandr Valialkin
145337792d
lib/{mergeset,storage}: properly limit cache sizes for indexdb
Previously these caches could exceed limits set via `-memory.allowedPercent` and/or `-memory.allowedBytes`,
since limits were set independently per each data part. If the number of data parts was big, then limits could be exceeded,
which could result to out of memory errors.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-01-20 18:37:17 +02:00
Aliaksandr Valialkin
178dd87e26
lib/storage: follow-up for 38bf5fc136 2022-01-05 16:00:11 +02:00
weng zhao
38bf5fc136
vmstorage: fix query like {foo=~"bar|"} return extra timeseries cause by negative filter transformation malfunction (#2032)
1. L2749 make kb.B remain the value of comonPrefix instead of tf.prefix
2. L2762 avoid change tf.value from "bar|" to ".+r|"
2022-01-05 15:59:15 +02:00
Nikolay
8ff7da7202
adds restore.lock (#1988)
* adds restore.lock
it must prevent from running storage after incomplete restore process
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1958

* return back flock file deletion

* Apply suggestions from code review

* wip

* docs/CHANGELOG.md: document https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1958

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2021-12-22 13:10:15 +02:00
Aliaksandr Valialkin
ce333f28d8
all: use logger.WithThrottler() where appropriate 2021-12-21 17:03:25 +02:00