Commit graph

1358 commits

Author SHA1 Message Date
Aliaksandr Valialkin
e3a10b327c
lib/blockcache: properly remove references to deleted parts
Previously references to deleted parts may remain active as cache.m keys.
This could prevent from proper memory de-allocation.
This could lead to increased memory usage for the following caches starting from v1.73.0:

* indexdb/indexBlocks
* indexdb/dataBlocks
* storage/indexBlocks

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2242
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007

This is a follow-up for 88605a7ea2
2022-03-18 17:07:59 +02:00
Aliaksandr Valialkin
2ae3a9a8a3
lib/storage: reduce the interval for checking for free disk space from 30 seconds to 1 second
This should reduce the probability of out of disk space panics when -storage.minFreeDiskSpaceBytes is set to low values.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2305
2022-03-18 16:52:27 +02:00
Aliaksandr Valialkin
88605a7ea2
lib/blockcache: properly release memory occupied by deleted entries
Proviously the deleted entries could remain referenced via lastAccessHeap for long time.
This could lead to increased memory usage for the following caches starting from v1.73.0:

* indexdb/indexBlocks
* indexdb/dataBlocks
* storage/indexBlocks

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2242
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-03-18 16:52:27 +02:00
jduncan0000
e5868b9c29
Fix for issue #2255 - matchTagFilters for positive empty-match filters (#2304)
* fix for issue 2255 - matchTagFilters for positive empty-match filters

* add example to comments

* formatting

* add test for positive empty match

* formatting
2022-03-18 12:58:22 +02:00
Aliaksandr Valialkin
3eef1ddc7d
lib/storage: trashing -> thrashing typo in docs
This is a follow-up for 918ed5cb32
2022-03-16 13:05:26 +02:00
Vic (Shihang) Li
918ed5cb32
fix: change thrashing typo (#2317) 2022-03-15 07:05:52 +00:00
Aliaksandr Valialkin
0a4aadffac
lib/mergeset: remove aux buffers from inmemoryPart
This should reduce the size of inmemoryPart items and may improve performance a bit during registering new time series

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2247
2022-03-03 17:08:44 +02:00
Aliaksandr Valialkin
c84a8b34cc
lib/mergeset: eliminate copying of itemsData and lensData from storageBlock to inmemoryBlock
This should improve performance when registering new time series.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2247
2022-03-03 16:46:37 +02:00
Aliaksandr Valialkin
7da4068f48
lib/mergeset: consistency renaming: ip->mp for inmemoryPart vars 2022-03-03 15:48:22 +02:00
Aliaksandr Valialkin
e8fdb27625
lib/mergeset: move storageBlock from inmemoryPart to a sync.Pool
The lifetime of storageBlock is much shorter comparing to the lifetime of inmemoryPart,
so sync.Pool usage should reduce overall memory usage and improve performance
because of better locality of reference when marshaling inmemoryBlock to inmemoryPart.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2247
2022-03-03 15:44:02 +02:00
Aliaksandr Valialkin
59877d9f32
lib/{mergeset,storage}: tune compression levels for small blocks
This should reduce CPU usage spent on compression
2022-02-25 15:33:40 +02:00
Aliaksandr Valialkin
7e99bbb967
lib/storage: document why job-like and instance-like labels must be stored at mn.Tags[0] and mn.Tags[1]
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2244
2022-02-25 13:21:07 +02:00
Aliaksandr Valialkin
8bf3fb917a
lib/storage: add a comment to indexSearch.containsTimeRange() on why it allows false positives
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2239
2022-02-24 12:47:27 +02:00
Aliaksandr Valialkin
a16f1ae565
lib/storage: properly handle series selector matching multiple metric names plus a negative filter
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2238

This is a follow-up for 00cbb099b6
2022-02-24 12:15:54 +02:00
Aliaksandr Valialkin
af5bdb9254
lib/mergeset: remove superflouos sorting of inmemoryBlock.data at inmemoryBlock.sort()
There is no need to sort the underlying data according to sorted items there.
This should reduce cpu usage when registering new time series in `indexdb`.

Thanks to @ahfuzhang for the suggestion at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2245
2022-02-24 11:20:32 +02:00
Aliaksandr Valialkin
3f49bdaeff
lib/promrelabel: add support for conditional relabeling via if filter
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1998
2022-02-24 02:27:26 +02:00
Aliaksandr Valialkin
d128a5bf99
lib/workingsetcache: do not rotate cache if it is in whole state
This should reduce the maximum memory usage for the cache in `whole` state
2022-02-23 22:55:18 +02:00
Aliaksandr Valialkin
62b46007c5
lib/workingsetcache: reduce the default cache rotation period from hour to 20 minutes
This should reduce memory usage under high time series churn rate
2022-02-23 13:41:45 +02:00
Aliaksandr Valialkin
f72b35665f
lib/storage: optimize /api/v1/status/tsdb call by skipping all the artificially created tag entries at once
This is a follow-up for b71be42d90
2022-02-21 18:23:35 +02:00
Aliaksandr Valialkin
ed12c60826
lib/mergeset: typo fix after b6ed9afd6d 2022-02-21 17:58:22 +02:00
Aliaksandr Valialkin
5d45ea1003
lib/blockcache: evict entries from the cache in LRU order
This should improve hit rate for smaller caches
2022-02-21 17:44:24 +02:00
Roman Khavronenko
69d1893f4c
Consul SD - update services on the watcher's start (#2202)
* lib/discovery/consul: update services on the watcher's start

Previously, watcher's start was only initing goroutines for discovery
but not waiting for the first iteration to end. It means first Consul
discovery wasn't returning discovered targets until the next iteration.

The change makes the watcher's start blocking until we get first discovery
iteration done and all registries updated.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: remove workarounds for consul SD

Now when consul SD lib properly updates services
on the first start, we don't need workarounds in vmalert.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/discovery/consul: update after review

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-21 15:32:45 +02:00
Roman Khavronenko
b6ed9afd6d
lib: allow to configure cache size by type (#2206)
* lib: allow to configure cache size by type

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1940
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Apply suggestions from code review

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-21 13:50:34 +02:00
Aliaksandr Valialkin
2b87b4d183
lib/storage: typo fix after c3affb0c4f 2022-02-17 12:55:54 +02:00
Aliaksandr Valialkin
c3affb0c4f
lib/storage: simplify code for searching for label values
This is a follow-up after 9dd191b27c
2022-02-17 12:29:38 +02:00
Aliaksandr Valialkin
9dd191b27c
lib/storage: properly skip composite tag entries when searching for tag names or tag values
This is a follow-up for b71be42d90

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2200
2022-02-16 23:01:19 +02:00
Aliaksandr Valialkin
5366d9be73
lib/blockcache: fix TestCache by ensuring that the cache size can be divided by the number of cache shards
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2204
2022-02-16 18:47:35 +02:00
Aliaksandr Valialkin
6ff71474a6
lib/storage: document why tsid cache is reset before saving it to disk
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2205
2022-02-16 18:37:56 +02:00
Aliaksandr Valialkin
b71be42d90
lib/storage: use binary search instead of full scan for skipping artificial tags when searching for tag names or tag values
This should improve performance for /api/v1/labels and /api/v1/label/<label_name>/values

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2200
2022-02-16 18:15:41 +02:00
Roman Khavronenko
d91c1d4eee
vmagent: fix js error on CollapseAll/ExpandAll buttons click (#2192)
* vmagent: fix js error on CollapseAll/ExpandAll buttons click

`Uncaught TypeError: Cannot read properties of null (reading 'style')`

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Apply suggestions from code review

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-15 12:52:48 +02:00
Corporte Gadfly
ad6bdd78d0
match fileSDCheckInterval with prometheus file_sd_config default (#2188) 2022-02-15 12:04:26 +02:00
Aliaksandr Valialkin
1215f51043
docs/CHANGELOG.md: document 3d890e89f1 2022-02-14 17:39:12 +02:00
Nikolay
3d890e89f1
Adds server certificate reload for lib/http (#2186)
* Adds server certificate reload for lib/http
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2171

* Update lib/httpserver/httpserver.go

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-14 17:32:13 +02:00
Nikolay
c90c1c4d54
fixes all_tenants query option usage for openstack service discovery (#2184)
explicit use configuration parametr instead of conditional add
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2182
2022-02-14 13:07:30 +02:00
Aliaksandr Valialkin
f10c38b827
lib/promscrape: add expand all and collapse all buttons to /targets page 2022-02-12 18:41:29 +02:00
Aliaksandr Valialkin
96dce63dbd
lib/storage: tune the logic for pre-populating of the per-day inverted index for the next day
- Postpone the pre-poulation to the last hour of the current day. This should reduce the number
  of useless entries in the next per-day index, which shouldn't be created there,
  when the corresponding time series are stopped to be pushed during the current day.

- Make the pre-population more smooth in time by using the hash of MetricID instead of MetricID itself
  when calculating the need for for the given MetricID pre-population.

- Sync the logic for pre-population of the next day inverted index with the logic of pre-populating tsid cache
  after indexdb rotation. This should improve code maintainability.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/430
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401
2022-02-12 16:33:16 +02:00
artifactori
ea153e5f90
Show gce sdconfig zone on vmagent:8429/config (#2178)
* vmagent: add test for marshalling gce sdconfig with ZoneYAML

* vmagent: implement MarshalYAML for ZoneYAML on gce sdconfig
2022-02-12 00:39:23 +02:00
Roman Khavronenko
cf1a8bce6b
lib/index: reduce read/write load after indexDB rotation (#2177)
* lib/index: reduce read/write load after indexDB rotation

IndexDB in VM is responsible for storing TSID - ID's used for identifying
time series. The index is stored on disk and used by both ingestion and read path.

IndexDB is stored separately to data parts and is global for all stored data.
It can't be deleted partially as VM deletes data parts. Instead, indexDB is
rotated once in `retention` interval.

The rotation procedure means that `current` indexDB becomes `previous`,
and new freshly created indexDB struct becomes `current`. So in any time,
VM holds indexDB for current and previous retention periods.
When time series is ingested or queried, VM checks if its TSID is present
in `current` indexDB. If it is missing, it checks the `previous` indexDB.
If TSID was found, it gets copied to the `current` indexDB. In this way
`current` indexDB stores only series which were active during the retention
period.

To improve indexDB lookups, VM uses a cache layer called `tsidCache`. Both
write and read path consult `tsidCache` and on miss the relad lookup happens.

When rotation happens, VM resets the `tsidCache`. This is needed for ingestion
path to trigger `current` indexDB re-population. Since index re-population
requires additional resources, every index rotation event may cause some extra
load on CPU and disk. While it may be unnoticeable for most of the cases,
for systems with very high number of unique series each rotation may lead
to performance degradation for some period of time.

This PR makes an attempt to smooth out resource usage after the rotation.
The changes are following:
1. `tsidCache` is no longer reset after the rotation;
2. Instead, each entry in `tsidCache` gains a notion of indexDB to which
they belong;
3. On ingestion path after the rotation we check if requested TSID was
found in `tsidCache`. Then we have 3 branches:
3.1 Fast path. It was found, and belongs to the `current` indexDB. Return TSID.
3.2 Slow path. It wasn't found, so we generate it from scratch,
add to `current` indexDB, add it to `tsidCache`.
3.3 Smooth path. It was found but does not belong to the `current` indexDB.
In this case, we add it to the `current` indexDB with some probability.
The probability is based on time passed since the last rotation with some threshold.
The more time has passed since rotation the higher is chance to re-populate `current` indexDB.
The default re-population interval in this PR is set to `1h`, during which entries from
`previous` index supposed to slowly re-populate `current` index.

The new metric `vm_timeseries_repopulated_total` was added to identify how many TSIDs
were moved from `previous` indexDB to the `current` indexDB. This metric supposed to
grow only during the first `1h` after the last rotation.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-12 00:30:08 +02:00
Aliaksandr Valialkin
08428464e9
lib/storage: fix broken BenchmarkHeadPostingForMatchers for {i=~".*"} after f4dead529f
The commit f4dead529f makes such query to return nothing instead of all the time series.
This aligns more with Prometheus behaviour.
2022-02-12 00:27:10 +02:00
Roman Khavronenko
e3adcbec6e
lib/promscrape: support prometheus-like duration in scrape configs (#2169)
* lib/promscrape: support prometheus-like duration in scrape configs

The change allows to specify duration values like `1d`, `1w`
for fields `scrape_interval`, `scrape_timeout`, etc.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/817#issuecomment-1033384766
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/blockcache: make linter happy

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/promscrape: support prometheus-like duration in scrape configs

* add support for extra fields `scrape_align_interval` and `scrape_offset`;
* support Prometheus duration parsing for `__scrape_interval__`
and `__scrape_duration__` labels;

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

* wip

* docs/CHANGELOG.md: document the feature

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-11 16:17:00 +02:00
Aliaksandr Valialkin
3cb72ccc2a
lib/promscrape/discovery/kubernetes: add __meta_kubernetes_endpointslice_{label,annotation}* labels to be consistent with other role values for Kubernetes service discovery 2022-02-11 14:54:47 +02:00
Nikolay
4e7f7f3302
fixes service discovery for kubernetes (#2173)
* fixes service discovery for kubernetes
now it must take in account all pods that belong to the discovered endpoint and endpointslice
adds simple test for endpoints
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2134

* wip

* docs/CHANGELOG.md: document the change

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-11 13:34:22 +02:00
Aliaksandr Valialkin
f9a17cb5fe
lib/mergeset: tune indexdb/{indexBlocks,dataBlocks} cache sizes further according to production stats
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-02-10 19:09:46 +02:00
Aliaksandr Valialkin
a9bb22b213
lib/blockcache: use higher number of shards for higher number of CPU cores
This should reduce mutex contention and increase performance

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-02-10 19:06:12 +02:00
Aliaksandr Valialkin
db8c4054e5
lib/promscrape: fix errors in test config
The errors were discovered after enabling strict parse mode by default.
See 9bb60ab00f
2022-02-08 19:56:37 +02:00
Aliaksandr Valialkin
4507b111a9
lib/blockcache: split the cache into multiple shards
This should reduce contention on cache mutex on hosts with many CPU cores,
which, in turn, should increase overall throughput for the cache.

This should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-02-08 19:44:29 +02:00
Aliaksandr Valialkin
2455a988e4
lib/mergeset: tune sizes for indexdb/dataBlocks and indexdb/indexBlocks according to production workload
This should help with https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007#issuecomment-1032308742
2022-02-08 17:58:49 +02:00
Aliaksandr Valialkin
9bb60ab00f
lib/promscrape: set -promscrape.config.strictParse to true by default
This allows detecting long-living silent errors in -promscrape.config
2022-02-08 15:41:43 +02:00
Aliaksandr Valialkin
a19e7f8c5b
lib/blockcache: make fmt 2022-02-08 15:24:11 +02:00
Aliaksandr Valialkin
d0f785defd
lib/blockcache: eliminate possible race when Cache.Put is called for the same entry from multiple goroutines
The race could result in incorrect cache size tracking, which, in turn, could result in too frequent cache cleaning
2022-02-08 01:10:43 +02:00