Aliaksandr Valialkin
8b262d4ba7
lib/storage: periodically reset prefetchedMetricIDs cache in order to limit its size under high churn rate
2021-07-07 10:58:51 +03:00
Aliaksandr Valialkin
a7694092b8
Revert "lib/uint64set: allow reusing bucket16 structs inside uint64set.Set via uint64set.Release method"
...
This reverts commit 7c6d3981bf
.
Reason for revert: high contention at bucket16Pool on systems with big number of CPU cores.
This slows down query processing significantly.
2021-07-06 18:21:35 +03:00
Aliaksandr Valialkin
8aa9bba9bd
lib/{mergeset,storage}: switch from sync.Pool to chan-based pool for inmemoryPart objects
...
This should reduce memory usage on systems with big number of CPU cores,
since every inmemoryPart object occupies at least 64KB of memory and sync.Pool maintains
a separate pool inmemoryPart objects per each CPU core.
Though the new scheme for the pool worsens per-cpu cache locality, this should be amortized
by big sizes of inmemoryPart objects.
2021-07-06 16:28:41 +03:00
Aliaksandr Valialkin
7c6d3981bf
lib/uint64set: allow reusing bucket16 structs inside uint64set.Set via uint64set.Release method
...
This reduces the load on memory allocator in Go runtime in production workload.
2021-07-06 15:35:03 +03:00
Aliaksandr Valialkin
78c9174682
lib/mergeset: increase pool capacity for inmemoryBlock according to collected profiles from production workload
...
CPU and memory profiles show that the pool capacity for inmemoryBlock objects is too small.
This results in the increased load on memory allocation code in Go runtime.
Increase the pool capacity in order to reduce the load on Go runtime.
2021-07-06 13:41:34 +03:00
Aliaksandr Valialkin
f71e4d1853
lib/mergeset: limit the frequency for flushCallback calls to once per 10 seconds
...
This should improve hit ratio for tagFiltersCache when big number of new time series are constantly registered
(aka high churn rate). This, in turn, should reduce CPU usage for queries over such time series.
2021-07-06 12:17:17 +03:00
Aliaksandr Valialkin
f3acf065c9
lib/storage: consistency renaming: tagCache -> tagFiltersCache
...
This improves code readability
2021-07-06 11:03:51 +03:00
Aliaksandr Valialkin
0020b9f904
lib/workingsetcache: properly update stats for requests and cache misses
...
Previously the stats for cache misses could be improperly counted, because it had inflated cache misses
if the entry was missing in the curr cache, but was existing in the prev cache.
The same applies to cache requests - they were inflated if the entry was missing in the curr cache.
2021-07-06 10:53:32 +03:00
Aliaksandr Valialkin
4cf47163c1
lib/workingsetcache: fix cache capacity calculations after 4f0003f182
2021-07-05 17:11:57 +03:00
Aliaksandr Valialkin
4f0003f182
lib/workingsetcache: typo fixes after d0c830039d
2021-07-05 15:35:37 +03:00
Aliaksandr Valialkin
d0c830039d
lib/storage: tune cache sizes according to production workload
2021-07-05 15:16:11 +03:00
Aliaksandr Valialkin
8f973e34fb
lib/workingsetcache: properly switch to whole
mode
...
Previously the switch from `split` to `whole` mode had been performed too early,
e.g. when the current cache size became bigger than 1/4 of the allowed cache size.
Now it is performed when the current cache size becomes bigger than 1/2 of the allowed cache size.
This change can reduce memory usage for data ingestion path when big number of active time series are ingested.
2021-07-05 15:16:11 +03:00
Aliaksandr Valialkin
43103be011
lib/{storage,mergeset}: increase cache timeout for data and index blocks from a minute to two minutes
...
One minute cache timeout result in slower queries in some production workloads where the interval
between query execution is in the range 1 minute - 2 minutes.
2021-07-05 15:16:11 +03:00
Aliaksandr Valialkin
54b9e1d3cb
lib/cgroup: set GOGC to 50 by default if it isn't set
...
This should reduce memory usage for typical VictoriaMetrics workloads by up to 50%
2021-07-05 15:16:11 +03:00
Aliaksandr Valialkin
9a83e9018d
lib/storage: properly detect free disk space shortage during data merge
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1373
2021-07-02 17:40:54 +03:00
Aliaksandr Valialkin
7088f17494
lib/promscrape/discovery/consul: use case-insensitive comparison for service names
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1424
2021-07-02 14:49:27 +03:00
Aliaksandr Valialkin
6e406083f2
lib/promauth: cache the client TLS certificate for up to a second
...
This should reduce CPU usage when TLS connections are established at a high rate.
2021-07-02 13:21:51 +03:00
Aliaksandr Valialkin
158c50c0ee
lib/promauth: reload TLS certificates from disk on every mTLS connection as Prometheus does
...
This allows updating client certificates without the need to restart vmagent and/or single-node VictoriaMetrics.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1420
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/470
2021-07-01 15:27:47 +03:00
Aliaksandr Valialkin
c25b839078
lib/workingsetcache: reset the cache mode when the cache is reset
...
This should reduce memory usage if the working set is reduced after the cache reset.
2021-07-01 11:50:11 +03:00
Nikolay
ae485c2bfd
fixes /targets button style ( #1423 )
...
* fixes /targets button style
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1422
* updates boostrap version
2021-07-01 11:48:07 +03:00
Aliaksandr Valialkin
c93cee8de8
lib/{mergeset,storage}: reduce the maximum lifetime for cached indexdb and data blocks from 2 minutes to a minute
...
This should reduce memory usage on a system with high number of active time series and a high churn rate.
One minute is enough for caching the blocks needed for repeated queries (e.g. alerting rules, recording rules and dashboard refreshes).
2021-06-29 19:57:07 +03:00
Aliaksandr Valialkin
fc12484734
lib/mergeset: switch from sync.Pool to a channel for a pool for inmemoryBlock structs
...
This should reduce memory usage for the pool on systems with big number of CPU cores.
The sync.Pool maintains per-CPU pools, so the total number of objects in the pool
is proportional to the number of available CPU cores. The channel limits the number
of pooled objects by its own capacity. This means smaller number of pooled objects on average.
2021-06-29 19:56:59 +03:00
Aliaksandr Valialkin
9ce211a514
lib/promscrape/discovery/docker: fix golint warning: struct field Id should be ID
2021-06-29 13:12:28 +03:00
Aliaksandr Valialkin
5506cff76e
lib/storage: put indexDBName into the key for dateTagFilter cache and for uselessTagFilters cache
...
This should prevent from stats overwriting when the previous indexdb is queried.
2021-06-29 12:40:05 +03:00
Aliaksandr Valialkin
1b0501a09e
lib/promscrape: typo fix in /targets
output
...
The typo has been introduced in fb72a2133f
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1408
2021-06-28 21:26:37 +03:00
Aliaksandr Valialkin
cb5453953f
lib/promscrape: split docker and dockerswarm service discovery code bases, since they have very little in common
...
This is a follow up after c85a5b7fcb
2021-06-25 13:20:20 +03:00
Aliaksandr Valialkin
a69045e440
lib/promscrape: consistently sort service discovery routines
...
This should simplify further maintenance of the code
2021-06-25 12:10:46 +03:00
Lu Jiajing
c85a5b7fcb
Support Docker ServiceDiscovery ( #1402 )
...
* add docker discovery
* add test
* add labels test and add scrape work
* remove TODO
* refactor to merge apiConfig and sdConfig
* apply suggestion
2021-06-25 11:42:47 +03:00
Nikolay
434e33da9b
adds missing MustStop call to do and http sd ( #1404 )
2021-06-25 11:39:18 +03:00
Aliaksandr Valialkin
c22114c6f0
lib/storage: tune tag filters search logic
...
Tune the logic according to the logs provided at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1338#issuecomment-864293624
The previous logic had a race when multiple concurrent queries execute the same tag filter without prior stats.
This could result in incorrectly stored stats for such tag filter, which then could result in non-optimal sorting of tag filters
for further queries.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1338
2021-06-23 13:29:39 +03:00
Aliaksandr Valialkin
e8a5bb92b7
lib/promscrape/discovery/consul: properly pass namespace to Consul watcher
...
Follow-up for 58a2989fe7
2021-06-22 17:42:41 +03:00
Aliaksandr Valialkin
ac54f34f9e
lib/promscrape/discovery/http: follow up after e307bbb29a
2021-06-22 13:40:33 +03:00
Aliaksandr Valialkin
755040a171
lib/promscrape/discovery: support generic auth configs in Consul service discovery in the same way as Prometheus 2.28 does
2021-06-22 13:34:02 +03:00
Nikolay
e307bbb29a
adds http_sd ( #1399 )
...
* adds http_sd
* adds X-Prometheus-Refresh-Interval-Seconds header
* Update lib/promscrape/discovery/http/api.go
Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-06-22 13:33:37 +03:00
Nikolay
58a2989fe7
adds consul enterprise namespace support ( #1400 )
...
* adds consul enterprise namespace support
* Update lib/promscrape/discovery/consul/consul.go
Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-06-22 12:49:44 +03:00
Aliaksandr Valialkin
fb72a2133f
lib/promscrape: show jobs with empty scrape targets on /targets page
2021-06-18 10:53:52 +03:00
Nikolay
6c434b260e
fixes DO service discovery labels ( #1389 )
...
adds test for digitalocean sd
2021-06-17 15:12:20 +03:00
Aliaksandr Valialkin
dcbc22552f
lib/storage: fix infinite loop introduced in aa9b56a046
2021-06-17 14:28:10 +03:00
Aliaksandr Valialkin
aa9b56a046
lib/{mergeset,storage}: reduce the number of fsync calls on data ingestion path on systems with many cpu cores
...
VictoriaMetrics maintains a buffer per CPU core for the ingested data. These buffers are flushed to disk every second.
These buffers are flushed to disk in parallel starting from the commit 56b6b893ce
.
This resulted in increased write disk IO usage on systems with many cpu cores
as described at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1338#issuecomment-863046999 .
This commit merges the per-CPU buffers into bigger in-memory buffers before flushing them to disk.
This should reduce the rate of fsync syscalls and, consequently, the write disk IO on systems with many CPU cores.
This should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1338
See also https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1244
2021-06-17 13:52:08 +03:00
Aliaksandr Valialkin
84fb59b0ba
lib/storage: move deletedMetricIDs set from indexDB to Storage
...
This makes consitent the list of deleted metricIDs when it is used from both the current indexDB and the previous indexDB (aka extDB).
This should fix the issue, which could lead to storing new samples under deleted metricIDs after indexDB rotation.
See more details at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1347#issuecomment-861232136 .
Thanks to @tangqipengleoo for the initial analysis and the pull request - https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1383 .
This commit resolves the issue in more generic way compared to https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1383 .
The downside of the commit is the deletedMetricIDs set isn't cleaned from the metricIDs outside the retention. It needs app restart.
This should be OK in most cases.
2021-06-15 15:04:30 +03:00
Aliaksandr Valialkin
e028ad241a
lib/protoparser: stop reading the input stream as soon as the callback provided by the caller returns error
...
This is a follow-up for af90c3c43b
2021-06-14 15:18:49 +03:00
faceair
af90c3c43b
lib/protoparser: stop read when callback error ( #1380 )
2021-06-14 15:10:58 +03:00
Aliaksandr Valialkin
36d55bff66
lib/promscrape: show the number of samples collected during the last scrape at /targets and /api/v1/targets pages
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1377
2021-06-14 14:04:00 +03:00
Nikolay
729c4eeb9c
adds digital ocean sd ( #1376 )
...
* adds digital ocean sd config
* adds digital ocean sd
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1367
* typo fix
2021-06-14 13:15:04 +03:00
Aliaksandr Valialkin
06b8e7d148
lib/promscrape: increase the duration for reading the full response in stream parsing mode
...
Increase the duration from 10x to 30x of the configured `scrape_interval'.
This should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1365
2021-06-14 12:28:09 +03:00
Aliaksandr Valialkin
48210130ac
lib/protoparser: measure the duration for reading the whole block of data instead of a single read operation
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1365
2021-06-14 12:25:52 +03:00
Aliaksandr Valialkin
3c4366806c
lib/protoparser/common: log the duration for reading a block of data in ReadLinesBlockExt on error
...
This may help debugging issues like https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1365
2021-06-14 12:22:04 +03:00
Aliaksandr Valialkin
ed83558646
app/vmauth: properly handle http.ErrAbortHandler panic
...
This panic can be raised by the reverseProxy on aborted request to the backend.
So handle it (e.g. suppress) at reverseProxy.ServeHTTP call.
Do not suppress the panic at lib/httpserver generic HTTP handler,
since it may result in an inconsistent state left after the panicking handler.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1353
2021-06-11 12:50:25 +03:00
Aliaksandr Valialkin
c4f3fbfa5d
lib/storage: reset cache on disk during series deletion and during indexdb rotation
...
This should prevent from inconsistent behavior (aka partially missing data for some time series) after unclean shutdown.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1347
2021-06-11 12:42:28 +03:00
Aliaksandr Valialkin
69b1482bdb
lib/storage: consistency renaming: getMaxRawRowsPerPartition -> getMaxRawRowsPerShard
2021-06-11 10:57:23 +03:00