Commit graph

1174 commits

Author SHA1 Message Date
Aliaksandr Valialkin
327034b54f lib/promscrape/discovery/kubernetes: use v1 API instead of v1beta1 API for role: ingress and role: endpointslices
This should fix service discovery for these roles in Kubernetes v1.22 and newer versions.
See https://kubernetes.io/docs/reference/using-api/deprecation-guide/#ingress-v122

The corresponding change in Prometheus - https://github.com/prometheus/prometheus/pull/9205
2021-08-29 11:23:58 +03:00
Aliaksandr Valialkin
7fdb4db73d lib/promscrape: add ability to load scrape configs from multiple files
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1559
2021-08-26 08:51:53 +03:00
Aliaksandr Valialkin
4a2d7aec7f lib/promscrape: expose promscrape_discovery_http_errors_total metric for tracking errors per each http_sd config 2021-08-25 13:05:29 +03:00
Aliaksandr Valialkin
b885bd9b7d lib/{mergeset,storage}: improve the detection of the needed free space for background merge
This should prevent from possible out of disk space crashes during big merges.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1560
2021-08-25 10:01:09 +03:00
Aliaksandr Valialkin
67bc407747 lib/promscrape: reduce memory and CPU usage when Prometheus staleness tracking is enabled for metrics from deleted / disappeared scrape targets
Store the scraped response body instead of storing the parsed and relabeld metrics.
This should reduce memory usage, since the response body takes less memory than the parsed and relabeled metrics.
This is especially true for Kubernetes service discovery, which adds many long labels for all the scraped metrics.

This should also reduce CPU usage, since the marshaling of the parsed
and relabeld metrics has been substituted by response body copying.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1526
2021-08-21 21:24:07 +03:00
Aliaksandr Valialkin
c3b24882a7 lib/promscrape: use scrapeTimestamp when storing stale markers for failed scrape
This will make timestamps for stale markers more consistent for timestamps for other samples
2021-08-19 14:19:54 +03:00
Aliaksandr Valialkin
8ee575dee9 lib/promscrape: send stale markers for the previously scraped metrics on failed scrapes like Prometheus does 2021-08-18 22:00:46 +03:00
Aliaksandr Valialkin
5d92fafc40 app/vmselect: add -search.noStaleMarkers command-line flag for disabling stale markers handling in queries
This option allows reducing CPU usage a bit when VictoriaMetrics is used
for collecting and processing non-Prometheus data. For example, InfluxDB line protocol, Graphite, OpenTSDB, CSV, etc.
2021-08-18 13:58:06 +03:00
Aliaksandr Valialkin
f21fad53b4 lib/promscrape: add ability to disable sending Prometheus staleness markers with -promscrape.disableStaleMarkers command-line flag
This option can be useful when vmagent consumes too much additional memory
for staleness markers functionality and when staleness markers aren't needed.
2021-08-18 13:58:05 +03:00
Aliaksandr Valialkin
db34c40aec lib/promscrape: stop scrapers for the removed targets before starting scrapers for the added targets
This should prevent from possible time series overlap when old target is substituted by new target (for example, during Kubernetes deployments).

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1526
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1530
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/748
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1509
2021-08-17 01:00:40 +03:00
Aliaksandr Valialkin
5f13c519ee lib/promscrape: restore red highlighting for DOWN targets at /targets page
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1461
2021-08-15 16:04:33 +03:00
Aliaksandr Valialkin
c1f81f08d4 all: add support for Prometheus staleness markers
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1526
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/748
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1509
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1530
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/845
2021-08-13 12:13:15 +03:00
Aliaksandr Valialkin
90efb5831b lib/envflag: add a link to docs for -envflag.enable 2021-08-11 10:32:40 +03:00
Aliaksandr Valialkin
b877538622 app/vmagent: follow-up after fe445f753b
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1491
2021-08-05 09:51:00 +03:00
Omar Ghader
fe445f753b
feature: Add multitenant for vmagent (#1505)
* feature: Add multitenant for vmagent

* Minor fix

* Fix rcs index out of range

* Minor fix

* Fix multi Init

* Fix multi Init

* Fix multi Init

* Add default multi

* Adjust naming

* Add TenantInserted metrics

* Add TenantInserted metrics

* fix: remove unused metrics for vmagent

* fix: remove unused metrics for vmagent

Co-authored-by: mghader <marc.ghader@ubisoft.com>
Co-authored-by: Sebastian YEPES <syepes@gmail.com>
2021-08-05 09:44:29 +03:00
Aliaksandr Valialkin
77bb9e1656 lib/promscrape/discovery/gce: add __meta_gce_interface_ipv4_<name> labels as in Prometheus 2.29
See https://github.com/prometheus/prometheus/pull/8978
2021-08-03 15:51:45 +03:00
Aliaksandr Valialkin
336a2aa2e0 lib/promscrape/discovery/ec2: add __meta_ec2_availability_zone_id label as Prometheus 2.29 does 2021-08-03 13:28:13 +03:00
Aliaksandr Valialkin
c473d8ffe1 li/storage: re-use the per-day inverted index search code for searching in global index
This allows removing a big pile of outdated code for global index search.

This may help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1486
2021-07-30 10:28:20 +03:00
Nikolay
6d47e750be adds check for region with custom s3 endpoint (#1465) 2021-07-27 12:39:10 +03:00
Aliaksandr Valialkin
1950f57316 lib/storage: yet another attempt to properly determine disk space shortage, which prevents from optimal merges
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1373
2021-07-27 12:03:31 +03:00
Aliaksandr Valialkin
92628f9f07 lib/promrelabel: add tests for verifying that regex works as expected in single quotes and double quotes 2021-07-27 10:53:03 +03:00
Aliaksandr Valialkin
5d255846ac all: add go:build lines for Go1.17
See https://tip.golang.org/doc/go1.17#gofmt for more details
2021-07-26 15:50:46 +03:00
Aliaksandr Valialkin
c857e05604 lib/promscrape: add missing whitespace at /targets page before up word 2021-07-26 12:23:06 +03:00
Aliaksandr Valialkin
376af3c956 lib/workingsetcache: switch from split cache to full cache after the cache size exceeds 95% of split capacity
Previously the switch occurred when the cache size becomes 100% of its capacity. The cache size could never reach 100% capacity.
This could prevent from switching from the split cache to full cache, thus reducing the cache effectiveness.
2021-07-15 16:53:35 +03:00
Aliaksandr Valialkin
9d3f9da5ad lib/storage: make sure the second call to DeduplicateSamples and deduplicateSamplesDuringMerge doesnt change samples 2021-07-15 12:18:38 +03:00
Aliaksandr Valialkin
e992754e79 lib/storage: remove cache directory if it contains reset_cache_on_startup file
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1447
2021-07-13 17:59:51 +03:00
Aliaksandr Valialkin
e6edb85fa2 lib/httpserver: add is_set label to flag metrics
This label allows determining the set flags with the query `flag{is_set="true"}`
2021-07-13 15:10:18 +03:00
Aliaksandr Valialkin
51cd19d2e3 lib/storage: reset perKeyMisses stats less frequently
This should reduce CPU usage for queries executed with intervals higher than 30 seconds
2021-07-12 14:34:54 +03:00
Aliaksandr Valialkin
3f705fe8d7 lib/storage: properly limit the size of storage/date_metricID cache 2021-07-12 14:25:28 +03:00
Aliaksandr Valialkin
ef3c58d7a3 lib/storage: properly determine when the deduplication is needed in needsDedup
Previously needsDedup() could return true if the de-duplication wasn't needed for the following case:

         d < interval
           /     \
   |        v | v        |
     interval   interval

Now it properly returns false for this case
2021-07-12 10:54:51 +03:00
Aliaksandr Valialkin
41754e12f8 lib/mergeset: cache indexBlock items only on the second request
This should reduce the indexdb/indexBlocks cache size, since it won't contain one-time-wonders items.
2021-07-07 15:24:37 +03:00
Aliaksandr Valialkin
ceda2b1df4 lib/httpserver: print full requestURI in httpserver.Errorf
This should simplify debugging.
2021-07-07 13:11:29 +03:00
Aliaksandr Valialkin
9826f7c1be lib/storage: do not cache inmemoryBlock entries requested only once (aka one-time-wonder items)
This should reduce the cache size and memory usage for the indexdb/dataBlocks cache
2021-07-07 10:59:45 +03:00
Aliaksandr Valialkin
74ace9340d lib/storage: periodically reset prefetchedMetricIDs cache in order to limit its size under high churn rate 2021-07-07 10:59:39 +03:00
Aliaksandr Valialkin
a846febc89 Revert "lib/uint64set: allow reusing bucket16 structs inside uint64set.Set via uint64set.Release method"
This reverts commit 7c6d3981bf.

Reason for revert: high contention at bucket16Pool on systems with big number of CPU cores.
This slows down query processing significantly.
2021-07-06 18:26:56 +03:00
Aliaksandr Valialkin
b805a675f3 lib/{mergeset,storage}: switch from sync.Pool to chan-based pool for inmemoryPart objects
This should reduce memory usage on systems with big number of CPU cores,
since every inmemoryPart object occupies at least 64KB of memory and sync.Pool maintains
a separate pool inmemoryPart objects per each CPU core.

Though the new scheme for the pool worsens per-cpu cache locality, this should be amortized
by big sizes of inmemoryPart objects.
2021-07-06 16:33:25 +03:00
Aliaksandr Valialkin
d8e7c1ef27 lib/uint64set: allow reusing bucket16 structs inside uint64set.Set via uint64set.Release method
This reduces the load on memory allocator in Go runtime in production workload.
2021-07-06 16:33:24 +03:00
Aliaksandr Valialkin
db6bd69475 lib/mergeset: increase pool capacity for inmemoryBlock according to collected profiles from production workload
CPU and memory profiles show that the pool capacity for inmemoryBlock objects is too small.
This results in the increased load on memory allocation code in Go runtime.
Increase the pool capacity in order to reduce the load on Go runtime.
2021-07-06 13:44:27 +03:00
Aliaksandr Valialkin
fd32855a6c lib/mergeset: limit the frequency for flushCallback calls to once per 10 seconds
This should improve hit ratio for tagFiltersCache when big number of new time series are constantly registered
(aka high churn rate). This, in turn, should reduce CPU usage for queries over such time series.
2021-07-06 12:20:15 +03:00
Aliaksandr Valialkin
22c6e64bbc lib/storage: consistency renaming: tagCache -> tagFiltersCache
This improves code readability
2021-07-06 11:03:30 +03:00
Aliaksandr Valialkin
21abf487c3 lib/workingsetcache: properly update stats for requests and cache misses
Previously the stats for cache misses could be improperly counted, because it had inflated cache misses
if the entry was missing in the curr cache, but was existing in the prev cache.

The same applies to cache requests - they were inflated if the entry was missing in the curr cache.
2021-07-06 10:54:38 +03:00
Aliaksandr Valialkin
e5031d9aee lib/workingsetcache: fix cache capacity calculations after 4f0003f182 2021-07-05 17:16:35 +03:00
Aliaksandr Valialkin
bd71f102e8 lib/workingsetcache: typo fixes after d0c830039d 2021-07-05 15:35:51 +03:00
Aliaksandr Valialkin
4b25e627f8 lib/workingsetcache: properly switch to whole mode
Previously the switch from `split` to `whole` mode had been performed too early,
e.g. when the current cache size became bigger than 1/4 of the allowed cache size.

Now it is performed when the current cache size becomes bigger than 1/2 of the allowed cache size.

This change can reduce memory usage for data ingestion path when big number of active time series are ingested.
2021-07-05 15:15:39 +03:00
Aliaksandr Valialkin
51516b96e6 lib/storage: tune cache sizes according to production workload 2021-07-05 15:14:45 +03:00
Aliaksandr Valialkin
f12f97daa1 lib/{storage,mergeset}: increase cache timeout for data and index blocks from a minute to two minutes
One minute cache timeout result in slower queries in some production workloads where the interval
between query execution is in the range 1 minute - 2 minutes.
2021-07-05 14:25:59 +03:00
Aliaksandr Valialkin
377bb06b47 lib/cgroup: set GOGC to 50 by default if it isn't set
This should reduce memory usage for typical VictoriaMetrics workloads by up to 50%
2021-07-05 12:34:01 +03:00
Aliaksandr Valialkin
8055439fe4 lib/storage: properly detect free disk space shortage during data merge
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1373
2021-07-02 17:42:23 +03:00
Aliaksandr Valialkin
6fc3696260 lib/promscrape/discovery/consul: use case-insensitive comparison for service names
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1424
2021-07-02 14:49:22 +03:00
Aliaksandr Valialkin
61e483a01c lib/protoparser/clusternative: remove unused field - unmarshalWork.lastResetTime
This is a follow-up for b84aea1e6e
2021-07-02 13:32:59 +03:00