github-mirrors/VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-11-21 14:44:00 +00:00

Author	SHA1	Message	Date
artifactori	ea153e5f90	Show gce sdconfig zone on vmagent:8429/config (#2178 ) * vmagent: add test for marshalling gce sdconfig with ZoneYAML * vmagent: implement MarshalYAML for ZoneYAML on gce sdconfig	2022-02-12 00:39:23 +02:00
Roman Khavronenko	cf1a8bce6b	lib/index: reduce read/write load after indexDB rotation (#2177 ) * lib/index: reduce read/write load after indexDB rotation IndexDB in VM is responsible for storing TSID - ID's used for identifying time series. The index is stored on disk and used by both ingestion and read path. IndexDB is stored separately to data parts and is global for all stored data. It can't be deleted partially as VM deletes data parts. Instead, indexDB is rotated once in `retention` interval. The rotation procedure means that `current` indexDB becomes `previous`, and new freshly created indexDB struct becomes `current`. So in any time, VM holds indexDB for current and previous retention periods. When time series is ingested or queried, VM checks if its TSID is present in `current` indexDB. If it is missing, it checks the `previous` indexDB. If TSID was found, it gets copied to the `current` indexDB. In this way `current` indexDB stores only series which were active during the retention period. To improve indexDB lookups, VM uses a cache layer called `tsidCache`. Both write and read path consult `tsidCache` and on miss the relad lookup happens. When rotation happens, VM resets the `tsidCache`. This is needed for ingestion path to trigger `current` indexDB re-population. Since index re-population requires additional resources, every index rotation event may cause some extra load on CPU and disk. While it may be unnoticeable for most of the cases, for systems with very high number of unique series each rotation may lead to performance degradation for some period of time. This PR makes an attempt to smooth out resource usage after the rotation. The changes are following: 1. `tsidCache` is no longer reset after the rotation; 2. Instead, each entry in `tsidCache` gains a notion of indexDB to which they belong; 3. On ingestion path after the rotation we check if requested TSID was found in `tsidCache`. Then we have 3 branches: 3.1 Fast path. It was found, and belongs to the `current` indexDB. Return TSID. 3.2 Slow path. It wasn't found, so we generate it from scratch, add to `current` indexDB, add it to `tsidCache`. 3.3 Smooth path. It was found but does not belong to the `current` indexDB. In this case, we add it to the `current` indexDB with some probability. The probability is based on time passed since the last rotation with some threshold. The more time has passed since rotation the higher is chance to re-populate `current` indexDB. The default re-population interval in this PR is set to `1h`, during which entries from `previous` index supposed to slowly re-populate `current` index. The new metric `vm_timeseries_repopulated_total` was added to identify how many TSIDs were moved from `previous` indexDB to the `current` indexDB. This metric supposed to grow only during the first `1h` after the last rotation. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 Signed-off-by: hagen1778 <roman@victoriametrics.com> * wip * wip Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2022-02-12 00:30:08 +02:00
Aliaksandr Valialkin	08428464e9	lib/storage: fix broken BenchmarkHeadPostingForMatchers for `{i=~".*"}` after `f4dead529f` The commit `f4dead529f` makes such query to return nothing instead of all the time series. This aligns more with Prometheus behaviour.	2022-02-12 00:27:10 +02:00
Roman Khavronenko	e3adcbec6e	lib/promscrape: support prometheus-like duration in scrape configs (#2169 ) * lib/promscrape: support prometheus-like duration in scrape configs The change allows to specify duration values like `1d`, `1w` for fields `scrape_interval`, `scrape_timeout`, etc. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/817#issuecomment-1033384766 Signed-off-by: hagen1778 <roman@victoriametrics.com> * lib/blockcache: make linter happy Signed-off-by: hagen1778 <roman@victoriametrics.com> * lib/promscrape: support prometheus-like duration in scrape configs * add support for extra fields `scrape_align_interval` and `scrape_offset`; * support Prometheus duration parsing for `__scrape_interval__` and `__scrape_duration__` labels; Signed-off-by: hagen1778 <roman@victoriametrics.com> * wip * wip * docs/CHANGELOG.md: document the feature Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2022-02-11 16:17:00 +02:00
Aliaksandr Valialkin	3cb72ccc2a	lib/promscrape/discovery/kubernetes: add `__meta_kubernetes_endpointslice_{label,annotation}*` labels to be consistent with other `role` values for Kubernetes service discovery	2022-02-11 14:54:47 +02:00
Nikolay	4e7f7f3302	fixes service discovery for kubernetes (#2173 ) * fixes service discovery for kubernetes now it must take in account all pods that belong to the discovered endpoint and endpointslice adds simple test for endpoints https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2134 * wip * docs/CHANGELOG.md: document the change Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2022-02-11 13:34:22 +02:00
Aliaksandr Valialkin	f9a17cb5fe	lib/mergeset: tune indexdb/{indexBlocks,dataBlocks} cache sizes further according to production stats Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007	2022-02-10 19:09:46 +02:00
Aliaksandr Valialkin	a9bb22b213	lib/blockcache: use higher number of shards for higher number of CPU cores This should reduce mutex contention and increase performance Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007	2022-02-10 19:06:12 +02:00
Aliaksandr Valialkin	db8c4054e5	lib/promscrape: fix errors in test config The errors were discovered after enabling strict parse mode by default. See `9bb60ab00f`	2022-02-08 19:56:37 +02:00
Aliaksandr Valialkin	4507b111a9	lib/blockcache: split the cache into multiple shards This should reduce contention on cache mutex on hosts with many CPU cores, which, in turn, should increase overall throughput for the cache. This should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007	2022-02-08 19:44:29 +02:00
Aliaksandr Valialkin	2455a988e4	lib/mergeset: tune sizes for `indexdb/dataBlocks` and `indexdb/indexBlocks` according to production workload This should help with https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007#issuecomment-1032308742	2022-02-08 17:58:49 +02:00
Aliaksandr Valialkin	9bb60ab00f	lib/promscrape: set `-promscrape.config.strictParse` to true by default This allows detecting long-living silent errors in -promscrape.config	2022-02-08 15:41:43 +02:00
Aliaksandr Valialkin	a19e7f8c5b	lib/blockcache: `make fmt`	2022-02-08 15:24:11 +02:00
Aliaksandr Valialkin	d0f785defd	lib/blockcache: eliminate possible race when Cache.Put is called for the same entry from multiple goroutines The race could result in incorrect cache size tracking, which, in turn, could result in too frequent cache cleaning	2022-02-08 01:10:43 +02:00
Aliaksandr Valialkin	46bd2c4d6d	lib/blockcache: increase the lifetime for rarely accessed blocks from 2 minutes to 5 minutes This should improve data ingestion speed if time series samples are ingested with interval bigger than 2 minutes. The actual interval could exceed 2 minutes if the original interval between samples doesn't exceed 2 minutes in the case of slow inserts. Slow inserts may appear in the following cases: * Big number of new time series are pushed to VictoriaMetrics, so they couldn't be registered in 2 minutes. * MetricName->tsid cache reset on indexdb rotation or due to unclean shutdown. In this case VictoriaMetrics needs to load MetricName->tsid entries for all the incoming series from IndexDB. IndexDB uses the block cache for increasing lookup performance. If the cache has no the needed block, then IndexDB reads and unpacks the block from disk. This requires an extra disk read IO and CPU. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007 This also should increase performance for periodically executed queries with intervals from 2 minutes to 5 minutes. See the previous similar commit - `43103be011` It is possible that the timeout can be increased further. Let's collect production numbers for this change so the timeout could be adjusted further.	2022-02-08 00:15:56 +02:00
Aliaksandr Valialkin	e86b7cc9a5	lib/workingsetcache: use the original cache size limits when rotating caches Previously limits for new caches were taken from cache stats. These limits could mismatch the original limits. This could result in failed cache load if the stored cache has been created with the limits obtained from cache stats.	2022-02-08 00:10:14 +02:00
Aliaksandr Valialkin	cde4664f0d	lib/blockcache: return proper number of entries from the cache This has been broken in `0d7374ad2f`	2022-02-07 19:28:42 +02:00
Aliaksandr Valialkin	b5b3c585b3	lib/promscrape: show the total number of scrapes and the total number of scrape errors per target at /targets page This information may be useful when debugging unreliable scrape targets	2022-02-03 20:22:41 +02:00
Aliaksandr Valialkin	2968779f16	lib/promscrape: provide the ability to fetch target responses on behalf of vmagent or single-node VictoriaMetrics This feature may be useful when debugging metrics for the given target located in isolated environment	2022-02-03 19:00:55 +02:00
Aliaksandr Valialkin	9c62b25ad6	lib/mergeset: pre-allocate data and items for inmemoryBlock in order to reduce memory allocations under high churn rate Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007	2022-02-01 00:57:14 +02:00
Aliaksandr Valialkin	4bdd10ab90	lib/bytesutil: split Resize* funcs to MayOverallocate and NoOverallocate for more fine-grained control over memory allocations Follow-up for `f4989edd96` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007	2022-02-01 00:18:42 +02:00
Aliaksandr Valialkin	e13ce2ee98	lib/encoding: substitute `64-bits.LeadingZeros64()` with `bits.Len64()`	2022-01-31 23:36:48 +02:00
Aliaksandr Valialkin	a8509c112a	lib/storage: avoid allocations of tsidPrev on every blockStreamReader.NextBlock() call This is a follow-up for `00b7c97d2a` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2082	2022-01-31 22:46:53 +02:00
Aliaksandr Valialkin	f50cf60534	lib/cgroup: fall back to runtime.NumCPU() when determining process_cpu_cores_available metric if it is impossible to determine cpu quota via cgroups Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2107	2022-01-31 20:30:14 +02:00
Aliaksandr Valialkin	ead66155ef	lib/cgroup: expose `process_cpu_cores_available` metric This metric shows the number of CPU cores available to the process. This allows creating alerting rules on CPU saturation with the following query: rate(process_cpu_seconds_total[5m]) / process_cpu_cores_available > 0.9 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2107	2022-01-31 20:24:41 +02:00
Aliaksandr Valialkin	96aa3761fc	lib/storage/table.go: add missing `tb.ptwsLock.Unlock()` before the return This is a follow-up for `a1083d0531` See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/2103	2022-01-28 14:15:42 +02:00
匠心零度	1999bbfe82	optimized code (#2103 ) * optimized code ,because only the first error,so no need var errors []error * optimized code ,because only the first error,so no need var errors []error Co-authored-by: lirenzuo <lirenzuo@shein.com>	2022-01-28 14:15:41 +02:00
Aliaksandr Valialkin	f4989edd96	lib/bytesutil: split Resize() into ResizeNoCopy() and ResizeWithCopy() functions Previously bytesutil.Resize() was copying the original byte slice contents to a newly allocated slice. This wasted CPU cycles and memory bandwidth in some places, where the original slice contents wasn't needed after slize resizing. Switch such places to bytesutil.ResizeNoCopy(). Rename the original bytesutil.Resize() function to bytesutil.ResizeWithCopy() for the sake of improved readability. Additionally, allocate new slice with `make()` instead of `append()`. This guarantees that the capacity of the allocated slice exactly matches the requested size. The `append()` could return a slice with bigger capacity as an optimization for further `append()` calls. This could result in excess memory usage when the returned byte slice was cached (for instance, in lib/blockcache). Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007	2022-01-25 15:24:44 +02:00
Aliaksandr Valialkin	91f2af2d7a	lib/mergeset: allocate the needed amounts of memory when unmarshaling inmemoryBlock This should reduce the memory required for indexdb/dataBlocks cache. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007	2022-01-24 18:50:40 +02:00
Aliaksandr Valialkin	4c13bae1cf	lib/logger: removed broken test after `746ee191e8`	2022-01-24 12:14:32 +02:00
Aliaksandr Valialkin	746ee191e8	lib/logger/throttler.go: show the original location of the error and warning message Previously the location inside LogThrottler implementation was shown. This could complicate debugging.	2022-01-23 13:55:00 +02:00
Aliaksandr Valialkin	0d7374ad2f	lib/blockcache: optimize blockcache a bit - Optimize Cache.RemoveBlocksFromPart(), so it doesn't need to iterate over all the cached blocks. - Cache blocks if there were no cache misses during the last 2 minutes. This may be the case when new blocks are added simultaneously to the storage and to the cache. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007	2022-01-23 13:13:45 +02:00
Aliaksandr Valialkin	ede93469ea	lib/mergeset: tune caches size limits for `indexdb/dataBlocks` and `indexdb/indexBlocks` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007	2022-01-21 12:45:43 +02:00
Aliaksandr Valialkin	5f84b17ed6	lib/storage: properly limit cardinality when ingesting multiple samples for the same time series in a single request	2022-01-21 12:38:09 +02:00
Aliaksandr Valialkin	00b7c97d2a	lib/storage: verify that blocks in a single part are sorted by TSID when reading sequential blocks from the part This may help narrowing down the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2082	2022-01-20 20:36:37 +02:00
Aliaksandr Valialkin	ea87f21e23	lib/storage: set bsm.Block to nil on error, so the previous block couldn't be used. This may help nailing down the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2082	2022-01-20 20:13:14 +02:00
Aliaksandr Valialkin	9797c928ef	lib/blockcache: add missing dependency after `145337792d` Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007	2022-01-20 18:50:44 +02:00
Aliaksandr Valialkin	145337792d	lib/{mergeset,storage}: properly limit cache sizes for indexdb Previously these caches could exceed limits set via `-memory.allowedPercent` and/or `-memory.allowedBytes`, since limits were set independently per each data part. If the number of data parts was big, then limits could be exceeded, which could result to out of memory errors. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007	2022-01-20 18:37:17 +02:00
Aliaksandr Valialkin	1d05444b33	lib/promscrape: expose promscrape_stale_samples_created_total metric for monitoring the number of created stale samples	2022-01-14 01:00:46 +02:00
Aliaksandr Valialkin	80f03177c4	lib/promscrape/discovery/kubernetes: add `__meta_kubernetes_node_provider_id` label for discovered Kubernetes nodes in the same way as Prometheus does See https://github.com/prometheus/prometheus/pull/9603	2022-01-13 23:16:02 +02:00
Aliaksandr Valialkin	355a63733d	lib/promscrape/discovery/kubernetes: add the ability to limit service discovery to the current namespace See https://github.com/prometheus/prometheus/issues/9782 and https://github.com/prometheus/prometheus/pull/9881	2022-01-13 22:44:35 +02:00
Aliaksandr Valialkin	17eb86a689	lib/promscrape/discovery/dockerswarm: follow up after `68a117a25a` - Document the bugfix at docs/CHANGELOG.md - Set __address__ field after copying commonLabels to the resulting map of discovered labels. This makes sure that the correct __address__ label is used.	2022-01-11 09:20:10 +02:00
Alexander Shtuchkin	68a117a25a	Fix for #2038 : Make correct __address__ value for dockerswarm promscrape (#2041 )	2022-01-11 08:59:06 +02:00
Aliaksandr Valialkin	e4e36383e2	lib/promscrape: do not send staleness markers on graceful shutdown This follows Prometheus behavior. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2013#issuecomment-1006994079	2022-01-07 01:17:57 +02:00
Aliaksandr Valialkin	178dd87e26	lib/storage: follow-up for `38bf5fc136`	2022-01-05 16:00:11 +02:00
weng zhao	38bf5fc136	vmstorage: fix query like `{foo=~"bar\|"}` return extra timeseries cause by negative filter transformation malfunction (#2032 ) 1. L2749 make kb.B remain the value of comonPrefix instead of tf.prefix 2. L2762 avoid change tf.value from "bar\|" to ".+r\|"	2022-01-05 15:59:15 +02:00
Aliaksandr Valialkin	cbaa2af280	lib/promscrape: scrape replicated targets at different offsets in vmagent replicated clustering mode This guarantees that the deduplication consistently leaves samples from the same vmagent replica. See https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets	2021-12-23 00:20:39 +02:00
Nikolay	8ff7da7202	adds restore.lock (#1988 ) * adds restore.lock it must prevent from running storage after incomplete restore process https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1958 * return back flock file deletion * Apply suggestions from code review * wip * docs/CHANGELOG.md: document https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1958 Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2021-12-22 13:10:15 +02:00
Aliaksandr Valialkin	ce333f28d8	all: use logger.WithThrottler() where appropriate	2021-12-21 17:03:25 +02:00
Roman Khavronenko	34fdc8881b	vmagent: add error log for skipped data block when rejected by receiv… (#1956 ) * vmagent: add error log for skipped data block when rejected by receiving side Previously, rejected data blocks were silently dropped - only metrics were update. From operational perspective, having an additional logging for such cases is preferable. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1911 Signed-off-by: hagen1778 <roman@victoriametrics.com> * vmagent: throttle log messages about skipped blocks The new type of logger was added to logger pacakge. This new type supposed to control number of logged messages by time. Signed-off-by: hagen1778 <roman@victoriametrics.com> * lib/logger: make LogThrottler public, so its methods can be inspected by external packages Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>	2021-12-21 16:36:09 +02:00

1 2 3 4 5 ...

1322 commits