Commit graph

1619 commits

Author SHA1 Message Date
Aliaksandr Valialkin
1caee74235
lib/blockcache: eliminate possible race when Cache.Put is called for the same entry from multiple goroutines
The race could result in incorrect cache size tracking, which, in turn, could result in too frequent cache cleaning
2022-02-08 01:18:27 +02:00
Aliaksandr Valialkin
10476738a8
lib/blockcache: increase the lifetime for rarely accessed blocks from 2 minutes to 5 minutes
This should improve data ingestion speed if time series samples are ingested with interval bigger than 2 minutes.
The actual interval could exceed 2 minutes if the original interval between samples doesn't exceed 2 minutes
in the case of slow inserts. Slow inserts may appear in the following cases:

* Big number of new time series are pushed to VictoriaMetrics, so they couldn't be registered in 2 minutes.
* MetricName->tsid cache reset on indexdb rotation or due to unclean shutdown.
  In this case VictoriaMetrics needs to load MetricName->tsid entries for all the incoming series from IndexDB.
  IndexDB uses the block cache for increasing lookup performance. If the cache has no the needed block,
  then IndexDB reads and unpacks the block from disk. This requires an extra disk read IO and CPU.
  See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007

This also should increase performance for periodically executed queries with intervals from 2 minutes to 5 minutes.
See the previous similar commit - 43103be011

It is possible that the timeout can be increased further. Let's collect production numbers for this change
so the timeout could be adjusted further.
2022-02-08 01:18:27 +02:00
Aliaksandr Valialkin
b7cefff7b0
lib/workingsetcache: use the original cache size limits when rotating caches
Previously limits for new caches were taken from cache stats.
These limits could mismatch the original limits. This could result in failed cache load
if the stored cache has been created with the limits obtained from cache stats.
2022-02-08 01:18:27 +02:00
Aliaksandr Valialkin
87071640a7
lib/blockcache: return proper number of entries from the cache
This has been broken in 0d7374ad2f
2022-02-08 01:18:27 +02:00
Aliaksandr Valialkin
34d14c4940
all: substitute zeroTime with time.Time{}, since this generates more optimal binary code 2022-02-07 14:36:41 +02:00
Aliaksandr Valialkin
e2d12a25e0
lib/netutil: increase dial timeout from 1 second to 5 seconds
There are real-world cases when TCP connection needs more than 1 second to be established.
2022-02-07 12:33:40 +02:00
Aliaksandr Valialkin
d24e5d9efd
lib/promscrape: show the total number of scrapes and the total number of scrape errors per target at /targets page
This information may be useful when debugging unreliable scrape targets
2022-02-03 20:23:27 +02:00
Aliaksandr Valialkin
678b3e71db
lib/promscrape: provide the ability to fetch target responses on behalf of vmagent or single-node VictoriaMetrics
This feature may be useful when debugging metrics for the given target located in isolated environment
2022-02-03 19:02:12 +02:00
Aliaksandr Valialkin
5f266370c5
all: follow-up after 4bdd10ab90
Properly use new bytesutil.Resize* functions
2022-02-01 17:49:28 +02:00
Aliaksandr Valialkin
d8d59ff760
lib/mergeset: pre-allocate data and items for inmemoryBlock in order to reduce memory allocations under high churn rate
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-02-01 11:20:20 +02:00
Aliaksandr Valialkin
02b2bfcff3
lib/bytesutil: split Resize* funcs to MayOverallocate and NoOverallocate for more fine-grained control over memory allocations
Follow-up for f4989edd96

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-02-01 11:20:20 +02:00
Aliaksandr Valialkin
084664d780
lib/encoding: substitute 64-bits.LeadingZeros64() with bits.Len64() 2022-02-01 11:20:20 +02:00
Aliaksandr Valialkin
0fbfa8c245
lib/storage: avoid allocations of tsidPrev on every blockStreamReader.NextBlock() call
This is a follow-up for 00b7c97d2a

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2082
2022-01-31 22:47:16 +02:00
Aliaksandr Valialkin
a02dde6cc7
lib/cgroup: fall back to runtime.NumCPU() when determining process_cpu_cores_available metric if it is impossible to determine cpu quota via cgroups
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2107
2022-01-31 20:31:12 +02:00
Aliaksandr Valialkin
566e12874d
lib/cgroup: expose process_cpu_cores_available metric
This metric shows the number of CPU cores available to the process.
This allows creating alerting rules on CPU saturation with the following query:

    rate(process_cpu_seconds_total[5m]) / process_cpu_cores_available > 0.9

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2107
2022-01-31 20:25:15 +02:00
Aliaksandr Valialkin
776b7bc9f8
lib/storage/table.go: add missing tb.ptwsLock.Unlock() before the return
This is a follow-up for a1083d0531

See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/2103
2022-01-28 12:12:58 +02:00
匠心零度
a1083d0531
optimized code (#2103)
* optimized code ,because only the first error,so no need var errors []error

* optimized code ,because only the first error,so no need var errors []error

Co-authored-by: lirenzuo <lirenzuo@shein.com>
2022-01-28 12:10:47 +02:00
Aliaksandr Valialkin
6232eaa938
lib/bytesutil: split Resize() into ResizeNoCopy() and ResizeWithCopy() functions
Previously bytesutil.Resize() was copying the original byte slice contents to a newly allocated slice.
This wasted CPU cycles and memory bandwidth in some places, where the original slice contents wasn't needed
after slize resizing. Switch such places to bytesutil.ResizeNoCopy().

Rename the original bytesutil.Resize() function to bytesutil.ResizeWithCopy() for the sake of improved readability.

Additionally, allocate new slice with `make()` instead of `append()`. This guarantees that the capacity of the allocated slice
exactly matches the requested size. The `append()` could return a slice with bigger capacity as an optimization for further `append()` calls.
This could result in excess memory usage when the returned byte slice was cached (for instance, in lib/blockcache).

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-01-25 15:28:42 +02:00
Aliaksandr Valialkin
7ec0705b98
lib/mergeset: allocate the needed amounts of memory when unmarshaling inmemoryBlock
This should reduce the memory required for indexdb/dataBlocks cache.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-01-24 18:52:22 +02:00
Aliaksandr Valialkin
65176726b3
lib/logger: removed broken test after 746ee191e8 2022-01-24 12:15:11 +02:00
Aliaksandr Valialkin
49650fe6aa
lib/logger/throttler.go: show the original location of the error and warning message
Previously the location inside LogThrottler implementation was shown. This could complicate debugging.
2022-01-23 13:55:48 +02:00
Aliaksandr Valialkin
233101137d
lib/blockcache: optimize blockcache a bit
- Optimize Cache.RemoveBlocksFromPart(), so it doesn't need to iterate over all the cached blocks.
- Cache blocks if there were no cache misses during the last 2 minutes.
  This may be the case when new blocks are added simultaneously to the storage and to the cache.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-01-23 13:08:55 +02:00
Aliaksandr Valialkin
9edf407144
lib/mergeset: tune caches size limits for indexdb/dataBlocks and indexdb/indexBlocks
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-01-21 12:46:05 +02:00
Aliaksandr Valialkin
4e05298756
lib/storage: properly limit cardinality when ingesting multiple samples for the same time series in a single request 2022-01-21 12:38:22 +02:00
Aliaksandr Valialkin
e3277918e4
lib/storage: verify that blocks in a single part are sorted by TSID when reading sequential blocks from the part
This may help narrowing down the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2082
2022-01-20 20:37:28 +02:00
Aliaksandr Valialkin
54ee71e16d
lib/storage: set bsm.Block to nil on error, so the previous block couldn't be used.
This may help nailing down the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2082
2022-01-20 20:37:24 +02:00
Aliaksandr Valialkin
5159a9451f
lib/blockcache: add missing dependency after 145337792d
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-01-20 18:51:03 +02:00
Aliaksandr Valialkin
6ae584b9b3
lib/{mergeset,storage}: properly limit cache sizes for indexdb
Previously these caches could exceed limits set via `-memory.allowedPercent` and/or `-memory.allowedBytes`,
since limits were set independently per each data part. If the number of data parts was big, then limits could be exceeded,
which could result to out of memory errors.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-01-20 18:45:03 +02:00
Aliaksandr Valialkin
da95516a1f
lib/promscrape: expose promscrape_stale_samples_created_total metric for monitoring the number of created stale samples 2022-01-14 01:00:40 +02:00
Aliaksandr Valialkin
dd91759f1f
lib/promscrape/discovery/kubernetes: add __meta_kubernetes_node_provider_id label for discovered Kubernetes nodes in the same way as Prometheus does
See https://github.com/prometheus/prometheus/pull/9603
2022-01-13 23:17:24 +02:00
Aliaksandr Valialkin
bc18368c15
lib/promscrape/discovery/kubernetes: add the ability to limit service discovery to the current namespace
See https://github.com/prometheus/prometheus/issues/9782 and https://github.com/prometheus/prometheus/pull/9881
2022-01-13 22:44:59 +02:00
Aliaksandr Valialkin
de8299f465
lib/promscrape/discovery/dockerswarm: follow up after 68a117a25a
- Document the bugfix at docs/CHANGELOG.md
- Set __address__ field after copying commonLabels to the resulting map of discovered labels.
  This makes sure that the correct __address__ label is used.
2022-01-11 09:22:03 +02:00
Alexander Shtuchkin
45a92e6ce1
Fix for #2038: Make correct __address__ value for dockerswarm promscrape (#2041) 2022-01-11 09:22:02 +02:00
Aliaksandr Valialkin
fa89f3e5a5
lib/promscrape: do not send staleness markers on graceful shutdown
This follows Prometheus behavior.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2013#issuecomment-1006994079
2022-01-07 01:19:06 +02:00
Aliaksandr Valialkin
80fc3fda07
lib/storage: follow-up for 38bf5fc136 2022-01-05 16:02:17 +02:00
weng zhao
1e0fe615ad
vmstorage: fix query like {foo=~"bar|"} return extra timeseries cause by negative filter transformation malfunction (#2032)
1. L2749 make kb.B remain the value of comonPrefix instead of tf.prefix
2. L2762 avoid change tf.value from "bar|" to ".+r|"
2022-01-05 15:57:54 +02:00
Aliaksandr Valialkin
c1722003a2
lib/promscrape: scrape replicated targets at different offsets in vmagent replicated clustering mode
This guarantees that the deduplication consistently leaves samples from the same vmagent replica.

See https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets
2021-12-23 00:21:41 +02:00
Nikolay
6cdc934c3d
adds restore.lock (#1988)
* adds restore.lock
it must prevent from running storage after incomplete restore process
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1958

* return back flock file deletion

* Apply suggestions from code review

* wip

* docs/CHANGELOG.md: document https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1958

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2021-12-22 13:10:56 +02:00
Aliaksandr Valialkin
727797a6fd
all: use logger.WithThrottler() where appropriate 2021-12-21 17:10:54 +02:00
Aliaksandr Valialkin
4dbf12254d
lib/promscrape: take into account the original job_name when creating an unique key per each scrape target
This should handle the case when the original job_name has been changed in -promscrape.config ,
while the resulting job label remains the same because it is overriden via relabeling.
2021-12-21 16:42:42 +02:00
Roman Khavronenko
23e1de06ee
vmagent: add error log for skipped data block when rejected by receiv… (#1956)
* vmagent: add error log for skipped data block when rejected by receiving side

Previously, rejected data blocks were silently dropped - only metrics were update.
From operational perspective, having an additional logging for such cases is preferable.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1911

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmagent: throttle log messages about skipped blocks

The new type of logger was added to logger pacakge.
This new type supposed to control number of logged messages
by time.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/logger: make LogThrottler public, so its methods can be inspected by external packages

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2021-12-21 16:42:38 +02:00
Aliaksandr Valialkin
053e85ff3d
all: typo fix: unexected -> unexpected 2021-12-20 17:40:13 +02:00
Aliaksandr Valialkin
406cb06f8c
lib/persistentqueue: check that readerOffset doesnt exceed writerOffset after each readerOffset increase
This should help detecting the source of the panic from https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1981
2021-12-20 17:26:07 +02:00
Aliaksandr Valialkin
f22aab411b
lib/storage: properly update per-part min_dedup_interval file contents after merge
Previously 0s was always written even if -dedup.minScrapeInterval was set to non-zero value

This is a follow-up for 4ff647137a
2021-12-17 20:12:18 +02:00
Aliaksandr Valialkin
5bd4e47a9e
lib/promscrape: allow up to 5 redirects when scraping a target by default
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1945
2021-12-16 00:14:45 +02:00
Aliaksandr Valialkin
d36fdbe537
lib/storage: deduplicate samples more thoroughly
Previously some duplicate samples may be left on disk for time series with high churn rate.
This may result in higher disk space usage.
2021-12-15 16:00:30 +02:00
Aliaksandr Valialkin
bc3923111b
lib/storage: return dedup interval in milliseconds from GetDedupInterval()
This removes duplicate .Milliseconds() calls after GetDedupInterval() calls.
2021-12-15 13:27:27 +02:00
Aliaksandr Valialkin
cdfe854c9b
lib/storage: explicitly pass dedupInterval to DeduplicateSamples() and deduplicateSamplesDuringMerge()
This improves the code readability and debuggability, since the output of these functions
stops depending on global state.
2021-12-14 20:52:29 +02:00
Aliaksandr Valialkin
c922c7af9a
lib/storage: convert alternate regexps into Graphite wildcards inside __graphite__ pseudo-label
For example, `{__graphite__=~"foo.(bar|baz)"}` is automatically converted to `{__graphite__=~"foo.{bar,baz}"}` before execution.
This allows using multi-value Grafana template variables such as `{__graphite__=~"foo.($app)"}`.
2021-12-14 19:55:59 +02:00
Aliaksandr Valialkin
38f5bc7451
lib/httpserver: add missing 127.0.0.1 hostname to the logged address for http and pprof server if the address starts with ':'
This allows copy-pasting the url to http server from logs.
2021-12-08 16:15:12 +02:00
Aliaksandr Valialkin
9aa9b081a4
app/vminsert: add -maxLabelValueLen command-line flag
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1908
2021-12-06 11:42:24 +02:00
Aliaksandr Valialkin
51f2eb3c46
lib/workingsetcache: fix unaligned 64-bit atomic operation panic on 32-bit architectures
The panic has been introduced in 7275ebf91a
2021-12-03 01:22:30 +02:00
Aliaksandr Valialkin
d40441947a
app: allow specifying http and https urls in the following command-line flags
* -promscrape.config
* -relabelConfig
* -remoteWrite.relabelConfig
* -remoteWrite.urlRelabelConfig
2021-12-03 00:11:47 +02:00
Aliaksandr Valialkin
daaea1eb2c
app/vmauth: follow-up for 13368bed18
* Document the ability to specify http or https urls in `-auth.config` at docs/CHANGELOG.md
* Move the ReadFileOrHTTP to lib/fs, so it can be re-used in other places where a file
  should be read from the given path. For example, in `-promscrape.config` at `vmagent`.
2021-12-02 23:34:15 +02:00
Aliaksandr Valialkin
b885a3b6e9
lib/httpserver: expose /-/healthy and /-/ready endpoints as Prometheus does
This improves integration with third-party solutions, which rely on these endpoints.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1833
2021-12-02 14:37:50 +02:00
Aliaksandr Valialkin
c540235470
app: use relative paths instead of absolute paths for the supported http handlers on the main page
This allows hiding VictoriaMetrics components behind proxies, which serve pages at different path prefixes

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1858
2021-12-02 13:54:15 +02:00
Aliaksandr Valialkin
d1289383eb
lib/protoparser/graphite: allow multiple separators between metric name, value and timestamp 2021-12-02 13:44:01 +02:00
Aliaksandr Valialkin
37a2bea072
lib/protoparser/graphite: properly parse Graphite line with whitespace after the timestamp
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1865
2021-12-02 13:33:50 +02:00
Aliaksandr Valialkin
2b7dee15dd
app/{vmbackup,vmrestore}: export internal metrics at /metrics http handler 2021-12-02 11:56:34 +02:00
Aliaksandr Valialkin
ab4be24397
app/vmstorage: export vm_cache_size_max_bytes metrics for determining capacity of various caches
The vm_cache_size_max_bytes metric can be used for determining caches which reach their capacity via the following query:

   vm_cache_size_bytes / vm_cache_size_max_bytes > 0.9
2021-12-02 10:30:01 +02:00
Aliaksandr Valialkin
d4655beae8
lib/fs: add vm_filestream_read_duration_seconds_total and vm_filestream_write_duration_seconds_total metrics
These metrics help determining persistent disk saturation with `rate(vm_filestream_read_duration_seconds_total) > 0.9`
2021-12-02 09:13:20 +02:00
Aliaksandr Valialkin
2e43cd9d62
lib/storage: do not take into account -storage.minFreeDiskSpaceBytes during background merges 2021-12-01 12:30:03 +02:00
Nikolay
cf1d2f289b
removes FileSize from backup part key (#1872)
* removes FileSize from backup part key
it should fix download restoration for backups

* Update lib/backup/common/part.go

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2021-12-01 12:30:03 +02:00
Aliaksandr Valialkin
71c0f7cce3
lib/storage: take into account -storage.minFreeDiskSpaceBytes when performing big merges
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/269
2021-11-30 12:56:53 +02:00
guidao
6fa7ad69fc
fix #1830 (#1861)
Co-authored-by: wangfeng <wangfeng@zhihu.com>
2021-11-30 01:16:15 +02:00
Aliaksandr Valialkin
975498d402
lib/protoparser/prometheus: follow-up for 8e338632a3
Do not spend CPU time on error message formatting if error logger is disabled
2021-11-30 00:51:15 +02:00
Nikolay
40f0726147
Changes unmarshallRow logger to noop for getRowsDiff (#1835) 2021-11-30 00:51:14 +02:00
Aliaksandr Valialkin
4ad397188e
lib/protoparser: do not log connection reset by peer error when reading the data via InfluxDB, Graphite and OpenTSDB protocols over plain TCP connections
This error is expected, so there is no need in spamming the log with this error.
2021-11-29 21:58:11 +02:00
Aliaksandr Valialkin
e93f46187d
lib/persistentqueue: add vm_persistentqueue_read_duration_seconds_total and vm_persistentqueue_write_duration_seconds_total metrics for determining disk usage saturation at vmagent 2021-11-17 16:42:12 +02:00
Lan
6662714c6c
Add flag of S3ForcePathStyle (#1802) 2021-11-17 01:10:22 +02:00
Aliaksandr Valialkin
4fb19fe34b
all: consistently return application/json content-type without charset=utf-8
The `application/json` content-type has utf-8 encoding by default.
See https://stackoverflow.com/questions/9254891/what-does-content-type-application-json-charset-utf-8-really-mean

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/897
2021-11-09 18:07:22 +02:00
Aliaksandr Valialkin
f41c02e475
lib/promscrape: improve logging for scrape_config_files parse errors
Log the actual file path, which led to the parse error.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1789
2021-11-08 13:34:26 +02:00
Aliaksandr Valialkin
847004fa77
app/{vminsert,vmagent}: hide passwords and auth tokens by default at /config page
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1764
2021-11-05 14:42:13 +02:00
Aliaksandr Valialkin
bc72b83102
lib/promauth: do not show empty values in oauth2 config section at /config page 2021-11-05 12:54:10 +02:00
Aliaksandr Valialkin
d445d22c0c
lib/promscrape: add -promscrape.maxResponseHeadersSize command-line flag for tuning the maximum http response headers size from Prometheus scrape targets 2021-11-03 22:27:55 +02:00
Aliaksandr Valialkin
6873d6d893
lib/protoparser/influx: automatically detect timestamp precision depending on the number of decimal digits in the timestamp 2021-10-28 12:48:34 +03:00
Aliaksandr Valialkin
105deb164c
lib/logger: show only explicitly set command-line flags in logs
This reduces initial verbosity in logs
2021-10-28 11:03:21 +03:00
Aliaksandr Valialkin
b626d6d606
lib/promscrape: add collapse and expand buttons per each group of targets from the same scrape job 2021-10-27 20:04:03 +03:00
Aliaksandr Valialkin
2ebee4e741
app/{vmalert,vmagent}: improve the distribution of scrape offsets among targets / rules
Previously only the lower part of 64-bit hash was used for calculating the offset.
This may give uneven distribution in some cases. So let's use all the available 64 bits from the hash
for calculating the offset.
2021-10-27 20:04:02 +03:00
Aliaksandr Valialkin
92d01db85a
lib/protoparser/prometheus: optimize GetRowsDiff() function
This should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1745 ,
since the provided profile shows that the majority of CPU and memory is spent in this function
during `streamParse` when `-promscrape.noStaleMarkers` wasn't set.
2021-10-27 18:55:25 +03:00
Aliaksandr Valialkin
16f1aaf0b5
lib/protoparser/prometheus: add a benchmark for GetRowsDiff 2021-10-27 18:55:23 +03:00
Aliaksandr Valialkin
99784b21c1
all: fix build issues and tests for Apple M1
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1653
2021-10-27 15:07:19 +03:00
Aliaksandr Valialkin
ad445a06cd
lib/promscrape: properly show proxy_url option value at /config page
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1755
2021-10-26 21:24:22 +03:00
Aliaksandr Valialkin
b08f51f5d3
lib/promscrape: do not populate response body to memory in stream parsing mode if -promscrape.noStaleMarkers is set
The response body isn't used if -promscrape.noStaleMarkers is set after the commit 2876137c92 ,
so there is no sense in pupulating it in memory. This should reduce memory usage when scraping big responses.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1728#issuecomment-949630694
2021-10-22 16:49:21 +03:00
Aliaksandr Valialkin
6bc10f0623
lib/promscrape: do not sort original labels and do not intern label string for the original labels before the sharding code is executed
This should reduce CPU and memory usage in shard mode when service discovery finds big number of scrape targets with many long labels.
See https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets

This is a follow-up after 9882cda8b9

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1728
2021-10-22 13:55:39 +03:00
Aliaksandr Valialkin
a8bcc3c276
lib/promscrape: reduce memory usage if -promscrape.noStaleMarkers command-line flag is passed
Do not store in memory the response from the last scrape per each target if -promscrape.noStaleMarkers option is enabled.
This should reduce memory usage when the scraped targets return large responses.
2021-10-22 13:22:08 +03:00
Nikolay
83e1dfccba
adds tab as second separator for graphite text protocol (#1733)
* adds tab as second separator for graphite text protocol

* changes indexFunc for indexAny

* Update lib/protoparser/graphite/parser_test.go

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2021-10-22 12:29:27 +03:00
Aliaksandr Valialkin
d56e676d71
lib/flagutil: do not expose sensitive info (passwords, keys and urls) at /flags page 2021-10-20 00:51:15 +03:00
Aliaksandr Valialkin
5705f4b6d1
lib/httpserver: expose command-line flags at /flags page
This should simplify debugging.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1695
2021-10-20 00:46:54 +03:00
Aliaksandr Valialkin
a105b71116
lib/envflag: use flag.Set for setting the flags from env vars
This should make visible the set flags at flag.Visit(), which is used later for logging
and exporting the `is_set` label for these flags at /metrics page
2021-10-20 00:46:53 +03:00
Aliaksandr Valialkin
93511b4be7
lib/storage: log a warning when the -storageDataPath has less than -storage.minFreeDiskSpaceBytes
This should improve the debuggability of the readonly feature.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1727
2021-10-19 23:58:09 +03:00
Aliaksandr Valialkin
ea69eef375
lib/promscrape/discovery/kubernetes: log a warning if role: endpoints discovers more than 1000 targets per a single endpoint
In this case `role: endpointslice` must be used instead.

See the following references:

* https://kubernetes.io/docs/reference/labels-annotations-taints/#endpoints-kubernetes-io-over-capacity
* https://github.com/kubernetes/kubernetes/pull/99975
* https://github.com/prometheus/prometheus/issues/7572#issuecomment-934779398
2021-10-19 13:22:28 +03:00
Nikolay
e84a063209
changes job source for /target api (#1723)
use jobNameOriginal instead of relabeled as prometheus does

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1707
2021-10-19 09:00:05 +03:00
Aliaksandr Valialkin
fbcc8b5c7d
lib/promscrape: set honor_timestamps: true by default if this option isnt set explicitly in scrape configs
This aligns the behavior to Prometheus - see https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config
2021-10-16 20:48:53 +03:00
Aliaksandr Valialkin
0a9be5ef9d
lib/promscrape: expose promscrape_series_limit_max_series and promscrape_series_limit_current_series metrics per each scrape target with the enabled unique series limiter 2021-10-16 19:14:13 +03:00
Aliaksandr Valialkin
99011c6b63
lib/promscrape: always initialize http client for stream parsing mode
Stream parsing mode can be automatically enabled when scraping targets with big response bodies
exceeding the -promscrape.minResponseSizeForStreamParse , so it must be always initialized.
2021-10-16 13:19:48 +03:00
Aliaksandr Valialkin
0f4fda1bda
lib/promscrape: store the last scraped response in compressed form if its size exceeds -promscrape.minResponseSizeForStreamParse
This should reduce memory usage when scraping targets with big response bodies.
2021-10-16 13:00:11 +03:00
Aliaksandr Valialkin
0452a8d4e8
lib/promscrape: store the full response in stream parsing mode in scrapeWork.lastScrape byte slice
This allows sending staleness marks and properly calculate scrape_series_added metric in stream parsing mode
at the cost of the increased memory usage, since now the potentially big response is kept
in the lastScrape byte slice per each scrapeWork.

In practice the memory usage increase shouldn't be big, since the response size
is usually much smaller than the parsed metrics from this response after the relabeling,
which usually adds a big pile of target-specific labels per each metric.
2021-10-15 15:26:24 +03:00
Aliaksandr Valialkin
3e9beb0f8d
lib/promscrape/discovery/kubernetes: rename endpointslices.go -> endpointslice.go in order to be consistent with EndpointSlice struct name
This is a follow-up for 31b42b30b6
2021-10-15 12:27:31 +03:00
Aliaksandr Valialkin
25421fa2ae
lib/promscrape: add -promscrape.minResponseSizeForStreamParse command-line option for automatic switching to stream parsing mode when scraping targets with big responses
This should reduce memory usage when vmagent scrapes targets with non-uniform response sizes.
This is common case in Kubernetes monitoring.
2021-10-14 12:30:55 +03:00
Aliaksandr Valialkin
bee130cc78
lib/promscrape: return error if sample_limit or series_limit options are set when stream parsing mode is enabled 2021-10-14 12:30:54 +03:00
Aliaksandr Valialkin
5b7d90d178
lib/promscrape: add ability to show the original labels for discovered targets at /targets page
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1698
2021-10-13 16:44:34 +03:00
Roman Khavronenko
5dab25e8ad
lib/promscrape: make errcheck happy (#1703) 2021-10-13 15:11:45 +03:00
Aliaksandr Valialkin
c3a729d458
lib/promscrape: shard targets among cluster nodes after relabeling is applied
This guarantees that targets with the same set of labels go to the same vmagent node.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1687#issuecomment-940629495
2021-10-12 17:06:37 +03:00
Aliaksandr Valialkin
aeedfe2fe2
app/vmagent: expose -promscrape.config contents at /config page as Prometheus does
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1695
2021-10-12 16:27:37 +03:00
Aliaksandr Valialkin
84aa08d93a
lib/promscrape: use Prometheus format for target labels at /targets page
This should simplify copy-pasting the labels to/from PromQL / MetricsQL
2021-10-11 12:42:18 +03:00
Aliaksandr Valialkin
a7a1305395
lib/storage: fix unaligned access on 32-bit architectures.
The bug has been introduced at a171916ef5
2021-10-08 19:38:20 +03:00
Aliaksandr Valialkin
a47754b689
lib/protoparser/clusternative: typo fix after 4fddcf4c83 2021-10-08 15:38:47 +03:00
Aliaksandr Valialkin
4fddcf4c83
app/{vminsert,vmstorage}: follow-up after a171916ef5
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/269
2021-10-08 14:09:51 +03:00
Nikolay
a171916ef5
Adds read-only mode for vmstorage node (#1680)
* adds read-only mode for vmstorage
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/269

* changes order a bit

* moves isFreeDiskLimitReached var to storage struct
renames functions to be consistent
change protoparser api - with optional storage limit check for given openned storage

* renames freeSpaceLimit to ReadOnly
2021-10-08 12:52:56 +03:00
Ziqi Zhao
1db3aeab36
fix some typos (#1678)
Co-authored-by: 柘远 <zzq237937@alibaba-inc.com>
2021-10-06 14:43:56 +03:00
Aliaksandr Valialkin
522a404b79
lib/promscrape: reduce memory allocations in mergeLabels() after 48e3e6c8df 2021-09-30 16:56:43 +03:00
Aliaksandr Valialkin
7b69d478ec
lib/protoparser: go fmt 2021-09-29 21:17:49 +03:00
Aliaksandr Valialkin
6167890d0e
lib/protoparser/prometheus: compare invalid Prometheus lines in full 2021-09-29 19:41:23 +03:00
Aliaksandr Valialkin
8dcf814c48
app/{vmbackup,vmrestore}: switch from gcs://... to gs://... urls for backups to GCS
The `gs://` urls are commonly used, so prefer them instead of `gcs://` urls,
while leaving support for `gcs://` urls for backwards compatibility.
2021-09-29 12:12:37 +03:00
Nikolay
9be5689b3f
changes auth validation for openstack (#1663)
* changes auth validation for openstack
must fix https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1655

* Apply suggestions from code review

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2021-09-29 00:33:38 +03:00
Aliaksandr Valialkin
4e65bfcc00
app/{vminsert,vmagent}: add ability to ingest data via DataDog "submit metrics" API
See https://docs.datadoghq.com/api/latest/metrics/#submit-metrics

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/206
2021-09-29 00:12:26 +03:00
Aliaksandr Valialkin
d15d036a5a
lib/storage: properly handle {__name__=~"prefix(suffix1|suffix2)",other_label="..."} queries
They were broken in the commit 00cbb099b6

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1644
2021-09-23 21:52:31 +03:00
Aliaksandr Valialkin
d8de26bbfd
lib/promscrape: add vm_promscrape_max_scrape_size_exceeded_errors_total metric for counting of the failed scrapes due to the exceeded response size
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1639
2021-09-23 14:48:16 +03:00
Aliaksandr Valialkin
86bafe796c
lib/httpserver: add -enterprise and/or -cluster suffixes to short_version label of vm_app_version metric
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1635
2021-09-21 23:12:50 +03:00
Aliaksandr Valialkin
c3e1f87048
lib/promrelabel: fix parsing regex: true in relabeling rules 2021-09-21 23:01:40 +03:00
Nikolay
dd53abf36d changes protoparser apis for accepting reading from io.Reader (#1624)
adds InsertHandlerForReader apis to vmagent
2021-09-20 14:54:20 +03:00
Nikolay
1ab2f844a2 makes filters optional for ec2 api requests (#1627)
filters can be applied only for DescribeInstances requests, like prometheus does.
related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1626
2021-09-17 18:12:25 +03:00
Aliaksandr Valialkin
1493461244 lib/storage: follow up after 00cbb099b6 2021-09-14 14:23:02 +03:00
faceair
61a51f7c15 lib/storage: optimize convert multiple values regexp filter to composite tag filter (#1610)
* lib/storage: optimize convert multiple values regexp filter to composite tag filter

* Apply suggestions from code review

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-09-14 14:23:01 +03:00
Aliaksandr Valialkin
184e145570 docs: consistency renaming: Influx -> InfluxDB 2021-09-13 17:14:45 +03:00
Aliaksandr Valialkin
b684624f67 lib/promscrape/discovery/docker: support host networking mode
See https://github.com/prometheus/prometheus/issues/9116
2021-09-13 13:30:55 +03:00
Aliaksandr Valialkin
6ed9f10da5 lib/promscrape/discovery/kubernetes: properly use https scheme for wildcard TLS certificates in ingress target discovery
See https://github.com/prometheus/prometheus/issues/8902
2021-09-13 13:04:43 +03:00
Aliaksandr Valialkin
d90834da70 lib/promscrape: generate scrape_timeout_seconds metric per each scrape target in the same way as Prometheus 2.30 does
See https://github.com/prometheus/prometheus/pull/9247
2021-09-12 15:21:26 +03:00
Aliaksandr Valialkin
279f37c9e7 lib/promscrape: make fmt 2021-09-12 13:35:21 +03:00
Aliaksandr Valialkin
6c97388dde lib/promscrape: add ability to configure scrape_timeout and scrape_interval via relabeling
See https://github.com/prometheus/prometheus/pull/8911
2021-09-12 13:35:20 +03:00
Aliaksandr Valialkin
09670479cd lib/promscrape: reduce CPU usage for common case when calculating scrape_series_added metric
Also reduce CPU usage when applying `series_limit` to scrape targets with constant set of metrics.

The main idea is to perform the calculations on scrape_series_added and series_limit
only if the set of metrics exposed by the target has been changed.
Scrape targets rarely change the set of exposed metrics,
so this optimization should reduce CPU usage in general case.
2021-09-12 12:53:45 +03:00
Aliaksandr Valialkin
c339642858 lib/promscrape: add the actual job name to the labels of promscrape_series_limit_rows_dropped_total metric 2021-09-11 11:03:38 +03:00
Aliaksandr Valialkin
6d6cf1b6e0 lib/storage: verify that the tsidsFound contain the needed tsids in tests added at f4dead529f 2021-09-11 11:02:56 +03:00
Aliaksandr Valialkin
5aaaa686a4 lib/promscrape: send stale markers for disappeared metrics like Prometheus does 2021-09-11 11:02:56 +03:00
Aliaksandr Valialkin
c2f37f049b lib/storage: properly search series by multiple tag filters matching empty labels such as foo{bar=~"baz|",x=~"y|"}
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1601
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/395
2021-09-09 21:12:53 +03:00
Aliaksandr Valialkin
d613a018c8 lib/promscrape: add ability to set series_limit and stream_parse options via relabeling
This allows managing these options on a per-target basis.

Typical use case: to manage these options for pods via Kubernetes annotations.
2021-09-09 18:51:23 +03:00
Aliaksandr Valialkin
b64866e64c lib/promscrape: add scrape_ prefix to job and target labels exported by promscrape_series_limit_rows_dropped_total metric
This is needed in order to prevent from possible clash with the corresponding (job, target) labels for the job, which scrapes this metric.
2021-09-09 17:31:04 +03:00
Aliaksandr Valialkin
75c3514c5c lib/promrelabel: add keep_metrics and drop_metrics actions to relabeling rules
These actions simlify metrics filtering. For example,

- action: keep_metrics
  regex: 'foo|bar|baz'

would leave only metrics with `foo`, `bar` and `baz` names, while the rest of metrics will be deleted.

The commit also makes possible to split long regexps into multiple lines. For example, the following config is equivalent to the config above:

- action: keep_metrics
  regex:
  - foo
  - bar
  - baz
2021-09-09 16:25:09 +03:00
mxlxm
42e07cfaea
reset deadline, fix #1562. (#1597)
* reset deadline, fix #1562.
reset deadline before we put it back to pool.

* make errcheck happy
2021-09-07 20:54:17 +03:00
Aliaksandr Valialkin
c4df601f43 lib/promscrape: add the ability to limit the number of unique series per each scrape target
The number of series per target can be limited with the following options:

* Global limit with `-promscrape.maxSeriesPerTarget` command-line option.
* Per-target limit with `max_series: N` option in `scrape_config` section.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1561
2021-09-01 16:08:12 +03:00
Aliaksandr Valialkin
146c14d879 lib/promscrape/discovery/kubernetes: return back support role: endpointslices, since it is used by VictoriaMetrics operator
This is a follow up commit after 31b42b30b6
2021-08-29 12:37:36 +03:00
Aliaksandr Valialkin
18d7adf731 lib/protoparser/opentsdb: follow-up after 8ee75ca45a 2021-08-29 11:50:01 +03:00
envzhu
00dddfe02f lib/protoparser/opentsdb: accept multiple spaces between fields in a row as a deliminator. (#1575) 2021-08-29 11:50:00 +03:00
Aliaksandr Valialkin
ca61d7c82b lib/promscrape/discovery/kubernetes: rename role: endpointslices to role: endpointslice to be consistent with Prometheus
See 2ec6c7dbb8/discovery/kubernetes/kubernetes.go (L99)
2021-08-29 11:23:59 +03:00
Aliaksandr Valialkin
327034b54f lib/promscrape/discovery/kubernetes: use v1 API instead of v1beta1 API for role: ingress and role: endpointslices
This should fix service discovery for these roles in Kubernetes v1.22 and newer versions.
See https://kubernetes.io/docs/reference/using-api/deprecation-guide/#ingress-v122

The corresponding change in Prometheus - https://github.com/prometheus/prometheus/pull/9205
2021-08-29 11:23:58 +03:00
Aliaksandr Valialkin
7fdb4db73d lib/promscrape: add ability to load scrape configs from multiple files
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1559
2021-08-26 08:51:53 +03:00
Aliaksandr Valialkin
4a2d7aec7f lib/promscrape: expose promscrape_discovery_http_errors_total metric for tracking errors per each http_sd config 2021-08-25 13:05:29 +03:00
Aliaksandr Valialkin
b885bd9b7d lib/{mergeset,storage}: improve the detection of the needed free space for background merge
This should prevent from possible out of disk space crashes during big merges.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1560
2021-08-25 10:01:09 +03:00
Aliaksandr Valialkin
67bc407747 lib/promscrape: reduce memory and CPU usage when Prometheus staleness tracking is enabled for metrics from deleted / disappeared scrape targets
Store the scraped response body instead of storing the parsed and relabeld metrics.
This should reduce memory usage, since the response body takes less memory than the parsed and relabeled metrics.
This is especially true for Kubernetes service discovery, which adds many long labels for all the scraped metrics.

This should also reduce CPU usage, since the marshaling of the parsed
and relabeld metrics has been substituted by response body copying.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1526
2021-08-21 21:24:07 +03:00
Aliaksandr Valialkin
c3b24882a7 lib/promscrape: use scrapeTimestamp when storing stale markers for failed scrape
This will make timestamps for stale markers more consistent for timestamps for other samples
2021-08-19 14:19:54 +03:00
Aliaksandr Valialkin
8ee575dee9 lib/promscrape: send stale markers for the previously scraped metrics on failed scrapes like Prometheus does 2021-08-18 22:00:46 +03:00
Aliaksandr Valialkin
5d92fafc40 app/vmselect: add -search.noStaleMarkers command-line flag for disabling stale markers handling in queries
This option allows reducing CPU usage a bit when VictoriaMetrics is used
for collecting and processing non-Prometheus data. For example, InfluxDB line protocol, Graphite, OpenTSDB, CSV, etc.
2021-08-18 13:58:06 +03:00
Aliaksandr Valialkin
f21fad53b4 lib/promscrape: add ability to disable sending Prometheus staleness markers with -promscrape.disableStaleMarkers command-line flag
This option can be useful when vmagent consumes too much additional memory
for staleness markers functionality and when staleness markers aren't needed.
2021-08-18 13:58:05 +03:00
Aliaksandr Valialkin
db34c40aec lib/promscrape: stop scrapers for the removed targets before starting scrapers for the added targets
This should prevent from possible time series overlap when old target is substituted by new target (for example, during Kubernetes deployments).

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1526
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1530
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/748
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1509
2021-08-17 01:00:40 +03:00
Aliaksandr Valialkin
5f13c519ee lib/promscrape: restore red highlighting for DOWN targets at /targets page
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1461
2021-08-15 16:04:33 +03:00
Aliaksandr Valialkin
c1f81f08d4 all: add support for Prometheus staleness markers
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1526
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/748
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1509
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1530
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/845
2021-08-13 12:13:15 +03:00
Aliaksandr Valialkin
90efb5831b lib/envflag: add a link to docs for -envflag.enable 2021-08-11 10:32:40 +03:00
Aliaksandr Valialkin
b877538622 app/vmagent: follow-up after fe445f753b
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1491
2021-08-05 09:51:00 +03:00
Omar Ghader
fe445f753b
feature: Add multitenant for vmagent (#1505)
* feature: Add multitenant for vmagent

* Minor fix

* Fix rcs index out of range

* Minor fix

* Fix multi Init

* Fix multi Init

* Fix multi Init

* Add default multi

* Adjust naming

* Add TenantInserted metrics

* Add TenantInserted metrics

* fix: remove unused metrics for vmagent

* fix: remove unused metrics for vmagent

Co-authored-by: mghader <marc.ghader@ubisoft.com>
Co-authored-by: Sebastian YEPES <syepes@gmail.com>
2021-08-05 09:44:29 +03:00
Aliaksandr Valialkin
77bb9e1656 lib/promscrape/discovery/gce: add __meta_gce_interface_ipv4_<name> labels as in Prometheus 2.29
See https://github.com/prometheus/prometheus/pull/8978
2021-08-03 15:51:45 +03:00
Aliaksandr Valialkin
336a2aa2e0 lib/promscrape/discovery/ec2: add __meta_ec2_availability_zone_id label as Prometheus 2.29 does 2021-08-03 13:28:13 +03:00
Aliaksandr Valialkin
c473d8ffe1 li/storage: re-use the per-day inverted index search code for searching in global index
This allows removing a big pile of outdated code for global index search.

This may help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1486
2021-07-30 10:28:20 +03:00
Nikolay
6d47e750be adds check for region with custom s3 endpoint (#1465) 2021-07-27 12:39:10 +03:00
Aliaksandr Valialkin
1950f57316 lib/storage: yet another attempt to properly determine disk space shortage, which prevents from optimal merges
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1373
2021-07-27 12:03:31 +03:00
Aliaksandr Valialkin
92628f9f07 lib/promrelabel: add tests for verifying that regex works as expected in single quotes and double quotes 2021-07-27 10:53:03 +03:00
Aliaksandr Valialkin
5d255846ac all: add go:build lines for Go1.17
See https://tip.golang.org/doc/go1.17#gofmt for more details
2021-07-26 15:50:46 +03:00
Aliaksandr Valialkin
c857e05604 lib/promscrape: add missing whitespace at /targets page before up word 2021-07-26 12:23:06 +03:00
Aliaksandr Valialkin
376af3c956 lib/workingsetcache: switch from split cache to full cache after the cache size exceeds 95% of split capacity
Previously the switch occurred when the cache size becomes 100% of its capacity. The cache size could never reach 100% capacity.
This could prevent from switching from the split cache to full cache, thus reducing the cache effectiveness.
2021-07-15 16:53:35 +03:00
Aliaksandr Valialkin
9d3f9da5ad lib/storage: make sure the second call to DeduplicateSamples and deduplicateSamplesDuringMerge doesnt change samples 2021-07-15 12:18:38 +03:00
Aliaksandr Valialkin
e992754e79 lib/storage: remove cache directory if it contains reset_cache_on_startup file
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1447
2021-07-13 17:59:51 +03:00
Aliaksandr Valialkin
e6edb85fa2 lib/httpserver: add is_set label to flag metrics
This label allows determining the set flags with the query `flag{is_set="true"}`
2021-07-13 15:10:18 +03:00
Aliaksandr Valialkin
51cd19d2e3 lib/storage: reset perKeyMisses stats less frequently
This should reduce CPU usage for queries executed with intervals higher than 30 seconds
2021-07-12 14:34:54 +03:00
Aliaksandr Valialkin
3f705fe8d7 lib/storage: properly limit the size of storage/date_metricID cache 2021-07-12 14:25:28 +03:00
Aliaksandr Valialkin
ef3c58d7a3 lib/storage: properly determine when the deduplication is needed in needsDedup
Previously needsDedup() could return true if the de-duplication wasn't needed for the following case:

         d < interval
           /     \
   |        v | v        |
     interval   interval

Now it properly returns false for this case
2021-07-12 10:54:51 +03:00
Aliaksandr Valialkin
41754e12f8 lib/mergeset: cache indexBlock items only on the second request
This should reduce the indexdb/indexBlocks cache size, since it won't contain one-time-wonders items.
2021-07-07 15:24:37 +03:00
Aliaksandr Valialkin
ceda2b1df4 lib/httpserver: print full requestURI in httpserver.Errorf
This should simplify debugging.
2021-07-07 13:11:29 +03:00
Aliaksandr Valialkin
9826f7c1be lib/storage: do not cache inmemoryBlock entries requested only once (aka one-time-wonder items)
This should reduce the cache size and memory usage for the indexdb/dataBlocks cache
2021-07-07 10:59:45 +03:00
Aliaksandr Valialkin
74ace9340d lib/storage: periodically reset prefetchedMetricIDs cache in order to limit its size under high churn rate 2021-07-07 10:59:39 +03:00
Aliaksandr Valialkin
a846febc89 Revert "lib/uint64set: allow reusing bucket16 structs inside uint64set.Set via uint64set.Release method"
This reverts commit 7c6d3981bf.

Reason for revert: high contention at bucket16Pool on systems with big number of CPU cores.
This slows down query processing significantly.
2021-07-06 18:26:56 +03:00
Aliaksandr Valialkin
b805a675f3 lib/{mergeset,storage}: switch from sync.Pool to chan-based pool for inmemoryPart objects
This should reduce memory usage on systems with big number of CPU cores,
since every inmemoryPart object occupies at least 64KB of memory and sync.Pool maintains
a separate pool inmemoryPart objects per each CPU core.

Though the new scheme for the pool worsens per-cpu cache locality, this should be amortized
by big sizes of inmemoryPart objects.
2021-07-06 16:33:25 +03:00
Aliaksandr Valialkin
d8e7c1ef27 lib/uint64set: allow reusing bucket16 structs inside uint64set.Set via uint64set.Release method
This reduces the load on memory allocator in Go runtime in production workload.
2021-07-06 16:33:24 +03:00
Aliaksandr Valialkin
db6bd69475 lib/mergeset: increase pool capacity for inmemoryBlock according to collected profiles from production workload
CPU and memory profiles show that the pool capacity for inmemoryBlock objects is too small.
This results in the increased load on memory allocation code in Go runtime.
Increase the pool capacity in order to reduce the load on Go runtime.
2021-07-06 13:44:27 +03:00
Aliaksandr Valialkin
fd32855a6c lib/mergeset: limit the frequency for flushCallback calls to once per 10 seconds
This should improve hit ratio for tagFiltersCache when big number of new time series are constantly registered
(aka high churn rate). This, in turn, should reduce CPU usage for queries over such time series.
2021-07-06 12:20:15 +03:00
Aliaksandr Valialkin
22c6e64bbc lib/storage: consistency renaming: tagCache -> tagFiltersCache
This improves code readability
2021-07-06 11:03:30 +03:00
Aliaksandr Valialkin
21abf487c3 lib/workingsetcache: properly update stats for requests and cache misses
Previously the stats for cache misses could be improperly counted, because it had inflated cache misses
if the entry was missing in the curr cache, but was existing in the prev cache.

The same applies to cache requests - they were inflated if the entry was missing in the curr cache.
2021-07-06 10:54:38 +03:00
Aliaksandr Valialkin
e5031d9aee lib/workingsetcache: fix cache capacity calculations after 4f0003f182 2021-07-05 17:16:35 +03:00
Aliaksandr Valialkin
bd71f102e8 lib/workingsetcache: typo fixes after d0c830039d 2021-07-05 15:35:51 +03:00
Aliaksandr Valialkin
4b25e627f8 lib/workingsetcache: properly switch to whole mode
Previously the switch from `split` to `whole` mode had been performed too early,
e.g. when the current cache size became bigger than 1/4 of the allowed cache size.

Now it is performed when the current cache size becomes bigger than 1/2 of the allowed cache size.

This change can reduce memory usage for data ingestion path when big number of active time series are ingested.
2021-07-05 15:15:39 +03:00
Aliaksandr Valialkin
51516b96e6 lib/storage: tune cache sizes according to production workload 2021-07-05 15:14:45 +03:00
Aliaksandr Valialkin
f12f97daa1 lib/{storage,mergeset}: increase cache timeout for data and index blocks from a minute to two minutes
One minute cache timeout result in slower queries in some production workloads where the interval
between query execution is in the range 1 minute - 2 minutes.
2021-07-05 14:25:59 +03:00
Aliaksandr Valialkin
377bb06b47 lib/cgroup: set GOGC to 50 by default if it isn't set
This should reduce memory usage for typical VictoriaMetrics workloads by up to 50%
2021-07-05 12:34:01 +03:00
Aliaksandr Valialkin
8055439fe4 lib/storage: properly detect free disk space shortage during data merge
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1373
2021-07-02 17:42:23 +03:00
Aliaksandr Valialkin
6fc3696260 lib/promscrape/discovery/consul: use case-insensitive comparison for service names
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1424
2021-07-02 14:49:22 +03:00
Aliaksandr Valialkin
61e483a01c lib/protoparser/clusternative: remove unused field - unmarshalWork.lastResetTime
This is a follow-up for b84aea1e6e
2021-07-02 13:32:59 +03:00
Aliaksandr Valialkin
72de54f93e lib/promauth: cache the client TLS certificate for up to a second
This should reduce CPU usage when TLS connections are established at a high rate.
2021-07-02 13:20:18 +03:00
Aliaksandr Valialkin
1c12c0f79c lib/promauth: reload TLS certificates from disk on every mTLS connection as Prometheus does
This allows updating client certificates without the need to restart vmagent and/or single-node VictoriaMetrics.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1420
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/470
2021-07-01 15:43:43 +03:00
Nikolay
6bd2309449 fixes /targets button style (#1423)
* fixes /targets button style
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1422

* updates boostrap version
2021-07-01 11:52:47 +03:00
Aliaksandr Valialkin
71c856beb8 lib/workingsetcache: reset the cache mode when the cache is reset
This should reduce memory usage if the working set is reduced after the cache reset.
2021-07-01 11:52:47 +03:00
Aliaksandr Valialkin
bced9ee666 lib/{mergeset,storage}: reduce the maximum lifetime for cached indexdb and data blocks from 2 minutes to a minute
This should reduce memory usage on a system with high number of active time series and a high churn rate.
One minute is enough for caching the blocks needed for repeated queries (e.g. alerting rules, recording rules and dashboard refreshes).
2021-06-29 19:57:53 +03:00
Aliaksandr Valialkin
b7c0b3dde3 lib/mergeset: switch from sync.Pool to a channel for a pool for inmemoryBlock structs
This should reduce memory usage for the pool on systems with big number of CPU cores.

The sync.Pool maintains per-CPU pools, so the total number of objects in the pool
is proportional to the number of available CPU cores. The channel limits the number
of pooled objects by its own capacity. This means smaller number of pooled objects on average.
2021-06-29 19:57:52 +03:00
Aliaksandr Valialkin
2edfea8c36 lib/promscrape/discovery/docker: fix golint warning: struct field Id should be ID 2021-06-29 13:11:33 +03:00
Aliaksandr Valialkin
609ad6d9bf lib/storage: put indexDBName into the key for dateTagFilter cache and for uselessTagFilters cache
This should prevent from stats overwriting when the previous indexdb is queried.
2021-06-29 13:11:32 +03:00
Aliaksandr Valialkin
0c4c630839 lib/promscrape: typo fix in /targets output
The typo has been introduced in fb72a2133f

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1408
2021-06-28 21:27:22 +03:00
Aliaksandr Valialkin
97d1ccfc8e lib/promscrape: split docker and dockerswarm service discovery code bases, since they have very little in common
This is a follow up after c85a5b7fcb
2021-06-25 13:22:16 +03:00
Aliaksandr Valialkin
4461e20e7d lib/promscrape: consistently sort service discovery routines
This should simplify further maintenance of the code
2021-06-25 13:22:16 +03:00
Lu Jiajing
12b4cbb68f Support Docker ServiceDiscovery (#1402)
* add docker discovery

* add test

* add labels test and add scrape work

* remove TODO

* refactor to merge apiConfig and sdConfig

* apply suggestion
2021-06-25 13:22:16 +03:00
Nikolay
501429c3ff adds missing MustStop call to do and http sd (#1404) 2021-06-25 11:43:32 +03:00
Aliaksandr Valialkin
b84aea1e6e lib/protoparser/clusternative: do not pool unmarshalWork structs, since they can occupy big amounts of memory (more than 100MB per each struct)
This should reduce memory usage for vmstorage under high ingestion rate when the vmstorage runs on a system with big number of CPU cores
2021-06-23 15:45:08 +03:00
Aliaksandr Valialkin
a22f37599b lib/storage: tune tag filters search logic
Tune the logic according to the logs provided at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1338#issuecomment-864293624

The previous logic had a race when multiple concurrent queries execute the same tag filter without prior stats.
This could result in incorrectly stored stats for such tag filter, which then could result in non-optimal sorting of tag filters
for further queries.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1338
2021-06-23 13:30:36 +03:00
Aliaksandr Valialkin
f10fa0d1d7 lib/promscrape/discovery/consul: properly pass namespace to Consul watcher
Follow-up for 58a2989fe7
2021-06-22 17:43:20 +03:00
Aliaksandr Valialkin
4adf6c9766 lib/promscrape/discovery/http: follow up after e307bbb29a 2021-06-22 13:42:10 +03:00
Nikolay
e03a3d3a36 adds http_sd (#1399)
* adds http_sd

* adds X-Prometheus-Refresh-Interval-Seconds header

* Update lib/promscrape/discovery/http/api.go

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-06-22 13:42:09 +03:00
Aliaksandr Valialkin
3ab3902f17 lib/promscrape/discovery: support generic auth configs in Consul service discovery in the same way as Prometheus 2.28 does 2021-06-22 13:18:51 +03:00
Nikolay
827a2396d2 adds consul enterprise namespace support (#1400)
* adds consul enterprise namespace support

* Update lib/promscrape/discovery/consul/consul.go

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-06-22 12:56:11 +03:00
Aliaksandr Valialkin
f9069ba32a lib/promscrape: show jobs with empty scrape targets on /targets page 2021-06-18 10:54:12 +03:00
Nikolay
9ea1dca3dd fixes DO service discovery labels (#1389)
adds test for digitalocean sd
2021-06-17 17:21:10 +03:00
Aliaksandr Valialkin
a207be3ffb lib/storage: fix infinite loop introduced in aa9b56a046
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1338
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1244
2021-06-17 14:27:30 +03:00
Aliaksandr Valialkin
0efd37cec1 lib/{mergeset,storage}: reduce the number of fsync calls on data ingestion path on systems with many cpu cores
VictoriaMetrics maintains a buffer per CPU core for the ingested data. These buffers are flushed to disk every second.
These buffers are flushed to disk in parallel starting from the commit 56b6b893ce .
This resulted in increased write disk IO usage on systems with many cpu cores
as described at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1338#issuecomment-863046999 .

This commit merges the per-CPU buffers into bigger in-memory buffers before flushing them to disk.
This should reduce the rate of fsync syscalls and, consequently, the write disk IO on systems with many CPU cores.

This should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1338
See also https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1244
2021-06-17 13:51:42 +03:00
Aliaksandr Valialkin
b133de1e37 lib/storage: move deletedMetricIDs set from indexDB to Storage
This makes consitent the list of deleted metricIDs when it is used from both the current indexDB and the previous indexDB (aka extDB).
This should fix the issue, which could lead to storing new samples under deleted metricIDs after indexDB rotation.
See more details at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1347#issuecomment-861232136 .

Thanks to @tangqipengleoo for the initial analysis and the pull request - https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1383 .

This commit resolves the issue in more generic way compared to https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1383 .

The downside of the commit is the deletedMetricIDs set isn't cleaned from the metricIDs outside the retention. It needs app restart.
This should be OK in most cases.
2021-06-15 15:07:54 +03:00
Aliaksandr Valialkin
ebaf68bcb0 lib/protoparser: stop reading the input stream as soon as the callback provided by the caller returns error
This is a follow-up for af90c3c43b
2021-06-14 15:20:38 +03:00
faceair
2ea187e801 lib/protoparser: stop read when callback error (#1380) 2021-06-14 15:20:37 +03:00
Aliaksandr Valialkin
5f91a701fa lib/promscrape: show the number of samples collected during the last scrape at /targets and /api/v1/targets pages
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1377
2021-06-14 14:04:35 +03:00
Nikolay
e42da47608 adds digital ocean sd (#1376)
* adds digital ocean sd config

* adds digital ocean sd
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1367

* typo fix
2021-06-14 13:19:29 +03:00
Aliaksandr Valialkin
df057177a0 lib/promscrape: increase the duration for reading the full response in stream parsing mode
Increase the duration from 10x to 30x of the configured `scrape_interval'.

This should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1365
2021-06-14 12:29:46 +03:00
Aliaksandr Valialkin
074b11fa69 lib/protoparser: measure the duration for reading the whole block of data instead of a single read operation
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1365
2021-06-14 12:29:45 +03:00
Aliaksandr Valialkin
87d221f78a lib/protoparser/common: log the duration for reading a block of data in ReadLinesBlockExt on error
This may help debugging issues like https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1365
2021-06-14 12:21:21 +03:00
Aliaksandr Valialkin
0672cfffa2 app/vmauth: properly handle http.ErrAbortHandler panic
This panic can be raised by the reverseProxy on aborted request to the backend.
So handle it (e.g. suppress) at reverseProxy.ServeHTTP call.

Do not suppress the panic at lib/httpserver generic HTTP handler,
since it may result in an inconsistent state left after the panicking handler.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1353
2021-06-11 12:54:37 +03:00
Aliaksandr Valialkin
ce10bdc82a lib/storage: reset cache on disk during series deletion and during indexdb rotation
This should prevent from inconsistent behavior (aka partially missing data for some time series) after unclean shutdown.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1347
2021-06-11 12:54:36 +03:00
Aliaksandr Valialkin
eb335d2c29 lib/storage: consistency renaming: getMaxRawRowsPerPartition -> getMaxRawRowsPerShard 2021-06-11 10:52:31 +03:00
Aliaksandr Valialkin
d06c0e7a94 lib/storage: reduce the amounts of memory which can be occupied by rawRow items during data ingestion on a system with many CPU cores 2021-06-11 10:49:02 +03:00
Nikolay
2c1611d316 disables panic for net/httpAbortHandler (#1355) 2021-06-09 12:12:45 +03:00
Aliaksandr Valialkin
1e4a64844d lib/storage: properly account the number of loops spent when matching for or suffixes
This may help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1338
2021-06-08 13:07:14 +03:00
Aliaksandr Valialkin
e7d353ee6a lib/promrelabel: add tests for labelsToString() function 2021-06-04 20:42:14 +03:00
Aliaksandr Valialkin
269e35d676 app/{vmagent,vminsert}: follow-up after 2fe045e2a4
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1343
2021-06-04 20:33:22 +03:00
jelmd
d8b46908db new feature: debug relabeling (#1344)
* new feature: relabel logging

Use scrape_configs[x].relabel_debug = true to log metric names inkl.
labels before and after relabeling. After relabeling related metrics
get dropped, i.e. not submitted to servers.

* vminsert wants relabel logging, too.
2021-06-04 20:33:21 +03:00
Nikolay
3d89c01d07 fixes solaris build (#1345) 2021-06-04 11:56:06 +03:00
Hason Chan
439c2ed510 fix eureka_sd_configs HTTPClientConfig incorrect parsing (#1350) 2021-06-04 11:56:06 +03:00
Aliaksandr Valialkin
fc2565b4ee lib/storage: reduce memory allocations when syncing dateMetricIDCache 2021-06-03 16:20:02 +03:00
Aliaksandr Valialkin
0b9f0de0a1 lib/promscrape: fix tests after f0c21b6300 2021-05-28 01:33:28 +03:00
Aliaksandr Valialkin
6865f3b497 Revert "lib/mergeset: remove a pool for inmemoryBlock structs"
This reverts commit 793fe39921.

Reason to revert: production testing revealed possible slowdown when registering big number of new time series
2021-05-28 01:11:22 +03:00
Aliaksandr Valialkin
7b33bc67a1 lib/mergeset: remove a pool for inmemoryBlock structs
The pool for inmemoryBlock struct doesn't give any performance gains in production workloads,
while it may result in excess memory usage for inmemoryBlock structs inside the pool during
background merge of indexdb.
2021-05-27 22:00:50 +03:00
Aliaksandr Valialkin
97de72054e docs: document f0c21b6300 2021-05-27 15:04:13 +03:00
faceair
b801b299f0 lib/promscrape: apply body size & sample limit to stream parse (#1331)
* lib/promscrape: apply body size limit to stream parse

Signed-off-by: faceair <git@faceair.me>

* lib/promscrape: apply sample limit to stream parse

Signed-off-by: faceair <git@faceair.me>
2021-05-27 15:04:11 +03:00
Aliaksandr Valialkin
49490ae5a7 lib/protoparser/clusternative: remove duplicate cannot read packet size phrase from the log message
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1336
2021-05-27 12:09:17 +03:00
Aliaksandr Valialkin
c85084b659 lib/handshake: pass io.EOF unmodified to the caller for BufferedConn.Read, so it could properly detect the end of stream 2021-05-27 12:09:17 +03:00
Aliaksandr Valialkin
10b2855949 lib/storage: fix spelling typo: borken->broken
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1336
2021-05-27 12:09:17 +03:00
Aliaksandr Valialkin
6b90570ed3 lib/uint64set: store pointers to bucket16 instead of bucket16 objects in bucket32
This speeds up bucket32.addBucketAtPos() when bucket32.buckets contains big number of items,
since the copying of bucket16 pointers is much faster than the copying of bucket16 objects.

This is a cpu profile for copying bucket16 objects:

      10ms     13.43s (flat, cum) 32.01% of Total
      10ms      120ms    650:	b.b16his = append(b.b16his[:pos+1], b.b16his[pos:]...)
         .          .    651:	b.b16his[pos] = hi
         .     13.31s    652:	b.buckets = append(b.buckets[:pos+1], b.buckets[pos:]...)
         .          .    653:	b16 := &b.buckets[pos]
         .          .    654:	*b16 = bucket16{}
         .          .    655:	return b16
         .          .    656:}

This is a cpu profile for copying pointers to bucket16:

      10ms      1.14s (flat, cum)  2.19% of Total
         .      100ms    647:	b.b16his = append(b.b16his[:pos+1], b.b16his[pos:]...)
         .          .    648:	b.b16his[pos] = hi
      10ms      700ms    649:	b.buckets = append(b.buckets[:pos+1], b.buckets[pos:]...)
         .      330ms    650:	b16 := &bucket16{}
         .          .    651:	b.buckets[pos] = b16
         .          .    652:	return b16
         .          .    653:}
2021-05-25 14:27:52 +03:00
Aliaksandr Valialkin
1c16cbacf5 lib/storage: do not stop data ingestion on the first error in Storage.AddRows
Continue data ingestion for the rest of blocks.
2021-05-24 15:32:24 +03:00
Aliaksandr Valialkin
2601844de3 lib/storage: limit the number of rows per each block in Storage.AddRows()
This should reduce memory usage when ingesting big blocks or rows.
2021-05-24 15:32:24 +03:00
Aliaksandr Valialkin
95b735a883 lib/storage: allow filling all the rows up to their capacity in rawRowsShard.addRows
This should reduce memory usage a bit on data ingestion path
2021-05-24 15:32:24 +03:00
Aliaksandr Valialkin
0f84503880 lib/bloomfilter: fix TestLimiterConcurrent 2021-05-24 05:18:29 +03:00
Aliaksandr Valialkin
745eda9e87 lib/fs: do not pass done callback to tryRemoveAll() func
This improves code readability a bit.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1313
2021-05-24 05:00:53 +03:00
Aliaksandr Valialkin
402a8ca710 lib/storage: do not populate MetricID->MetricName cache during data ingestion
This cache isn't needed during data ingestion, so there is no need in spending RAM on it.

This reduces RAM usage on data ingestion path by 30%
2021-05-24 03:06:40 +03:00
Aliaksandr Valialkin
0fc857d363 lib/{mergeset,storage}: reduce the number of IFNO log messages like merged ... items across ... blocks in ... seconds
Log these messages if the merge takes more than 30 seconds instead of 10 seconds.
2021-05-23 14:15:49 +03:00
Aliaksandr Valialkin
71ff7ee18d lib/promauth: follow-up after 5b8176c68e 2021-05-22 18:02:03 +03:00
Nikolay
2780d6dbcd basic OAuth2 support for remoteWrite and scrape targets (#1316)
* adds OAuth2 support for remoteWrite and scrapping

* adds tests
changes init
2021-05-22 18:02:01 +03:00
Aliaksandr Valialkin
89e1a45cdb lib/fs: concurrently remove up to 1024 blocked NFS directories
Previously the blocked directories were removed sequentially by a single goroutine.
This can be not enough for highly loaded VictoriaMetrics that accepts millions of sample per second,
when big number of LSM parts are created and removed at high rate.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1313
2021-05-21 17:58:08 +03:00
Aliaksandr Valialkin
23355ca34c lib/fs: wait for a while before giving up on NFS file removal if the removal queue is full
This should reduce the probability of the panic on a highly loaded VictoriaMetrics
accepting millions of samples per second.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1313
2021-05-21 17:21:35 +03:00
Aliaksandr Valialkin
d77db9d813 all: do not skip SIGHUP signal during service initialization
This can lead to stale or incomplete configs like in the https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-21 16:38:20 +03:00
Aliaksandr Valialkin
69e365cd48 Makefile: update golangci-lint from v1.29.0 to v1.40.1 2021-05-20 18:30:24 +03:00
Aliaksandr Valialkin
da0b32c31a app/vmagent/remotewrite: expose metrics with the current number of active series per day and per hour
These numbers are exposed via the following metrics:

- vmagent_hourly_series_limit_current_series
- vmagent_daily_series_limit_current_series

Expose also the limits via the following metrics:

- vmagent_hourly_series_limit_max_series
- vmagent_daily_series_limit_max_series
2021-05-20 15:31:57 +03:00
Aliaksandr Valialkin
165a9f9200 app/vmstorage: add ability to limit series cardinality via -storage.maxHourlySeries and -storage.maxDailySeries command-line flags 2021-05-20 15:31:57 +03:00
Aliaksandr Valialkin
7aad5c3f76 app/vmagent: add ability to limit series cardinality on a per-hour and per-day basis 2021-05-20 15:31:57 +03:00
Aliaksandr Valialkin
110a888e39 lib/promscrape/discovery/kubernetes: make golangci-lint happy by removing empty branches 2021-05-20 12:00:17 +03:00
Aliaksandr Valialkin
e228f479a5 lib/storage: remove possible data race when logging dropped labels 2021-05-20 11:54:06 +03:00
Aliaksandr Valialkin
9d97f44772 lib/promscrape/discovery/kubernetes: reload objects on object parse error
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-18 23:27:24 +03:00
Aliaksandr Valialkin
74ef40034c lib/httpserver: typo fix in -http.shutdownDelay command-line flag description: servier -> server 2021-05-18 16:25:27 +03:00
Aliaksandr Valialkin
c507faec0b lib/promscrape/discovery/kubernetes: simplify the reload logic for urlWatcher.objectsByKey 2021-05-18 15:41:51 +03:00
Aliaksandr Valialkin
0f54c0121b lib/promscrape/discovery/kubernetes: properly update vm_promscrape_discovery_kubernetes_scrape_works metric
Previously it wasn't descreased during config update.
2021-05-18 15:41:51 +03:00
Aliaksandr Valialkin
9f62d348db lib/promscrape/discovery/kubernetes: log errors and stop service discovery when unexpected updates are received from Kubernetes API server
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-18 15:41:51 +03:00
Aliaksandr Valialkin
6ea191d196 docs: dealay -> delay 2021-05-18 01:07:32 +03:00
Aliaksandr Valialkin
c4ed50ae54 lib/promrelabel: add tests for conditional removal of label on another label match
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1294
2021-05-18 00:23:23 +03:00
Aliaksandr Valialkin
8764b0ae21 lib/promscrape/discovery/kubernetes: key ScrapeWork objects by urlWatcher instead of namespace
This makes the code less fragile if urlWatcher would depend on additional to namepsace properties.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1170
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-17 23:49:48 +03:00
Aliaksandr Valialkin
e08287f017 lib/promscrape: reload auth tokens from files every second
Previously auth tokens were loaded at startup and couldn't be updated without vmagent restart.
Now there is no need in vmagent restart.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1297
2021-05-14 20:03:35 +03:00
Aliaksandr Valialkin
a6cb4f10a7 app/{vmalert,vmauth}: explicitly set MaxIdleConnsPerHost in net/http.Client.Transport
By default MaxIdleConnsPerHost is set to 2. This limits the possibility to re-use http keep-alive connections.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1300
2021-05-14 18:13:34 +03:00
Aliaksandr Valialkin
e3f61d540b lib/promscrape: limit scrape_timeout by scrape_interval like Prometheus does
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1281
2021-05-13 16:10:42 +03:00
匠心零度
d5285ecaf0 fix vagent imbalance problem (#1292)
/path/to/vmagent -promscrape.cluster.membersCount=3 -promscrape.cluster.replicationFactor=2 -promscrape.cluster.memberNum=0 -promscrape.config=/path/to/config.yml ...
/path/to/vmagent -promscrape.cluster.membersCount=3 -promscrape.cluster.replicationFactor=2 -promscrape.cluster.memberNum=1 -promscrape.config=/path/to/config.yml ...
/path/to/vmagent -promscrape.cluster.membersCount=3 -promscrape.cluster.replicationFactor=2 -promscrape.cluster.memberNum=2 -promscrape.config=/path/to/config.yml ...

Co-authored-by: lirenzuo <lirenzuo@shein.com>
2021-05-13 11:19:30 +03:00
Aliaksandr Valialkin
f13585dc5d vendor: update github.com/VictoriaMetrics/fasthttp from v1.0.14 to v1.0.15
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1289
2021-05-13 10:47:09 +03:00
Aliaksandr Valialkin
d13906bf1f lib/promscrape: exponentially increase retry interval on unsuccesful requests to scrape targets or to service discovery services
This should reduce CPU load at vmagent and at remote side when the remote side doesn't accept HTTP requests.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1289
2021-05-13 10:47:07 +03:00
Aliaksandr Valialkin
66c6976723 lib/cgroup: document the ability to detect cgroup v2 memory and cpu limits. This is follow-up for b50024812e 2021-05-13 09:27:35 +03:00
Nikolay
8743bf541f adds cgroupsv2 support (#1283)
* adds cgroupv2 limits support
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1269

* small fix

* changes Atoi to ParseUint
2021-05-13 09:27:33 +03:00
Aliaksandr Valialkin
2839055513 lib/storage: substitute GetTSDBStatusForDate with GetTSDBStatusWithFiltersForDate with nil tfss 2021-05-13 09:01:05 +03:00
Aliaksandr Valialkin
008ae25b3a lib/storage: merge getTSDBStatusForDate with getTSDBStatusWithFiltersForDate
These functions are non-trivial, while their code has minimal differences.
It is better from maintainability PoV to merge these functions into a single function.
2021-05-12 18:01:08 +03:00
Nikolay
be87be34a4 Adds tsdb match filters (#1282)
* init work on filters

* init propose for status filters

* fixes tsdb status
adds test

* fix bug

* removes checks from test
2021-05-12 17:16:58 +03:00
Aliaksandr Valialkin
027607db3e lib/promscrape/discovery/kubernetes: refresh endpoints and endpointslices scrape targets every 5 seconds, since they may depend on changed service and pod objects
This should make endpoints and endpointslices scrape targets eventually consistent with the maximum delay of 5 seconds after the related service or pod object changes.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-12 14:12:43 +03:00
Aliaksandr Valialkin
1d32b008c6 lib/httpserver: add new X-Server-Hostname header instead of overwriting already exsiting header
This makes possible tracking origins of chained requests over multiple hops.
2021-05-11 23:47:19 +03:00
Aliaksandr Valialkin
f1317f7c6c lib/httpserver: return X-Server-Hostname http header in all the responses for better debuggability 2021-05-11 22:04:41 +03:00
Aliaksandr Valialkin
4e59cf4380 lib/storage: properly apply time range when matching an empty filter
It must match all the time series on the given time range.
Previously it was matched to all the time series without the restriction on the given time range.
2021-05-11 01:09:35 +03:00
Aliaksandr Valialkin
326cf83eb4 lib/storage: remove dead code after the commit 3ccf7ea20c 2021-05-08 20:15:59 +03:00
Aliaksandr Valialkin
9c505d27dd lib/ingestserver: properly close incoming connections during graceful shutdown 2021-05-08 19:53:45 +03:00
Aliaksandr Valialkin
4a5f45c77e app/vminsert: add support for data ingestion via other vminsert nodes 2021-05-08 19:53:45 +03:00
Aliaksandr Valialkin
e6c19cb09d lib/promscrape/discovery/kubernetes: start watchers for pods and services before starting watchers for endpoints
This should eliminate possible race when an update on endpoints depends on pods and/or services, which are missing in the cache yet.
This could result in missing targets based on endpoints or endpointslices.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-05 12:23:16 +03:00
Aliaksandr Valialkin
43c52ff77a lib/storage: use WARNING instead of INFO level for logging dropped labels 2021-05-03 13:57:28 +03:00
Aliaksandr Valialkin
ec6becd3f5 lib/httpserver: stop the process on panics in request handlers
Panics may leave the process in inconsistent state. That's why it is better to stop the process after the panic
instead of recovering from the panic. Unfortunately, the standard net/http.Server recovers panics in request handlers.
See https://github.com/golang/go/issues/16542 . That's lib/httpserver must stop the process on itself after the panic.
2021-05-03 12:00:44 +03:00
Nikolay
62d58324dd adds stalePartsRemover (#1261)
for new created partitions
2021-05-03 11:34:33 +03:00
Aliaksandr Valialkin
60ffbcbb99 lib/promrelabel: add tests for removing the specified {label="value"} pair 2021-05-03 11:26:58 +03:00
Aliaksandr Valialkin
b43ba6d85f lib/storage: log dropped labels if the number of labels in a metric exceeds -maxLabelsPerTimeseries command-line flag value
This should improve debuggability for this case.
2021-05-01 09:29:56 +03:00
Aliaksandr Valialkin
8be1cb297b app/vmagent: list user-visible endpoints at http://vmagent:8429/
While at it, use common WriteAPIHelp function for the listing in vmagent, vmalert and victoria-metrics
2021-04-30 09:38:23 +03:00
Aliaksandr Valialkin
421a92983a lib/promscrape/discovery/kubernetes: remove a mutex at urlWatcher - use groupWatcher mutex for accessing all the urlWatcher children
This simplifies the code a bit and reduces the probability of improper mutex handling and deadlocks.
2021-04-29 10:17:45 +03:00
Nikolay
535b3ff618 vmagent kubernetes_sd tests (#1253)
* first part of tests for kubernetes sd

* makes linter happy

* added more test cases

* adds pub/sub for tests
2021-04-29 10:17:45 +03:00
Aliaksandr Valialkin
e37e1b1e34 lib/{storage,mergeset}: fix unaligned 64-bit atomic operation panic for 32-bit architectures
The panic has been introduced in 56b6b893ce
2021-04-27 16:42:19 +03:00
Aliaksandr Valialkin
2d1d60118d lib/mergeset: split rows ingestion among multiple shards
This improves rows ingestion on systems with many CPU cores by reducing lock contention.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1244

Thanks to @waldoweng for the original idea and draft implementation at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1243
2021-04-27 15:45:11 +03:00
Aliaksandr Valialkin
b3da457629 lib/promscrape/discovery/kubernetes: fix a deadlock introduced in eddba29664
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240

Thanks to @f41gh7 for providing the initial idea for deadlock fix at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1248
2021-04-27 14:59:56 +03:00
Aliaksandr Valialkin
cba2d13456 lib/storage: typo fix in info message when deleting the part outside the configured retention
Previously the message was displaying incorrect retention time
2021-04-27 13:33:36 +03:00
Aliaksandr Valialkin
f14412321b lib/persistentqueue: eliminate possible data race when obtaining vm_persistentqueue_bytes_pending metric value 2021-04-27 00:26:32 +03:00
Aliaksandr Valialkin
320983f650 lib/promscrape: apply scrape_timeout on receiving the first response byte for stream_parse: true scrape targets
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1017#issuecomment-767235047
2021-04-23 22:05:00 +03:00
Aliaksandr Valialkin
34321e5f8d lib/promscrape/discovery/kubernetes: refresh role: endpoints targets on service object removal as Prometheus does
This is a follow-up for ae37cfd528

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-04-23 20:27:29 +03:00
Aliaksandr Valialkin
db27dbab5e lib/promscrape/discovery/kubernetes: refresh endpoints and endpointslices targets on service object update like Prometheus does
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-04-23 20:12:22 +03:00
Aliaksandr Valialkin
ab8008d6d7 lib/{storage,mergeset}: remove empty directories on startup. Such directories can be left after unclean shutdown on NFS storage
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1142
2021-04-22 13:03:29 +03:00
Aliaksandr Valialkin
6dc5d3b357 all: rename https://victoriametrics.github.io to https://docs.victoriametrics.com 2021-04-20 20:20:01 +03:00
Aliaksandr Valialkin
1088113110 lib/uint64set: remove memory allocation in getOrCreateSmallPool()
This partially reverts fb82c4b9fa

It has been appeared that the additional memory allocation may result in higher GC pauses.
It is better to spend CPU time on copying bigger bucket16 structs instead of increasing query latencies due to higher GC pauses
2021-04-14 12:31:53 +03:00
Aliaksandr Valialkin
72c41323fa lib/storage: code clarification: remove caching the found metricName in searchMetricName 2021-04-13 10:20:35 +03:00
Artem Navoiev
c3dcfdef8c improve docs for cli flags (#1202)
* improve docs for cli flags

* improve docs for cli flags.2
2021-04-12 12:28:36 +03:00
Aliaksandr Valialkin
3b1f0cb3f6 lib/promscrape: create a single swosFunc per scrape_config 2021-04-08 09:31:38 +03:00
Aliaksandr Valialkin
276dbc2133 lib/promscrape: do not spend CPU time on constructing scrapeWork key if clustering is disabled 2021-04-07 21:55:06 +03:00
Aliaksandr Valialkin
59ccc43e3a lib/storage: properly handle big time ranges passed to /api/v1/labels and /api/v1/label/<labelName>/values
It should be faster querying all the labels and/or all the values instead of querying per-day labels/values on time ranges exceeding maxDaysForPerDaySearch
2021-04-07 13:33:10 +03:00
Aliaksandr Valialkin
02b83e0957 lib/promscrape/discovery: remove superflouos check in registerPendingAPIWatchers
The check `_, ok := uw.aws[aw]; !ok` isn't needed, since aw cannot exist in uw.aws
because of the check inside subscribeAPIWatcher
2021-04-07 13:10:04 +03:00
Aliaksandr Valialkin
db56ee0e28 lib/promscrape/discovery/kubernetes: register pending apiWatchers in uw.aws 2021-04-06 11:11:53 +03:00
Aliaksandr Valialkin
edd66b7e82 lib/promscrape/discovery/kubernetes: remove superflouos mustStart and mustStop functions 2021-04-05 22:43:49 +03:00
Lu Jiajing
4ee6def68b fix access to nil *url.URL (#1180)
* fix access to nil *url.URL

Signed-off-by: Megrez Lu <lujiajing1126@gmail.com>

* Update lib/promscrape/discovery/kubernetes/api_watcher.go

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-04-05 22:26:43 +03:00
Aliaksandr Valialkin
7eca60694e lib/promscrape/discovery/kubernetes: reduce CPU time spent on registering big number of Kubernetes objects shared among big number of scrape jobs
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1182
2021-04-05 22:05:02 +03:00
Aliaksandr Valialkin
9da2ef3d8f lib/promscrape/discovery/kubernetes: load objects missing in local cache from api seriver in getObjectByRole()
This should fix possible race for `role: endpoints` and `role: endpointslices` service discovery,
when the referred `pod` and `service` objects aren't propagated to urlWatcher cache yet.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1182#issuecomment-813353359 for details.
2021-04-05 20:31:22 +03:00
Aliaksandr Valialkin
4d6a3aba4d lib/persistentqueue: delete corrupted persistent queue instead of throwing a fatal error
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1030
2021-04-05 19:26:51 +03:00
Aliaksandr Valialkin
fe084fdd33 lib/promscrape/discovery/kubernetes: synchronously load Kubernetes objects on first access
Remove async registration of apiWatchers, since it breaks discovering `role: endpoints` and `role: endpointslices` targets,
which depend on pod and service objects.

There is no need in reloading `endpoints` and `endpointslices` targets if the referenced `pod` or `service` objects change,
since in this case the corresponding `endpoints` and `endpointslices` objects should also change because they contain
ResourceVersion of the referenced `pod` or `service` objects, which is modified on object update.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1182
2021-04-05 14:37:07 +03:00
Aliaksandr Valialkin
c8b7c32e23 lib/proxy: typo fix after a5c5b54c22 2021-04-05 14:37:06 +03:00
Aliaksandr Valialkin
e0b687171b lib/proxy: add support for socks5 over tls proxy 2021-04-05 13:00:42 +03:00
Aliaksandr Valialkin
90da332ea0 lib/promscrape: pass X-Prometheus-Scrape-Timeout-Seconds header to scrape targets as Prometheus does 2021-04-05 12:15:14 +03:00
Nikolay
d0b664454b adds socks5 support for fasthttp client (#1178)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1177

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-04-04 01:43:59 +03:00
Aliaksandr Valialkin
dd19fab7c9 lib/promscrape: properly send full url in GET request via simple HTTP proxy
This is a follow-up for a0ae0f86666a75ec57b45eab2429da7ab4a7b250

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1179
2021-04-04 01:20:43 +03:00
Aliaksandr Valialkin
ab9e1eb41f lib/promscrape: support for simple HTTP proxies without CONNECT method support such as https://github.com/prometheus-community/PushProx
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1179
2021-04-04 00:40:58 +03:00
Aliaksandr Valialkin
27d8a5d2c0 lib/promscrape: add tests for authorization config, which has been added in df148f48b7 2021-04-03 22:14:03 +03:00
Aliaksandr Valialkin
ae5c20a34c lib/proxy: log response body on non-200 response code
This should improve debuggability for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1179
2021-04-03 03:03:41 +03:00
Aliaksandr Valialkin
87700f1259 lib/promscrape: add support for authorization config in -promscrape.config as Prometheus 2.26 does
See https://github.com/prometheus/prometheus/pull/8512
2021-04-02 21:20:37 +03:00
Aliaksandr Valialkin
2825a1e86d lib/promscrape: add follow_redirect option to scrape_configs section like Prometheus does
See https://github.com/prometheus/prometheus/pull/8546
2021-04-02 21:20:37 +03:00
Aliaksandr Valialkin
245eba8896 lib/promscrape/discovery/kubernetes: properly track objects with the same names in multiple namespaces
This is a follow-up for 12e4785fe8

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1170
2021-04-02 14:46:34 +03:00
Aliaksandr Valialkin
eee860f83d lib/promscrape/discovery/kubernetes: properly discover targets in multiple namespaces
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1170
2021-04-02 14:29:24 +03:00
Nikolay
00157f3ec6 Adds aws ECS credentials support (#1175) 2021-04-02 13:11:03 +03:00
Aliaksandr Valialkin
512addc608 app/{vminsert,vmagent}: add -sortLabels command-line option for sorting time series labels before ingesting them in the storage
This option can be useful when samples for the same time series are ingested with distinct order of labels.
For example, metric{k1="v1",k2="v2"} and metric{k2="v2",k1="v1"}.
2021-03-31 23:27:21 +03:00
Aliaksandr Valialkin
ae1c653d55 lib/storage: reduce memory usage when ingesting samples for the same time series with distinct order of labels 2021-03-31 21:22:40 +03:00
Aliaksandr Valialkin
887e67f13b lib/uint64set: improve Set.Has() performance scalability on multi-CPU system
Do not update bucket32.hint on Set.Has() call, since it leads to memory ping-pong between CPU cores multi-CPU system
2021-03-29 12:34:19 +03:00
Aliaksandr Valialkin
940a547116 lib/storage: do not update b.nextIdx if no samples are removed because of retention 2021-03-29 12:13:38 +03:00
Aliaksandr Valialkin
7d87d42a91 lib/promscrape/discovery/kubernetes: typo fix in error message 2021-03-26 12:46:33 +02:00
Aliaksandr Valialkin
a920e71809 lib/promscrape/discovery/kubernetes: properly handle too old resource version error message from Kubernetes watch API 2021-03-26 12:28:35 +02:00
Aliaksandr Valialkin
9c2be144cf app/vmselect: log the metric which trigger rollup result cache reset
This should help finding the source of stale metrics
2021-03-25 21:32:28 +02:00
Aliaksandr Valialkin
f971fe86cd lib/storage: tune loopsCountPerMetricNameMatch according to production workload 2021-03-25 13:48:17 +02:00
Aliaksandr Valialkin
6b1f807418 app/vmagent: add -promscrape.consul.waitTime command-line flag for configuring Consul service discovery wait time
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1144
2021-03-23 19:34:12 +02:00
Aliaksandr Valialkin
9947c65df3 lib/storage: do not reload metricName for the same metricID in Search.NextMetricBlock
This should speed up Search.NextMetricBlock a bit
2021-03-23 17:59:34 +02:00
Nikolay
19a40faf8e changes consul_service label value (#1143)
according to prometheus discovery.
 It should mitigate issue with case sensetive services
https://github.com/hashicorp/consul/issues/5707
2021-03-23 15:37:06 +02:00
Aliaksandr Valialkin
12ca0efc19 lib/storage: respect the deadline passed to Storage.SearchMetricNames 2021-03-22 23:03:00 +02:00