Commit graph

1581 commits

Author SHA1 Message Date
Aliaksandr Valialkin
6ff71474a6
lib/storage: document why tsid cache is reset before saving it to disk
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2205
2022-02-16 18:37:56 +02:00
Aliaksandr Valialkin
b71be42d90
lib/storage: use binary search instead of full scan for skipping artificial tags when searching for tag names or tag values
This should improve performance for /api/v1/labels and /api/v1/label/<label_name>/values

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2200
2022-02-16 18:15:41 +02:00
Roman Khavronenko
d91c1d4eee
vmagent: fix js error on CollapseAll/ExpandAll buttons click (#2192)
* vmagent: fix js error on CollapseAll/ExpandAll buttons click

`Uncaught TypeError: Cannot read properties of null (reading 'style')`

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Apply suggestions from code review

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-15 12:52:48 +02:00
Corporte Gadfly
ad6bdd78d0
match fileSDCheckInterval with prometheus file_sd_config default (#2188) 2022-02-15 12:04:26 +02:00
Aliaksandr Valialkin
1215f51043
docs/CHANGELOG.md: document 3d890e89f1 2022-02-14 17:39:12 +02:00
Nikolay
3d890e89f1
Adds server certificate reload for lib/http (#2186)
* Adds server certificate reload for lib/http
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2171

* Update lib/httpserver/httpserver.go

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-14 17:32:13 +02:00
Nikolay
c90c1c4d54
fixes all_tenants query option usage for openstack service discovery (#2184)
explicit use configuration parametr instead of conditional add
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2182
2022-02-14 13:07:30 +02:00
Aliaksandr Valialkin
f10c38b827
lib/promscrape: add expand all and collapse all buttons to /targets page 2022-02-12 18:41:29 +02:00
Aliaksandr Valialkin
96dce63dbd
lib/storage: tune the logic for pre-populating of the per-day inverted index for the next day
- Postpone the pre-poulation to the last hour of the current day. This should reduce the number
  of useless entries in the next per-day index, which shouldn't be created there,
  when the corresponding time series are stopped to be pushed during the current day.

- Make the pre-population more smooth in time by using the hash of MetricID instead of MetricID itself
  when calculating the need for for the given MetricID pre-population.

- Sync the logic for pre-population of the next day inverted index with the logic of pre-populating tsid cache
  after indexdb rotation. This should improve code maintainability.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/430
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401
2022-02-12 16:33:16 +02:00
artifactori
ea153e5f90
Show gce sdconfig zone on vmagent:8429/config (#2178)
* vmagent: add test for marshalling gce sdconfig with ZoneYAML

* vmagent: implement MarshalYAML for ZoneYAML on gce sdconfig
2022-02-12 00:39:23 +02:00
Roman Khavronenko
cf1a8bce6b
lib/index: reduce read/write load after indexDB rotation (#2177)
* lib/index: reduce read/write load after indexDB rotation

IndexDB in VM is responsible for storing TSID - ID's used for identifying
time series. The index is stored on disk and used by both ingestion and read path.

IndexDB is stored separately to data parts and is global for all stored data.
It can't be deleted partially as VM deletes data parts. Instead, indexDB is
rotated once in `retention` interval.

The rotation procedure means that `current` indexDB becomes `previous`,
and new freshly created indexDB struct becomes `current`. So in any time,
VM holds indexDB for current and previous retention periods.
When time series is ingested or queried, VM checks if its TSID is present
in `current` indexDB. If it is missing, it checks the `previous` indexDB.
If TSID was found, it gets copied to the `current` indexDB. In this way
`current` indexDB stores only series which were active during the retention
period.

To improve indexDB lookups, VM uses a cache layer called `tsidCache`. Both
write and read path consult `tsidCache` and on miss the relad lookup happens.

When rotation happens, VM resets the `tsidCache`. This is needed for ingestion
path to trigger `current` indexDB re-population. Since index re-population
requires additional resources, every index rotation event may cause some extra
load on CPU and disk. While it may be unnoticeable for most of the cases,
for systems with very high number of unique series each rotation may lead
to performance degradation for some period of time.

This PR makes an attempt to smooth out resource usage after the rotation.
The changes are following:
1. `tsidCache` is no longer reset after the rotation;
2. Instead, each entry in `tsidCache` gains a notion of indexDB to which
they belong;
3. On ingestion path after the rotation we check if requested TSID was
found in `tsidCache`. Then we have 3 branches:
3.1 Fast path. It was found, and belongs to the `current` indexDB. Return TSID.
3.2 Slow path. It wasn't found, so we generate it from scratch,
add to `current` indexDB, add it to `tsidCache`.
3.3 Smooth path. It was found but does not belong to the `current` indexDB.
In this case, we add it to the `current` indexDB with some probability.
The probability is based on time passed since the last rotation with some threshold.
The more time has passed since rotation the higher is chance to re-populate `current` indexDB.
The default re-population interval in this PR is set to `1h`, during which entries from
`previous` index supposed to slowly re-populate `current` index.

The new metric `vm_timeseries_repopulated_total` was added to identify how many TSIDs
were moved from `previous` indexDB to the `current` indexDB. This metric supposed to
grow only during the first `1h` after the last rotation.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-12 00:30:08 +02:00
Aliaksandr Valialkin
08428464e9
lib/storage: fix broken BenchmarkHeadPostingForMatchers for {i=~".*"} after f4dead529f
The commit f4dead529f makes such query to return nothing instead of all the time series.
This aligns more with Prometheus behaviour.
2022-02-12 00:27:10 +02:00
Roman Khavronenko
e3adcbec6e
lib/promscrape: support prometheus-like duration in scrape configs (#2169)
* lib/promscrape: support prometheus-like duration in scrape configs

The change allows to specify duration values like `1d`, `1w`
for fields `scrape_interval`, `scrape_timeout`, etc.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/817#issuecomment-1033384766
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/blockcache: make linter happy

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/promscrape: support prometheus-like duration in scrape configs

* add support for extra fields `scrape_align_interval` and `scrape_offset`;
* support Prometheus duration parsing for `__scrape_interval__`
and `__scrape_duration__` labels;

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

* wip

* docs/CHANGELOG.md: document the feature

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-11 16:17:00 +02:00
Aliaksandr Valialkin
3cb72ccc2a
lib/promscrape/discovery/kubernetes: add __meta_kubernetes_endpointslice_{label,annotation}* labels to be consistent with other role values for Kubernetes service discovery 2022-02-11 14:54:47 +02:00
Nikolay
4e7f7f3302
fixes service discovery for kubernetes (#2173)
* fixes service discovery for kubernetes
now it must take in account all pods that belong to the discovered endpoint and endpointslice
adds simple test for endpoints
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2134

* wip

* docs/CHANGELOG.md: document the change

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-11 13:34:22 +02:00
Aliaksandr Valialkin
f9a17cb5fe
lib/mergeset: tune indexdb/{indexBlocks,dataBlocks} cache sizes further according to production stats
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-02-10 19:09:46 +02:00
Aliaksandr Valialkin
a9bb22b213
lib/blockcache: use higher number of shards for higher number of CPU cores
This should reduce mutex contention and increase performance

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-02-10 19:06:12 +02:00
Aliaksandr Valialkin
db8c4054e5
lib/promscrape: fix errors in test config
The errors were discovered after enabling strict parse mode by default.
See 9bb60ab00f
2022-02-08 19:56:37 +02:00
Aliaksandr Valialkin
4507b111a9
lib/blockcache: split the cache into multiple shards
This should reduce contention on cache mutex on hosts with many CPU cores,
which, in turn, should increase overall throughput for the cache.

This should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-02-08 19:44:29 +02:00
Aliaksandr Valialkin
2455a988e4
lib/mergeset: tune sizes for indexdb/dataBlocks and indexdb/indexBlocks according to production workload
This should help with https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007#issuecomment-1032308742
2022-02-08 17:58:49 +02:00
Aliaksandr Valialkin
9bb60ab00f
lib/promscrape: set -promscrape.config.strictParse to true by default
This allows detecting long-living silent errors in -promscrape.config
2022-02-08 15:41:43 +02:00
Aliaksandr Valialkin
a19e7f8c5b
lib/blockcache: make fmt 2022-02-08 15:24:11 +02:00
Aliaksandr Valialkin
d0f785defd
lib/blockcache: eliminate possible race when Cache.Put is called for the same entry from multiple goroutines
The race could result in incorrect cache size tracking, which, in turn, could result in too frequent cache cleaning
2022-02-08 01:10:43 +02:00
Aliaksandr Valialkin
46bd2c4d6d
lib/blockcache: increase the lifetime for rarely accessed blocks from 2 minutes to 5 minutes
This should improve data ingestion speed if time series samples are ingested with interval bigger than 2 minutes.
The actual interval could exceed 2 minutes if the original interval between samples doesn't exceed 2 minutes
in the case of slow inserts. Slow inserts may appear in the following cases:

* Big number of new time series are pushed to VictoriaMetrics, so they couldn't be registered in 2 minutes.
* MetricName->tsid cache reset on indexdb rotation or due to unclean shutdown.
  In this case VictoriaMetrics needs to load MetricName->tsid entries for all the incoming series from IndexDB.
  IndexDB uses the block cache for increasing lookup performance. If the cache has no the needed block,
  then IndexDB reads and unpacks the block from disk. This requires an extra disk read IO and CPU.
  See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007

This also should increase performance for periodically executed queries with intervals from 2 minutes to 5 minutes.
See the previous similar commit - 43103be011

It is possible that the timeout can be increased further. Let's collect production numbers for this change
so the timeout could be adjusted further.
2022-02-08 00:15:56 +02:00
Aliaksandr Valialkin
e86b7cc9a5
lib/workingsetcache: use the original cache size limits when rotating caches
Previously limits for new caches were taken from cache stats.
These limits could mismatch the original limits. This could result in failed cache load
if the stored cache has been created with the limits obtained from cache stats.
2022-02-08 00:10:14 +02:00
Aliaksandr Valialkin
cde4664f0d
lib/blockcache: return proper number of entries from the cache
This has been broken in 0d7374ad2f
2022-02-07 19:28:42 +02:00
Aliaksandr Valialkin
b5b3c585b3
lib/promscrape: show the total number of scrapes and the total number of scrape errors per target at /targets page
This information may be useful when debugging unreliable scrape targets
2022-02-03 20:22:41 +02:00
Aliaksandr Valialkin
2968779f16
lib/promscrape: provide the ability to fetch target responses on behalf of vmagent or single-node VictoriaMetrics
This feature may be useful when debugging metrics for the given target located in isolated environment
2022-02-03 19:00:55 +02:00
Aliaksandr Valialkin
9c62b25ad6
lib/mergeset: pre-allocate data and items for inmemoryBlock in order to reduce memory allocations under high churn rate
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-02-01 00:57:14 +02:00
Aliaksandr Valialkin
4bdd10ab90
lib/bytesutil: split Resize* funcs to MayOverallocate and NoOverallocate for more fine-grained control over memory allocations
Follow-up for f4989edd96

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-02-01 00:18:42 +02:00
Aliaksandr Valialkin
e13ce2ee98
lib/encoding: substitute 64-bits.LeadingZeros64() with bits.Len64() 2022-01-31 23:36:48 +02:00
Aliaksandr Valialkin
a8509c112a
lib/storage: avoid allocations of tsidPrev on every blockStreamReader.NextBlock() call
This is a follow-up for 00b7c97d2a

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2082
2022-01-31 22:46:53 +02:00
Aliaksandr Valialkin
f50cf60534
lib/cgroup: fall back to runtime.NumCPU() when determining process_cpu_cores_available metric if it is impossible to determine cpu quota via cgroups
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2107
2022-01-31 20:30:14 +02:00
Aliaksandr Valialkin
ead66155ef
lib/cgroup: expose process_cpu_cores_available metric
This metric shows the number of CPU cores available to the process.
This allows creating alerting rules on CPU saturation with the following query:

    rate(process_cpu_seconds_total[5m]) / process_cpu_cores_available > 0.9

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2107
2022-01-31 20:24:41 +02:00
Aliaksandr Valialkin
96aa3761fc
lib/storage/table.go: add missing tb.ptwsLock.Unlock() before the return
This is a follow-up for a1083d0531

See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/2103
2022-01-28 14:15:42 +02:00
匠心零度
1999bbfe82
optimized code (#2103)
* optimized code ,because only the first error,so no need var errors []error

* optimized code ,because only the first error,so no need var errors []error

Co-authored-by: lirenzuo <lirenzuo@shein.com>
2022-01-28 14:15:41 +02:00
Aliaksandr Valialkin
f4989edd96
lib/bytesutil: split Resize() into ResizeNoCopy() and ResizeWithCopy() functions
Previously bytesutil.Resize() was copying the original byte slice contents to a newly allocated slice.
This wasted CPU cycles and memory bandwidth in some places, where the original slice contents wasn't needed
after slize resizing. Switch such places to bytesutil.ResizeNoCopy().

Rename the original bytesutil.Resize() function to bytesutil.ResizeWithCopy() for the sake of improved readability.

Additionally, allocate new slice with `make()` instead of `append()`. This guarantees that the capacity of the allocated slice
exactly matches the requested size. The `append()` could return a slice with bigger capacity as an optimization for further `append()` calls.
This could result in excess memory usage when the returned byte slice was cached (for instance, in lib/blockcache).

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-01-25 15:24:44 +02:00
Aliaksandr Valialkin
91f2af2d7a
lib/mergeset: allocate the needed amounts of memory when unmarshaling inmemoryBlock
This should reduce the memory required for indexdb/dataBlocks cache.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-01-24 18:50:40 +02:00
Aliaksandr Valialkin
4c13bae1cf
lib/logger: removed broken test after 746ee191e8 2022-01-24 12:14:32 +02:00
Aliaksandr Valialkin
746ee191e8
lib/logger/throttler.go: show the original location of the error and warning message
Previously the location inside LogThrottler implementation was shown. This could complicate debugging.
2022-01-23 13:55:00 +02:00
Aliaksandr Valialkin
0d7374ad2f
lib/blockcache: optimize blockcache a bit
- Optimize Cache.RemoveBlocksFromPart(), so it doesn't need to iterate over all the cached blocks.
- Cache blocks if there were no cache misses during the last 2 minutes.
  This may be the case when new blocks are added simultaneously to the storage and to the cache.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-01-23 13:13:45 +02:00
Aliaksandr Valialkin
ede93469ea
lib/mergeset: tune caches size limits for indexdb/dataBlocks and indexdb/indexBlocks
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-01-21 12:45:43 +02:00
Aliaksandr Valialkin
5f84b17ed6
lib/storage: properly limit cardinality when ingesting multiple samples for the same time series in a single request 2022-01-21 12:38:09 +02:00
Aliaksandr Valialkin
00b7c97d2a
lib/storage: verify that blocks in a single part are sorted by TSID when reading sequential blocks from the part
This may help narrowing down the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2082
2022-01-20 20:36:37 +02:00
Aliaksandr Valialkin
ea87f21e23
lib/storage: set bsm.Block to nil on error, so the previous block couldn't be used.
This may help nailing down the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2082
2022-01-20 20:13:14 +02:00
Aliaksandr Valialkin
9797c928ef
lib/blockcache: add missing dependency after 145337792d
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-01-20 18:50:44 +02:00
Aliaksandr Valialkin
145337792d
lib/{mergeset,storage}: properly limit cache sizes for indexdb
Previously these caches could exceed limits set via `-memory.allowedPercent` and/or `-memory.allowedBytes`,
since limits were set independently per each data part. If the number of data parts was big, then limits could be exceeded,
which could result to out of memory errors.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-01-20 18:37:17 +02:00
Aliaksandr Valialkin
1d05444b33
lib/promscrape: expose promscrape_stale_samples_created_total metric for monitoring the number of created stale samples 2022-01-14 01:00:46 +02:00
Aliaksandr Valialkin
80f03177c4
lib/promscrape/discovery/kubernetes: add __meta_kubernetes_node_provider_id label for discovered Kubernetes nodes in the same way as Prometheus does
See https://github.com/prometheus/prometheus/pull/9603
2022-01-13 23:16:02 +02:00
Aliaksandr Valialkin
355a63733d
lib/promscrape/discovery/kubernetes: add the ability to limit service discovery to the current namespace
See https://github.com/prometheus/prometheus/issues/9782 and https://github.com/prometheus/prometheus/pull/9881
2022-01-13 22:44:35 +02:00
Aliaksandr Valialkin
17eb86a689
lib/promscrape/discovery/dockerswarm: follow up after 68a117a25a
- Document the bugfix at docs/CHANGELOG.md
- Set __address__ field after copying commonLabels to the resulting map of discovered labels.
  This makes sure that the correct __address__ label is used.
2022-01-11 09:20:10 +02:00
Alexander Shtuchkin
68a117a25a
Fix for #2038: Make correct __address__ value for dockerswarm promscrape (#2041) 2022-01-11 08:59:06 +02:00
Aliaksandr Valialkin
e4e36383e2
lib/promscrape: do not send staleness markers on graceful shutdown
This follows Prometheus behavior.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2013#issuecomment-1006994079
2022-01-07 01:17:57 +02:00
Aliaksandr Valialkin
178dd87e26
lib/storage: follow-up for 38bf5fc136 2022-01-05 16:00:11 +02:00
weng zhao
38bf5fc136
vmstorage: fix query like {foo=~"bar|"} return extra timeseries cause by negative filter transformation malfunction (#2032)
1. L2749 make kb.B remain the value of comonPrefix instead of tf.prefix
2. L2762 avoid change tf.value from "bar|" to ".+r|"
2022-01-05 15:59:15 +02:00
Aliaksandr Valialkin
cbaa2af280
lib/promscrape: scrape replicated targets at different offsets in vmagent replicated clustering mode
This guarantees that the deduplication consistently leaves samples from the same vmagent replica.

See https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets
2021-12-23 00:20:39 +02:00
Nikolay
8ff7da7202
adds restore.lock (#1988)
* adds restore.lock
it must prevent from running storage after incomplete restore process
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1958

* return back flock file deletion

* Apply suggestions from code review

* wip

* docs/CHANGELOG.md: document https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1958

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2021-12-22 13:10:15 +02:00
Aliaksandr Valialkin
ce333f28d8
all: use logger.WithThrottler() where appropriate 2021-12-21 17:03:25 +02:00
Roman Khavronenko
34fdc8881b
vmagent: add error log for skipped data block when rejected by receiv… (#1956)
* vmagent: add error log for skipped data block when rejected by receiving side

Previously, rejected data blocks were silently dropped - only metrics were update.
From operational perspective, having an additional logging for such cases is preferable.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1911

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmagent: throttle log messages about skipped blocks

The new type of logger was added to logger pacakge.
This new type supposed to control number of logged messages
by time.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/logger: make LogThrottler public, so its methods can be inspected by external packages

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2021-12-21 16:36:09 +02:00
Aliaksandr Valialkin
b9363d9726
lib/promscrape: take into account the original job_name when creating an unique key per each scrape target
This should handle the case when the original job_name has been changed in -promscrape.config ,
while the resulting job label remains the same because it is overriden via relabeling.
2021-12-20 18:38:05 +02:00
Aliaksandr Valialkin
afafeb379a
all: typo fix: unexected -> unexpected 2021-12-20 17:39:52 +02:00
Aliaksandr Valialkin
5a36e241f4
lib/persistentqueue: check that readerOffset doesnt exceed writerOffset after each readerOffset increase
This should help detecting the source of the panic from https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1981
2021-12-20 17:25:11 +02:00
Aliaksandr Valialkin
8a7f08ded3
lib/storage: properly update per-part min_dedup_interval file contents after merge
Previously 0s was always written even if -dedup.minScrapeInterval was set to non-zero value

This is a follow-up for 4ff647137a
2021-12-17 20:13:24 +02:00
Aliaksandr Valialkin
a3adf24527
lib/promscrape: allow up to 5 redirects when scraping a target by default
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1945
2021-12-16 00:14:14 +02:00
Aliaksandr Valialkin
4ff647137a
lib/storage: deduplicate samples more thoroughly
Previously some duplicate samples may be left on disk for time series with high churn rate.
This may result in higher disk space usage.
2021-12-15 15:59:58 +02:00
Aliaksandr Valialkin
92070cbb67
lib/storage: return dedup interval in milliseconds from GetDedupInterval()
This removes duplicate .Milliseconds() calls after GetDedupInterval() calls.
2021-12-15 13:26:38 +02:00
Aliaksandr Valialkin
1d20a19c7d
lib/storage: explicitly pass dedupInterval to DeduplicateSamples() and deduplicateSamplesDuringMerge()
This improves the code readability and debuggability, since the output of these functions
stops depending on global state.
2021-12-14 20:49:12 +02:00
Aliaksandr Valialkin
e1a715b0f5
lib/storage: convert alternate regexps into Graphite wildcards inside __graphite__ pseudo-label
For example, `{__graphite__=~"foo.(bar|baz)"}` is automatically converted to `{__graphite__=~"foo.{bar,baz}"}` before execution.
This allows using multi-value Grafana template variables such as `{__graphite__=~"foo.($app)"}`.
2021-12-14 19:51:49 +02:00
Yury Molodov
c1fd93e8a0
vmui: multiple queries (#1916)
* feat: change duration by "enter"

* fix: optimize data processing for chart

* feat: set minimum step to 1ms

* update dependencies

* feat: remove save the last query to local storage

* fix: handle an error in a table with subqueries

* feat: store display type in URL

* Revert "feat: store display type in URL"

This reverts commit ccc242c69a.

* feat: store display type in URL

* refactor: move the time setting to a folder

* refactor: move the query configurator to a folder

* refactor: move the auth settings to a folder

* feat: improve styles

* feat: add multi query

* update package-lock

* feat: add display multiple queries

* feat: add limits for multiple queries

* update dependencies

* feat: add history for multiple queries

* feat: add line type to legend

* feat: change style for switch

* feat: change the logic for axes limits for multiple queries

* update package-lock.json

* update dependencies

* feat: add the filter to legend

* wip

* lib/httpserver: add missing 127.0.0.1 hostname to the logged address for http and pprof server if the address starts with ':'

This allows copy-pasting the url to http server from logs.

* lib/httpserver: add missing 127.0.0.1 hostname to the logged address for http and pprof server if the address starts with ':'

This allows copy-pasting the url to http server from logs.

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2021-12-08 16:40:15 +02:00
Aliaksandr Valialkin
45d082bbe2
app/vminsert: add -maxLabelValueLen command-line flag
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1908
2021-12-06 11:40:34 +02:00
Aliaksandr Valialkin
da402fbdfa
lib/workingsetcache: fix unaligned 64-bit atomic operation panic on 32-bit architectures
The panic has been introduced in 7275ebf91a
2021-12-03 01:21:51 +02:00
Aliaksandr Valialkin
06642d97f5
app: allow specifying http and https urls in the following command-line flags
* -promscrape.config
* -relabelConfig
* -remoteWrite.relabelConfig
* -remoteWrite.urlRelabelConfig
2021-12-03 00:10:02 +02:00
Aliaksandr Valialkin
62b4efb3e7
app/vmauth: follow-up for 13368bed18
* Document the ability to specify http or https urls in `-auth.config` at docs/CHANGELOG.md
* Move the ReadFileOrHTTP to lib/fs, so it can be re-used in other places where a file
  should be read from the given path. For example, in `-promscrape.config` at `vmagent`.
2021-12-02 23:32:05 +02:00
Aliaksandr Valialkin
394a345ae0
lib/httpserver: expose /-/healthy and /-/ready endpoints as Prometheus does
This improves integration with third-party solutions, which rely on these endpoints.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1833
2021-12-02 14:36:58 +02:00
Aliaksandr Valialkin
90c542af12
app: use relative paths instead of absolute paths for the supported http handlers on the main page
This allows hiding VictoriaMetrics components behind proxies, which serve pages at different path prefixes

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1858
2021-12-02 13:52:39 +02:00
Aliaksandr Valialkin
03f5ad3060
lib/protoparser/graphite: allow multiple separators between metric name, value and timestamp 2021-12-02 13:43:49 +02:00
Aliaksandr Valialkin
49a18b8660
lib/protoparser/graphite: properly parse Graphite line with whitespace after the timestamp
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1865
2021-12-02 13:33:26 +02:00
Aliaksandr Valialkin
c0cbf0de2a
app/{vmbackup,vmrestore}: export internal metrics at /metrics http handler 2021-12-02 11:55:58 +02:00
Aliaksandr Valialkin
7275ebf91a
app/vmstorage: export vm_cache_size_max_bytes metrics for determining capacity of various caches
The vm_cache_size_max_bytes metric can be used for determining caches which reach their capacity via the following query:

   vm_cache_size_bytes / vm_cache_size_max_bytes > 0.9
2021-12-02 10:30:43 +02:00
Aliaksandr Valialkin
2f63dec2e3
lib/fs: add vm_filestream_read_duration_seconds_total and vm_filestream_write_duration_seconds_total metrics
These metrics help determining persistent disk saturation with `rate(vm_filestream_read_duration_seconds_total) > 0.9`
2021-12-02 10:30:42 +02:00
Aliaksandr Valialkin
2fb5a6ca78
lib/storage: do not take into account -storage.minFreeDiskSpaceBytes during background merges 2021-12-01 11:02:36 +02:00
Nikolay
06eff5a72c
removes FileSize from backup part key (#1872)
* removes FileSize from backup part key
it should fix download restoration for backups

* Update lib/backup/common/part.go

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2021-12-01 11:01:28 +02:00
Aliaksandr Valialkin
d666755159
lib/storage: take into account -storage.minFreeDiskSpaceBytes when performing big merges
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/269
2021-11-30 12:56:35 +02:00
guidao
f05cddd2fc
fix #1830 (#1861)
Co-authored-by: wangfeng <wangfeng@zhihu.com>
2021-11-30 01:12:24 +02:00
Aliaksandr Valialkin
ba927d1c77
lib/protoparser/prometheus: follow-up for 8e338632a3
Do not spend CPU time on error message formatting if error logger is disabled
2021-11-30 00:50:11 +02:00
Nikolay
8e338632a3
Changes unmarshallRow logger to noop for getRowsDiff (#1835) 2021-11-30 00:48:13 +02:00
Aliaksandr Valialkin
d44c585ca4
lib/protoparser: do not log connection reset by peer error when reading the data via InfluxDB, Graphite and OpenTSDB protocols over plain TCP connections
This error is expected, so there is no need in spamming the log with this error.
2021-11-29 21:47:56 +02:00
Aliaksandr Valialkin
b688960db0
lib/persistentqueue: add vm_persistentqueue_read_duration_seconds_total and vm_persistentqueue_write_duration_seconds_total metrics for determining disk usage saturation at vmagent 2021-11-17 16:41:35 +02:00
Lan
b72eed1f5e
Add flag of S3ForcePathStyle (#1802) 2021-11-17 01:03:03 +02:00
Aliaksandr Valialkin
e5ac9d8e57
all: consistently return application/json content-type without charset=utf-8
The `application/json` content-type has utf-8 encoding by default.
See https://stackoverflow.com/questions/9254891/what-does-content-type-application-json-charset-utf-8-really-mean

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/897
2021-11-09 18:04:44 +02:00
Aliaksandr Valialkin
fd596945e7
lib/promscrape: improve logging for scrape_config_files parse errors
Log the actual file path, which led to the parse error.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1789
2021-11-08 13:34:12 +02:00
Aliaksandr Valialkin
cbfc7b7c92
app/{vminsert,vmagent}: hide passwords and auth tokens by default at /config page
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1764
2021-11-05 14:41:16 +02:00
Aliaksandr Valialkin
e73a82f7a5
lib/promauth: do not show empty values in oauth2 config section at /config page 2021-11-05 12:53:39 +02:00
Aliaksandr Valialkin
aa534c2582
lib/promscrape: add -promscrape.maxResponseHeadersSize command-line flag for tuning the maximum http response headers size from Prometheus scrape targets 2021-11-03 22:26:56 +02:00
Aliaksandr Valialkin
d1eb87c831
app/{vmagent,vminsert}: add ability to restrict access to /config page with authKey query arg
The authKey can be configured via `-configAuthKey` command-line flag.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1764
2021-11-01 16:44:54 +02:00
Aliaksandr Valialkin
bb87949d5c
lib/protoparser/influx: automatically detect timestamp precision depending on the number of decimal digits in the timestamp 2021-10-28 12:47:22 +03:00
Aliaksandr Valialkin
d0e7c0535e
lib/logger: show only explicitly set command-line flags in logs
This reduces initial verbosity in logs
2021-10-28 11:00:52 +03:00
Aliaksandr Valialkin
74b8af9891
lib/promscrape: add collapse and expand buttons per each group of targets from the same scrape job 2021-10-27 20:03:24 +03:00
Aliaksandr Valialkin
6608705652
app/{vmalert,vmagent}: improve the distribution of scrape offsets among targets / rules
Previously only the lower part of 64-bit hash was used for calculating the offset.
This may give uneven distribution in some cases. So let's use all the available 64 bits from the hash
for calculating the offset.
2021-10-27 19:59:16 +03:00
Aliaksandr Valialkin
e3a91b186a
lib/protoparser/prometheus: optimize GetRowsDiff() function
This should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1745 ,
since the provided profile shows that the majority of CPU and memory is spent in this function
during `streamParse` when `-promscrape.noStaleMarkers` wasn't set.
2021-10-27 18:54:45 +03:00
Aliaksandr Valialkin
95d44157fc
lib/protoparser/prometheus: add a benchmark for GetRowsDiff 2021-10-27 18:53:54 +03:00
Aliaksandr Valialkin
1952ab99aa
all: fix build issues and tests for Apple M1
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1653
2021-10-27 15:06:34 +03:00
Aliaksandr Valialkin
4821adfd95
lib/promscrape: properly show proxy_url option value at /config page
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1755
2021-10-26 21:23:54 +03:00
Aliaksandr Valialkin
7fa15f7f86
lib/promscrape: do not populate response body to memory in stream parsing mode if -promscrape.noStaleMarkers is set
The response body isn't used if -promscrape.noStaleMarkers is set after the commit 2876137c92 ,
so there is no sense in pupulating it in memory. This should reduce memory usage when scraping big responses.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1728#issuecomment-949630694
2021-10-22 16:44:44 +03:00
Aliaksandr Valialkin
6106d4069d
lib/promscrape: do not sort original labels and do not intern label string for the original labels before the sharding code is executed
This should reduce CPU and memory usage in shard mode when service discovery finds big number of scrape targets with many long labels.
See https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets

This is a follow-up after 9882cda8b9

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1728
2021-10-22 13:54:30 +03:00
Aliaksandr Valialkin
2876137c92
lib/promscrape: reduce memory usage if -promscrape.noStaleMarkers command-line flag is passed
Do not store in memory the response from the last scrape per each target if -promscrape.noStaleMarkers option is enabled.
This should reduce memory usage when the scraped targets return large responses.
2021-10-22 13:10:29 +03:00
Nikolay
a3684fe3de
adds tab as second separator for graphite text protocol (#1733)
* adds tab as second separator for graphite text protocol

* changes indexFunc for indexAny

* Update lib/protoparser/graphite/parser_test.go

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2021-10-22 12:23:45 +03:00
Aliaksandr Valialkin
8991c8b589
lib/flagutil: do not expose sensitive info (passwords, keys and urls) at /flags page 2021-10-20 00:51:26 +03:00
Aliaksandr Valialkin
8ad95f0db7
lib/httpserver: expose command-line flags at /flags page
This should simplify debugging.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1695
2021-10-20 00:45:09 +03:00
Aliaksandr Valialkin
676ad70d9f
lib/envflag: use flag.Set for setting the flags from env vars
This should make visible the set flags at flag.Visit(), which is used later for logging
and exporting the `is_set` label for these flags at /metrics page
2021-10-20 00:41:08 +03:00
Aliaksandr Valialkin
53bb58ed2a
lib/storage: log a warning when the -storageDataPath has less than -storage.minFreeDiskSpaceBytes
This should improve the debuggability of the readonly feature.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1727
2021-10-19 23:59:13 +03:00
Aliaksandr Valialkin
3408a05d12
lib/promscrape/discovery/kubernetes: log a warning if role: endpoints discovers more than 1000 targets per a single endpoint
In this case `role: endpointslice` must be used instead.

See the following references:

* https://kubernetes.io/docs/reference/labels-annotations-taints/#endpoints-kubernetes-io-over-capacity
* https://github.com/kubernetes/kubernetes/pull/99975
* https://github.com/prometheus/prometheus/issues/7572#issuecomment-934779398
2021-10-19 13:20:40 +03:00
Nikolay
cbcc622786
changes job source for /target api (#1723)
use jobNameOriginal instead of relabeled as prometheus does

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1707
2021-10-19 08:49:36 +03:00
Aliaksandr Valialkin
c37f285466
lib/promscrape: set honor_timestamps: true by default if this option isnt set explicitly in scrape configs
This aligns the behavior to Prometheus - see https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config
2021-10-16 20:49:08 +03:00
Aliaksandr Valialkin
c055bc478c
lib/promscrape: expose promscrape_series_limit_max_series and promscrape_series_limit_current_series metrics per each scrape target with the enabled unique series limiter 2021-10-16 18:47:13 +03:00
Aliaksandr Valialkin
06b0982d6b
lib/promscrape: always initialize http client for stream parsing mode
Stream parsing mode can be automatically enabled when scraping targets with big response bodies
exceeding the -promscrape.minResponseSizeForStreamParse , so it must be always initialized.
2021-10-16 13:18:23 +03:00
Aliaksandr Valialkin
32793adbd9
lib/promscrape: store the last scraped response in compressed form if its size exceeds -promscrape.minResponseSizeForStreamParse
This should reduce memory usage when scraping targets with big response bodies.
2021-10-16 13:00:30 +03:00
Aliaksandr Valialkin
9866dd95c1
lib/promscrape: store the full response in stream parsing mode in scrapeWork.lastScrape byte slice
This allows sending staleness marks and properly calculate scrape_series_added metric in stream parsing mode
at the cost of the increased memory usage, since now the potentially big response is kept
in the lastScrape byte slice per each scrapeWork.

In practice the memory usage increase shouldn't be big, since the response size
is usually much smaller than the parsed metrics from this response after the relabeling,
which usually adds a big pile of target-specific labels per each metric.
2021-10-15 15:39:23 +03:00
Aliaksandr Valialkin
f6d33596ff
lib/promscrape/discovery/kubernetes: rename endpointslices.go -> endpointslice.go in order to be consistent with EndpointSlice struct name
This is a follow-up for 31b42b30b6
2021-10-15 12:27:12 +03:00
Aliaksandr Valialkin
bbd34fa15e
lib/promscrape: add -promscrape.minResponseSizeForStreamParse command-line option for automatic switching to stream parsing mode when scraping targets with big responses
This should reduce memory usage when vmagent scrapes targets with non-uniform response sizes.
This is common case in Kubernetes monitoring.
2021-10-14 12:29:35 +03:00
Aliaksandr Valialkin
1a7287c408
lib/promscrape: return error if sample_limit or series_limit options are set when stream parsing mode is enabled 2021-10-14 12:11:23 +03:00
Aliaksandr Valialkin
e3c8304deb
lib/promscrape: add ability to show the original labels for discovered targets at /targets page
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1698
2021-10-13 15:59:58 +03:00
Roman Khavronenko
c0a932a55f
lib/promscrape: make errcheck happy (#1703) 2021-10-13 14:57:30 +03:00
Aliaksandr Valialkin
9882cda8b9
lib/promscrape: shard targets among cluster nodes after relabeling is applied
This guarantees that targets with the same set of labels go to the same vmagent node.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1687#issuecomment-940629495
2021-10-12 17:06:00 +03:00
Aliaksandr Valialkin
5a58c041c2
app/vmagent: expose -promscrape.config contents at /config page as Prometheus does
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1695
2021-10-12 16:25:37 +03:00
Aliaksandr Valialkin
873aac584e
lib/promscrape: use Prometheus format for target labels at /targets page
This should simplify copy-pasting the labels to/from PromQL / MetricsQL
2021-10-11 12:41:37 +03:00
Aliaksandr Valialkin
001750c239
lib/storage: fix unaligned access on 32-bit architectures.
The bug has been introduced at a171916ef5
2021-10-08 19:43:03 +03:00
Aliaksandr Valialkin
cf5cbd1c70
app/{vminsert,vmstorage}: follow-up after a171916ef5
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/269
2021-10-08 14:35:49 +03:00
Nikolay
4290b46e8c
Adds read-only mode for vmstorage node (#1680)
* adds read-only mode for vmstorage
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/269

* changes order a bit

* moves isFreeDiskLimitReached var to storage struct
renames functions to be consistent
change protoparser api - with optional storage limit check for given openned storage

* renames freeSpaceLimit to ReadOnly
2021-10-08 14:35:48 +03:00
Ziqi Zhao
402c995d6d
fix some typos (#1678)
Co-authored-by: 柘远 <zzq237937@alibaba-inc.com>
2021-10-06 14:43:10 +03:00
Aliaksandr Valialkin
6ee66fb6b1
lib/promscrape: reduce memory allocations in mergeLabels() after 48e3e6c8df 2021-09-30 16:56:12 +03:00
Aliaksandr Valialkin
463a5bf76e
lib/protoparser: go fmt 2021-09-29 21:19:00 +03:00
Aliaksandr Valialkin
58964d52a5
lib/protoparser/prometheus: compare invalid Prometheus lines in full 2021-09-29 19:41:28 +03:00
Aliaksandr Valialkin
d80d72efec
app/{vmbackup,vmrestore}: switch from gcs://... to gs://... urls for backups to GCS
The `gs://` urls are commonly used, so prefer them instead of `gcs://` urls,
while leaving support for `gcs://` urls for backwards compatibility.
2021-09-29 12:10:29 +03:00
Nikolay
3d17112a7e
changes auth validation for openstack (#1663)
* changes auth validation for openstack
must fix https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1655

* Apply suggestions from code review

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2021-09-29 00:28:49 +03:00
Aliaksandr Valialkin
91b3c601bc
app/{vminsert,vmagent}: add ability to ingest data via DataDog "submit metrics" API
See https://docs.datadoghq.com/api/latest/metrics/#submit-metrics

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/206
2021-09-29 00:13:08 +03:00
Aliaksandr Valialkin
718eca33ab
lib/storage: properly handle {__name__=~"prefix(suffix1|suffix2)",other_label="..."} queries
They were broken in the commit 00cbb099b6

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1644
2021-09-23 21:48:51 +03:00
Aliaksandr Valialkin
a0313c046b
lib/promscrape: add vm_promscrape_max_scrape_size_exceeded_errors_total metric for counting of the failed scrapes due to the exceeded response size
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1639
2021-09-23 14:47:54 +03:00
Aliaksandr Valialkin
9ca1cbced1
lib/httpserver: add -enterprise and/or -cluster suffixes to short_version label of vm_app_version metric
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1635
2021-09-21 23:12:42 +03:00
Aliaksandr Valialkin
207c5760ce
lib/promrelabel: fix parsing regex: true in relabeling rules 2021-09-21 23:00:53 +03:00
Nikolay
ad08d9dfc0
changes protoparser apis for accepting reading from io.Reader (#1624)
adds InsertHandlerForReader apis to vmagent
2021-09-20 14:49:28 +03:00
Nikolay
0e09fdb8b0
makes filters optional for ec2 api requests (#1627)
filters can be applied only for DescribeInstances requests, like prometheus does.
related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1626
2021-09-17 18:00:37 +03:00
Aliaksandr Valialkin
8f685d81c6 lib/storage: follow up after 00cbb099b6 2021-09-14 14:16:25 +03:00
faceair
00cbb099b6
lib/storage: optimize convert multiple values regexp filter to composite tag filter (#1610)
* lib/storage: optimize convert multiple values regexp filter to composite tag filter

* Apply suggestions from code review

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-09-14 12:47:07 +03:00
Aliaksandr Valialkin
7f0a8d4bdb docs: consistency renaming: Influx -> InfluxDB 2021-09-13 17:05:16 +03:00
Aliaksandr Valialkin
fb6ed0ce19 lib/promscrape/discovery/docker: support host networking mode
See https://github.com/prometheus/prometheus/issues/9116
2021-09-13 13:30:16 +03:00
Aliaksandr Valialkin
6295861acd lib/promscrape/discovery/kubernetes: properly use https scheme for wildcard TLS certificates in ingress target discovery 2021-09-13 13:03:42 +03:00
Aliaksandr Valialkin
728c4c3841 lib/promscrape: generate scrape_timeout_seconds metric per each scrape target in the same way as Prometheus 2.30 does
See https://github.com/prometheus/prometheus/pull/9247
2021-09-12 15:20:44 +03:00
Aliaksandr Valialkin
0b4eb0fa7d lib/promscrape: make fmt 2021-09-12 13:34:15 +03:00
Aliaksandr Valialkin
48e3e6c8df lib/promscrape: add ability to configure scrape_timeout and scrape_interval via relabeling
See https://github.com/prometheus/prometheus/pull/8911
2021-09-12 13:33:41 +03:00
Aliaksandr Valialkin
f3e89754a9 lib/promscrape: reduce CPU usage for common case when calculating scrape_series_added metric
Also reduce CPU usage when applying `series_limit` to scrape targets with constant set of metrics.

The main idea is to perform the calculations on scrape_series_added and series_limit
only if the set of metrics exposed by the target has been changed.
Scrape targets rarely change the set of exposed metrics,
so this optimization should reduce CPU usage in general case.
2021-09-12 12:53:14 +03:00
Aliaksandr Valialkin
cebcb15ba4 lib/storage: verify that the tsidsFound contain the needed tsids in tests added at f4dead529f 2021-09-11 10:57:13 +03:00
Aliaksandr Valialkin
9286107e82 lib/promscrape: send stale markers for disappeared metrics like Prometheus does 2021-09-11 10:51:04 +03:00
Aliaksandr Valialkin
f4dead529f lib/storage: properly search series by multiple tag filters matching empty labels such as foo{bar=~"baz|",x=~"y|"}
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1601
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/395
2021-09-09 21:09:21 +03:00
Aliaksandr Valialkin
4aeb8db83f lib/promscrape: add ability to set series_limit and stream_parse options via relabeling
This allows managing these options on a per-target basis.

Typical use case: to manage these options for pods via Kubernetes annotations.
2021-09-09 18:49:39 +03:00
Aliaksandr Valialkin
468f941f7e lib/promscrape: add the actual job name to the labels of promscrape_series_limit_rows_dropped_total metric 2021-09-09 17:37:37 +03:00
Aliaksandr Valialkin
086b5d0cf1 lib/promscrape: add scrape_ prefix to job and target labels exported by promscrape_series_limit_rows_dropped_total metric
This is needed in order to prevent from possible clash with the corresponding (job, target) labels for the job, which scrapes this metric.
2021-09-09 17:29:21 +03:00
Aliaksandr Valialkin
d6bd956930 lib/promrelabel: add keep_metrics and drop_metrics actions to relabeling rules
These actions simlify metrics filtering. For example,

- action: keep_metrics
  regex: 'foo|bar|baz'

would leave only metrics with `foo`, `bar` and `baz` names, while the rest of metrics will be deleted.

The commit also makes possible to split long regexps into multiple lines. For example, the following config is equivalent to the config above:

- action: keep_metrics
  regex:
  - foo
  - bar
  - baz
2021-09-09 16:18:21 +03:00
Aliaksandr Valialkin
f77dde837a lib/promscrape: add the ability to limit the number of unique series per each scrape target
The number of series per target can be limited with the following options:

* Global limit with `-promscrape.maxSeriesPerTarget` command-line option.
* Per-target limit with `max_series: N` option in `scrape_config` section.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1561
2021-09-01 16:03:59 +03:00
Aliaksandr Valialkin
5c63d69454 lib/promscrape/discovery/kubernetes: return back support role: endpointslices, since it is used by VictoriaMetrics operator
This is a follow up commit after 31b42b30b6
2021-08-29 12:37:03 +03:00
Aliaksandr Valialkin
db330232ac lib/protoparser/opentsdb: follow-up after 8ee75ca45a 2021-08-29 11:49:21 +03:00
envzhu
8ee75ca45a
lib/protoparser/opentsdb: accept multiple spaces between fields in a row as a deliminator. (#1575) 2021-08-29 11:38:32 +03:00
Aliaksandr Valialkin
31b42b30b6 lib/promscrape/discovery/kubernetes: rename role: endpointslices to role: endpointslice to be consistent with Prometheus
See 2ec6c7dbb8/discovery/kubernetes/kubernetes.go (L99)
2021-08-29 11:23:08 +03:00
Aliaksandr Valialkin
2e001db4de lib/promscrape/discovery/kubernetes: use v1 API instead of v1beta1 API for role: ingress and role: endpointslices
This should fix service discovery for these roles in Kubernetes v1.22 and newer versions.
See https://kubernetes.io/docs/reference/using-api/deprecation-guide/#ingress-v122

The corresponding change in Prometheus - https://github.com/prometheus/prometheus/pull/9205
2021-08-29 11:16:59 +03:00
Aliaksandr Valialkin
10f960fa0c lib/promscrape: add ability to load scrape configs from multiple files
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1559
2021-08-26 08:51:16 +03:00
Aliaksandr Valialkin
c27ee35c5c lib/promscrape: expose promscrape_discovery_http_errors_total metric for tracking errors per each http_sd config 2021-08-25 13:05:49 +03:00
Aliaksandr Valialkin
ffc0ab1774 lib/{mergeset,storage}: improve the detection of the needed free space for background merge
This should prevent from possible out of disk space crashes during big merges.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1560
2021-08-25 09:35:44 +03:00
Aliaksandr Valialkin
d5622b32e2 lib/promscrape: reduce memory and CPU usage when Prometheus staleness tracking is enabled for metrics from deleted / disappeared scrape targets
Store the scraped response body instead of storing the parsed and relabeld metrics.
This should reduce memory usage, since the response body takes less memory than the parsed and relabeled metrics.
This is especially true for Kubernetes service discovery, which adds many long labels for all the scraped metrics.

This should also reduce CPU usage, since the marshaling of the parsed
and relabeld metrics has been substituted by response body copying.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1526
2021-08-21 21:17:26 +03:00
Aliaksandr Valialkin
f46a73dcdd lib/promscrape: use scrapeTimestamp when storing stale markers for failed scrape
This will make timestamps for stale markers more consistent for timestamps for other samples
2021-08-19 14:18:05 +03:00
Aliaksandr Valialkin
c09446a9aa lib/promscrape: send stale markers for the previously scraped metrics on failed scrapes like Prometheus does 2021-08-18 21:59:03 +03:00
Aliaksandr Valialkin
cdc372bb98 app/vmselect: add -search.noStaleMarkers command-line flag for disabling stale markers handling in queries
This option allows reducing CPU usage a bit when VictoriaMetrics is used
for collecting and processing non-Prometheus data. For example, InfluxDB line protocol, Graphite, OpenTSDB, CSV, etc.
2021-08-18 13:59:02 +03:00
Aliaksandr Valialkin
226143f31b lib/promscrape: add ability to disable sending Prometheus staleness markers with -promscrape.disableStaleMarkers command-line flag
This option can be useful when vmagent consumes too much additional memory
for staleness markers functionality and when staleness markers aren't needed.
2021-08-18 13:43:21 +03:00
Aliaksandr Valialkin
03c959f1df lib/promscrape: stop scrapers for the removed targets before starting scrapers for the added targets
This should prevent from possible time series overlap when old target is substituted by new target (for example, during Kubernetes deployments).

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1526
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1530
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/748
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1509
2021-08-17 00:55:51 +03:00
Aliaksandr Valialkin
a0e18f06eb lib/promscrape: restore red highlighting for DOWN targets at /targets page
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1461
2021-08-15 16:03:57 +03:00
Aliaksandr Valialkin
4401464c22 all: add support for Prometheus staleness markers
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1526
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/748
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1509
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1530
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/845
2021-08-13 12:10:17 +03:00
Aliaksandr Valialkin
d375d9b878 lib/envflag: add a link to docs for -envflag.enable 2021-08-11 10:29:33 +03:00
Aliaksandr Valialkin
d826352688 app/vmagent: follow-up after fe445f753b
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1491
2021-08-05 09:52:32 +03:00
Omar Ghader
46e27d60a6 feature: Add multitenant for vmagent (#1505)
* feature: Add multitenant for vmagent

* Minor fix

* Fix rcs index out of range

* Minor fix

* Fix multi Init

* Fix multi Init

* Fix multi Init

* Add default multi

* Adjust naming

* Add TenantInserted metrics

* Add TenantInserted metrics

* fix: remove unused metrics for vmagent

* fix: remove unused metrics for vmagent

Co-authored-by: mghader <marc.ghader@ubisoft.com>
Co-authored-by: Sebastian YEPES <syepes@gmail.com>
2021-08-05 09:52:31 +03:00
Aliaksandr Valialkin
50663ba41f lib/promscrape/discovery/gce: add __meta_gce_interface_ipv4_<name> labels as in Prometheus 2.29
See https://github.com/prometheus/prometheus/pull/8978
2021-08-03 16:11:49 +03:00
Aliaksandr Valialkin
3cad8b4564 lib/promscrape/discovery/ec2: add __meta_ec2_availability_zone_id label as Prometheus 2.29 does 2021-08-03 16:11:49 +03:00
Aliaksandr Valialkin
d05cac6c98 li/storage: re-use the per-day inverted index search code for searching in global index
This allows removing a big pile of outdated code for global index search.

This may help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1486
2021-07-30 10:31:37 +03:00
Aliaksandr Valialkin
8ee8660ac4 app/vmselect: follow-up for 626073bca8
* Rename -search.maxMetricsPointSearch to -search.maxSamplesPerQuery, so it is more consistent with the existing -search.maxSamplesPerSeries
* Move the -search.maxSamplesPerQuery from vmstorage to vmselect, so it could effectively limit the number of raw samples obtained from all the vmstorage nodes
* Document the -search.maxSamplesPerQuery in docs/CHANGELOG.md
2021-07-28 18:00:23 +03:00
Nikolay
9d45b46f4c
adds check for region with custom s3 endpoint (#1465) 2021-07-27 12:35:38 +03:00
Aliaksandr Valialkin
c2deee9911 lib/storage: yet another attempt to properly determine disk space shortage, which prevents from optimal merges
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1373
2021-07-27 12:04:50 +03:00
Aliaksandr Valialkin
bb31117555 lib/promrelabel: add tests for verifying that regex works as expected in single quotes and double quotes 2021-07-27 10:50:55 +03:00
Aliaksandr Valialkin
8b7917cd81 all: add go:build lines for Go1.17
See https://tip.golang.org/doc/go1.17#gofmt for more details
2021-07-26 15:48:21 +03:00
Aliaksandr Valialkin
1318736ad1 lib/promscrape: add missing whitespace at /targets page before up word 2021-07-26 12:22:59 +03:00
Aliaksandr Valialkin
4ba3fd9e6d lib/workingsetcache: switch from split cache to full cache after the cache size exceeds 95% of split capacity
Previously the switch occurred when the cache size becomes 100% of its capacity. The cache size could never reach 100% capacity.
This could prevent from switching from the split cache to full cache, thus reducing the cache effectiveness.
2021-07-15 16:12:04 +03:00
Aliaksandr Valialkin
d472b03e34 lib/storage: make sure the second call to DeduplicateSamples and deduplicateSamplesDuringMerge doesnt change samples 2021-07-15 12:17:45 +03:00
Aliaksandr Valialkin
682662b2ae lib/storage: remove cache directory if it contains reset_cache_on_startup file
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1447
2021-07-13 17:58:51 +03:00
Aliaksandr Valialkin
2df66dad7b lib/httpserver: add is_set label to flag metrics
This label allows determining the set flags with the query `flag{is_set="true"}`
2021-07-13 15:10:13 +03:00
Aliaksandr Valialkin
f9de546139 lib/storage: reset perKeyMisses stats less frequently
This should reduce CPU usage for queries executed with intervals higher than 30 seconds
2021-07-12 14:33:42 +03:00
Aliaksandr Valialkin
4f80b2f230 lib/storage: properly limit the size of storage/date_metricID cache 2021-07-12 14:25:44 +03:00
Aliaksandr Valialkin
8ca2799478 lib/storage: properly determine when the deduplication is needed in needsDedup
Previously needsDedup() could return true if the de-duplication wasn't needed for the following case:

         d < interval
           /     \
   |        v | v        |
     interval   interval

Now it properly returns false for this case
2021-07-12 10:53:30 +03:00
Aliaksandr Valialkin
6e0553c92e lib/mergeset: cache indexBlock items only on the second request
This should reduce the indexdb/indexBlocks cache size, since it won't contain one-time-wonders items.
2021-07-07 15:23:06 +03:00
Aliaksandr Valialkin
766edbc421 lib/httpserver: print full requestURI in httpserver.Errorf
This should simplify debugging.
2021-07-07 13:09:40 +03:00
Aliaksandr Valialkin
e843bd7bd7 lib/storage: do not cache inmemoryBlock entries requested only once (aka one-time-wonder items)
This should reduce the cache size and memory usage for the indexdb/dataBlocks cache
2021-07-07 10:58:51 +03:00
Aliaksandr Valialkin
8b262d4ba7 lib/storage: periodically reset prefetchedMetricIDs cache in order to limit its size under high churn rate 2021-07-07 10:58:51 +03:00
Aliaksandr Valialkin
a7694092b8 Revert "lib/uint64set: allow reusing bucket16 structs inside uint64set.Set via uint64set.Release method"
This reverts commit 7c6d3981bf.

Reason for revert: high contention at bucket16Pool on systems with big number of CPU cores.
This slows down query processing significantly.
2021-07-06 18:21:35 +03:00
Aliaksandr Valialkin
8aa9bba9bd lib/{mergeset,storage}: switch from sync.Pool to chan-based pool for inmemoryPart objects
This should reduce memory usage on systems with big number of CPU cores,
since every inmemoryPart object occupies at least 64KB of memory and sync.Pool maintains
a separate pool inmemoryPart objects per each CPU core.

Though the new scheme for the pool worsens per-cpu cache locality, this should be amortized
by big sizes of inmemoryPart objects.
2021-07-06 16:28:41 +03:00
Aliaksandr Valialkin
7c6d3981bf lib/uint64set: allow reusing bucket16 structs inside uint64set.Set via uint64set.Release method
This reduces the load on memory allocator in Go runtime in production workload.
2021-07-06 15:35:03 +03:00
Aliaksandr Valialkin
78c9174682 lib/mergeset: increase pool capacity for inmemoryBlock according to collected profiles from production workload
CPU and memory profiles show that the pool capacity for inmemoryBlock objects is too small.
This results in the increased load on memory allocation code in Go runtime.
Increase the pool capacity in order to reduce the load on Go runtime.
2021-07-06 13:41:34 +03:00
Aliaksandr Valialkin
f71e4d1853 lib/mergeset: limit the frequency for flushCallback calls to once per 10 seconds
This should improve hit ratio for tagFiltersCache when big number of new time series are constantly registered
(aka high churn rate). This, in turn, should reduce CPU usage for queries over such time series.
2021-07-06 12:17:17 +03:00
Aliaksandr Valialkin
f3acf065c9 lib/storage: consistency renaming: tagCache -> tagFiltersCache
This improves code readability
2021-07-06 11:03:51 +03:00
Aliaksandr Valialkin
0020b9f904 lib/workingsetcache: properly update stats for requests and cache misses
Previously the stats for cache misses could be improperly counted, because it had inflated cache misses
if the entry was missing in the curr cache, but was existing in the prev cache.

The same applies to cache requests - they were inflated if the entry was missing in the curr cache.
2021-07-06 10:53:32 +03:00
Aliaksandr Valialkin
4cf47163c1 lib/workingsetcache: fix cache capacity calculations after 4f0003f182 2021-07-05 17:11:57 +03:00
Aliaksandr Valialkin
4f0003f182 lib/workingsetcache: typo fixes after d0c830039d 2021-07-05 15:35:37 +03:00
Aliaksandr Valialkin
d0c830039d lib/storage: tune cache sizes according to production workload 2021-07-05 15:16:11 +03:00
Aliaksandr Valialkin
8f973e34fb lib/workingsetcache: properly switch to whole mode
Previously the switch from `split` to `whole` mode had been performed too early,
e.g. when the current cache size became bigger than 1/4 of the allowed cache size.

Now it is performed when the current cache size becomes bigger than 1/2 of the allowed cache size.

This change can reduce memory usage for data ingestion path when big number of active time series are ingested.
2021-07-05 15:16:11 +03:00
Aliaksandr Valialkin
43103be011 lib/{storage,mergeset}: increase cache timeout for data and index blocks from a minute to two minutes
One minute cache timeout result in slower queries in some production workloads where the interval
between query execution is in the range 1 minute - 2 minutes.
2021-07-05 15:16:11 +03:00
Aliaksandr Valialkin
54b9e1d3cb lib/cgroup: set GOGC to 50 by default if it isn't set
This should reduce memory usage for typical VictoriaMetrics workloads by up to 50%
2021-07-05 15:16:11 +03:00
Aliaksandr Valialkin
9a83e9018d lib/storage: properly detect free disk space shortage during data merge
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1373
2021-07-02 17:40:54 +03:00
Aliaksandr Valialkin
7088f17494 lib/promscrape/discovery/consul: use case-insensitive comparison for service names
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1424
2021-07-02 14:49:27 +03:00
Aliaksandr Valialkin
6e406083f2 lib/promauth: cache the client TLS certificate for up to a second
This should reduce CPU usage when TLS connections are established at a high rate.
2021-07-02 13:21:51 +03:00
Aliaksandr Valialkin
158c50c0ee lib/promauth: reload TLS certificates from disk on every mTLS connection as Prometheus does
This allows updating client certificates without the need to restart vmagent and/or single-node VictoriaMetrics.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1420
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/470
2021-07-01 15:27:47 +03:00
Aliaksandr Valialkin
c25b839078 lib/workingsetcache: reset the cache mode when the cache is reset
This should reduce memory usage if the working set is reduced after the cache reset.
2021-07-01 11:50:11 +03:00
Nikolay
ae485c2bfd
fixes /targets button style (#1423)
* fixes /targets button style
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1422

* updates boostrap version
2021-07-01 11:48:07 +03:00
Aliaksandr Valialkin
c93cee8de8 lib/{mergeset,storage}: reduce the maximum lifetime for cached indexdb and data blocks from 2 minutes to a minute
This should reduce memory usage on a system with high number of active time series and a high churn rate.
One minute is enough for caching the blocks needed for repeated queries (e.g. alerting rules, recording rules and dashboard refreshes).
2021-06-29 19:57:07 +03:00
Aliaksandr Valialkin
fc12484734 lib/mergeset: switch from sync.Pool to a channel for a pool for inmemoryBlock structs
This should reduce memory usage for the pool on systems with big number of CPU cores.

The sync.Pool maintains per-CPU pools, so the total number of objects in the pool
is proportional to the number of available CPU cores. The channel limits the number
of pooled objects by its own capacity. This means smaller number of pooled objects on average.
2021-06-29 19:56:59 +03:00
Aliaksandr Valialkin
9ce211a514 lib/promscrape/discovery/docker: fix golint warning: struct field Id should be ID 2021-06-29 13:12:28 +03:00
Aliaksandr Valialkin
5506cff76e lib/storage: put indexDBName into the key for dateTagFilter cache and for uselessTagFilters cache
This should prevent from stats overwriting when the previous indexdb is queried.
2021-06-29 12:40:05 +03:00
Aliaksandr Valialkin
1b0501a09e lib/promscrape: typo fix in /targets output
The typo has been introduced in fb72a2133f

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1408
2021-06-28 21:26:37 +03:00
Aliaksandr Valialkin
cb5453953f lib/promscrape: split docker and dockerswarm service discovery code bases, since they have very little in common
This is a follow up after c85a5b7fcb
2021-06-25 13:20:20 +03:00
Aliaksandr Valialkin
a69045e440 lib/promscrape: consistently sort service discovery routines
This should simplify further maintenance of the code
2021-06-25 12:10:46 +03:00
Lu Jiajing
c85a5b7fcb
Support Docker ServiceDiscovery (#1402)
* add docker discovery

* add test

* add labels test and add scrape work

* remove TODO

* refactor to merge apiConfig and sdConfig

* apply suggestion
2021-06-25 11:42:47 +03:00
Nikolay
434e33da9b
adds missing MustStop call to do and http sd (#1404) 2021-06-25 11:39:18 +03:00
Aliaksandr Valialkin
c22114c6f0 lib/storage: tune tag filters search logic
Tune the logic according to the logs provided at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1338#issuecomment-864293624

The previous logic had a race when multiple concurrent queries execute the same tag filter without prior stats.
This could result in incorrectly stored stats for such tag filter, which then could result in non-optimal sorting of tag filters
for further queries.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1338
2021-06-23 13:29:39 +03:00
Aliaksandr Valialkin
e8a5bb92b7 lib/promscrape/discovery/consul: properly pass namespace to Consul watcher
Follow-up for 58a2989fe7
2021-06-22 17:42:41 +03:00
Aliaksandr Valialkin
ac54f34f9e lib/promscrape/discovery/http: follow up after e307bbb29a 2021-06-22 13:40:33 +03:00
Aliaksandr Valialkin
755040a171 lib/promscrape/discovery: support generic auth configs in Consul service discovery in the same way as Prometheus 2.28 does 2021-06-22 13:34:02 +03:00
Nikolay
e307bbb29a
adds http_sd (#1399)
* adds http_sd

* adds X-Prometheus-Refresh-Interval-Seconds header

* Update lib/promscrape/discovery/http/api.go

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-06-22 13:33:37 +03:00
Nikolay
58a2989fe7
adds consul enterprise namespace support (#1400)
* adds consul enterprise namespace support

* Update lib/promscrape/discovery/consul/consul.go

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-06-22 12:49:44 +03:00
Aliaksandr Valialkin
fb72a2133f lib/promscrape: show jobs with empty scrape targets on /targets page 2021-06-18 10:53:52 +03:00
Nikolay
6c434b260e
fixes DO service discovery labels (#1389)
adds test for digitalocean sd
2021-06-17 15:12:20 +03:00
Aliaksandr Valialkin
dcbc22552f lib/storage: fix infinite loop introduced in aa9b56a046 2021-06-17 14:28:10 +03:00
Aliaksandr Valialkin
aa9b56a046 lib/{mergeset,storage}: reduce the number of fsync calls on data ingestion path on systems with many cpu cores
VictoriaMetrics maintains a buffer per CPU core for the ingested data. These buffers are flushed to disk every second.
These buffers are flushed to disk in parallel starting from the commit 56b6b893ce .
This resulted in increased write disk IO usage on systems with many cpu cores
as described at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1338#issuecomment-863046999 .

This commit merges the per-CPU buffers into bigger in-memory buffers before flushing them to disk.
This should reduce the rate of fsync syscalls and, consequently, the write disk IO on systems with many CPU cores.

This should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1338
See also https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1244
2021-06-17 13:52:08 +03:00
Aliaksandr Valialkin
84fb59b0ba lib/storage: move deletedMetricIDs set from indexDB to Storage
This makes consitent the list of deleted metricIDs when it is used from both the current indexDB and the previous indexDB (aka extDB).
This should fix the issue, which could lead to storing new samples under deleted metricIDs after indexDB rotation.
See more details at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1347#issuecomment-861232136 .

Thanks to @tangqipengleoo for the initial analysis and the pull request - https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1383 .

This commit resolves the issue in more generic way compared to https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1383 .

The downside of the commit is the deletedMetricIDs set isn't cleaned from the metricIDs outside the retention. It needs app restart.
This should be OK in most cases.
2021-06-15 15:04:30 +03:00
Aliaksandr Valialkin
e028ad241a lib/protoparser: stop reading the input stream as soon as the callback provided by the caller returns error
This is a follow-up for af90c3c43b
2021-06-14 15:18:49 +03:00
faceair
af90c3c43b
lib/protoparser: stop read when callback error (#1380) 2021-06-14 15:10:58 +03:00
Aliaksandr Valialkin
36d55bff66 lib/promscrape: show the number of samples collected during the last scrape at /targets and /api/v1/targets pages
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1377
2021-06-14 14:04:00 +03:00
Nikolay
729c4eeb9c
adds digital ocean sd (#1376)
* adds digital ocean sd config

* adds digital ocean sd
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1367

* typo fix
2021-06-14 13:15:04 +03:00
Aliaksandr Valialkin
06b8e7d148 lib/promscrape: increase the duration for reading the full response in stream parsing mode
Increase the duration from 10x to 30x of the configured `scrape_interval'.

This should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1365
2021-06-14 12:28:09 +03:00
Aliaksandr Valialkin
48210130ac lib/protoparser: measure the duration for reading the whole block of data instead of a single read operation
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1365
2021-06-14 12:25:52 +03:00
Aliaksandr Valialkin
3c4366806c lib/protoparser/common: log the duration for reading a block of data in ReadLinesBlockExt on error
This may help debugging issues like https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1365
2021-06-14 12:22:04 +03:00
Aliaksandr Valialkin
ed83558646 app/vmauth: properly handle http.ErrAbortHandler panic
This panic can be raised by the reverseProxy on aborted request to the backend.
So handle it (e.g. suppress) at reverseProxy.ServeHTTP call.

Do not suppress the panic at lib/httpserver generic HTTP handler,
since it may result in an inconsistent state left after the panicking handler.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1353
2021-06-11 12:50:25 +03:00
Aliaksandr Valialkin
c4f3fbfa5d lib/storage: reset cache on disk during series deletion and during indexdb rotation
This should prevent from inconsistent behavior (aka partially missing data for some time series) after unclean shutdown.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1347
2021-06-11 12:42:28 +03:00
Aliaksandr Valialkin
69b1482bdb lib/storage: consistency renaming: getMaxRawRowsPerPartition -> getMaxRawRowsPerShard 2021-06-11 10:57:23 +03:00
Aliaksandr Valialkin
044ab46824 lib/storage: reduce the amounts of memory which can be occupied by rawRow items during data ingestion on a system with many CPU cores 2021-06-11 10:57:23 +03:00
Nikolay
6b29b955c0
disables panic for net/httpAbortHandler (#1355) 2021-06-09 12:08:58 +03:00
Aliaksandr Valialkin
96b691a0ab lib/storage: properly account the number of loops spent when matching for or suffixes
This may help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1338
2021-06-08 13:06:12 +03:00
Aliaksandr Valialkin
661d2668f8 lib/promrelabel: add tests for labelsToString() function 2021-06-04 20:42:46 +03:00
Aliaksandr Valialkin
78f83dc5ad app/{vmagent,vminsert}: follow-up after 2fe045e2a4
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1343
2021-06-04 20:27:58 +03:00
jelmd
2fe045e2a4
new feature: debug relabeling (#1344)
* new feature: relabel logging

Use scrape_configs[x].relabel_debug = true to log metric names inkl.
labels before and after relabeling. After relabeling related metrics
get dropped, i.e. not submitted to servers.

* vminsert wants relabel logging, too.
2021-06-04 17:50:23 +03:00
Hason Chan
6f19bb23a1
fix eureka_sd_configs HTTPClientConfig incorrect parsing (#1350) 2021-06-04 11:47:17 +03:00
Aliaksandr Valialkin
2d8bd41f8a lib/storage: reduce memory allocations when syncing dateMetricIDCache 2021-06-03 16:20:42 +03:00
Nikolay
ddc8022702
fixes solaris build (#1345) 2021-05-31 09:21:23 +03:00
Aliaksandr Valialkin
a52a20659a lib/promscrape: fix tests after f0c21b6300 2021-05-28 01:32:50 +03:00
Aliaksandr Valialkin
d088923aef Revert "lib/mergeset: remove a pool for inmemoryBlock structs"
This reverts commit 793fe39921.

Reason to revert: production testing revealed possible slowdown when registering big number of new time series
2021-05-28 01:09:32 +03:00
Aliaksandr Valialkin
793fe39921 lib/mergeset: remove a pool for inmemoryBlock structs
The pool for inmemoryBlock struct doesn't give any performance gains in production workloads,
while it may result in excess memory usage for inmemoryBlock structs inside the pool during
background merge of indexdb.
2021-05-27 21:57:33 +03:00
Aliaksandr Valialkin
60341722d5 docs: document f0c21b6300 2021-05-27 15:03:30 +03:00
faceair
f0c21b6300
lib/promscrape: apply body size & sample limit to stream parse (#1331)
* lib/promscrape: apply body size limit to stream parse

Signed-off-by: faceair <git@faceair.me>

* lib/promscrape: apply sample limit to stream parse

Signed-off-by: faceair <git@faceair.me>
2021-05-27 14:52:44 +03:00
Aliaksandr Valialkin
2233d6ed8a lib/uint64set: store pointers to bucket16 instead of bucket16 objects in bucket32
This speeds up bucket32.addBucketAtPos() when bucket32.buckets contains big number of items,
since the copying of bucket16 pointers is much faster than the copying of bucket16 objects.

This is a cpu profile for copying bucket16 objects:

      10ms     13.43s (flat, cum) 32.01% of Total
      10ms      120ms    650:	b.b16his = append(b.b16his[:pos+1], b.b16his[pos:]...)
         .          .    651:	b.b16his[pos] = hi
         .     13.31s    652:	b.buckets = append(b.buckets[:pos+1], b.buckets[pos:]...)
         .          .    653:	b16 := &b.buckets[pos]
         .          .    654:	*b16 = bucket16{}
         .          .    655:	return b16
         .          .    656:}

This is a cpu profile for copying pointers to bucket16:

      10ms      1.14s (flat, cum)  2.19% of Total
         .      100ms    647:	b.b16his = append(b.b16his[:pos+1], b.b16his[pos:]...)
         .          .    648:	b.b16his[pos] = hi
      10ms      700ms    649:	b.buckets = append(b.buckets[:pos+1], b.buckets[pos:]...)
         .      330ms    650:	b16 := &bucket16{}
         .          .    651:	b.buckets[pos] = b16
         .          .    652:	return b16
         .          .    653:}
2021-05-25 14:20:52 +03:00
Aliaksandr Valialkin
39ef1e7a51 lib/storage: do not stop data ingestion on the first error in Storage.AddRows
Continue data ingestion for the rest of blocks.
2021-05-24 15:32:47 +03:00
Aliaksandr Valialkin
4b01c9fb2e lib/storage: limit the number of rows per each block in Storage.AddRows()
This should reduce memory usage when ingesting big blocks or rows.
2021-05-24 15:24:07 +03:00
Aliaksandr Valialkin
a4ff4b8e65 lib/storage: allow filling all the rows up to their capacity in rawRowsShard.addRows
This should reduce memory usage a bit on data ingestion path
2021-05-24 15:22:59 +03:00
Aliaksandr Valialkin
a46551245c lib/bloomfilter: fix TestLimiterConcurrent 2021-05-24 05:17:36 +03:00
Aliaksandr Valialkin
93d81b486d lib/fs: do not pass done callback to tryRemoveAll() func
This improves code readability a bit.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1313
2021-05-24 04:51:57 +03:00
Aliaksandr Valialkin
f54133b200 lib/storage: do not populate MetricID->MetricName cache during data ingestion
This cache isn't needed during data ingestion, so there is no need in spending RAM on it.

This reduces RAM usage on data ingestion path by 30%
2021-05-24 03:02:46 +03:00
Aliaksandr Valialkin
ec79abc382 lib/{mergeset,storage}: reduce the number of IFNO log messages like merged ... items across ... blocks in ... seconds
Log these messages if the merge takes more than 30 seconds instead of 10 seconds.
2021-05-23 14:03:21 +03:00
Aliaksandr Valialkin
78dddfb98f lib/promauth: follow-up after 5b8176c68e 2021-05-22 18:01:11 +03:00
Nikolay
5b8176c68e
basic OAuth2 support for remoteWrite and scrape targets (#1316)
* adds OAuth2 support for remoteWrite and scrapping

* adds tests
changes init
2021-05-22 16:20:18 +03:00
Aliaksandr Valialkin
e05dd475f0 lib/fs: concurrently remove up to 1024 blocked NFS directories
Previously the blocked directories were removed sequentially by a single goroutine.
This can be not enough for highly loaded VictoriaMetrics that accepts millions of sample per second,
when big number of LSM parts are created and removed at high rate.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1313
2021-05-21 17:57:46 +03:00
Aliaksandr Valialkin
8e2985b53d lib/fs: wait for a while before giving up on NFS file removal if the removal queue is full
This should reduce the probability of the panic on a highly loaded VictoriaMetrics
accepting millions of samples per second.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1313
2021-05-21 17:21:00 +03:00
Aliaksandr Valialkin
c54bb73867 all: do not skip SIGHUP signal during service initialization
This can lead to stale or incomplete configs like in the https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-21 16:34:06 +03:00
Aliaksandr Valialkin
4c7bb75fa2 Makefile: update golangci-lint from v1.29.0 to v1.40.1 2021-05-20 18:27:10 +03:00
Aliaksandr Valialkin
e394ff6466 app/vmagent/remotewrite: expose metrics with the current number of active series per day and per hour
These numbers are exposed via the following metrics:

- vmagent_hourly_series_limit_current_series
- vmagent_daily_series_limit_current_series

Expose also the limits via the following metrics:

- vmagent_hourly_series_limit_max_series
- vmagent_daily_series_limit_max_series
2021-05-20 15:28:09 +03:00
Aliaksandr Valialkin
ad73f226ff app/vmstorage: add ability to limit series cardinality via -storage.maxHourlySeries and -storage.maxDailySeries command-line flags 2021-05-20 14:15:19 +03:00
Aliaksandr Valialkin
7e526effaa app/vmagent: add ability to limit series cardinality on a per-hour and per-day basis 2021-05-20 13:13:40 +03:00
Aliaksandr Valialkin
22585531ad lib/promscrape/discovery/kubernetes: make golangci-lint happy by removing empty branches 2021-05-20 12:00:29 +03:00
Aliaksandr Valialkin
009e136d88 lib/storage: remove possible data race when logging dropped labels 2021-05-20 02:47:22 +03:00
Aliaksandr Valialkin
eb8093ca6b lib/promscrape/discovery/kubernetes: reload objects on object parse error
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-18 23:25:48 +03:00
Aliaksandr Valialkin
f4719889da lib/httpserver: typo fix in -http.shutdownDelay command-line flag description: servier -> server 2021-05-18 16:26:16 +03:00
Aliaksandr Valialkin
0cf1fe19e6 lib/promscrape/discovery/kubernetes: simplify the reload logic for urlWatcher.objectsByKey 2021-05-18 15:40:46 +03:00
Aliaksandr Valialkin
bfb61de606 lib/promscrape/discovery/kubernetes: properly update vm_promscrape_discovery_kubernetes_scrape_works metric
Previously it wasn't descreased during config update.
2021-05-18 15:33:45 +03:00
Aliaksandr Valialkin
3166994244 lib/promscrape/discovery/kubernetes: log errors and stop service discovery when unexpected updates are received from Kubernetes API server
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-18 15:10:48 +03:00
Aliaksandr Valialkin
6c944b86d8 docs: dealay -> delay
Thanks to @jelmd . See 0b7e3510c8 (r50884991)
2021-05-18 01:07:52 +03:00
Aliaksandr Valialkin
ec6a284978 lib/promrelabel: add tests for conditional removal of label on another label match
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1294
2021-05-18 00:23:03 +03:00
Aliaksandr Valialkin
2eed410466 lib/promscrape/discovery/kubernetes: key ScrapeWork objects by urlWatcher instead of namespace
This makes the code less fragile if urlWatcher would depend on additional to namepsace properties.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1170
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-17 23:47:08 +03:00
Aliaksandr Valialkin
733706e6c6 lib/promscrape: reload auth tokens from files every second
Previously auth tokens were loaded at startup and couldn't be updated without vmagent restart.
Now there is no need in vmagent restart.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1297
2021-05-14 20:00:08 +03:00
Aliaksandr Valialkin
10a47af631 app/{vmalert,vmauth}: explicitly set MaxIdleConnsPerHost in net/http.Client.Transport
By default MaxIdleConnsPerHost is set to 2. This limits the possibility to re-use http keep-alive connections.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1300
2021-05-14 18:12:24 +03:00
Aliaksandr Valialkin
fc3519fa26 lib/promscrape: limit scrape_timeout by scrape_interval like Prometheus does 2021-05-13 16:09:45 +03:00
匠心零度
b4f5be8bd8
fix vagent imbalance problem (#1292)
/path/to/vmagent -promscrape.cluster.membersCount=3 -promscrape.cluster.replicationFactor=2 -promscrape.cluster.memberNum=0 -promscrape.config=/path/to/config.yml ...
/path/to/vmagent -promscrape.cluster.membersCount=3 -promscrape.cluster.replicationFactor=2 -promscrape.cluster.memberNum=1 -promscrape.config=/path/to/config.yml ...
/path/to/vmagent -promscrape.cluster.membersCount=3 -promscrape.cluster.replicationFactor=2 -promscrape.cluster.memberNum=2 -promscrape.config=/path/to/config.yml ...

Co-authored-by: lirenzuo <lirenzuo@shein.com>
2021-05-13 11:14:51 +03:00
Aliaksandr Valialkin
e6fda03e8f vendor: update github.com/VictoriaMetrics/fasthttp from v1.0.14 to v1.0.15
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1289
2021-05-13 10:44:14 +03:00
Aliaksandr Valialkin
f6a641de62 lib/promscrape: exponentially increase retry interval on unsuccesful requests to scrape targets or to service discovery services
This should reduce CPU load at vmagent and at remote side when the remote side doesn't accept HTTP requests.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1289
2021-05-13 10:38:50 +03:00
Aliaksandr Valialkin
c0ec541559 lib/cgroup: document the ability to detect cgroup v2 memory and cpu limits. This is follow-up for b50024812e 2021-05-13 09:26:20 +03:00
Aliaksandr Valialkin
d7be2753c0 lib/storage: substitute GetTSDBStatusForDate with GetTSDBStatusWithFiltersForDate with nil tfss 2021-05-13 09:02:33 +03:00
Nikolay
b50024812e
adds cgroupsv2 support (#1283)
* adds cgroupv2 limits support
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1269

* small fix

* changes Atoi to ParseUint
2021-05-13 09:02:13 +03:00
Aliaksandr Valialkin
a22a17dc66 lib/storage: merge getTSDBStatusForDate with getTSDBStatusWithFiltersForDate
These functions are non-trivial, while their code has minimal differences.
It is better from maintainability PoV to merge these functions into a single function.
2021-05-12 17:56:53 +03:00
Aliaksandr Valialkin
832651c6c2 app/vmselect: follow up after 8a0678678b
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1168
2021-05-12 17:18:30 +03:00
Nikolay
8a0678678b
Adds tsdb match filters (#1282)
* init work on filters

* init propose for status filters

* fixes tsdb status
adds test

* fix bug

* removes checks from test
2021-05-12 15:18:45 +03:00