Commit graph

584 commits

Author SHA1 Message Date
Aliaksandr Valialkin
d8ffbf55a2
docs/CHANGELOG.md: document ea153e5f90 2022-02-12 00:48:06 +02:00
Roman Khavronenko
cf1a8bce6b
lib/index: reduce read/write load after indexDB rotation (#2177)
* lib/index: reduce read/write load after indexDB rotation

IndexDB in VM is responsible for storing TSID - ID's used for identifying
time series. The index is stored on disk and used by both ingestion and read path.

IndexDB is stored separately to data parts and is global for all stored data.
It can't be deleted partially as VM deletes data parts. Instead, indexDB is
rotated once in `retention` interval.

The rotation procedure means that `current` indexDB becomes `previous`,
and new freshly created indexDB struct becomes `current`. So in any time,
VM holds indexDB for current and previous retention periods.
When time series is ingested or queried, VM checks if its TSID is present
in `current` indexDB. If it is missing, it checks the `previous` indexDB.
If TSID was found, it gets copied to the `current` indexDB. In this way
`current` indexDB stores only series which were active during the retention
period.

To improve indexDB lookups, VM uses a cache layer called `tsidCache`. Both
write and read path consult `tsidCache` and on miss the relad lookup happens.

When rotation happens, VM resets the `tsidCache`. This is needed for ingestion
path to trigger `current` indexDB re-population. Since index re-population
requires additional resources, every index rotation event may cause some extra
load on CPU and disk. While it may be unnoticeable for most of the cases,
for systems with very high number of unique series each rotation may lead
to performance degradation for some period of time.

This PR makes an attempt to smooth out resource usage after the rotation.
The changes are following:
1. `tsidCache` is no longer reset after the rotation;
2. Instead, each entry in `tsidCache` gains a notion of indexDB to which
they belong;
3. On ingestion path after the rotation we check if requested TSID was
found in `tsidCache`. Then we have 3 branches:
3.1 Fast path. It was found, and belongs to the `current` indexDB. Return TSID.
3.2 Slow path. It wasn't found, so we generate it from scratch,
add to `current` indexDB, add it to `tsidCache`.
3.3 Smooth path. It was found but does not belong to the `current` indexDB.
In this case, we add it to the `current` indexDB with some probability.
The probability is based on time passed since the last rotation with some threshold.
The more time has passed since rotation the higher is chance to re-populate `current` indexDB.
The default re-population interval in this PR is set to `1h`, during which entries from
`previous` index supposed to slowly re-populate `current` index.

The new metric `vm_timeseries_repopulated_total` was added to identify how many TSIDs
were moved from `previous` indexDB to the `current` indexDB. This metric supposed to
grow only during the first `1h` after the last rotation.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-12 00:30:08 +02:00
Roman Khavronenko
e3adcbec6e
lib/promscrape: support prometheus-like duration in scrape configs (#2169)
* lib/promscrape: support prometheus-like duration in scrape configs

The change allows to specify duration values like `1d`, `1w`
for fields `scrape_interval`, `scrape_timeout`, etc.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/817#issuecomment-1033384766
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/blockcache: make linter happy

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/promscrape: support prometheus-like duration in scrape configs

* add support for extra fields `scrape_align_interval` and `scrape_offset`;
* support Prometheus duration parsing for `__scrape_interval__`
and `__scrape_duration__` labels;

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

* wip

* docs/CHANGELOG.md: document the feature

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-11 16:17:00 +02:00
Aliaksandr Valialkin
3cb72ccc2a
lib/promscrape/discovery/kubernetes: add __meta_kubernetes_endpointslice_{label,annotation}* labels to be consistent with other role values for Kubernetes service discovery 2022-02-11 14:54:47 +02:00
Nikolay
4e7f7f3302
fixes service discovery for kubernetes (#2173)
* fixes service discovery for kubernetes
now it must take in account all pods that belong to the discovered endpoint and endpointslice
adds simple test for endpoints
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2134

* wip

* docs/CHANGELOG.md: document the change

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-11 13:34:22 +02:00
Aliaksandr Valialkin
8f2d03fdc7
docs/CHANGELOG.md: document 4e722c459b
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2167
2022-02-10 12:20:12 +02:00
Aliaksandr Valialkin
0028b2c6d1
docs/CHANGELOG.md: add instructions on how to build VictoriaMetrics components from source code in order to test tip changes 2022-02-08 16:44:08 +02:00
Nikolay
a8acad7453
adds CGO build for arm64 (#2102)
* adds CGO build for arm64
it must improve performance for arm64 based deployments of vmstorage and
vmsingle for 15-20%

it depends on gozstd package update for correct musl gozstd vendoring

* typo fixes

* docs/CHANGELOG.md: document the change

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-08 16:25:59 +02:00
Aliaksandr Valialkin
9bb60ab00f
lib/promscrape: set -promscrape.config.strictParse to true by default
This allows detecting long-living silent errors in -promscrape.config
2022-02-08 15:41:43 +02:00
Aliaksandr Valialkin
8b36044c93
docs/CHANGELOG.md: add links to issues, which could benefit from improved re-routing algorithm 2022-02-07 16:55:28 +02:00
Aliaksandr Valialkin
865f09ecbb
docs: sync with cluster branch 2022-02-07 14:54:16 +02:00
Aliaksandr Valialkin
b5b3c585b3
lib/promscrape: show the total number of scrapes and the total number of scrape errors per target at /targets page
This information may be useful when debugging unreliable scrape targets
2022-02-03 20:22:41 +02:00
Aliaksandr Valialkin
2968779f16
lib/promscrape: provide the ability to fetch target responses on behalf of vmagent or single-node VictoriaMetrics
This feature may be useful when debugging metrics for the given target located in isolated environment
2022-02-03 19:00:55 +02:00
Aliaksandr Valialkin
4b850c2a59
app/vmselect/promql: do not push down filters, which enumerate more than 10k unique values
Such filters may slow down time series search, so just skip them.

This is a follow-up for e7f1ceeb84

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1827
2022-02-02 23:40:02 +02:00
Aliaksandr Valialkin
4ef32df4fa
docs/CHANGELOG.md: document 55e3bbd4cc
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1567
2022-02-02 23:32:35 +02:00
Aliaksandr Valialkin
6530bcedec
docs: updates after 5da71eb685
* Mention about the ability to configure vmalert notifiers via files in docs/CHANGELOG.md
* Mention about the ability to use Consul service discovery for vmalert notifiers in docs/CHANGELOG.md
* Run `make docs-sync` in order to sync app/vmalert/README.md to docs/vmalert.md
2022-02-02 23:26:18 +02:00
Aliaksandr Valialkin
ead66155ef
lib/cgroup: expose process_cpu_cores_available metric
This metric shows the number of CPU cores available to the process.
This allows creating alerting rules on CPU saturation with the following query:

    rate(process_cpu_seconds_total[5m]) / process_cpu_cores_available > 0.9

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2107
2022-01-31 20:24:41 +02:00
Aliaksandr Valialkin
e7f1ceeb84
app/vmselect/promql: optimize queries, which join on _info metrics.
Automatically add common filters from one side of binary operation
to the other side before sending the query to storage subsystem.

See https://grafana.com/blog/2021/08/04/how-to-use-promql-joins-for-more-effective-queries-of-prometheus-metrics-at-scale/
and https://www.robustperception.io/exposing-the-software-version-to-prometheus
2022-01-31 19:32:36 +02:00
Aliaksandr Valialkin
3d8a4bf023
docs/CHANGELOG.md: document 6a519896db 2022-01-31 12:41:57 +02:00
Aliaksandr Valialkin
e02e0508da
vendor: update github.com/VictoriaMetrics/metricsql from v0.37.0 to v0.38.0
This adds more optimization cases for https://utcc.utoronto.ca/~cks/space/blog/sysadmin/PrometheusLabelNonOptimization

For example:

* Multi-level transform functions. For example, abs(round(foo{a="b"})) + bar{x="y"}
  is now optimized to abs(round(foo{a="b",x="y"})) + bar{a="b",x="y"}
* Binary operations with `on()`, `without()`, `group_left()` and `group_right()` modifiers.
  For example, foo{a="b"} on (a) + bar is now optimized to foo{a="b"} on (a) + bar{a="b"}
* Multi-level binary operations. For example, foo{a="b"} + bar{x="y"} + baz{z="q"}
  is now optimized to foo{a="b",x="y",z="q"} + bar{a="b",x="y",z="q"} + baz{a="b",x="y",z="q"}
* Aggregate functions. For example, sum(foo{a="b"}) by (c) + bar{c="d"}
  is now optimized to sum(foo{a="b",c="d"}) by (c) + bar{c="d"}
2022-01-27 19:03:54 +02:00
Aliaksandr Valialkin
746ee191e8
lib/logger/throttler.go: show the original location of the error and warning message
Previously the location inside LogThrottler implementation was shown. This could complicate debugging.
2022-01-23 13:55:00 +02:00
Yury Molodov
ad5059f2d3
vmui: fixed display type switching (#2088)
* fix: correct switch display type

* docs/CHANGELOG.md: document the bugfix

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-01-21 16:56:22 +02:00
Yury Molodov
adbb821eac
vmui: fix time range selector (#2085)
* fix: add date validate for time range

* app/vmselect/vmui: `make vmui-update`

* docs/CHANGELOG.md: document the bugfix

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-01-21 12:02:38 +02:00
Aliaksandr Valialkin
145337792d
lib/{mergeset,storage}: properly limit cache sizes for indexdb
Previously these caches could exceed limits set via `-memory.allowedPercent` and/or `-memory.allowedBytes`,
since limits were set independently per each data part. If the number of data parts was big, then limits could be exceeded,
which could result to out of memory errors.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007
2022-01-20 18:37:17 +02:00
Aliaksandr Valialkin
84f6b3014c
docs/CHANGELOG.md: document the bugfix for highestMax() function is Graphite render API 2022-01-20 12:16:18 +02:00
Aliaksandr Valialkin
109363de49
docs/CHANGELOG.md: add missing parens in example for @ modifier 2022-01-19 13:04:51 +02:00
Aliaksandr Valialkin
98edeac7b7
docs/CHANGELOG.md: fix incorrect link to the issue
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1911
2022-01-19 00:06:44 +02:00
Aliaksandr Valialkin
8f5902dfcf
docs/CHANGELOG.md: cut v1.72.0 2022-01-18 22:43:18 +02:00
Yury Molodov
8bdc45ba00
fix: remove buffer period (#2078)
* fix: remove buffer period

* app/vmselect/vmui: `make vmui-update`

* docs/CHANGELOG.md: document the implemented feature

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2064

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-01-18 21:42:56 +02:00
Yury Molodov
70737ea4ac
vmui: correct url encoding (#2067)
* fix: correct encode multi-line queries

* fix: change autocomplete for correct arrows work

* app/vmselect/vmui: `make vmui-update`

* docs/CHANGELOG.md: document the bugfix for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2039

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-01-18 21:31:46 +02:00
Aliaksandr Valialkin
dcadec65b6
docs/CHANGELOG.md: document 8e3f9c1fbb 2022-01-18 21:24:49 +02:00
Aliaksandr Valialkin
e933e3150d
docs/CHANGELOG.md: document fcd33fc409 2022-01-18 12:46:33 +02:00
Aliaksandr Valialkin
c2a3911bb5
docs/CHANGELOG.md: document that vmgateway now supports extra_filters option
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1863
2022-01-18 12:39:25 +02:00
Aliaksandr Valialkin
1bdc71d917
app/vmselect/promql: implement keep_metric_names modifier for transform and rollup functions
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/949
2022-01-14 04:14:59 +02:00
Aliaksandr Valialkin
f41846d002
app/vmselect/promql: add stale_samples_over_time() function 2022-01-14 01:48:04 +02:00
Aliaksandr Valialkin
96707223db
docs/CHANGELOG.md: yet another attempt to fix formatting for yaml snippet 2022-01-14 01:14:55 +02:00
Aliaksandr Valialkin
d7d83d6d93
docs/CHANGELOG.md: fix formatting for scrape_configs example 2022-01-14 01:02:32 +02:00
Aliaksandr Valialkin
1d05444b33
lib/promscrape: expose promscrape_stale_samples_created_total metric for monitoring the number of created stale samples 2022-01-14 01:00:46 +02:00
Aliaksandr Valialkin
831b93a755
app/vmalert: add parseDuration function in the same way as Prometheus does
See https://github.com/prometheus/prometheus/pull/8817
2022-01-13 23:30:41 +02:00
Aliaksandr Valialkin
80f03177c4
lib/promscrape/discovery/kubernetes: add __meta_kubernetes_node_provider_id label for discovered Kubernetes nodes in the same way as Prometheus does
See https://github.com/prometheus/prometheus/pull/9603
2022-01-13 23:16:02 +02:00
Aliaksandr Valialkin
80f966b80c
app/vmalert: add stripPort template function in the same way as Prometheus does
See https://github.com/prometheus/prometheus/pull/10002
2022-01-13 22:53:42 +02:00
Aliaksandr Valialkin
355a63733d
lib/promscrape/discovery/kubernetes: add the ability to limit service discovery to the current namespace
See https://github.com/prometheus/prometheus/issues/9782 and https://github.com/prometheus/prometheus/pull/9881
2022-01-13 22:44:35 +02:00
Aliaksandr Valialkin
c883c15878
app/vmselect/promql: add support for @ modifier
Add support for `@` modifier in MetricsQL according to https://prometheus.io/docs/prometheus/latest/querying/basics/#modifier

Extend the support with the following features:
* Allow using `@` modifier everywhere in the query. For example, `sum(foo) @ end()`
* Allow using arbitrary expression as `@` modifier. For example, `foo @ (end() - 1h)`
  returns `foo` value at `end - 1 hour` timestamp on the selected time range `[start ... end]`

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1348
2022-01-13 22:12:06 +02:00
Aliaksandr Valialkin
17eb86a689
lib/promscrape/discovery/dockerswarm: follow up after 68a117a25a
- Document the bugfix at docs/CHANGELOG.md
- Set __address__ field after copying commonLabels to the resulting map of discovered labels.
  This makes sure that the correct __address__ label is used.
2022-01-11 09:20:10 +02:00
Aliaksandr Valialkin
7a2c46d951
docs/CHANGELOG.md: document 77bfa8181d 2022-01-11 08:53:49 +02:00
Aliaksandr Valialkin
bdba50432b
docs/CHANGELOG.md: add release dates for every release 2022-01-07 12:27:32 +02:00
Aliaksandr Valialkin
e4e36383e2
lib/promscrape: do not send staleness markers on graceful shutdown
This follows Prometheus behavior.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2013#issuecomment-1006994079
2022-01-07 01:17:57 +02:00
Denys Holius
ae89b4e818
Old links replaced for newest (#2033)
* replaced old links to the website

* fixed deletion main README.md file

* fix: added docs files after docs-sync
2022-01-05 16:30:13 +02:00
Aliaksandr Valialkin
dbe592597f
docs/CHANGELOG.md: clarify the issue description, which is fixed by 38bf5fc136 2022-01-05 16:19:06 +02:00
Aliaksandr Valialkin
178dd87e26
lib/storage: follow-up for 38bf5fc136 2022-01-05 16:00:11 +02:00