Commit graph

1629 commits

Author SHA1 Message Date
Nikolay
1f91f22b5f
app/vmselect: reduce lock contention for heavy aggregation requests (#5119)
reduce lock contention for heavy aggregation requests
previously lock contetion may happen on machine with big number of CPU due to enabled string interning. sync.Map was a choke point for all aggregation requests.
Now instead of interning, new string is created. It may increase CPU and memory usage for some cases.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5087
2023-10-10 13:45:20 +02:00
Haleygo
2aa0f5fc41
vmalert: add evalAlignment for rule group and fix evalutaion timstamp (#5066)
* vmalert: add `query_time_alignment` for rule group

1. add `eval_alignment` attribute for group which by default is true. So group rule query stamp will be aligned with interval and propagated to ALERT metrics and the messages for alertmanager;
2. deprecate `datasource.queryTimeAlignment` flag.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5049
2023-10-10 12:41:19 +02:00
Dmytro Kozlov
244c887825
app/vmalert: hide sensetive info in the vmalert (#5059)
Strip sensitive information such as auth headers or passwords from datasource, remote-read, 
remote-write or notifier URLs in log messages or UI. This behavior is by default and is controlled via 
`-datasource.showURL`, `-remoteRead.showURL`, `remoteWrite.showURL` or `-notifier.showURL` cmd-line flags. 

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5044
2023-10-10 11:40:27 +02:00
Yury Molodov
c5044cdba9
vmui: enhancement of autocomplete feature (#5051)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4993
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3006
2023-10-10 10:38:08 +02:00
Dmytro Kozlov
f60c08a7bd
app/(vminsert|vmagent): add support for new relic infrastructure agent (#4712)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2023-10-05 14:39:51 +02:00
Aliaksandr Valialkin
75dd7b30ba
lib/filestream: add -filestream.disableFadvise syscall for unconditional disabling of fadvise syscall
This may be needed in rare cases when performing backups on systems with big number of CPU cores
and big value passed to -concurrency command-line flag.

See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5120
2023-10-04 16:19:46 +02:00
hagen1778
de651165bd
alerting: account for vmauth component for alerts ServiceDown and TooManyRestarts
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-03 16:45:33 +02:00
Aliaksandr Valialkin
f13a96f42c
docs/CHANGELOG.md: cut v1.94.0 2023-10-02 22:33:35 +02:00
Yury Molodov
f39045eca6
vmui: add storage for query history (#5022)
* vmui: add storage for query history

* docs/vmui: add storage for query history
2023-10-02 21:41:03 +02:00
Roman Khavronenko
a4bd73ec7e
lib/promscrape: make concurrency control optional (#5073)
* lib/promscrape: make concurrency control optional

Before, `-maxConcurrentInserts` was limiting all calls to `promscrape.Parse`
function: during ingestion and scraping. This behavior is incorrect.
Cmd-line flag `-maxConcurrentInserts` should have effect onl on ingestion.

Since both pipelines use the same `promscrape.Parse` function, we extend it
to make concurrency limiter optional. So caller can decide whether concurrency
should be limited or not.

This commit makes c53b5788b4
obsolete.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Revert "dashboards: move `Concurrent inserts` panel to Troubleshooting section"

This reverts commit c53b5788b4.

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-02 21:32:11 +02:00
Dmytro Kozlov
34961dd4b8
app/vmagent: fix check of the DataDog agent path requests when requests have trailing slashes (#5106)
* app/vmagent: fix check of the DataDog agent path requests when requests have trailing slashes

* app/vmagent: fix CHANGELOG.md description

* wip

* wip

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-10-02 21:18:03 +02:00
Aliaksandr Valialkin
859977d591
Revert "lib/promscrape: add metric vm_promscrape_scrapes_skipped_total (#5074)"
This reverts commit 74301cdbf5.

Reason for revert:

vmagent already provides better approach for detecting slow scrape targets via the following query:

    scrape_duration_seconds / scrape_timeout_seconds > 1

This query depends on automatically generated per-target metrics.
See https://docs.victoriametrics.com/vmagent.html#automatically-generated-metrics for more details.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5074
2023-10-02 20:59:56 +02:00
Aliaksandr Valialkin
71668637ce
app/vmselect/promql: follow-up for 896c85a4a4
- Clarify the description of the change at docs/CHANGELOG.md
- Make sure that bitmap_*(X, NaN) returns NaN

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4996
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5021
2023-10-02 20:08:26 +02:00
Roman Khavronenko
74301cdbf5
lib/promscrape: add metric vm_promscrape_scrapes_skipped_total (#5074)
* lib/promscrape: add metric `vm_promscrape_scrapes_skipped_total`

add metric `vm_promscrape_scrapes_skipped_total`to show whether vmagent skips the scrapes.
This could happen if vmagent is overloaded or target is responding too slow for configured `scrape_interval`.

The follow-up commit should add a corresponding alerting rule and panel to vmagent dashboard.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* deployment/docker: add `TooManyScrapeSkips` alerting rule for vmagent

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* dashboards: add panels `Scrape duration 0.99 quantile` and `Skipped scrapes` to vmagent dashboard

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-02 17:12:12 +02:00
Aliaksandr Valialkin
5e49b72126
docs/CHANGELOG.md: follow-up for f0e33700fc
Mention that the statistic inaccuracy is related to cardinality explorer
2023-10-01 21:33:31 +02:00
Aliaksandr Valialkin
859859aa1c
app/vmagent: follow-up for cfef814750
- Properly handle /insert/multitenant/api/put url for opentsdb handler at vmagent
- Document that the bug has been introduced in v1.93.2 at docs/CHANGELOG.md
- Add a link to multitenant url docs in bugfix description

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5061
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4910
2023-10-01 21:09:32 +02:00
Dmytro Kozlov
896c85a4a4
app/vmselect: fix bitmap_*() functions behavior (#5021)
Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4996

Signed-off-by: dmitryk-dk d.kozlov@victoriametrics.com

Signed-off-by: dmitryk-dk d.kozlov@victoriametrics.com
Co-authored-by: Nikolay <nik@victoriametrics.com>
2023-09-29 12:03:01 +02:00
Dmytro Kozlov
f0e33700fc
vmui: update information about tsdb usage in cluster version (#5004)
* vmui: update information about tsdb usage in cluster version

* vmui: cleanup

* vmui: add CHANGELOG.md

* vmui: cleanup

* vmui: update logic, move information to the visible place

* app/vmui: remove values fetch, update documentation for cardinality explorer

* app/vmui: update CHANGELOG.md
2023-09-29 11:47:45 +02:00
hagen1778
c53b5788b4
dashboards: move Concurrent inserts panel to Troubleshooting section
Moved because this panel is related to both: scraped and ingested data.
Before, it could have give a misleading impression that it is related to ingested metrics only.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-09-26 14:26:40 +02:00
Alexander Marshalov
34a9d1f818
fixed ingestion via multitenant url for opentsdbhttp (#5061) (#5064) 2023-09-26 11:18:34 +02:00
Roman Khavronenko
4d1b572f46
Docker add vmauth (#5057)
* docker-compose: add vmauth to cluster env

vmauth acts as a balancer and used as an example of how to interconnect
VM components via vmauth.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* docker-compose: add vmauth to cluster env

vmauth acts as a balancer and used as an example of how to interconnect
VM components via vmauth.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2023-09-26 10:50:10 +02:00
Aliaksandr Valialkin
717c53af27
lib/storage: stop exposing vm_merge_need_free_disk_space metric
This metric confuses users and has no any useful information.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/686#issuecomment-1733844128
2023-09-25 16:52:39 +02:00
Aliaksandr Valialkin
3b9605dba5
app/vmselect/promql: do not sort q1 or q2 results
This makes sure that `q2` series are returned after `q1` series in the same way as Prometheus does

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4763
2023-09-25 16:14:16 +02:00
Aliaksandr Valialkin
a740159541
app/vmselect/promql: completely substitute median_over_time() WITH template with regular median_over_time() rollup function
This is a follow-up for 34d7a670d0

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5034
2023-09-25 15:28:12 +02:00
Zakhar Bessarab
34d7a670d0
app/vmselect/promql: add implementation of median_over_time for rollup functions list (#5042)
`median_over_time` is handled by predefined WITH template in MetricsQL library which translates it to `quantile_over_time(0.5)`
This makes it impossble to use `median_over_time` as a usual rollup function for `aggr_over_time`.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5034

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-09-25 14:01:00 +02:00
Roman Khavronenko
ec50375991
docs/changelog: add link to sandbox (#5050)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-09-25 14:00:41 +02:00
Zakhar Bessarab
8d99c12a7d
lib/promscrape/discovery/kubernetes: supress context.Cancelled error in logs (#5048)
lib/promscrape/discovery/kubernetes: supress context.Cancelled error in logs

It is possible that context.Cancelled will appear after k8s watcher was closed due to reload(see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850).

Logging an error misinforms user and looks like vmagent discovery will stop working even though this does not affect discovery.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-09-22 13:01:33 +02:00
Zakhar Bessarab
760cdcec68
lib/backup: fix issue with inconsistent copying of appliedRetention.txt (#5027)
* lib/backup: fix issue with inconsistent copying of appliedRetention.txt

appliedRetention.txt can be modified in place, so it should be always copied just the same as parts.json

Updates: https://github.com/victoriaMetrics/victoriaMetrics/issues/5005
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs: add changelog entry for appliedRetention.txt copying fix

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-09-21 11:25:19 +02:00
Roman Khavronenko
462c918251
app/vmauth: update config reload routine (#5019)
* expose metrics `vmauth_config_last_reload_*` for tracking the state of config reloads, similarly to vmagent/vmalert components.
* do not print logs like `SIGHUP received...` once per configured `-configCheckInterval` cmd-line flag. This log will be printed only if config reload was invoked manually.
*  prevent configuration reloading if there were no changes in config. This improves memory usage when `-configCheckInterval` cmd-line flag is configured and config has extensive list of regexp expressions requiring additional memory on parsing.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-09-20 15:04:52 +02:00
Aliaksandr Valialkin
28aed4d098
docs/CHANGELOG.md: publish changes for v1.93.5 2023-09-19 10:50:25 +02:00
Aliaksandr Valialkin
582f1f8fda
docs/CHANGELOG.md: clarify the description of bugfixes at f7dda12b4d and b6ad581b45
This is a follow-up for 8b01bc4a5c

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4999
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5009
2023-09-19 00:22:32 +02:00
Aliaksandr Valialkin
76af32d869
lib/promscrape/discovery/kubernetes: follow-up after eeb862f3ff
- Move the bugfix description to the correct place in docs/CHANGELOG.md
- Prevent from logging of 'context canceled' errors after the url watcher is stopped,
  since these errors are expected and may confuse users.
- Remove unused urlWatcher.refCount field.
- Remove unused urlWatcher.close() method.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850
2023-09-18 17:06:39 +02:00
Aliaksandr Valialkin
4d01bc6d52
lib/backup: properly copy parts.json files inside indexdb directory additional to data directory
This is a follow-up for 264ffe3fa1

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5005
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5006
2023-09-18 16:16:50 +02:00
Nikolay
8b01bc4a5c
docs: reflect recent changes at change logs (#5015) 2023-09-18 08:22:10 +02:00
Zakhar Bessarab
eeb862f3ff
lib/promscrape/discovery/kubernetes: fix leaking api watcher (#4861)
* lib/promscrape/discovery/kubernetes: fix leaking api watcher

goroutine which was polling k8s API had no execution control. This leaded to leaking goroutines during config reload.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promscrape/discovery/kubernetes: use reference counting for urlWatcher cleanup

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promscrape/discovery/kubernetes: remove waitgroup sync for goroutines polling API server

This is unnecessary since context will is cancelled and new requests will not be sent. Also, using waitgroup will increase time required to perform reload which might result in missed scrapes.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promscrape/discovery/kubernetes: clarify comment

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* Apply suggestions from code review

* lib/promscrape/discovery/kubernetes: address review feedback

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2023-09-15 19:40:13 +02:00
Zakhar Bessarab
264ffe3fa1
lib/backup: force copying of parts.json (#5006)
* lib/backup: force copying of parts.json

Copying of parts.json is required because `part.key()` comparison can create same key value for files with different contents. This will result in inconsistent backup being created or restored.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5005
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/backup: ensure parts.json is only copied once

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2023-09-15 19:04:38 +02:00
Zakhar Bessarab
2a362e7397
docs: add changelog entry for downsampling.period and dedup.minScrapeInterval verification (#5000)
* docs: add changelog entry for downsampling.period and dedup.minScrapeInterval verification

- added changelog entry
- documented requirements for dedup.minScrapeInterval and downsampling.period being multiples of each other

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs: `make docs-sync`

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-09-15 15:14:16 +02:00
Dmytro Kozlov
d5f9619984
vmagent: add validation of MetricsQL functions (#4991)
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-09-15 13:15:23 +02:00
Aliaksandr Valialkin
151f363552
docs/CHANGELOG.md: document v1.87.9 2023-09-10 21:41:23 +02:00
Aliaksandr Valialkin
bb8eda0b0f
docs/CHANGELOG.md: document v1.93.4 2023-09-10 19:47:38 +02:00
Aliaksandr Valialkin
0bbc6a5b43
app/vmagent/remotewrite: fix data race when extra labels are added to samples before sending them to multiple remote storage systems
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4972
2023-09-08 23:24:00 +02:00
Aliaksandr Valialkin
a315694dd9
app/vmauth: add ability to specify response status codes for retrying requests during load-balancing
Response status codes for retrying can be specified via retry_status_codes list

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4893
2023-09-08 23:23:15 +02:00
Roman Khavronenko
6351d07da8
vmalert: correctly add duplicated params to the query (#4955)
Fix the bug when Group's `params` fields with multiple values were
overriding each other instead of adding up.
The bug was introduced in this commit eccecdf177
 starting from v1.91.1 https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.91.1

 https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4908

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-09-08 09:32:48 +02:00
Aliaksandr Valialkin
b80d338287
app/vmauth: retry requests at other backends on 5xx response status codes
This should allow implementing high availability scheme described at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4792#issuecomment-1674338561

See also https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4893
2023-09-08 00:46:37 +02:00
Aliaksandr Valialkin
dd10f94951
app/vmselect: return 503 status code when partial responses are denied and some of vmstorage nodes are temporarily unavailable
This should help detecting this case and automatic retrying the query at healthy cluster replica
in another availability zone.

This commit is needed as a preparation for automatic query retry at another backend at vmauth on 5xx errors
as described at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4792#issuecomment-1674338561
2023-09-07 16:11:39 +02:00
Aliaksandr Valialkin
9de440c803
lib/logger: increase the maximum log arg size from 200 to 500
The 200 chars limit has been appeared too small for typical log messages emitted by VictoriaMetrics components

This is a follow-up for 87fea7d8ac
2023-09-07 16:11:08 +02:00
Aliaksandr Valialkin
87fea7d8ac
lib/logger: limit the maximum arg length, which can be emitted to log lines
This should prevent from emitting too long lines when too long args are passed to logger.* functions.
For example, too long MetricsQL queries or too long data samples.
2023-09-07 15:22:46 +02:00
Aliaksandr Valialkin
9bccc5aab2
docs/CHANGELOG.md: return back accidentally deleted line at 45c0e4bb31 2023-09-07 12:03:04 +02:00
Aliaksandr Valialkin
2dc33e0ddc
all: update Go builder from Go1.21.0 to Go1.21.1
See https://github.com/golang/go/issues?q=milestone%3AGo1.21.1+label%3ACherryPickApproved
2023-09-07 11:36:16 +02:00
Aliaksandr Valialkin
5f85dd7f80
docs/CHANGELOG.md: clarify the scope of recent bugfixes 2023-09-07 11:25:11 +02:00