Commit graph

1619 commits

Author SHA1 Message Date
Dmytro Kozlov
34961dd4b8
app/vmagent: fix check of the DataDog agent path requests when requests have trailing slashes (#5106)
* app/vmagent: fix check of the DataDog agent path requests when requests have trailing slashes

* app/vmagent: fix CHANGELOG.md description

* wip

* wip

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-10-02 21:18:03 +02:00
Aliaksandr Valialkin
859977d591
Revert "lib/promscrape: add metric vm_promscrape_scrapes_skipped_total (#5074)"
This reverts commit 74301cdbf5.

Reason for revert:

vmagent already provides better approach for detecting slow scrape targets via the following query:

    scrape_duration_seconds / scrape_timeout_seconds > 1

This query depends on automatically generated per-target metrics.
See https://docs.victoriametrics.com/vmagent.html#automatically-generated-metrics for more details.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5074
2023-10-02 20:59:56 +02:00
Aliaksandr Valialkin
71668637ce
app/vmselect/promql: follow-up for 896c85a4a4
- Clarify the description of the change at docs/CHANGELOG.md
- Make sure that bitmap_*(X, NaN) returns NaN

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4996
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5021
2023-10-02 20:08:26 +02:00
Roman Khavronenko
74301cdbf5
lib/promscrape: add metric vm_promscrape_scrapes_skipped_total (#5074)
* lib/promscrape: add metric `vm_promscrape_scrapes_skipped_total`

add metric `vm_promscrape_scrapes_skipped_total`to show whether vmagent skips the scrapes.
This could happen if vmagent is overloaded or target is responding too slow for configured `scrape_interval`.

The follow-up commit should add a corresponding alerting rule and panel to vmagent dashboard.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* deployment/docker: add `TooManyScrapeSkips` alerting rule for vmagent

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* dashboards: add panels `Scrape duration 0.99 quantile` and `Skipped scrapes` to vmagent dashboard

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-02 17:12:12 +02:00
Aliaksandr Valialkin
5e49b72126
docs/CHANGELOG.md: follow-up for f0e33700fc
Mention that the statistic inaccuracy is related to cardinality explorer
2023-10-01 21:33:31 +02:00
Aliaksandr Valialkin
859859aa1c
app/vmagent: follow-up for cfef814750
- Properly handle /insert/multitenant/api/put url for opentsdb handler at vmagent
- Document that the bug has been introduced in v1.93.2 at docs/CHANGELOG.md
- Add a link to multitenant url docs in bugfix description

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5061
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4910
2023-10-01 21:09:32 +02:00
Dmytro Kozlov
896c85a4a4
app/vmselect: fix bitmap_*() functions behavior (#5021)
Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4996

Signed-off-by: dmitryk-dk d.kozlov@victoriametrics.com

Signed-off-by: dmitryk-dk d.kozlov@victoriametrics.com
Co-authored-by: Nikolay <nik@victoriametrics.com>
2023-09-29 12:03:01 +02:00
Dmytro Kozlov
f0e33700fc
vmui: update information about tsdb usage in cluster version (#5004)
* vmui: update information about tsdb usage in cluster version

* vmui: cleanup

* vmui: add CHANGELOG.md

* vmui: cleanup

* vmui: update logic, move information to the visible place

* app/vmui: remove values fetch, update documentation for cardinality explorer

* app/vmui: update CHANGELOG.md
2023-09-29 11:47:45 +02:00
hagen1778
c53b5788b4
dashboards: move Concurrent inserts panel to Troubleshooting section
Moved because this panel is related to both: scraped and ingested data.
Before, it could have give a misleading impression that it is related to ingested metrics only.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-09-26 14:26:40 +02:00
Alexander Marshalov
34a9d1f818
fixed ingestion via multitenant url for opentsdbhttp (#5061) (#5064) 2023-09-26 11:18:34 +02:00
Roman Khavronenko
4d1b572f46
Docker add vmauth (#5057)
* docker-compose: add vmauth to cluster env

vmauth acts as a balancer and used as an example of how to interconnect
VM components via vmauth.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* docker-compose: add vmauth to cluster env

vmauth acts as a balancer and used as an example of how to interconnect
VM components via vmauth.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2023-09-26 10:50:10 +02:00
Aliaksandr Valialkin
717c53af27
lib/storage: stop exposing vm_merge_need_free_disk_space metric
This metric confuses users and has no any useful information.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/686#issuecomment-1733844128
2023-09-25 16:52:39 +02:00
Aliaksandr Valialkin
3b9605dba5
app/vmselect/promql: do not sort q1 or q2 results
This makes sure that `q2` series are returned after `q1` series in the same way as Prometheus does

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4763
2023-09-25 16:14:16 +02:00
Aliaksandr Valialkin
a740159541
app/vmselect/promql: completely substitute median_over_time() WITH template with regular median_over_time() rollup function
This is a follow-up for 34d7a670d0

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5034
2023-09-25 15:28:12 +02:00
Zakhar Bessarab
34d7a670d0
app/vmselect/promql: add implementation of median_over_time for rollup functions list (#5042)
`median_over_time` is handled by predefined WITH template in MetricsQL library which translates it to `quantile_over_time(0.5)`
This makes it impossble to use `median_over_time` as a usual rollup function for `aggr_over_time`.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5034

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-09-25 14:01:00 +02:00
Roman Khavronenko
ec50375991
docs/changelog: add link to sandbox (#5050)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-09-25 14:00:41 +02:00
Zakhar Bessarab
8d99c12a7d
lib/promscrape/discovery/kubernetes: supress context.Cancelled error in logs (#5048)
lib/promscrape/discovery/kubernetes: supress context.Cancelled error in logs

It is possible that context.Cancelled will appear after k8s watcher was closed due to reload(see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850).

Logging an error misinforms user and looks like vmagent discovery will stop working even though this does not affect discovery.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-09-22 13:01:33 +02:00
Zakhar Bessarab
760cdcec68
lib/backup: fix issue with inconsistent copying of appliedRetention.txt (#5027)
* lib/backup: fix issue with inconsistent copying of appliedRetention.txt

appliedRetention.txt can be modified in place, so it should be always copied just the same as parts.json

Updates: https://github.com/victoriaMetrics/victoriaMetrics/issues/5005
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs: add changelog entry for appliedRetention.txt copying fix

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-09-21 11:25:19 +02:00
Roman Khavronenko
462c918251
app/vmauth: update config reload routine (#5019)
* expose metrics `vmauth_config_last_reload_*` for tracking the state of config reloads, similarly to vmagent/vmalert components.
* do not print logs like `SIGHUP received...` once per configured `-configCheckInterval` cmd-line flag. This log will be printed only if config reload was invoked manually.
*  prevent configuration reloading if there were no changes in config. This improves memory usage when `-configCheckInterval` cmd-line flag is configured and config has extensive list of regexp expressions requiring additional memory on parsing.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-09-20 15:04:52 +02:00
Aliaksandr Valialkin
28aed4d098
docs/CHANGELOG.md: publish changes for v1.93.5 2023-09-19 10:50:25 +02:00
Aliaksandr Valialkin
582f1f8fda
docs/CHANGELOG.md: clarify the description of bugfixes at f7dda12b4d and b6ad581b45
This is a follow-up for 8b01bc4a5c

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4999
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5009
2023-09-19 00:22:32 +02:00
Aliaksandr Valialkin
76af32d869
lib/promscrape/discovery/kubernetes: follow-up after eeb862f3ff
- Move the bugfix description to the correct place in docs/CHANGELOG.md
- Prevent from logging of 'context canceled' errors after the url watcher is stopped,
  since these errors are expected and may confuse users.
- Remove unused urlWatcher.refCount field.
- Remove unused urlWatcher.close() method.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850
2023-09-18 17:06:39 +02:00
Aliaksandr Valialkin
4d01bc6d52
lib/backup: properly copy parts.json files inside indexdb directory additional to data directory
This is a follow-up for 264ffe3fa1

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5005
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5006
2023-09-18 16:16:50 +02:00
Nikolay
8b01bc4a5c
docs: reflect recent changes at change logs (#5015) 2023-09-18 08:22:10 +02:00
Zakhar Bessarab
eeb862f3ff
lib/promscrape/discovery/kubernetes: fix leaking api watcher (#4861)
* lib/promscrape/discovery/kubernetes: fix leaking api watcher

goroutine which was polling k8s API had no execution control. This leaded to leaking goroutines during config reload.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promscrape/discovery/kubernetes: use reference counting for urlWatcher cleanup

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promscrape/discovery/kubernetes: remove waitgroup sync for goroutines polling API server

This is unnecessary since context will is cancelled and new requests will not be sent. Also, using waitgroup will increase time required to perform reload which might result in missed scrapes.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promscrape/discovery/kubernetes: clarify comment

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* Apply suggestions from code review

* lib/promscrape/discovery/kubernetes: address review feedback

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2023-09-15 19:40:13 +02:00
Zakhar Bessarab
264ffe3fa1
lib/backup: force copying of parts.json (#5006)
* lib/backup: force copying of parts.json

Copying of parts.json is required because `part.key()` comparison can create same key value for files with different contents. This will result in inconsistent backup being created or restored.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5005
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/backup: ensure parts.json is only copied once

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2023-09-15 19:04:38 +02:00
Zakhar Bessarab
2a362e7397
docs: add changelog entry for downsampling.period and dedup.minScrapeInterval verification (#5000)
* docs: add changelog entry for downsampling.period and dedup.minScrapeInterval verification

- added changelog entry
- documented requirements for dedup.minScrapeInterval and downsampling.period being multiples of each other

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs: `make docs-sync`

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-09-15 15:14:16 +02:00
Dmytro Kozlov
d5f9619984
vmagent: add validation of MetricsQL functions (#4991)
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-09-15 13:15:23 +02:00
Aliaksandr Valialkin
151f363552
docs/CHANGELOG.md: document v1.87.9 2023-09-10 21:41:23 +02:00
Aliaksandr Valialkin
bb8eda0b0f
docs/CHANGELOG.md: document v1.93.4 2023-09-10 19:47:38 +02:00
Aliaksandr Valialkin
0bbc6a5b43
app/vmagent/remotewrite: fix data race when extra labels are added to samples before sending them to multiple remote storage systems
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4972
2023-09-08 23:24:00 +02:00
Aliaksandr Valialkin
a315694dd9
app/vmauth: add ability to specify response status codes for retrying requests during load-balancing
Response status codes for retrying can be specified via retry_status_codes list

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4893
2023-09-08 23:23:15 +02:00
Roman Khavronenko
6351d07da8
vmalert: correctly add duplicated params to the query (#4955)
Fix the bug when Group's `params` fields with multiple values were
overriding each other instead of adding up.
The bug was introduced in this commit eccecdf177
 starting from v1.91.1 https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.91.1

 https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4908

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-09-08 09:32:48 +02:00
Aliaksandr Valialkin
b80d338287
app/vmauth: retry requests at other backends on 5xx response status codes
This should allow implementing high availability scheme described at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4792#issuecomment-1674338561

See also https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4893
2023-09-08 00:46:37 +02:00
Aliaksandr Valialkin
dd10f94951
app/vmselect: return 503 status code when partial responses are denied and some of vmstorage nodes are temporarily unavailable
This should help detecting this case and automatic retrying the query at healthy cluster replica
in another availability zone.

This commit is needed as a preparation for automatic query retry at another backend at vmauth on 5xx errors
as described at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4792#issuecomment-1674338561
2023-09-07 16:11:39 +02:00
Aliaksandr Valialkin
9de440c803
lib/logger: increase the maximum log arg size from 200 to 500
The 200 chars limit has been appeared too small for typical log messages emitted by VictoriaMetrics components

This is a follow-up for 87fea7d8ac
2023-09-07 16:11:08 +02:00
Aliaksandr Valialkin
87fea7d8ac
lib/logger: limit the maximum arg length, which can be emitted to log lines
This should prevent from emitting too long lines when too long args are passed to logger.* functions.
For example, too long MetricsQL queries or too long data samples.
2023-09-07 15:22:46 +02:00
Aliaksandr Valialkin
9bccc5aab2
docs/CHANGELOG.md: return back accidentally deleted line at 45c0e4bb31 2023-09-07 12:03:04 +02:00
Aliaksandr Valialkin
2dc33e0ddc
all: update Go builder from Go1.21.0 to Go1.21.1
See https://github.com/golang/go/issues?q=milestone%3AGo1.21.1+label%3ACherryPickApproved
2023-09-07 11:36:16 +02:00
Aliaksandr Valialkin
5f85dd7f80
docs/CHANGELOG.md: clarify the scope of recent bugfixes 2023-09-07 11:25:11 +02:00
Aliaksandr Valialkin
448baf12a3
deployment/docker: properly build armv5 production builds for GOARCH=arm
Pass GOARM=5 when building GOARCH=arm production builds, since the default value for this env var
has been changed to GOARM=6 since Go1.21.0.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4965
and https://github.com/golang/go/issues/62475
2023-09-07 11:18:53 +02:00
Haleygo
45c0e4bb31
vmalert: add eval_offset for group (#4693)
Adds `eval_offset` attribute for Groups. 
If specified, Group will be evaluated at the exact time offset on the range of [0...evaluationInterval]. 
The setting might be useful for cron-like rules which must be evaluated at specific moments of time. 

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3409

Signed-off-by: Haley Wang <pipilong.25@gmail.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2023-09-06 16:29:59 +02:00
Aliaksandr Valialkin
138e02da37
docs/CHANGELOG.md: document the bugfix at 7db72dd7e6
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4947
2023-09-06 12:17:21 +02:00
Yury Molodov
7b92f1d038
vmui: fix render heatmap (#4957) 2023-09-06 10:26:45 +02:00
hagen1778
f9e47a9abe
docs: fix broken link in vmctl references
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-09-04 12:45:46 +02:00
Yury Molodov
d19072a2d9
feat: add the option to see the latest queries (#4718) (#4759)
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-09-04 11:29:11 +02:00
Aliaksandr Valialkin
7a716095dc
docs/CHANGELOG.md: document 1.93.3 release 2023-09-02 10:21:20 +02:00
Aliaksandr Valialkin
82ccae1c02
docs/CHANGELOG.md: document v1.87.8 2023-09-02 01:54:07 +02:00
Nikolay
b9a5ea03fa
lib/vmselectapi: do not send empty label names for labelNames request (#4936)
* lib/vmselectapi: do not send empty label names for labelNames request
it breaks cluster communication, since vmselect incorrectly reads request buffer, leaving unread data on it
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4932

* typo fix

* wip

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-09-01 23:26:43 +02:00
Aliaksandr Valialkin
8632683990
docs/CHANGELOG.md: document bugfix at 7c19d01e9a
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4870
2023-09-01 18:00:12 +02:00