Commit graph

7003 commits

Author SHA1 Message Date
Roman Khavronenko
0df0b0f29e
lib/promscrape: add metric vm_promscrape_scrapes_skipped_total (#5074)
* lib/promscrape: add metric `vm_promscrape_scrapes_skipped_total`

add metric `vm_promscrape_scrapes_skipped_total`to show whether vmagent skips the scrapes.
This could happen if vmagent is overloaded or target is responding too slow for configured `scrape_interval`.

The follow-up commit should add a corresponding alerting rule and panel to vmagent dashboard.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* deployment/docker: add `TooManyScrapeSkips` alerting rule for vmagent

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* dashboards: add panels `Scrape duration 0.99 quantile` and `Skipped scrapes` to vmagent dashboard

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-02 20:38:23 +02:00
Aliaksandr Valialkin
120f3bc467
lib/logstorage: follow-up for 8a23d08c21
- Compare the actual free disk space to the value provided via -storage.minFreeDiskSpaceBytes
  directly inside the Storage.IsReadOnly(). This should work fast in most cases.
  This simplifies the logic at lib/storage.

- Do not take into account -storage.minFreeDiskSpaceBytes during background merges, since
  it results in uncontrolled growth of small parts when the free disk space approaches -storage.minFreeDiskSpaceBytes.
  The background merge logic uses another mechanism for determining whether there is enough
  disk space for the merge - it reserves the needed disk space before the merge
  and releases it after the merge. This prevents from out of disk space errors during background merge.

- Properly handle corner cases for flushing in-memory data to disk when the storage
  enters read-only mode. This is better than losing the in-memory data.

- Return back Storage.MustAddRows() instead of Storage.AddRows(),
  since the only case when AddRows() can return error is when the storage is in read-only mode.
  This case must be handled by the caller by calling Storage.IsReadOnly()
  before adding rows to the storage.
  This simplifies the code a bit, since the caller of Storage.MustAddRows() shouldn't handle
  errors returned by Storage.AddRows().

- Properly store parsed logs to Storage if parts of the request contain invalid log lines.
  Previously the parsed logs could be lost in this case.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4737
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4945
2023-10-02 20:38:00 +02:00
Aliaksandr Valialkin
cbbdf9cdf5
lib/logstorage: run up to GOMAXPROCS flushers of old in-memory parts to disk
One flusher isn't enough under high data ingestion rate.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4775
2023-10-02 20:36:53 +02:00
Github Actions
025a53dcb6
Automatic update operator docs from VictoriaMetrics/operator@44bdc27 (#5104) 2023-10-02 20:36:21 +02:00
Github Actions
835393a59c
Automatic update operator docs from VictoriaMetrics/operator@c7125bd (#5102) 2023-10-02 20:36:21 +02:00
hagen1778
25a006099d
app/vlinsert/loki: make fmt
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-02 20:35:45 +02:00
Aliaksandr Valialkin
78e9cda4b1
lib/logstorage: assist merging in-memory parts at data ingestion path if their number starts exceeding maxInmemoryPartsPerPartition
This is a follow-up for 9310e9f584 , which removed data ingestion pacing.
This can result in uncontrolled growth of in-memory parts under high data ingestion rate,
which, in turn, can result in unbounded RAM usage, OOM crashes and slow query performance.

While at it, consistently reset isInMerge field for parts passed to mergeParts() before returning from this function.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4775
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4828
2023-10-02 20:35:20 +02:00
Aliaksandr Valialkin
f55d114785
lib/{mergeset,storage}: consistently reset isInMerge field in parts passed to mergeParts() before returning from the function
While at it consistently check that the isInMerge field is set in all the parts passed to mergeParts()
2023-10-02 20:34:52 +02:00
Aliaksandr Valialkin
c3ece6d38e
docs/VictoriaLogs/CHANGELOG.md: remove duplicate lines about vl_http_request_duration_seconds metric
This is a follow-up after 8a23d08c21

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4945
2023-10-02 20:34:19 +02:00
Aliaksandr Valialkin
8b1d6b995e
lib/{mergeset,storage}: perform at most one assisted merge per each call to addRows/addItems
This should reduce tail latency during data ingestion.

This shouldn't slow down data ingestion in the worst case, since assisted merges are spread among
distinct addRows/addItems calls after this change.
2023-10-02 20:33:51 +02:00
Aliaksandr Valialkin
4c0402f118
docs/Single-server-VictoriaMetrics.md: refer to active queries and top queries pages at VMUI instead of refering to the corresponding HTTP endpoints
"Active queries" and "Top queries" pages at VMUI are user-friendly than the corresponding HTTP endpoints
2023-10-02 20:33:10 +02:00
Aliaksandr Valialkin
5f1492d978
docs/vmalert.md: refer to -evaluationInterval command-line flag instead of evaluation_interval option, which isnt supported by vmalert
This is follow-up for 5c42c1218a
2023-10-02 20:32:02 +02:00
Aliaksandr Valialkin
ac418281da
docs/Troubleshooting.md: describe how to optimize SLI/SLO queries with long lookbehind windows 2023-10-02 20:29:37 +02:00
Aliaksandr Valialkin
b5f9a6a5c6
docs/CHANGELOG.md: follow-up for f0e33700fc
Mention that the statistic inaccuracy is related to cardinality explorer
2023-10-02 20:29:07 +02:00
Aliaksandr Valialkin
3db9db356d
deployment/docker/docker-compose-cluster.yml: follow-up for 4d1b572f46
Grafana and vmalert now depend on vmauth instead of individual vmselect nodes

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5057
2023-10-02 20:28:06 +02:00
Aliaksandr Valialkin
b366a22018
deployment: update VictoriaMetrics version from v1.93.4 to v1.93.5
See https://docs.victoriametrics.com/CHANGELOG.html#v1935
2023-10-02 20:16:54 +02:00
Dmytro Kozlov
90b189dab8
app/vmselect: fix bitmap_*() functions behavior (#5021)
Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4996

Signed-off-by: dmitryk-dk d.kozlov@victoriametrics.com

Signed-off-by: dmitryk-dk d.kozlov@victoriametrics.com
Co-authored-by: Nikolay <nik@victoriametrics.com>
2023-10-02 20:13:27 +02:00
Aliaksandr Valialkin
6e613cb8e8
docs/Cluster-VictoriaMetrics.md: increase the minimum supported version of Go builder from 1.18 to 1.20
See the related commit 3da493ff62
2023-10-02 19:33:41 +02:00
Zakhar Bessarab
876bce5a57
lib/logstorage: prevent from panic during background merge (#4969)
* lib/logstorage: prevent from panic during background merge

Fixes panic during background merge when resulting block would contain more columns than maxColumnsPerBlock.
Buffered data will be flushed and replaced by the next block.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4762
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/logstorage: clarify field description and comment

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-10-02 19:29:31 +02:00
Zakhar Bessarab
dfdada055c
lib/logstorage: switch to read-only mode when running out of disk space (#4945)
* lib/logstorage: switch to read-only mode when running out of disk space

Added support of `--storage.minFreeDiskSpaceBytes` command-line flag to allow graceful handling of running out of disk space at `--storageDataPath`.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4737
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/logstorage: fix error handling logic during merge

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/logstorage: fix log level

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2023-10-02 17:09:57 +02:00
Zakhar Bessarab
53268ebc66
lib/logstorage/datadb: remove parts merge cond (#4828)
It was added in order to limit number of goroutines performing assisted merges during ingestion.
It turned out that blocking ingestion goroutines lower ingestion performance and limits overall ingestion around 40k items per seconds because of lock contention.
Removing parts merge sync.Cond allows to remove lock contention at write path and significantly improves write performance.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4775

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-10-02 17:09:12 +02:00
Dmytro Kozlov
10371eac60
vmui: update information about tsdb usage in cluster version (#5004)
* vmui: update information about tsdb usage in cluster version

* vmui: cleanup

* vmui: add CHANGELOG.md

* vmui: cleanup

* vmui: update logic, move information to the visible place

* app/vmui: remove values fetch, update documentation for cardinality explorer

* app/vmui: update CHANGELOG.md
2023-10-01 21:30:44 +02:00
Zakhar Bessarab
ff88e53e01
doc: address review feedback
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-10-01 21:27:49 +02:00
Zakhar Bessarab
9f6704d6cc
doc: mention InfluxDB v2 HTTP API support
Address: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5076
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-10-01 21:26:44 +02:00
Github Actions
18cc977956
Automatic update operator docs from VictoriaMetrics/operator@958ce2b (#5070) 2023-10-01 21:25:54 +02:00
hagen1778
d0641d6ea2
dashboards: move Concurrent inserts panel to Troubleshooting section
Moved because this panel is related to both: scraped and ingested data.
Before, it could have give a misleading impression that it is related to ingested metrics only.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-01 21:25:25 +02:00
Roman Khavronenko
d39c8525e2
Docker add vmauth (#5057)
* docker-compose: add vmauth to cluster env

vmauth acts as a balancer and used as an example of how to interconnect
VM components via vmauth.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* docker-compose: add vmauth to cluster env

vmauth acts as a balancer and used as an example of how to interconnect
VM components via vmauth.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2023-10-01 21:24:01 +02:00
Aliaksandr Valialkin
15645c8a94
app/vmagent: follow-up for cfef814750
- Properly handle /insert/multitenant/api/put url for opentsdb handler at vmagent
- Document that the bug has been introduced in v1.93.2 at docs/CHANGELOG.md
- Add a link to multitenant url docs in bugfix description

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5061
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4910
2023-10-01 20:52:29 +02:00
Alexander Marshalov
cfef814750
fixed ingestion via multitenant url for opentsdbhttp (#5061) (#5063) 2023-09-26 10:47:49 +02:00
Aliaksandr Valialkin
e341128096
docs/vmagent.md: make VictoriaMetrics remove_write protocol more visible by mentioning it at the top of the page 2023-09-25 17:42:25 +02:00
Aliaksandr Valialkin
9ae92ff2ee
lib/storage: remove unused atomicSetBool function after 717c53af27 2023-09-25 17:37:45 +02:00
Aliaksandr Valialkin
308134970f
docs: run make docs-sync after 8e722e10ee 2023-09-25 17:35:34 +02:00
Aliaksandr Valialkin
f6b35a715d
docs/CaseStudies.md: add Criteo case study
This is a follow-up for bdbe616408

See https://medium.com/criteo-engineering/victoriametrics-a-prometheus-remote-storage-solution-57081a3d8e61
2023-09-25 17:34:47 +02:00
Aliaksandr Valialkin
60fe63df07
lib/storage: make it clear that the number of big merge workers always equals to 4
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4915#issuecomment-1733922830
2023-09-25 17:17:40 +02:00
Aliaksandr Valialkin
a421db5977
lib/storage: stop exposing vm_merge_need_free_disk_space metric
This metric confuses users and has no any useful information.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/686#issuecomment-1733844128
2023-09-25 17:00:14 +02:00
Aliaksandr Valialkin
538dc6058d
app/vmselect/promql: run make fmt after 3b9605dba5 2023-09-25 16:15:58 +02:00
Aliaksandr Valialkin
b43ff80d21
app/vmselect/promql: do not sort q1 or q2 results
This makes sure that `q2` series are returned after `q1` series in the same way as Prometheus does

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4763
2023-09-25 16:15:02 +02:00
Aliaksandr Valialkin
c954019e43
app/vmselect/promql: completely substitute median_over_time() WITH template with regular median_over_time() rollup function
This is a follow-up for 34d7a670d0

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5034
2023-09-25 15:31:25 +02:00
Zakhar Bessarab
fd6ca57c14
app/vmselect/promql: add implementation of median_over_time for rollup functions list (#5042)
`median_over_time` is handled by predefined WITH template in MetricsQL library which translates it to `quantile_over_time(0.5)`
This makes it impossble to use `median_over_time` as a usual rollup function for `aggr_over_time`.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5034

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-09-25 15:31:25 +02:00
Roman Khavronenko
23131f932a
docs/changelog: add link to sandbox (#5050)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-09-25 15:13:30 +02:00
Github Actions
d9754ca44b
Automatic update operator docs from VictoriaMetrics/operator@587ea54 (#5054) 2023-09-25 15:12:44 +02:00
hagen1778
1ffc23af1b
docs/articles: add link to "How to reduce expenses on monitoring" slides
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-09-25 15:03:26 +02:00
Aliaksandr Valialkin
36d26b69aa
docs/Cluster-VictoriaMetrics.md: update -help output for enterprise components 2023-09-22 13:51:18 +02:00
Zakhar Bessarab
0be8960875
lib/promscrape/discovery/kubernetes: supress context.Cancelled error in logs (#5048)
lib/promscrape/discovery/kubernetes: supress context.Cancelled error in logs

It is possible that context.Cancelled will appear after k8s watcher was closed due to reload(see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850).

Logging an error misinforms user and looks like vmagent discovery will stop working even though this does not affect discovery.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
(cherry picked from commit 8d99c12a7d)
2023-09-22 13:02:57 +02:00
Zakhar Bessarab
86eaf6906b
docs/vmbackup: update docs for different authentication options, add examples (#5046)
Updates: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5023

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-09-22 11:42:34 +02:00
Github Actions
391b857eb4
Automatic update operator docs from VictoriaMetrics/operator@9d65e09 (#5040) 2023-09-22 11:41:33 +02:00
Aliaksandr Valialkin
281eb0c377
lib/storage: log fatal error inside searchMetricName() instead of propagating it to the caller
This simplifies the code a bit at searchMetricName() and searchMetricNameWithCache() call sites

This is a result of investigating https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4972
2023-09-22 11:37:55 +02:00
Zakhar Bessarab
e216592378
lib/backup: fix issue with inconsistent copying of appliedRetention.txt (#5027)
* lib/backup: fix issue with inconsistent copying of appliedRetention.txt

appliedRetention.txt can be modified in place, so it should be always copied just the same as parts.json

Updates: https://github.com/victoriaMetrics/victoriaMetrics/issues/5005
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs: add changelog entry for appliedRetention.txt copying fix

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-09-21 11:26:13 +02:00
Aliaksandr Valialkin
11ebcf1f9b
app/vmauth: consistently use '%w' for formatting errors in fmt.Errorf() 2023-09-21 11:05:26 +02:00
Roman Khavronenko
c9f121e694
app/vmauth: update config reload routine (#5019)
* expose metrics `vmauth_config_last_reload_*` for tracking the state of config reloads, similarly to vmagent/vmalert components.
* do not print logs like `SIGHUP received...` once per configured `-configCheckInterval` cmd-line flag. This log will be printed only if config reload was invoked manually.
*  prevent configuration reloading if there were no changes in config. This improves memory usage when `-configCheckInterval` cmd-line flag is configured and config has extensive list of regexp expressions requiring additional memory on parsing.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-09-21 11:05:26 +02:00