Commit graph

1958 commits

Author SHA1 Message Date
hagen1778
dc25c30fdc
docs: move recent changes to Tip
These changes were mistakenly put to existing release

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-20 14:58:15 +01:00
hagen1778
e2dad3a2ac
app/vmalert: consistently sort groups by name and filename on /groups page
This should prevent non-deterministic sorting for groups with identical names.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-20 13:50:57 +01:00
hagen1778
11b03d9fc8
app/vmalert: follow-up after b60dcbe11f
* support case-insensitive search
* reflect search condition in URL, so link can be sharable
* support filtering on /alerts page
* fix collapseAll/expandAll logic to respect only shown entries
* add changelog

b60dcbe11f
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-20 13:07:05 +01:00
Aliaksandr Valialkin
0514091948
app/vlselect: follow-up for 451d2abf50
- Consistently return the first `limit` log entries if the total size of found log entries doesn't exceed 1Mb.
  See app/vlselect/logsql/sort_writer.go . Previously random log entries could be returned with each request.
- Document the change at docs/VictoriaLogs/CHANGELOG.md
- Document the `limit` query arg at docs/VictoriaLogs/querying/README.md
- Make the change less intrusive.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5674
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5778
2024-02-18 23:05:51 +02:00
Dmytro Kozlov
451d2abf50
Enable the limit query param for the /select/logsql/query (#5778)
* app/vlselect: add limit for logs query

* app/vlselect: CHANGELOG.md

* app/vlselect: stop search process if limit is reached, update logic, remove default limit

* app/vlselect: fix tests

* app/vlselect: fix filter tests

* app/vlselect: fix tests
2024-02-18 22:58:47 +02:00
Aliaksandr Valialkin
c42ddce159
lib/promscrape: add support for enable_compression option in the same way as Prometheus does
Updates https://github.com/prometheus/prometheus/pull/13166
Updates https://github.com/prometheus/prometheus/issues/12319

Do not document enable_compression option at docs/sd_configs.md, since vmagent already supports
more clear disable_compression option - see https://docs.victoriametrics.com/vmagent/#scrape_config-enhancements
2024-02-18 19:40:39 +02:00
Aliaksandr Valialkin
5a092e161c
lib/promscrape/discovery/kuma: add support for client_id option
See https://github.com/prometheus/prometheus/pull/13278
2024-02-18 19:19:40 +02:00
Aliaksandr Valialkin
2e30842582
vendor: update github.com/VictoriaMetrics/metricsql from v0.72.1 to v0.73.0
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5383
2024-02-18 18:41:41 +02:00
Aliaksandr Valialkin
dfbb6e0826
docs/LTS-releases.md: cosmetic fixes 2024-02-17 18:09:36 +02:00
Aliaksandr Valialkin
d03719e72d
docs/CHANGELOG.md: document f8207e33a2 2024-02-17 17:52:53 +02:00
Aliaksandr Valialkin
cee901cdf4
docs/CHANGELOG.md: add missing for in the description of the TLS configuration features for vmctl
This is a follow-up for 6a07cb1bdb and f973711e56
2024-02-17 17:44:22 +02:00
hagen1778
f973711e56
app/vmctl: follow-up after 0c293a66ec
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-16 15:22:44 +01:00
Khushi Jain
0c293a66ec
app/vmctl : support TLS config options for remote read mode (#5798) 2024-02-16 15:12:43 +01:00
hagen1778
6a07cb1bdb
app/vmctl: follow-up after 7cd1b7d047
* cleanup code
* update docs

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-16 15:08:51 +01:00
Khushi Jain
7cd1b7d047
app/vmctl : support TLS config options for InfluxDB datasource (#5783)
* vmctl: TLS flags for influx DB

* added httputils function

* Add changelog and doc

---------

Co-authored-by: Khushi Jain <khushi.jain@nokia.com>
2024-02-16 14:59:18 +01:00
Aliaksandr Valialkin
6b9bedd0f9
app/vmstorage: expose vm_last_partition_parts metrics, which may help identifying performance issues related to the increased number of parts in the last partition 2024-02-15 14:51:19 +02:00
Aliaksandr Valialkin
926854b0f3
docs/CHANGELOG.md: document v1.93.12 LTS release 2024-02-14 20:17:44 +02:00
Aliaksandr Valialkin
39cba7e4fa
docs/CHANGELOG.md: document v1.97.2 LTS release 2024-02-14 18:50:50 +02:00
Aliaksandr Valialkin
baaa88001e
lib/promrelabel: store the original labels before returning them them to promutils.PutLabels()
This should reduce memory allocations.

This is a follow-up for b09bd6c42a

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5389
2024-02-14 16:09:10 +02:00
Aliaksandr Valialkin
e4ad41b5ff
docs/CHANGELOG.md: cut v1.98.0 release 2024-02-14 16:00:50 +02:00
Aliaksandr Valialkin
4954eee187
docs/CHANGELOG.md: various typo cleanups 2024-02-14 02:45:17 +02:00
Yury Molodov
1c9f13d6c7
vmui: improve the context for autocomplete #5736 #5737 #5739 (#5804)
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-14 00:22:51 +00:00
Aliaksandr Valialkin
d6e22f2888
app/vmselect: add sum_eq_over_time, sum_gt_over_time and sum_le_over_time functions to MetricsQL
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4641
2024-02-13 23:40:07 +02:00
Nikolay
88329d84ca
app/vmauth: properly release memory during config reload (#5805)
* app/vmauth: properly release memory during config reload
previously metrics package hold a refrence for channels for users concurrent requests.
it case of churn at `name`  field of users configuration, new metric was created. But previous one wasn't deleted.
It prevented full parsed configuration from being garbace collected.

now all config related metrics are bound to corresponding metrics.Set and unregistered during config reload process.

It also must fix an issue with incorrect values for current concurrent user requests

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4690

* wip

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-13 18:49:17 +00:00
Aliaksandr Valialkin
2f3091460f
vendor: update github.com/VictoriaMetrics/metricsql from v0.70.1 to v0.71.0
This adds an ability to propagate label filters across label_set() and alias() functions.

This should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1827#issuecomment-1654095358
2024-02-13 06:36:59 +02:00
Aliaksandr Valialkin
e963d6c789
app/vmagent/remotewrite: add -remoteWrite.tlsHandshakeTimeout command-line flag for tuning tls handshake timeout to -remoteWrite.url
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1699
2024-02-13 02:46:33 +02:00
Aliaksandr Valialkin
062cbb1130
app/vmauth: add support for mTLS-based routing of incoming requests to different backends depending on the subject field in the TLS certificate provided by the user
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1547
2024-02-13 01:03:20 +02:00
Aliaksandr Valialkin
95222b2079
all: upgrade Go builder from Go1.21.7 to Go1.22.0
See https://go.dev/doc/go1.22
2024-02-12 21:59:51 +02:00
Roman Khavronenko
8850c7431d
app/vmalert: support filtering for /api/v1/rule like Prometheus does (#5787)
Follow-up after 62e5e2a4c8

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-09 14:35:31 +01:00
Victor Amorim dos Santos
62e5e2a4c8
app/vmalert: support type param for filtering /api/v1/rules response by rule type (#5749)
Co-authored-by: Hui Wang <haley@victoriametrics.com>
2024-02-09 09:02:35 +01:00
Aliaksandr Valialkin
ae8a867924
all: add support for specifying multiple -httpListenAddr options 2024-02-09 03:15:04 +02:00
Aliaksandr Valialkin
b161e889b5
docs/CHANGELOG.md: typo fixes 2024-02-08 21:18:45 +02:00
Aliaksandr Valialkin
d8c1db7953
lib/httpserver: do not close client connections every 2 minutes by default
Closing client connections every 2 minutes doesn't help load balancing -
this just leads to "jumpy" connections between multiple backend servers,
e.g. the load isn't spread evenly among backend servers, and instead jumps
between the servers every 2 minutes.

It is still possible periodically closing client connections by specifying non-zero -http.connTimeout command-line flag.

This should help with https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1304#issuecomment-1636997037

This is a follow-up for d387da142e
2024-02-08 21:10:25 +02:00
Aliaksandr Valialkin
b718e555a6
docs/CHANGELOG.md: add a link to docs describing -disableReroutingOnUnavailable command-line flag 2024-02-08 17:03:19 +02:00
hagen1778
e1926f286b
docs: follow-up after 83e55456e2
83e55456e2
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-08 15:55:57 +01:00
Khushi Jain
83e55456e2
app/vmbackup: support client-side TLS configuration for create/delete snapshot API (#5738) 2024-02-08 15:52:00 +01:00
Aliaksandr Valialkin
a354924b0d
app/victoria-metrics: properly send staleness markers on victoriametrics shutdown if -selfScrapeInterval > 0
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/943
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1526
2024-02-08 15:29:19 +02:00
Aliaksandr Valialkin
61d9df4c36
app/vmselect: add ability to reset rollup result cache on startup by passing -search.resetRollupResultCacheOnStartup command-line flag
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/834
2024-02-08 14:40:40 +02:00
Aliaksandr Valialkin
f7b68b466c
docs/CHANGELOG.md: clarify the bugfix description for e1bf8440eb 2024-02-08 13:01:19 +02:00
Aliaksandr Valialkin
e1bf8440eb
lib/mergeset: prevent from possible too big indexBlockSize panic
This panic could occur when samples with too long label values are ingested into VictoriaMetrics.
This could result in too long fistItem and commonPrefix values at blockHeader (up to 64kb each).
This may inflate the maximum index block size by 4 * maxIndexBlockSize.
2024-02-08 12:54:10 +02:00
hagen1778
3380043424
dashboards: follow-up 4369bc1df2
* add more details to changelog
* simplify panels description
* remove capacity planning recommendation, as it proves it incompetent

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-08 09:51:43 +01:00
Hui Wang
4369bc1df2
deployment/dashboards: fix Storage full ETA panels (#5747)
During background downsampling, rate(vm_deduplicated_samples_total{type="merge"}) could be much bigger than 
rate(vm_rows_added_to_storage_total) and it could last quite some time,
 which causes negative values of Storage full ETA and confuses users, see playground.

Instead of trying to get more accurate results during downsampling, I think it's ok to ignore 
vm_deduplicated_samples_total at all, it's more reasonable to see Storage full ETA increase after downsampling.

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-08 09:43:39 +01:00
Aliaksandr Valialkin
0cf56c1ba5
app/vmselect/promql: properly handle precision errors in rollup functions
changes(), increases_over_time() and resets() shouldn't take into account
value changes, which may occur because of precision errors.

The maximum guaranteed precision for raw samples stored in VictoriaMetrics is 12 decimal digits.
So do not count relative changes for values if they are smaller than 1e-12 comparing to the value.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/767
2024-02-08 02:30:57 +02:00
Aliaksandr Valialkin
19c1066a25
docs/CHANGELOG.md: properly document the change at b74006e2ca
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5774
2024-02-07 22:05:12 +02:00
Nihal
b74006e2ca
[vmsingle/vminsert]: change http success response code to 200 for -/reload request handler (#5776)
* change vmsingle's response code to 200 for reload request handler

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>

* change vmsingle's response code to 200 for the reload request handler

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>

* change vmsingle's response code to 200 for the reload request handler. See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5774

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>

---------

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>
2024-02-07 20:00:04 +00:00
Aliaksandr Valialkin
e78e5ccfaa
docs/CHANGELOG.md: support empty command-line flag values in short array notation
For example, -fooDuration=',10s,' is now supported - it sets three command-line flag values:

- the first and the last one are set to the default value for `-fooDuration`
- the second one is set to 10s
2024-02-07 20:53:13 +02:00
Aliaksandr Valialkin
b431ccea5b
all: update Go builder from Go1.21.6 to Go1.21.7
See https://github.com/golang/go/issues?q=milestone%3AGo1.21.7+label%3ACherryPickApproved
2024-02-07 04:00:37 +02:00
Aliaksandr Valialkin
541b644d3d
app/{vmagent,vminsert}: follow-up after a1d1ccd6f2
- Document the change at docs/CHANGELOG.md
- Copy changes from docs/Single-server-VictoriaMetrics.md to README.md
- Add missing handler for processing multitenant requests ( https://docs.victoriametrics.com/vmagent/#multitenancy )
- Substitute github.com/stretchr/testify dependency with 3 lines of code in the added tests
- Comment unclear code at lib/protoparser/datadogsketches/parser.go , so @AndrewChubatiuk could update it
  and add permalinks to the original source code there.
- Various code cleanups

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5584
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3091
2024-02-07 01:28:05 +02:00
Yury Molodov
dcbdbc760e
vmui: improve select component functionality (#5755)
* vmui: fix select closing on click outside (#5728)

* vmui: clear entered text in select after selecting a value (#5727)

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-06 20:50:04 +00:00
Yury Molodov
a81ccbd749
vmui: fix handling invalid timezone (#5758)
* vmui: fix handling invalid timezone (#5732)

* vmui: switch browser timezone flag to isValid

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-06 20:47:30 +00:00
Yury Molodov
65b8002aeb
vmui: fix graph dragging (#5769)
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-06 20:32:03 +00:00
Aliaksandr Valialkin
61524ad87b
vendor: update github.com/VictoriaMetrics/metricsql from v0.70.0 to v0.70.1
This should help with the https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5604
2024-02-06 21:52:35 +02:00
Aliaksandr Valialkin
f4f1caea2a
docs/CHANGELOG.md: add link to mTLS feature request 2024-02-06 19:32:40 +02:00
Aliaksandr Valialkin
7bc3af1224
lib/httpserver: add support for mTLS for requests to -httpListenAddr 2024-02-06 17:46:19 +02:00
Aliaksandr Valialkin
c1e50848c5
docs/CHANGELOG.md: move descriptions for recently added features from v1.97.1 to tip 2024-02-06 16:31:18 +02:00
Aliaksandr Valialkin
209c96fc42
docs/Cluster-VictoriaMetrics.md: document -disableReroutingOnUnavailable command-line flag
This is a follow-up for 88f0d1572e
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5713
2024-02-05 15:18:01 +02:00
Aliaksandr Valialkin
5f836c8729
docs/CHANGELOG.md: add a link to all VictoriaMetrics dashboards for Grafana 2024-02-05 11:42:05 +02:00
Aliaksandr Valialkin
1684766152
docs/CHANGELOG.md: add missing links to the corresponding dashboards 2024-02-05 10:55:30 +02:00
hagen1778
487a94565b
dashboards/all: add new panel CPU spent on GC
It should help identifying cases when too much CPU is spent on garbage collection,
 and advice users on how this can be addressed.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-02 16:21:21 +01:00
hagen1778
29a9b31584
dashboards: add Targets scraped/s
A new stat panel shows the number of targets scraped by the vmagent per-second.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-02 15:48:26 +01:00
Aliaksandr Valialkin
deed8ddfb8
docs/CHANGELOG.md: document v1.93.11 LTS release 2024-02-01 18:21:28 +02:00
Aliaksandr Valialkin
87bf1900e4
docs/CHANGELOG.md: document v1.87.14 LTS release 2024-02-01 17:08:56 +02:00
Aliaksandr Valialkin
31c53adbde
docs: mark v1.97.x as long-term support release 2024-02-01 15:16:39 +02:00
Aliaksandr Valialkin
bdfa4aee0d
docs/CHANGELOG.md: cut v1.97.1 2024-02-01 15:08:40 +02:00
Aliaksandr Valialkin
8aaa828ba3
lib/prompbmarshal: return back custom protobuf marshaler for lib/prompbmarshal.WriteRequest
The easyproto-based marshaler is 2x slower than the previous custom marshaler,
so let's stick with it. This improves the performance for sending data to remote storage at vmagent
and reduces CPU usage to pre-v1.97.0 levels.
2024-02-01 06:33:06 +02:00
Aliaksandr Valialkin
b7fd7ee0b6
lib/promauth: follow-up for fca3b14b7b
- Simplify the code for handling BasicAuthConfig at lib/promauth/config.go
- Move the description of the change into correct place at docs/CHANGELOG.md
- Put tests for username in front of tests for password at lib/promauth/config_test.go

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5720
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5511
2024-01-31 19:45:16 +02:00
Nihal
fca3b14b7b
Support for username_file in scrape config (basic_auth) similar to Prometheus for having config compatibility (#5720)
* adding support for username_file in basic_auth of scrape config

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>

* adding support for username_file in basic_auth of scrape config. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5511

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>

* adding support for username_file in basic_auth of scrape config

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>

* adding support for username_file in basic_auth of scrape config

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>

* adding support for username_file in basic_auth of scrape config

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>

---------

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>
2024-01-31 17:41:16 +00:00
Aliaksandr Valialkin
db4623efc2
app/vmselect/netstorage: properly handle the case when an empty brsPool points to the end of brs.brs
This case is possible after a new brsPool is allocated. The fix is to verify whether len(brsPool) >= len(brs.brs)
before trying to append a new item to brsPool and sharing its contents with brs.brs.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5733
2024-01-31 10:27:50 +02:00
hagen1778
02492bc1a4
dashboards/single: fix typo in query for version annotation
The typo falsely produced many version change events.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-31 09:13:46 +01:00
Aliaksandr Valialkin
ec0ca8e7eb
app/vmselect/promql: really keep metric names when keep_metric_names modifier is applied to binary operator
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5556
2024-01-31 02:32:55 +02:00
Aliaksandr Valialkin
fcc8b14f86
deployment/docker: upgrade base Docker image from Alpine 3.19.0 to 3.19.1
See https://www.alpinelinux.org/posts/Alpine-3.19.1-released.html
2024-01-30 22:47:18 +02:00
Aliaksandr Valialkin
26488726a8
docs/CHANGELOG.md: cut v1.97.0 2024-01-30 22:45:04 +02:00
Roman Khavronenko
6939c53e48
app/vmselect: set proper timestamp for cached instant responses (#5723)
* app/vmselect: set proper timestamp for cached instant responses

The change updates `getSumInstantValues` to prefer timestamp
from the most recent results. Before, timestamp from cached series
was used.

The old behavior had negative impact on recording rules as they
were getting responses with shifted timestamps in past.
Subsequent recording or alerting rules fetching results of these
recording rules could get no result due to staleness interval.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5659
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-30 20:03:34 +00:00
Yury Molodov
81b5db04f6
vmui: add the ability to expand all tracing entries (#5677) (#5726) 2024-01-30 19:10:10 +00:00
Aliaksandr Valialkin
f768d5d797
docs/CHANGELOG.md: document the enhancement, which reduces initial memory usage when vmagent scrapes targets with large responses
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5567
2024-01-30 20:51:13 +02:00
Aliaksandr Valialkin
17f8ed8948
docs/CHANGELOG.md: refer to the related pull request for the bugfix for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1945 2024-01-30 20:21:44 +02:00
Aliaksandr Valialkin
ea2752ce62
docs/CHANGELOG.md: document the bugfix addressed by the commit bc7cf4950b
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1945
2024-01-30 20:16:22 +02:00
Aliaksandr Valialkin
5d66ee88bd
lib/storage: do not check the limit for -search.maxUniqueTimeseries when performing /api/v1/labels and /api/v1/label/.../values requests
This limit has little sense for these APIs, since:

- Thses APIs frequently result in scanning of all the time series on the given time range.
  For example, if extra_filters={datacenter="some_dc"} .

- Users expect these APIs shouldn't hit the -search.maxUniqueTimeseries limit,
  which is intended for limiting resource usage at /api/v1/query and /api/v1/query_range requests.

Also limit the concurrency for /api/v1/labels, /api/v1/label/.../values
and /api/v1/series requests in order to limit the maximum memory usage and CPU usage for these API.
This limit shouldn't affect typical use cases for these APIs:

- Grafana dashboard load when dashboard labels should be loaded
- Auto-suggestion list load when editing the query in Grafana or vmui

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055
2024-01-29 16:45:12 +01:00
Roman Khavronenko
aaa526e8ff
lib/streamaggr: skip unfinished aggregation state on shutdown by default (#5689)
Sending unfinished aggregate states tend to produce unexpected anomalies with lower values than expected.
The old behavior can be restored by specifying `flush_on_shutdown: true` setting in streaming aggregation config

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-26 22:45:23 +01:00
Roman Khavronenko
df59ac7f0e
app/vmalert: fix data race during hot-config reload (#5698)
* app/vmalert: fix data race during hot-config reload

During hot-reload, the logic evokes the group update and rules evaluation
interruption simultaneously. Falsely assuming that interruption happens before
the update. However, it could happen that group will be updated first and only
after the rules evaluation will be cancelled. Which will result in permanent
interruption for all rules within the group.

The fix caches the cancel context function into local variable first. And only after
performs the group update. With cached cancel function we can safely call it without
worrying that we cancel the evaluation for already updated group.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Revert "app/vmalert: fix data race during hot-config reload"

This reverts commit a4bb7e8932.

* app/vmalert: fix data race during hot-config reload

During hot-reload, the logic evokes the group update and rules evaluation
interruption simultaneously. Falsely assuming that interruption happens before
the update. However, it could happen that group will be updated first and only
after the rules evaluation will be cancelled. Which will result in permanent
interruption for all rules within the group.

The fix cancels the evaulation context before applying the update, making sure
that the context will be cancelled for old group always.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-26 22:42:21 +01:00
Yury Molodov
a7b11eff7c
vmui: fix Enter key in query field (#5667) (#5681) 2024-01-26 22:38:32 +01:00
Aliaksandr Valialkin
bb7a419cc3
lib/{mergeset,storage}: make background merge more responsive and scalable
- Maintain a separate worker pool per each part type (in-memory, file, big and small).
  Previously a shared pool was used for merging all the part types.
  A single merge worker could merge parts with mixed types at once. For example,
  it could merge simultaneously an in-memory part plus a big file part.
  Such a merge could take hours for big file part. During the duration of this merge
  the in-memory part was pinned in memory and couldn't be persisted to disk
  under the configured -inmemoryDataFlushInterval .

  Another common issue, which could happen when parts with mixed types are merged,
  is uncontrolled growth of in-memory parts or small parts when all the merge workers
  were busy with merging big files. Such growth could lead to significant performance
  degradataion for queries, since every query needs to check ever growing list of parts.
  This could also slow down the registration of new time series, since VictoriaMetrics
  searches for the internal series_id in the indexdb for every new time series.

  The third issue is graceful shutdown duration, which could be very long when a background
  merge is running on in-memory parts plus big file parts. This merge couldn't be interrupted,
  since it merges in-memory parts.

  A separate pool of merge workers per every part type elegantly resolves both issues:
  - In-memory parts are merged to file-based parts in a timely manner, since the maximum
    size of in-memory parts is limited.
  - Long-running merges for big parts do not block merges for in-memory parts and small parts.
  - Graceful shutdown duration is now limited by the time needed for flushing in-memory parts to files.
    Merging for file parts is instantly canceled on graceful shutdown now.

- Deprecate -smallMergeConcurrency command-line flag, since the new background merge algorithm
  should automatically self-tune according to the number of available CPU cores.

- Deprecate -finalMergeDelay command-line flag, since it wasn't working correctly.
  It is better to run forced merge when needed - https://docs.victoriametrics.com/#forced-merge

- Tune the number of shards for pending rows and items before the data goes to in-memory parts
  and becomes visible for search. This improves the maximum data ingestion rate and the maximum rate
  for registration of new time series. This should reduce the duration of data ingestion slowdown
  in VictoriaMetrics cluster on e.g. re-routing events, when some of vmstorage nodes become temporarily
  unavailable.

- Prevent from possible "sync: WaitGroup misuse" panic on graceful shutdown.

This is a follow-up for fa566c68a6 .
Thanks @misutoth to for the inspiration at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5190
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3790
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3337
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3425
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3647
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3641
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291
2024-01-26 22:27:47 +01:00
Roman Khavronenko
b11f4ef5ea
app/vmalert: autogenerate ALERTS_FOR_STATE time series for alerting rules with for: 0 (#5680)
* app/vmalert: autogenerate `ALERTS_FOR_STATE` time series for alerting rules with `for: 0`

 Previously, `ALERTS_FOR_STATE` was generated only for alerts with `for > 0`.
 This behavior differs from Prometheus behavior - it generates ALERTS_FOR_STATE
 time series for alerting rules with `for: 0` as well. Such time series can
 be useful for tracking the moment when alerting rule became active.

 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5648
 https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3056

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* app/vmalert: support ALERTS_FOR_STATE in `replay` mode

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-25 15:42:57 +01:00
Alexander Marshalov
806c07ddd5
vmsingle/vmselect returns http status 429 (TooManyRequests) instead of 503 (ServiceUnavailable) when max concurrent requests limit is reached. (#5682) 2024-01-24 17:55:06 +01:00
Aliaksandr Valialkin
ef12598ad4
lib/promscrape/discovery/kubernetes: do not generate targets for already terminated pods and containers
Already terminated pods and containers cannot be scraped and will never resurrect,
so there is zero sense in creating scrape targets for them.
2024-01-24 14:57:53 +02:00
Aliaksandr Valialkin
4d961c70f7
app/{vmselect,vmstorage}: return compression of the data passed from vmstorage to vmselect
This reverts cd4f641d32 , since it has been appeared that the disabled compression
for vmstorage->vmselect data increase network bandwidth usage by more than 10x on typical production workloads,
while it decreases CPU usage at vmstorage by up to 10% and improves query latency by up to 10%.

The 10x increase in network usage is too high price for 10% improvements on query latency and vmstorage CPU usage.
This may result in network bandwidth bottlenecks, which can reduce the overall performance and stability
of VictoriaMetrics cluster. That's why return back the vmstorage->vmselect data compression by default.

The vmstorage->vmselect compression can be disabled by passing -rpc.disableCompression command-line flag to vmstorage.
The vmselect->vmselect compression in multi-level cluster setup can be disabled by passing -clusternative.disableCompression command-line flag.
2024-01-24 13:39:28 +02:00
Aliaksandr Valialkin
f888a019fe
lib/streamaggr: expand %{ENV} placeholders in stream aggregation configs 2024-01-24 12:31:27 +02:00
Aliaksandr Valialkin
fa566c68a6
lib/mergeset: really limit the number of in-memory parts to 15
It has been appeared that the registration of new time series slows down linearly
with the number of indexdb parts, since VictoriaMetrics needs to check every indexdb part
when it searches for TSID by newly ingested metric name.

The number of in-memory parts grows when new time series are registered
at high rate. The number of in-memory parts grows faster on systems with big number
of CPU cores, because the mergeset maintains per-CPU buffers with newly added entries
for the indexdb, and every such entry is transformed eventually into a separate in-memory part.

The solution has been suggested in https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212
by @misutoth - to limit the number of in-memory parts with buffered channel.
This solution is implemented in this commit. Additionally, this commit merges per-CPU parts
into a single part before adding it to the list of in-memory parts. This reduces CPU load
when searching for TSID by newly ingested metric name.

The https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212 recommends setting the limit on the number
of in-memory parts to 100, but my internal testing shows that much lower limit 15 works with the same efficiency
on a system with 16 CPU cores while reducing memory usage for `indexdb/dataBlocks` cache by up to 50%.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5190
2024-01-24 03:38:12 +02:00
Roman Khavronenko
89e3c70ccd
lib/promscrape: respect 0 value for series_limit param (#5663)
* lib/promscrape: respect `0` value for `series_limit` param

Respect `0` value for `series_limit` param in `scrape_config`
even if global limit was set via `-promscrape.seriesLimitPerTarget`.
Previously, `0` value will be ignored in favor of `-promscrape.seriesLimitPerTarget`.

This behavior aligns with possibility to override `series_limit` value via
relabeling with `__series_limit__` label.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Update docs/CHANGELOG.md

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-23 13:09:14 +02:00
Aliaksandr Valialkin
114822d585
app/{vmstorage,vmselect}: disable vmstorage->vmselect RPC compression by default in order to improve query performance 2024-01-23 04:24:57 +02:00
Zakhar Bessarab
bf4742526d
lib/storage: print tenant ID in log when discarding or truncating labels (#5658)
Previously, it was not possible to determine which tenant sends metrics with excessive amount of labels of label values.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-23 04:24:56 +02:00
Yury Molodov
38231d5994
vmui: query report (#5497)
* vmui: add query analyzer page

* vmui: fix tabs for query analyzer

* vmui: add help to export query

* vmui: add time params to query analyzer

* docs/vmui: add query analyzer

* vmui: fix validation JSON form

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-23 04:23:26 +02:00
Yury Molodov
eb6def0695
vmui: add flag for default timezone setting (#5611)
* vmui: add flag for default timezone setting #5375

* vmui: validate timezone before client return

* Update app/vmselect/vmui.go

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-23 04:11:19 +02:00
Yury Molodov
633e6b48ad
vmui: fix cache autocomplete (#5591)
* vmui: fix the logic of closing the popper #5470

* vmui: fix the logic of caching autocomplete results #5472

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-23 04:06:14 +02:00
Dmytro Kozlov
38b2a5bc44
deployment/docker: add grafana datasource to the docker-compose files (#5363)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3920
https://github.com/VictoriaMetrics/grafana-datasource/issues/113
2024-01-22 15:45:31 +01:00
Aliaksandr Valialkin
d3ee3e0ef5
Revert "lib/promscrape: do not store last scrape response when stale markers … (#5577)"
This reverts commit cfec258803.

Reason for revert: the original code already doesn't store the last scrape response when stale markers are disabled.
The scrapeWork.areIdenticalSeries() function always returns true is stale markers are disabled.
This prevents from storing the last response at scrapeWork.processScrapedData().

It looks like the reverted commit could also return back the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3660

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5577
2024-01-22 00:43:48 +02:00
Aliaksandr Valialkin
1c7f990fad
app/vmselect: handle negative time range start in a generic manner inside NewSearchQuery()
This is a follow-up for cf03e11d89

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5553
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5630
2024-01-21 23:45:31 +02:00
Hui Wang
4e3242b02d
lib/promscrape/discovery/kubernetes: fix watcher start order for roles endpoints and endpointslice (#5557)
* lib/promscrape/discovery/kubernetes: fix watcher start order for roles endpoints and endpointslice

Previously the groupWatcher could be mistakenly stopped when requests for pod or services resources take too long.

* remove mislead comment

* docs/sd_configs.md: mention -promscrape.kubernetes.attachNodeMetadataAll flag in the description for attach_metadata section

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4640

* wip

* lib/promscrape/kubernetes: prevent from stopping groupWatcher when there are in-flight apiWatcher.mustStart() calls

groupWatcher is stopped if it has zero registered apiWatchers during 14 seconds.
But such a groupWatcher can be still in use if apiWatcher for `role: endpoints` or `role: endpointslice`
is being registered and the discovery of the associated `pod` and/or `service` objects takes longer
than 14 seconds - see the beginning of groupWatcher.startWatchersForRole() function for details.

Track the number of in-flight calls to apiWatcher.mustStart() and prevent from stopping the associated groupWatcher
if the number of in-flight calls is non-zero.

P.S. postponing the discovery of `pod` and/or `service` objects associated with `endpoints` or `endpointslice` roles
isn't the best solution, since it slows down initial discovery of `endpoints` and `endpointslice` targets.

* typo fix

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-21 23:13:15 +02:00
Aliaksandr Valialkin
1f105dde98
all: allow dynamically reading *AuthKey flag values from files and urls
Examples:

1) -metricsAuthKey=file:///abs/path/to/file - reads flag value from the given absolute filepath
2) -metricsAuthKey=file://./relative/path/to/file - reads flag value from the given relative filepath
3) -metricsAuthKey=http://some-host/some/path?query_arg=abc - reads flag value from the given url

The flag value is automatically updated when the file contents changes.
2024-01-21 22:03:38 +02:00
Nikolay
b3598ba2c1
app/vmauth: adds metric_labels and backend_errors counter (#5585)
* app/vmauth: adds metric_labels and backend_errors counter
it must improve observability for user requests with new metric - per user backend errors counter.
it's needed to calculate requests fail rate to the configured backends.
metric_labels configuration allows to perform additional aggregations on top of multiple users from configuration section.
It could be multiple clients or clients with separate read/write tokens
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5565

* wip

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-21 04:40:52 +02:00