Commit graph

2459 commits

Author SHA1 Message Date
Aliaksandr Valialkin
6f5bbf096a
app/vmagent/remotewrite: cosmetic updates after f3a51e8b1d
- Compare directory names instead of paths to directory when determining which persistent queues must be deleted
  This is less error-prone solution, since paths to the same directory can differ, which could lead
  to accidental directory removal for the existing -remoteWrite.url

- Log the `removed %d dangling queues` message when at least a single queue has been removed

- Consistently use filepath.Join() for creating paths to persistent queues.
  This is needed for Windows support (see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70 )

- Clarify the description of the change at docs/CHANGELOG.md

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4014
2023-03-27 18:38:53 -07:00
Zakhar Bessarab
6ed6eb0c4c
app/vmagent: add -remoteWrite.removeDanglingQueues flag (#4017)
* app/vmagent: add `-remoteWrite.removeDanglingQueues` flag which allows to automatically remove dangling persistent queue contents

Related issue: #4014

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* app/vmagent: address review feedback

- remove persistent queues files by default
- rename `remoteWrite.removeDanglingQueues` to `remoteWrite.keepDanglingQueues`
- update docs to reflect changed behaviour

Related issue: #4014

* Apply suggestions from code review

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-03-27 18:38:51 -07:00
Nikolay
b38a145cfd
app/vmselect: properly remove temp files at windows system (#4020)
With non-posix compliant systems it's not possible to remove unclosed files.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70
2023-03-27 18:10:44 -07:00
Aliaksandr Valialkin
54b9537a76
app/vmselect/promql: follow-up for 79e1c6a6fc
- Document the fix at docs/CHANGELOG.md
- Add tests with multiple adjancent zero buckets
- Simplify the fix a bit

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/296
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4021
2023-03-27 18:04:30 -07:00
Ze'ev Klapow
680a661ec0
fix le buckets when adjacent vmrange is empty (#4021)
There is a bug here where if you have a single bucket like:

foo{vmrange="4.084e+02...4.642e+02"} 2 123

The expected output is three le encoded buckets like:

foo{le="4.084e+02"} 0 123
foo{le="4.642e+02"} 2 123
foo{le="+Inf"} 2 123

This correctly encodes the start and end of the vmrange.
If however, the input contains the previous bucket, and that bucket is
empty then you only get the end le and +Inf out currently, i.e:

foo{vmrange="7.743e+05...8.799e+05"} 5 123
foo{vmrange="6.813e+05...7.743e+05"} 0 123

results in:

foo{le="8.799e+05"} 5 123
foo{le="+Inf"} 5 123

This causes issues when you go to compute a quantile because this means
that the assumed lower bound of the buckets is 0 and this we interpolate
between 0->end rather than the vmrange start->end as expected.
2023-03-27 18:04:29 -07:00
Aliaksandr Valialkin
9387793f47
app/vmselect: follow-up for 10ab086366
- Expose stats.seriesFetched at `/api/v1/query_range` responses too
  for the sake of consistency.

- Initialize QueryStats when it is needed and pass it to EvalConfig then.
  This guarantees that the QueryStats is properly collected when the query
  contains some subqueries.
2023-03-27 15:11:42 -07:00
Roman Khavronenko
10ab086366
app/vmselect: export seriesFetched stat for /query responses (#3925)
The change adds a new field `seriesFetched` to EvalConfig object.
Since EvalConfig object can be copied inside `Exec`,
`seriesFetched` is a pointer which can be updated by all copied
objects.

The reason for having stats is that other components, like vmalert,
could benefit from this information.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-03-27 08:51:33 -07:00
Yury Molodov
86a98fa131
vmui: heatmap (#3780)
* fix: add stroke and font for all axes

* feat: add util for generate gradient

* feat: add heatmap plugin

* feat: add heatmap legend

* feat: add heatmap graph (#3384)

* vmui: add heatmap graph (#3384)

* feat: add convert Prometheus to VictoriaMetrics histogram

* fix: prevent re-render graph

* feat: reset step for heatmap

* feat: normalize heatmap data

* fix: format heatmap legend

* wip

* app/vmselect/vmui: run `make vmui-update`

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-03-26 00:31:21 -07:00
Aliaksandr Valialkin
db3bcbe56a
app/vmselect/netstorage: reduce the contention at fs.ReaderAt stats collection on systems with big number of CPU cores
This optimization is based on the profile provided at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966#issuecomment-1483208419
2023-03-25 16:38:39 -07:00
Aliaksandr Valialkin
a2ecf4fa4a
app/vmselect/netstorage: document why runtime.Gosched() is removed at 28f054bb00
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966
2023-03-25 16:38:28 -07:00
Zakhar Bessarab
16f3b279a2
vmselect/netstorage: remove direct calls to Gosched to reduce amount of locks for global scope
using `runtime.Gosched` requires acquiring global lock to check if there are any other goroutines to perform tasks. with the latest versions of runtime it can pause running goroutines automatically without requiring to call `Gosched` directly.

Updates #3966

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-03-25 16:37:58 -07:00
Aliaksandr Valialkin
3698994953
app/{vmbackup,vmrestore}: publish vmbackup and vmrestore binaries for Windows
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70
2023-03-25 15:09:41 -07:00
Aliaksandr Valialkin
740fa57fdc
app/vmselect/promql: typo fix after e7f46a0aab 2023-03-24 23:47:11 -07:00
Aliaksandr Valialkin
7aff6f872f
app/vmselect/promql: follow-up for 7205c79c5a
- Allocate and initialize seriesByWorkerID slice in a single go instead
  of initializing every item in the list separately.
  This should reduce CPU usage a bit.
- Properly set anti-false sharing padding at timeseriesWithPadding structure
- Document the change at docs/CHANGELOG.md

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966
2023-03-24 23:39:43 -07:00
Zakhar Bessarab
fec87e3ada
app/vmselect/promql: use lock-less approach to gather results of parallel processing for evalRollup* funcs (#4004)
* vmselect/promql: refactor `evalRollupNoIncrementalAggregate` to use lock-less approach for parallel workers computation

Locking there is causing issues when running on highly multi-core system as it introduces lock contention during results merge.

New implementation uses lock less approach to store results per workerID and merges final result in the end, this is expected to significantly reduce lock contention and CPU usage for systems with high number of cores.

Related: #3966
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* vmselect/promql: add pooling for `timeseriesWithPadding` to reduce allocations

Related: #3966
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* vmselect/promql: refactor `evalRollupFuncWithSubquery` to avoid using locks

Uses same approach as `evalRollupNoIncrementalAggregate` to remove locking between workers and reduce lock contention.

Related: #3966
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-03-24 23:39:41 -07:00
Aliaksandr Valialkin
5c39b19acd
app/vmbackup: simplify code a bit after 5ba347bd2c
Unconditionally call deleteSnapshot() func just after making the snapshot, either successful or unsuccessful

Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2055
2023-03-24 22:09:19 -07:00
Zakhar Bessarab
3f38ed3171
app/vmbackup: delete created snapshot in case of error during backup (#4008)
Related issue: #2055

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-03-24 22:09:17 -07:00
Aliaksandr Valialkin
b9632023c4
app/vmselect/vmui: run make vmui-update after dc2c712a29 2023-03-24 18:08:51 -07:00
Aliaksandr Valialkin
c54b8acba2
docs/vmauth.md: follow-up for 36edba9bfb
- Document `-configCheckInterval` command-line flag in `quick start` section
- Clarify the addition of `-configCheckInterval` at docs/CHANGELOG.md

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3990
2023-03-24 17:56:59 -07:00
Aliaksandr Valialkin
9d19c2d89b
docs/vmagent.md: clarify that there is no need to specify multiple -remoteWrite.url options when writing data to a single VictoriaMetrics cluster when data replication is needed
Also add a link to https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format from `getting started` section,
so users could quickly find how to write data to VictoriaMetrics cluster
2023-03-24 17:56:31 -07:00
Roman Khavronenko
a09dabc78f
vmalert: add anchor char to Group's link (#4006)
This should help users to see that Group's name is clickable
and used for anchoring.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-24 17:56:04 -07:00
Roman Khavronenko
ec6a20880c
vmalert: mention VMUI example for alert's source (#4005)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-24 17:55:30 -07:00
Dmytro Kozlov
352dbd7e08
app/vmui: update cardinality page (#3986)
vmui: update cardinality page

---------

Co-authored-by: Yury Moladau <yurymolodov@gmail.com>
2023-03-24 13:34:01 -07:00
Yury Molodov
5f77efa915
vmui: display errors for each query individually (#3987) (#3994) 2023-03-24 13:26:43 -07:00
Alexander Marshalov
b5027cff9c
added configCheckInterval flag for vmauth (#3990) (#3991)
* added configCheckInterval flag for vmauth (#3990)
Signed-off-by: Alexander Marshalov <_@marshalov.org>
2023-03-24 13:25:07 -07:00
Dmytro Kozlov
4ba237ec14
app/vmctl: follow up after aed59b9029 (#3983) 2023-03-21 09:26:26 -07:00
Aliaksandr Valialkin
8ed9295109
docs/vmagent.md: mention in docs that the target relabel debug page shows target url now 2023-03-20 22:20:13 -07:00
Aliaksandr Valialkin
79d8f0e7c6
app/vmselect/promql: pass workerID to the callback inside doParallel()
This opens the possibility to remove tssLock from evalRollupFuncWithSubquery()
in the follow-up commit from @zekker6 in order to speed up the code
for systems with many CPU cores.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966
2023-03-20 20:57:34 -07:00
Aliaksandr Valialkin
e749a015a9
app/vmselect/promql: fix TestIncrementalAggr test on systems less than 3 CPU cores
This is a follow-up for 4856a4cf5a
2023-03-20 20:37:44 -07:00
Aliaksandr Valialkin
08da383eac
app/vmselect/netstorage: reduce the number of calls to runtime.Gosched() at timeseriesWorker() and unpackWorker()
Call runtime.Gosched() only when there is a work to steal from other workers.
Simplify the timeseriesWorker() and unpackWroker() code a bit by inlining stealTimeseriesWork() and stealUnpackWork().

This should reduce CPU usage when processing queries on systems with big number of CPU cores.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966
2023-03-20 20:32:56 -07:00
Aliaksandr Valialkin
18af01c387
app/vmselect: optimize incremental aggregates a bit
Substitute sync.Map with an ordinary slice indexed by workerID.
This should reduce the overhead when updating the incremental aggregate state
2023-03-20 15:42:13 -07:00
Aliaksandr Valialkin
7a1e2f49cc
app/vmselect/vmui: make vmui-update after d4525bd2d0 2023-03-20 14:35:17 -07:00
Roman Khavronenko
6a7de761f4
vmalert: support logs suppressing during config reloads (#3973)
* vmalert: support logs suppressing during config reloads

The change is mostly required for ENT version of vmalert,
since it supports object-storage for config files.
Reading data from object storage could be time-consuming,
so vmalert emits logs to track the progress.

However, these logs are mostly needed on start or on
manual config reload. Printing these logs each time
`rule.configCheckInterval` is triggered would too verbose.
So the change allows to control logs emitting during
config reloads.

Now, logs are emitted during start up or when SIGHUP is receieved.
For periodicall config checks logs emitted by config pkg are suppressed.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: review fixes

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-20 14:25:26 -07:00
Dmytro Kozlov
aed59b9029
app/vmctl: automatically check tty (#3938)
app/vmctl: automatically detect if TTY is available
2023-03-20 14:14:43 -07:00
Yury Molodov
95b60c2777
vmui: support for drag'n'drop in the "Trace analyzer" page (#3971)
vmui: add drag-and-drop support for the trace analyzer page
2023-03-20 14:09:45 -07:00
Yury Molodov
b66953d8e1
vmui: improve usability of date/time picker (#3968)
* vmui: allow manually set input date and time
* vmui/docs: improve usability of date/time picker
2023-03-20 13:57:47 -07:00
Aliaksandr Valialkin
fc3d826d7f
all: add Windows build for VictoriaMetrics
This commit changes background merge algorithm, so it becomes compatible with Windows file semantics.

The previous algorithm for background merge:

1. Merge source parts into a destination part inside tmp directory.
2. Create a file in txn directory with instructions on how to atomically
   swap source parts with the destination part.
3. Perform instructions from the file.
4. Delete the file with instructions.

This algorithm guarantees that either source parts or destination part
is visible in the partition after unclean shutdown at any step above,
since the remaining files with instructions is replayed on the next restart,
after that the remaining contents of the tmp directory is deleted.

Unfortunately this algorithm doesn't work under Windows because
it disallows removing and moving files, which are in use.

So the new algorithm for background merge has been implemented:

1. Merge source parts into a destination part inside the partition directory itself.
   E.g. now the partition directory may contain both complete and incomplete parts.
2. Atomically update the parts.json file with the new list of parts after the merge,
   e.g. remove the source parts from the list and add the destination part to the list
   before storing it to parts.json file.
3. Remove the source parts from disk when they are no longer used.

This algorithm guarantees that either source parts or destination part
is visible in the partition after unclean shutdown at any step above,
since incomplete partitions from step 1 or old source parts from step 3 are removed
on the next startup by inspecting parts.json file.

This algorithm should work under Windows, since it doesn't remove or move files in use.
This algorithm has also the following benefits:

- It should work better for NFS.
- It fits object storage semantics.

The new algorithm changes data storage format, so it is impossible to downgrade
to the previous versions of VictoriaMetrics after upgrading to this algorithm.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3236
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3821
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70
2023-03-19 23:28:26 -07:00
Aliaksandr Valialkin
aeeab74388
app/vmctl: drop integration tests from cluster branch, since they expect single-node VictoriaMetrics
This is a follow-up for 235477628e
2023-03-17 16:43:58 -07:00
Roman Khavronenko
0ac57ef5b9
Vmalert tests (#3975)
* vmalert: add tests for notifier pkg

* vmalert: add tests for remotewrite pkg

* vmalert: add tests for template functions

* vmalert: add tests for web pages

* vmalert: fix int overflow in tests

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-17 16:16:13 -07:00
oliverpool
8c708ca1e9
app/vmselect/promql: add test to ensure 8-byte alignment (#3948)
See 0af9e2b693
2023-03-16 22:07:13 -07:00
Dmytro Kozlov
235477628e
app/vmctl: integration test for native protocol (#3947)
* app/vmctl: integration test for native protocol

* app/vmctl: implemented two integration tests

* app/vmctl: cleanup

* app/vmctl: split storage init and filling data logic

* app/vmctl: cleanup

* app/vmctl: remove storage from server, used initialization process

* app/vmctl: prepare for parallel run, code cleanup

* app/vmctl: code cleanup

* app/vmctl: remove unused field
2023-03-14 16:08:40 -07:00
Aliaksandr Valialkin
3b4a3583bc
app/vmselect/promql: prevent from cannot unmarshal timeseries from rollupResultCache panic after the upgrade to v1.89.0 2023-03-12 19:09:11 -07:00
Aliaksandr Valialkin
cf7d8811f6
app/vmselect/vmui: make vmui-update after 00a0816ab1 2023-03-12 17:22:28 -07:00
Yury Molodov
aebc441251
vmui: predefined dashboards docs (#3895)
* fix: correct display predefined panels

* docs: update the documentation for predefined dashboards
2023-03-12 17:22:27 -07:00
Aliaksandr Valialkin
a6a4beb89a
app/vmselect: remove data race on updating EvalConfig.IsPartialResponse from concurrently running goroutines
This properly returns `is_partial: true` for partial responses.
2023-03-12 16:53:03 -07:00
Aliaksandr Valialkin
5cd60c54d3
app/vmselect/promql: prevent from SIGBUS crash on architecures, which deny unaligned access to 8-byte words (e.g. ARM)
Thanks to @oliverpool for nailing down the root cause of the issue and for the initial attempt to fix it
at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3927
2023-03-12 16:29:18 -07:00
Yury Molodov
2a1bc14984
vmui: remove send step param for instant queries (#3931)
* fix: remove step param for instant queries (#3896)

* vmui: remove send step param for instant queries

* Update docs/CHANGELOG.md

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-03-12 03:14:05 -07:00
Aliaksandr Valialkin
e491fee1f4
app/vmselect/netstorage: do not intern string representation of MetricName for time series received from vmstorage
It has been appeared that this interning may lead to increased memory usage and increased CPU usage
when vmselect performs queries, which select big number of time series.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3692
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3863
2023-03-12 00:44:08 -08:00
Aliaksandr Valialkin
094fb31089
app/vmctl/README.md: remove trailing space from the line added at 4c3bc04efa 2023-03-12 00:28:00 -08:00
Zakhar Bessarab
3b7152b1d8
docs: add a note about cache reset for vmalert backfilling docs (#3940)
docs: add a note about cache reset for vmalert backfilling docs
2023-03-12 00:13:00 -08:00