- Sort MetricName tags only once before the benchmark loop.
- Obtain indexSearch per each benchmark loop in order to give a chance for background merge
for the recently created parts
Previously all the newly ingested time series were registered in global `MetricName -> TSID` index.
This index was used during data ingestion for locating the TSID (internal series id)
for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names).
The `MetricName -> TSID` index is stored on disk in order to make sure that the data
isn't lost on VictoriaMetrics restart or unclean shutdown.
The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding
data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache,
and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics
uses in-memory cache for speeding up the lookup for active time series.
This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested
active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk.
VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases:
- If `storage/tsid` cache capacity isn't enough for active time series.
Then just increase available memory for VictoriaMetrics or reduce the number of active time series
ingested into VictoriaMetrics.
- If new time series is ingested into VictoriaMetrics. In this case it cannot find
the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index,
since it doesn't know that the index has no the corresponding entry too.
This is a typical event under high churn rate, when old time series are constantly substituted
with new time series.
Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index,
are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics.
Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName`
for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod.
This index can become very large under high churn rate and long retention. VictoriaMetrics
caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups.
The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing
recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series.
This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly
reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics
consults only the index for the current day when new time series is ingested into it.
The downside of this change is increased indexdb size on disk for workloads without high churn rate,
e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store
identical `MetricName -> TSID` entries for static time series for every day.
This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation,
since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 .
At the same time the change fixes the issue, which could result in lost access to time series,
which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698
The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed
in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685
This is a follow-up for 1f28b46ae9
The number of parts in the snapshot partition may be zero if concurrent goroutine just
started creating new partition, but didn't put data into it yet when the current
goroutine made a snapshot.
This reverts commit 20b18e9feb.
Reason for revert: running goimports on `make check-all` introduces the following issues:
- It runs only on modified files, which weren't commited yet into git repository.
This means the formatting for the remaining files becomes different comparing to the formatting
for the changed files. This also means that the goimports has no any effect
at github actions and when the changed code is already commited to git repository.
- `gomiports` performs formatting in the same way as gofmt, so `make fmt` becomes unnecessary.
But when `gofmt` is substituted with `goimports`, then it performs unnecessary formatting for *.qtpl.go files.
It is possible to make a hack, which will prepare a list of all the *.go files at lib/ and app/
without the *.qtpl.go files, and then feed this list to `goimports`, but this looks too fragile
for the task of just fixing the ordering of Go imports.
So it is better to leave source code formatting as is with `gofmt`, while manually fixing improper ordering
of Go import from time to time in dedicated commits until better solution arises.
Add a break if gotAlert is nil
This removes the following golangci-lint warning:
app/vmalert/alerting_test.go:868:8: SA5011(related information): this check suggests that the pointer can be nil (staticcheck)
if gotAlert == nil {
^
* app/vmctl: fix panic `--remote-read-filter-time-start` flag not defined
* app/vmctl: update CHANGELOG.md
---------
Co-authored-by: Nikolay <nik@victoriametrics.com>
It could happen for low evaluation intervals and irregular
delays during execution that evaluation time would get
a negative offset. This could result into cumulative
discrepancy between the actual time and evaluation time for rules.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* make: add goimports task
Adds task to fix imports formatting implace.
Formats imports into:
- native library
- external libraries
- local packages based on github.com/VictoriaMetrics/VictoriaMetrics prefix
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* make: add goimports install task
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* make: run goimports only for changed files
Applying goimports to all existing files would create a lot of problems with cherry-picking changes between different branches used for development. To avoid this it was decided to only run goimports on changed files to fix formatting gradually.
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* make: update goimports to run on all changed files
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
- Clarify docs about -replicationFactor command-line flag at vmselect
- Clarify description for -replicationFactor and -search.skipSlowReplicas command-line flags
- Fix the logic for returning responses if -search.skipSlowReplicas command-line flag
is enabled. The logic was broken in the 173ccf4333,
so it could return responses only if some of vmstorage nodes return error,
while it should return when query results are successfully collected from more than
(len(storageNodes) - replicationFactor) vmstorage nodes.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1207
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/711
* vmselect: introduce `search.skipSlowReplicas` cmd-line flag
vmselect has two logical conditions during request processing when
`-replicationFactor` cmd-line flag is set:
1. If at least `len(storageNodes) - replicationFactor` responded, it could skip
waiting for the rest of nodes to respond. This could lead to problems described
here https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1207.
2. Mark response as partial if less than `len(storageNodes) - replicationFactor` responded
without an error.
The P1 showed itself error-prone and became the main reason why
`-replicationFactor` wasn't recommended to use at vmselect level.
However, this optimization could be still very useful in situations
when there are slow and fast replicas in cluster.
But P2 remains viable and important conditionless.
Hiding P1 behind the feature-flag `search.skipSlowReplicas`
should make `-replicationFactor` flag usable again. And let users
choose whether they want P1 to be respected.
Related issues
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1207https://github.com/VictoriaMetrics/VictoriaMetrics/issues/711
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* docs: update changelog
Signed-off-by: hagen1778 <roman@victoriametrics.com>
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
* docs: make `httpAuth.*` flags description less ambiguous
Currently, it may confuse users whether `httpAuth.*` flags are used by HTTP client or server configuration(see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4586 for example).
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
* docs: fix a typo
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
---------
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
vmalert: allow disabling of `step` param attached to instant queries
This might be useful for using vmalert with datasources that to not support this param,
unlike VictoriaMetrics.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4573
Signed-off-by: hagen1778 <roman@victoriametrics.com>