Commit graph

2759 commits

Author SHA1 Message Date
Aliaksandr Valialkin
e1cf962bad
lib/storage: switch from global to per-day index for MetricName -> TSID mapping
Previously all the newly ingested time series were registered in global `MetricName -> TSID` index.
This index was used during data ingestion for locating the TSID (internal series id)
for the given canonical metric name (the canonical metric name consists of metric name plus all its labels sorted by label names).

The `MetricName -> TSID` index is stored on disk in order to make sure that the data
isn't lost on VictoriaMetrics restart or unclean shutdown.

The lookup in this index is relatively slow, since VictoriaMetrics needs to read the corresponding
data block from disk, unpack it, put the unpacked block into `indexdb/dataBlocks` cache,
and then search for the given `MetricName -> TSID` entry there. So VictoriaMetrics
uses in-memory cache for speeding up the lookup for active time series.
This cache is named `storage/tsid`. If this cache capacity is enough for all the currently ingested
active time series, then VictoriaMetrics works fast, since it doesn't need to read the data from disk.

VictoriaMetrics starts reading data from `MetricName -> TSID` on-disk index in the following cases:

- If `storage/tsid` cache capacity isn't enough for active time series.
  Then just increase available memory for VictoriaMetrics or reduce the number of active time series
  ingested into VictoriaMetrics.

- If new time series is ingested into VictoriaMetrics. In this case it cannot find
  the needed entry in the `storage/tsid` cache, so it needs to consult on-disk `MetricName -> TSID` index,
  since it doesn't know that the index has no the corresponding entry too.
  This is a typical event under high churn rate, when old time series are constantly substituted
  with new time series.

Reading the data from `MetricName -> TSID` index is slow, so inserts, which lead to reading this index,
are counted as slow inserts, and they can be monitored via `vm_slow_row_inserts_total` metric exposed by VictoriaMetrics.

Prior to this commit the `MetricName -> TSID` index was global, e.g. it contained entries sorted by `MetricName`
for all the time series ever ingested into VictoriaMetrics during the configured -retentionPeriod.
This index can become very large under high churn rate and long retention. VictoriaMetrics
caches data from this index in `indexdb/dataBlocks` in-memory cache for speeding up index lookups.
The `indexdb/dataBlocks` cache may occupy significant share of available memory for storing
recently accessed blocks at `MetricName -> TSID` index when searching for newly ingested time series.

This commit switches from global `MetricName -> TSID` index to per-day index. This allows significantly
reducing the amounts of data, which needs to be cached in `indexdb/dataBlocks`, since now VictoriaMetrics
consults only the index for the current day when new time series is ingested into it.

The downside of this change is increased indexdb size on disk for workloads without high churn rate,
e.g. with static time series, which do no change over time, since now VictoriaMetrics needs to store
identical `MetricName -> TSID` entries for static time series for every day.

This change removes an optimization for reducing CPU and disk IO spikes at indexdb rotation,
since it didn't work correctly - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 .

At the same time the change fixes the issue, which could result in lost access to time series,
which stop receving new samples during the first hour after indexdb rotation - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698

The issue with the increased CPU and disk IO usage during indexdb rotation will be addressed
in a separate commit according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401#issuecomment-1553488685

This is a follow-up for 1f28b46ae9
2023-07-13 17:03:50 -07:00
Aliaksandr Valialkin
df67b78f75
docs/CHANGELOG.md: clarify the description of the bugfix at 177a0c1ca9
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4555
2023-07-13 12:19:00 -07:00
Dmytro Kozlov
f31ac064f9
app/vmctl: fix panic --remote-read-filter-time-start flag not defined (#4605)
* app/vmctl: fix panic `--remote-read-filter-time-start` flag not defined

* app/vmctl: update CHANGELOG.md

---------

Co-authored-by: Nikolay <nik@victoriametrics.com>
2023-07-13 12:13:21 -07:00
Dmytro Kozlov
555a0a9d57
app/vmctl: fix issue with adding many seconds (#4617)
* app/vmctl: fix issue with adding many seconds

* app/vmagent: add CHANGELOG.md
2023-07-13 12:09:54 -07:00
Roman Khavronenko
fdccb56620
vmalert: check for negative offset for missed rounds (#4628)
It could happen for low evaluation intervals and irregular
delays during execution that evaluation time would get
a negative offset. This could result into cumulative
discrepancy between the actual time and evaluation time for rules.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-13 12:05:52 -07:00
Alexander Marshalov
09cc17a6b1
fixed typo (#4622)
Signed-off-by: Alexander Marshalov <_@marshalov.org>
2023-07-13 11:29:57 -07:00
Alexander Marshalov
7a5e5f6a89
add info about using stream aggregation as statsd alternative (#4600) (#4621)
Signed-off-by: Alexander Marshalov <_@marshalov.org>
2023-07-13 11:29:57 -07:00
Aliaksandr Valialkin
2636c35cec
docs/VictoriaLogs/FAQ.md: small fixes 2023-07-12 01:10:40 -07:00
Aliaksandr Valialkin
b07a1c85b9
all: update Go builder from 1.20.5 to 1.20.6
See https://github.com/golang/go/issues?q=milestone%3AGo1.20.6+label%3ACherryPickApproved
2023-07-12 01:00:24 -07:00
Aliaksandr Valialkin
bcea7a0a92
docs/keyConcepts.md: cosmetic fixes after b67bd156d5 2023-07-12 00:30:33 -07:00
Aliaksandr Valialkin
b99fcb7d7a
docs/VictoriaLogs: add FAQ 2023-07-12 00:30:33 -07:00
Aliaksandr Valialkin
f8299ee9d8
docs/VictoriaLogs/README.md: make it clear that VictoriaLogs is open source 2023-07-12 00:30:33 -07:00
Alexander Marshalov
8277b64ec3
follow up for #4612 and #4584 (#4614)
Signed-off-by: Alexander Marshalov <_@marshalov.org>
2023-07-12 00:30:06 -07:00
Zakhar Bessarab
a667bdaad5
doc: fix image src after b67bd156 (#4612)
Followup for b67bd156d5

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-07-12 00:29:36 -07:00
Alexander Marshalov
9e8c02520c
added info about search.latencyOffset to key concepts (#4567) (#4584)
Signed-off-by: Alexander Marshalov <_@marshalov.org>
Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-07-12 00:25:32 -07:00
Aliaksandr Valialkin
a5f55259f6
docs/VictoriaLogs: make more prominent the information about returned log fields in query responses
Thanks to @candlerb for suggestions on how to improve VictoriaLogs docs
at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4609#issuecomment-1629758426
2023-07-10 15:01:54 -07:00
Dmytro Kozlov
5c4ca4aea8
app/vmctl: remove undefined flag from the documentation. See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4552. (#4606) 2023-07-10 15:01:54 -07:00
Aliaksandr Valialkin
81fe089546
docs/Single-server-VictoriaMetrics.md: mention how to use Prometheus config file with unsupported options
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4274
2023-07-09 12:37:44 -07:00
Aliaksandr Valialkin
371036eb35
docs/VictoriaLogs: small clarifications 2023-07-09 12:36:38 -07:00
Zakhar Bessarab
ddd918b93c
docs: make httpAuth.* flags description less ambiguous (#4588)
* docs: make `httpAuth.*` flags description less ambiguous

Currently, it may confuse users whether `httpAuth.*` flags are used by HTTP client or server configuration(see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4586 for example).

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs: fix a typo

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-07-09 12:36:14 -07:00
Haleygo
ef8e3eb9b3
vmselect: fix result in Prometheus query when time is small (#4578)
vmselect: fix result in Prometheus query when time is small

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2023-07-09 12:33:29 -07:00
Aliaksandr Valialkin
e1a2404db5
app/vmselect/netstorage: follow-up after 173ccf4333
- Clarify docs about -replicationFactor command-line flag at vmselect
- Clarify description for -replicationFactor and -search.skipSlowReplicas command-line flags
- Fix the logic for returning responses if -search.skipSlowReplicas command-line flag
  is enabled. The logic was broken in the 173ccf4333,
  so it could return responses only if some of vmstorage nodes return error,
  while it should return when query results are successfully collected from more than
  (len(storageNodes) - replicationFactor) vmstorage nodes.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1207
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/711
2023-07-09 11:58:22 -07:00
Haleygo
3c2308fd52
vmalert:fix query request using rfc3339 format (#4577)
vmalert: consistently use time.RFC3339 format for time in queries

Co-authored-by: hagen1778 <roman@victoriametrics.com>
2023-07-09 11:03:10 -07:00
Roman Khavronenko
173ccf4333
vmselect: introduce search.skipSlowReplicas cmd-line flag (#4538)
* vmselect: introduce `search.skipSlowReplicas` cmd-line flag

vmselect has two logical conditions during request processing when
`-replicationFactor` cmd-line flag is set:
1. If at least `len(storageNodes) - replicationFactor` responded, it could skip
waiting for the rest of nodes to respond. This could lead to problems described
here https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1207.
2. Mark response as partial if less than `len(storageNodes) - replicationFactor` responded
without an error.

The P1 showed itself error-prone and became the main reason why
`-replicationFactor` wasn't recommended to use at vmselect level.
However, this optimization could be still very useful in situations
when there are slow and fast replicas in cluster.

But P2 remains viable and important conditionless.
Hiding P1 behind the feature-flag `search.skipSlowReplicas`
should make `-replicationFactor` flag usable again. And let users
choose whether they want P1 to be respected.

Related issues
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1207
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/711

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* docs: update changelog

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-07 11:50:26 +02:00
Artem Navoiev
bf49efc11a
update logo width in cluster doc to 300
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-07-06 23:16:42 -07:00
Roman Khavronenko
109e55f865
vmalert: allow disabling of step param attached to instant queries (#4574)
vmalert: allow disabling of `step` param attached to instant queries

This might be useful for using vmalert with datasources that to not support this param,
unlike VictoriaMetrics.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4573

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-06 23:13:56 -07:00
Aliaksandr Valialkin
0107d78639
docs/vmgateway.md: update -help output 2023-07-06 23:07:47 -07:00
Aliaksandr Valialkin
72dd0b9fac
docs/Cluster-VictoriaMetrics.md: update -help output 2023-07-06 23:06:11 -07:00
Aliaksandr Valialkin
ee4280d132
docs/vmbackupmanager.md: update -help output 2023-07-06 22:57:31 -07:00
Aliaksandr Valialkin
921d8b36b5
docs/vmrestore.md: update -help output 2023-07-06 22:55:26 -07:00
Aliaksandr Valialkin
e1993dadc2
docs/vmbackup.md: update -help output 2023-07-06 22:54:15 -07:00
Aliaksandr Valialkin
e35abdd2e4
docs/vmauth.md: update -help output 2023-07-06 22:52:48 -07:00
Aliaksandr Valialkin
316abe550d
docs/vmalert.md: update -help output 2023-07-06 22:50:47 -07:00
Aliaksandr Valialkin
b9790515e4
docs/vmagent.md: update -help output 2023-07-06 22:48:23 -07:00
Aliaksandr Valialkin
65d7194588
docs/Single-server-VictoriaMetrics.md: update -help output 2023-07-06 22:45:58 -07:00
Aliaksandr Valialkin
eea088d87f
docs/CHANGELOG.md: clarify description for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4336 bugfix
This is a follow-up for 5eb5df96e2
2023-07-06 22:42:02 -07:00
Aliaksandr Valialkin
eeb53660b8
docs/CHANGELOG.md: use the proper link to the issue related to the commit 7a92263459
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4402
2023-07-06 22:41:43 -07:00
Aliaksandr Valialkin
67a8992798
docs/CHANGELOG.md: remove redundant info from the url to consulagent_sd_configs docs
This is a follow-up for 40d12be607
2023-07-06 22:41:23 -07:00
Aliaksandr Valialkin
40f1ccba67
docs/CHANGELOG.md: clarify the description of the bugfix at ce7141383d 2023-07-06 22:41:03 -07:00
Aliaksandr Valialkin
dc89e1f644
app/vmselect/graphite: follow-up after c7884f8686
- Consistently use -search.maxGraphiteTagValues for limiting tag values from auto-complete API
- Use -search.maxGraphiteSeries for limiting paths (aka series), which can be returned from Graphite series API
- Clarify the change in docs/CHANGELOG.md

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4339
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2841
2023-07-06 22:33:30 -07:00
Alexander Marshalov
eb611c3dc3
fix removing storage data dir before restoring from backup (#598)
* fix removing storage data dir before restoring from backup

Signed-off-by: Alexander Marshalov <_@marshalov.org>

* fix review comment

Signed-off-by: Alexander Marshalov <_@marshalov.org>

* fix review comment

Signed-off-by: Alexander Marshalov <_@marshalov.org>

* fixes after merge with `enterprise-single-node` branch

Signed-off-by: Alexander Marshalov <_@marshalov.org>

---------

Signed-off-by: Alexander Marshalov <_@marshalov.org>
2023-07-06 22:32:12 -07:00
Aliaksandr Valialkin
2f19ba0f75
app/vmselect/netstorage: follow-up after 11ac551d52
- Clarify the scope of the fix at docs/CHANGELOG.md
- Handle the case when -search.maxSamplesPerSeries limit is exceeded
  in the same way as the -search.maxSamplesPerQuery limit.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4472
2023-07-06 22:26:47 -07:00
Roman Khavronenko
690f58c016
docs: explicitly mention errors processing for import APIs (#4583)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-06 22:26:04 -07:00
Denys Holius
97a5bdf4f0
docs: adds curl commands to clear the query cache (#4468)
adds curl commands to clear query cache on vmselect/VM Single
2023-07-06 22:25:21 -07:00
Aliaksandr Valialkin
85c134feec
docs/VictoriaLogs/LogsQL.md: various fixes according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4579 2023-07-06 22:24:41 -07:00
Aliaksandr Valialkin
7cf5efc5b8
README.md: add a link to VictoriaLogs 2023-07-06 22:23:54 -07:00
Aliaksandr Valialkin
5b94246d92
docs: add Roblox case study 2023-07-06 22:23:13 -07:00
Aliaksandr Valialkin
ff96e9cfc7
docs/Single-server-VictoriaMetrics.md: fix link to Storage section after the ab2d184e42 2023-07-06 22:22:42 -07:00
Roman Khavronenko
bd5abb74fd
vmctl: interrupt explore procedure in influx mode if no numeric fields were found (#4576)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-06 22:21:18 -07:00
Roman Khavronenko
9cfd8d6b86
Docs retention (#4568)
* docs: mention parts and partitions in Retention section

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-07-06 22:19:43 -07:00