Commit graph

159 commits

Author SHA1 Message Date
Aliaksandr Valialkin
374beb350e
app/vmselect: optimize /api/v1/labels and /api/v1/label/.../values handlers when match[] query arg is passed to them 2022-06-12 04:32:13 +03:00
Aliaksandr Valialkin
2bcb960f17
all: improve query tracing coverage for indexdb search
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1403
2022-06-09 20:07:07 +03:00
Aliaksandr Valialkin
12ac255dae
lib/querytracer: make it easier to use by passing trace context message to New and NewChild
The context message can be extended by calling Donef.
If there is no need to extend the message, then just call Done.
2022-06-08 21:06:52 +03:00
Aliaksandr Valialkin
ea06d2fd3c
lib/storage: stop background merge when storage enters read-only mode
This should prevent from `no space left on device` errors when VictoriaMetrics
under-estimates the additional disk space needed for background merge.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2603
2022-06-01 14:36:45 +03:00
Roman Khavronenko
642eb1c534
lib/storage: make indexdb/tagFilters cache size configurable (#2667)
The default size of `indexdb/tagFilters` now can be overridden via
`storage.cacheSizeIndexDBTagFilters` flag.
Please, be careful with changing default size since it may
lead to inefficient work of the vmstorage or OOM exceptions.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2663
Signed-off-by: hagen1778 <roman@victoriametrics.com>

Co-authored-by: Nikolay <nik@victoriametrics.com>
2022-06-01 10:07:53 +02:00
Aliaksandr Valialkin
41958ed5dd
all: add initial support for query tracing
See https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#query-tracing

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1403
2022-06-01 02:29:23 +03:00
Aliaksandr Valialkin
f6d11a49aa
lib/storage: add ability to change the indexdb rotation time offset with -retentionTimezoneOffset command-line flag
This is a follow-up for 0fbf59199a

See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/2574
2022-05-25 16:05:29 +03:00
阳明
0fbf59199a
lib/storage: Remove the effect of time zone on next retention period (#2568) (#2574) 2022-05-25 15:08:24 +03:00
Dmytro Kozlov
7dd9f3b98e
{vmbackup, vmbackup/snapshot}: fixed problem with snapshot backup in another snapshot folder (#2535)
* {vmbackup, vmbackup/snapshot}: validate snapshot name

* vmbackup/snapshot: added another checks

* backup/actions: added check that we ignore backup_complete.ignore file

* vmbackup: moved snapshot to lib directory

* lib/snapshot: added functions description

* lib/snapshot: fixed typo

* vmbackup: code cleanup

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-05-04 22:12:03 +03:00
Artem Navoiev
37cf509c3a
lib/{storage,flagutil} - Add option for snapshot autoremoval (#2487)
* lib/{storage,flagutil} - Add option for snapshot autoremoval

- add prometheus-like duration as command flag
- add option to delete stale snapshots
- update duration.go flag to re-use own code

* wip

* lib/flagutil: re-use Duration.Set() call in NewDuration

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-05-02 11:00:15 +03:00
Aliaksandr Valialkin
50cf74ce4b
lib/storage: reuse sync.WaitGroup objects
This reduces GC load by up to 10% according to memory profiling
2022-04-06 13:34:04 +03:00
Aliaksandr Valialkin
6e364e19ef
app/vmselect: add fine-grained limits for the number of returned/scanned time series for various APIs 2022-03-26 11:29:49 +02:00
Aliaksandr Valialkin
2ae3a9a8a3
lib/storage: reduce the interval for checking for free disk space from 30 seconds to 1 second
This should reduce the probability of out of disk space panics when -storage.minFreeDiskSpaceBytes is set to low values.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2305
2022-03-18 16:52:27 +02:00
Aliaksandr Valialkin
3eef1ddc7d
lib/storage: trashing -> thrashing typo in docs
This is a follow-up for 918ed5cb32
2022-03-16 13:05:26 +02:00
Aliaksandr Valialkin
62b46007c5
lib/workingsetcache: reduce the default cache rotation period from hour to 20 minutes
This should reduce memory usage under high time series churn rate
2022-02-23 13:41:45 +02:00
Roman Khavronenko
b6ed9afd6d
lib: allow to configure cache size by type (#2206)
* lib: allow to configure cache size by type

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1940
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Apply suggestions from code review

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-21 13:50:34 +02:00
Aliaksandr Valialkin
6ff71474a6
lib/storage: document why tsid cache is reset before saving it to disk
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2205
2022-02-16 18:37:56 +02:00
Aliaksandr Valialkin
96dce63dbd
lib/storage: tune the logic for pre-populating of the per-day inverted index for the next day
- Postpone the pre-poulation to the last hour of the current day. This should reduce the number
  of useless entries in the next per-day index, which shouldn't be created there,
  when the corresponding time series are stopped to be pushed during the current day.

- Make the pre-population more smooth in time by using the hash of MetricID instead of MetricID itself
  when calculating the need for for the given MetricID pre-population.

- Sync the logic for pre-population of the next day inverted index with the logic of pre-populating tsid cache
  after indexdb rotation. This should improve code maintainability.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/430
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401
2022-02-12 16:33:16 +02:00
Roman Khavronenko
cf1a8bce6b
lib/index: reduce read/write load after indexDB rotation (#2177)
* lib/index: reduce read/write load after indexDB rotation

IndexDB in VM is responsible for storing TSID - ID's used for identifying
time series. The index is stored on disk and used by both ingestion and read path.

IndexDB is stored separately to data parts and is global for all stored data.
It can't be deleted partially as VM deletes data parts. Instead, indexDB is
rotated once in `retention` interval.

The rotation procedure means that `current` indexDB becomes `previous`,
and new freshly created indexDB struct becomes `current`. So in any time,
VM holds indexDB for current and previous retention periods.
When time series is ingested or queried, VM checks if its TSID is present
in `current` indexDB. If it is missing, it checks the `previous` indexDB.
If TSID was found, it gets copied to the `current` indexDB. In this way
`current` indexDB stores only series which were active during the retention
period.

To improve indexDB lookups, VM uses a cache layer called `tsidCache`. Both
write and read path consult `tsidCache` and on miss the relad lookup happens.

When rotation happens, VM resets the `tsidCache`. This is needed for ingestion
path to trigger `current` indexDB re-population. Since index re-population
requires additional resources, every index rotation event may cause some extra
load on CPU and disk. While it may be unnoticeable for most of the cases,
for systems with very high number of unique series each rotation may lead
to performance degradation for some period of time.

This PR makes an attempt to smooth out resource usage after the rotation.
The changes are following:
1. `tsidCache` is no longer reset after the rotation;
2. Instead, each entry in `tsidCache` gains a notion of indexDB to which
they belong;
3. On ingestion path after the rotation we check if requested TSID was
found in `tsidCache`. Then we have 3 branches:
3.1 Fast path. It was found, and belongs to the `current` indexDB. Return TSID.
3.2 Slow path. It wasn't found, so we generate it from scratch,
add to `current` indexDB, add it to `tsidCache`.
3.3 Smooth path. It was found but does not belong to the `current` indexDB.
In this case, we add it to the `current` indexDB with some probability.
The probability is based on time passed since the last rotation with some threshold.
The more time has passed since rotation the higher is chance to re-populate `current` indexDB.
The default re-population interval in this PR is set to `1h`, during which entries from
`previous` index supposed to slowly re-populate `current` index.

The new metric `vm_timeseries_repopulated_total` was added to identify how many TSIDs
were moved from `previous` indexDB to the `current` indexDB. This metric supposed to
grow only during the first `1h` after the last rotation.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-02-12 00:30:08 +02:00
Aliaksandr Valialkin
5f84b17ed6
lib/storage: properly limit cardinality when ingesting multiple samples for the same time series in a single request 2022-01-21 12:38:09 +02:00
Nikolay
8ff7da7202
adds restore.lock (#1988)
* adds restore.lock
it must prevent from running storage after incomplete restore process
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1958

* return back flock file deletion

* Apply suggestions from code review

* wip

* docs/CHANGELOG.md: document https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1958

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2021-12-22 13:10:15 +02:00
Aliaksandr Valialkin
ce333f28d8
all: use logger.WithThrottler() where appropriate 2021-12-21 17:03:25 +02:00
Aliaksandr Valialkin
e1a715b0f5
lib/storage: convert alternate regexps into Graphite wildcards inside __graphite__ pseudo-label
For example, `{__graphite__=~"foo.(bar|baz)"}` is automatically converted to `{__graphite__=~"foo.{bar,baz}"}` before execution.
This allows using multi-value Grafana template variables such as `{__graphite__=~"foo.($app)"}`.
2021-12-14 19:51:49 +02:00
Aliaksandr Valialkin
7275ebf91a
app/vmstorage: export vm_cache_size_max_bytes metrics for determining capacity of various caches
The vm_cache_size_max_bytes metric can be used for determining caches which reach their capacity via the following query:

   vm_cache_size_bytes / vm_cache_size_max_bytes > 0.9
2021-12-02 10:30:43 +02:00
Aliaksandr Valialkin
53bb58ed2a
lib/storage: log a warning when the -storageDataPath has less than -storage.minFreeDiskSpaceBytes
This should improve the debuggability of the readonly feature.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1727
2021-10-19 23:59:13 +03:00
Aliaksandr Valialkin
001750c239
lib/storage: fix unaligned access on 32-bit architectures.
The bug has been introduced at a171916ef5
2021-10-08 19:43:03 +03:00
Aliaksandr Valialkin
cf5cbd1c70
app/{vminsert,vmstorage}: follow-up after a171916ef5
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/269
2021-10-08 14:35:49 +03:00
Nikolay
4290b46e8c
Adds read-only mode for vmstorage node (#1680)
* adds read-only mode for vmstorage
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/269

* changes order a bit

* moves isFreeDiskLimitReached var to storage struct
renames functions to be consistent
change protoparser api - with optional storage limit check for given openned storage

* renames freeSpaceLimit to ReadOnly
2021-10-08 14:35:48 +03:00
Aliaksandr Valialkin
f77dde837a lib/promscrape: add the ability to limit the number of unique series per each scrape target
The number of series per target can be limited with the following options:

* Global limit with `-promscrape.maxSeriesPerTarget` command-line option.
* Per-target limit with `max_series: N` option in `scrape_config` section.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1561
2021-09-01 16:03:59 +03:00
Aliaksandr Valialkin
4401464c22 all: add support for Prometheus staleness markers
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1526
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/748
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1509
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1530
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/845
2021-08-13 12:10:17 +03:00
Aliaksandr Valialkin
682662b2ae lib/storage: remove cache directory if it contains reset_cache_on_startup file
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1447
2021-07-13 17:58:51 +03:00
Aliaksandr Valialkin
4f80b2f230 lib/storage: properly limit the size of storage/date_metricID cache 2021-07-12 14:25:44 +03:00
Aliaksandr Valialkin
8b262d4ba7 lib/storage: periodically reset prefetchedMetricIDs cache in order to limit its size under high churn rate 2021-07-07 10:58:51 +03:00
Aliaksandr Valialkin
d0c830039d lib/storage: tune cache sizes according to production workload 2021-07-05 15:16:11 +03:00
Aliaksandr Valialkin
84fb59b0ba lib/storage: move deletedMetricIDs set from indexDB to Storage
This makes consitent the list of deleted metricIDs when it is used from both the current indexDB and the previous indexDB (aka extDB).
This should fix the issue, which could lead to storing new samples under deleted metricIDs after indexDB rotation.
See more details at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1347#issuecomment-861232136 .

Thanks to @tangqipengleoo for the initial analysis and the pull request - https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1383 .

This commit resolves the issue in more generic way compared to https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1383 .

The downside of the commit is the deletedMetricIDs set isn't cleaned from the metricIDs outside the retention. It needs app restart.
This should be OK in most cases.
2021-06-15 15:04:30 +03:00
Aliaksandr Valialkin
c4f3fbfa5d lib/storage: reset cache on disk during series deletion and during indexdb rotation
This should prevent from inconsistent behavior (aka partially missing data for some time series) after unclean shutdown.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1347
2021-06-11 12:42:28 +03:00
Aliaksandr Valialkin
2d8bd41f8a lib/storage: reduce memory allocations when syncing dateMetricIDCache 2021-06-03 16:20:42 +03:00
Aliaksandr Valialkin
39ef1e7a51 lib/storage: do not stop data ingestion on the first error in Storage.AddRows
Continue data ingestion for the rest of blocks.
2021-05-24 15:32:47 +03:00
Aliaksandr Valialkin
4b01c9fb2e lib/storage: limit the number of rows per each block in Storage.AddRows()
This should reduce memory usage when ingesting big blocks or rows.
2021-05-24 15:24:07 +03:00
Aliaksandr Valialkin
f54133b200 lib/storage: do not populate MetricID->MetricName cache during data ingestion
This cache isn't needed during data ingestion, so there is no need in spending RAM on it.

This reduces RAM usage on data ingestion path by 30%
2021-05-24 03:02:46 +03:00
Aliaksandr Valialkin
ad73f226ff app/vmstorage: add ability to limit series cardinality via -storage.maxHourlySeries and -storage.maxDailySeries command-line flags 2021-05-20 14:15:19 +03:00
Aliaksandr Valialkin
d7be2753c0 lib/storage: substitute GetTSDBStatusForDate with GetTSDBStatusWithFiltersForDate with nil tfss 2021-05-13 09:02:33 +03:00
Aliaksandr Valialkin
832651c6c2 app/vmselect: follow up after 8a0678678b
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1168
2021-05-12 17:18:30 +03:00
Nikolay
8a0678678b
Adds tsdb match filters (#1282)
* init work on filters

* init propose for status filters

* fixes tsdb status
adds test

* fix bug

* removes checks from test
2021-05-12 15:18:45 +03:00
Aliaksandr Valialkin
12d733dd5d app/vminsert: add support for data ingestion via other vminsert nodes 2021-05-08 19:52:57 +03:00
Aliaksandr Valialkin
dc9eafcd02 app/{vminsert,vmagent}: add -sortLabels command-line option for sorting time series labels before ingesting them in the storage
This option can be useful when samples for the same time series are ingested with distinct order of labels.
For example, metric{k1="v1",k2="v2"} and metric{k2="v2",k1="v1"}.
2021-03-31 23:27:58 +03:00
Aliaksandr Valialkin
e1f699bb6c lib/storage: reduce memory usage when ingesting samples for the same time series with distinct order of labels 2021-03-31 21:24:46 +03:00
Aliaksandr Valialkin
aa81039b42 app/vmselect: log the metric which trigger rollup result cache reset
This should help finding the source of stale metrics
2021-03-25 21:31:39 +02:00
Aliaksandr Valialkin
3cfb3a3683 lib/storage: respect the deadline passed to Storage.SearchMetricNames 2021-03-22 23:03:17 +02:00
Aliaksandr Valialkin
8e2afdf568 lib/storage: improve Search.NextMetricBlock performance by using MetricID->MetricName cache 2021-03-22 22:49:18 +02:00