Commit graph

76 commits

Author SHA1 Message Date
Aliaksandr Valialkin
582c74cd93 lib/storage: mention tag filters used in the query that led to error message
This should improve detecting invalid or heavy queries that lead to errors.
2020-08-10 13:36:49 +03:00
Aliaksandr Valialkin
f3d33e23c9 app/vmstorage: improve error logging when the request times out 2020-08-10 13:23:26 +03:00
Aliaksandr Valialkin
84fd8af6d3 lib/storage: slow down concurrent searches when the number of concurrent inserts reaches the limit
This should improve data ingestion performance when heavy searches are executed

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/618
2020-08-07 08:49:40 +03:00
Aliaksandr Valialkin
9043a509a3 lib/storage: properly check timeouts and pace limits
Previously they were checked on every iteration for small number of iterations
2020-08-07 08:40:37 +03:00
Aliaksandr Valialkin
ad730d8a17 lib/storage: optimize prefetching metric names for the given metricIDs 2020-08-06 16:53:10 +03:00
Aliaksandr Valialkin
8f16388428 lib/storage: limit the number of concurrent calls to storage.searchTSIDs to GOMAXPROCS*2
This should limit the maximum memory usage and reduce CPU trashing on vmstorage
when multiple heavy queries are executed.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648
2020-08-05 18:30:07 +03:00
Aliaksandr Valialkin
922d9aadf2 lib/storage: properly update vm_slow_row_inserts_total metric when importing multiple data points per time series at once
Previously the `vm_slow_row_inserts_total` metric may be incremented multiple times for different data points per a single time series,
while only a single increment is needed when inserting the first data point for this time series.
2020-07-30 16:17:24 +03:00
Aliaksandr Valialkin
039c9d2441 lib/storage: respect -search.maxQueryDuration when searching for time series in inverted index
Previously the time spent on inverted index search could exceed the configured `-search.maxQueryDuration`.
This commit stops searching in inverted index on query timeout.
2020-07-23 21:21:42 +03:00
Aliaksandr Valialkin
2a45871823 lib/storage: add more fine-grained pace limiting for search 2020-07-23 19:26:08 +03:00
Aliaksandr Valialkin
6f05c4d351 lib/storage: improve prioritizing of data ingestion over querying
Prioritize also small merges over big merges.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648
2020-07-23 13:23:36 +03:00
Aliaksandr Valialkin
e4303d3d21 lib/storage: prevent possible race condition when all the goroutines exit Storage.AddRows, before goroutines other goroutines are blocked on searchTSIDsCond inside Storage.searchTSIDs
This condition may occur after the following sequence of events:

1) A goroutine enters the loop body when len(addRowsConcurrencyCh) == cap(addRowsConcurrencyCh) inside Storage.searchTSIDs.
2) All the goroutines return from Storage.AddRows.
3) The goroutine from step 1 blocks on searchTSIDsCond.Wait() inside the loop body.

The goroutine remains blocked until the next call to Storage.AddRows, which calls searchTSIDsCond.Signal().
This may take indefinite time.
2020-07-22 21:52:34 +03:00
Aliaksandr Valialkin
d3442b40b2 lib/uint64set: optimize adding items to the set via Set.AddMulti 2020-07-21 20:56:59 +03:00
Aliaksandr Valialkin
e1107fec10 lib/storage: reset MetricName->TSID cache after marking metricIDs as deleted
This is a follow-up commit after 12b16077c4 ,
which didn't reset the `tsidCache` in all the required places.
This could result in indefinite errors like:

    missing metricName by metricID ...; this could be the case after unclean shutdown; deleting the metricID, so it could be re-created next time

Fix this by resetting the cache inside deleteMetricIDs function.
2020-07-14 14:06:32 +03:00
Aliaksandr Valialkin
cb92113632 lib/storage: limit the maximum concurrency for data ingestion to GOMAXPROCS
Previously the concurrency has been limited to GOMAXPROCS*2. This had little sense,
since every call to Storage.AddRows is bound to CPU, so the maximum ingestion bandwidth
is achieved when the number of concurrent calls to Storage.AddRows is limited to the number of CPUs,
i.e. to GOMAXPROCS.
2020-07-08 17:32:18 +03:00
Aliaksandr Valialkin
32b9fb58b8 lib/storage: clarify out of retention period error message by mentioning -retentionPeriod command-line flag 2020-07-08 13:54:26 +03:00
Aliaksandr Valialkin
12b16077c4 lib/storage: reset MetricName->TSID cache after deleting time series
This should prevent from adding new data points to deleted time series
without the need to check for the deleted time series.

This improves ingestion performance a bit when the `deleted time series ids` aka `dmis` set
contains big number of time series.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/596

Based on the idea from @n4mine at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/604
2020-07-06 22:01:08 +03:00
Aliaksandr Valialkin
6daa5f7500 lib/storage: prioritize data ingestion over heavy queries
Heavy queries could result in the lack of CPU resources for processing the current data ingestion stream.
Prevent this by delaying queries' execution until free resources are available for data ingestion.

Expose `vm_search_delays_total` metric, which may be used in for alerting when there is no enough CPU resources
for data ingestion and/or for executing heavy queries.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291
2020-07-05 19:42:05 +03:00
Aliaksandr Valialkin
d5dddb0953 all: use %w instead of %s for wrapping errors in fmt.Errorf
This will simplify examining the returned errors such as httpserver.ErrorWithStatusCode .
See https://blog.golang.org/go1.13-errors for details.
2020-06-30 23:05:11 +03:00
Aliaksandr Valialkin
b19ca3eb5f lib/storage: do not increment vm_slow_metric_name_loads_total counter for metric_ids which shouldnt be prefetched, since this may mislead users 2020-05-16 10:21:17 +03:00
Aliaksandr Valialkin
82ffbcb9a6 app/vmstorage: add vm_slow_metric_name_loads_total metric, which could be used as an indicator when more RAM is needed for improving query performance 2020-05-15 14:11:45 +03:00
Aliaksandr Valialkin
82ccdfaa91 app/vmstorage: add vm_slow_row_inserts_total and vm_slow_per_day_index_inserts_total metrics for determining whether VictoriaMetrics required more RAM for the current number of active time series 2020-05-15 13:44:32 +03:00
Aliaksandr Valialkin
4fc33163c4 lib/storage: optimize ingestion pefrormance for new time series 2020-05-15 13:24:37 +03:00
Aliaksandr Valialkin
8b32e7c3a0 lib/storage: reduce indentation in Storage.add 2020-05-15 13:24:37 +03:00
Aliaksandr Valialkin
1573ececb2 lib/storage: return the first error instead of the last error, since the first error usually points to the root cause 2020-05-15 13:24:37 +03:00
Aliaksandr Valialkin
0afd48d2ee lib: extract common code for returning fast unix timestamp into lib/fasttime 2020-05-14 23:02:07 +03:00
Aliaksandr Valialkin
dbd0c552d5 lib/storage: gradually pre-populate per-day inverted index for the next day
This should prevent from CPU usage spikes at 00:00 UTC every day when
inverted index for new day must be quickly created for all the active time series.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/430
2020-05-12 12:13:05 +03:00
Aliaksandr Valialkin
364db13c9c app/vmselect: add /api/v1/status/tsdb page with useful stats for locating root cause for high cardinality issues
See https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/425
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/268
2020-04-22 22:03:43 +03:00
Aliaksandr Valialkin
f3e0c55ea1 lib/storage: serialize snapshot creation process with mutex
This guarantees that the snapshot contains all the recently added data
from inmemory buffers when multiple concurrent calls to Storage.CreateSnapshot are performed.
2020-03-24 22:27:05 +02:00
Aliaksandr Valialkin
18af31a4c2 all: properly split vm_deduplicated_samples_total among cluster components
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/345
2020-02-27 23:48:07 +02:00
Aliaksandr Valialkin
ce15cecae4 lib/storage: typo fix 2020-02-16 15:53:44 +02:00
Aliaksandr Valialkin
32e153e834 lib/storage: prevent from clobbering nin-nil lastError in Storage.add 2020-02-16 15:51:26 +02:00
Aliaksandr Valialkin
eceaf13e5e lib/{storage,mergeset}: use time.Ticker instead of time.Timer where appropriate
It has been appeared that time.Timer was used in places where time.Ticker must be used instead.
This could result in blocked goroutines as in the https://github.com/VictoriaMetrics/VictoriaMetrics/issues/316 .
2020-02-13 13:10:07 +02:00
Aliaksandr Valialkin
2152f6f0cd lib/storage: re-use indexSearch inside Storage.prefetchMetricNames 2020-01-31 01:16:53 +02:00
Aliaksandr Valialkin
d68546aa4a lib/storage: pre-fetch metricNames for the found metricIDs in Search.Init
This should speed up Search.NextMetricBlock loop for big number of found time series.
2020-01-30 15:08:51 +02:00
Aliaksandr Valialkin
680080887d all: consistently log durations in seconds with millisecond precision
This should improve logs readability
2020-01-22 18:28:27 +02:00
Aliaksandr Valialkin
605d588ba6 lib/uint64set: reduce memory usage in Union, Intersect and Subtract methods
Iterate items with newly added Set.ForEach method instead of allocating `[]uint64`
slice for all the items before the iteration.
2020-01-15 12:12:49 +02:00
Aliaksandr Valialkin
97f70ccda7 lib/storage: optimize bulk import performance when multiple data points are inserted for the same time series
This should speed up `/api/v1/import` and make it more scalable on multi-core systems.
2019-12-19 18:18:29 +02:00
Aliaksandr Valialkin
62a915f2b2 lib/storage: protect from time drift during indexdb rotation
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/248
2019-12-02 14:44:42 +02:00
Aliaksandr Valialkin
7a4635f853 all: remove the remaining mentions of cluster version 2019-11-21 23:18:22 +02:00
Aliaksandr Valialkin
119dfd01bb lib/storage: add vm_cache_size_bytes{type="storage/hour_metric_ids"} metric 2019-11-13 20:24:21 +02:00
Aliaksandr Valialkin
86a1cd700b lib/storage: remove inmemory index for recent hour, since it uses too much memory
Production workload shows that the index requires ~4Kb of RAM per active time series.
This is too much for high number of active time series, so let's delete this index.

Now the queries should fall back to the index for the current day instead of the index
for the recent hour. The query performance for the current day index should be good enough
given the 100M rows/sec scan speed per CPU core.
2019-11-13 17:58:07 +02:00
Aliaksandr Valialkin
c57eb0ff83 lib/storage: add -disableRecentHourIndex flag for disabling inmemory index for recent hour
This may be useful for saving RAM on high number of time series aka high cardinality
2019-11-13 15:02:51 +02:00
Aliaksandr Valialkin
ca259864e2 lib/storage: return back inmemory inverted index for recent hour
Issues fixed:
- Slow startup times. Now the index is loaded from cache during start.
- High memory usage related to superflouos index copies every 10 seconds.
2019-11-13 13:11:04 +02:00
Aliaksandr Valialkin
01bb3c06c7 lib/storage: remove inmemory inverted index for recent hours
Production load with >10M active time series showed it could
slow down VictoriaMetrics startup times and could eat
all the memory leading to OOM.

Remove inmemory inverted index for recent hours until thorough
testing on production data shows it works OK.
2019-11-13 10:45:53 +02:00
Oleg Kovalov
b4f44befa3 fix misspelled words (#229) 2019-11-12 00:16:42 +02:00
Aliaksandr Valialkin
8e8f98f712 lib/storage: add tests for dateMetricIDCache 2019-11-11 13:21:57 +02:00
Aliaksandr Valialkin
c342f5e37e lib/storage: eliminate data race when updating lastSyncTime in dateMetricIDCache.Has 2019-11-10 22:04:01 +02:00
Aliaksandr Valialkin
ee7765b10d lib/storage: implement per-day inverted index 2019-11-10 00:02:46 +02:00
Aliaksandr Valialkin
5810ba57c2 lib/storage: use specialized cache for (date, metricID) entries
This improves ingestion performance.
2019-11-09 23:06:11 +02:00
Aliaksandr Valialkin
9ea549ed24 lib/storage: sync with cluster changes 2019-11-08 21:21:07 +02:00