Commit graph

541 commits

Author SHA1 Message Date
Aliaksandr Valialkin
ad730d8a17 lib/storage: optimize prefetching metric names for the given metricIDs 2020-08-06 16:53:10 +03:00
Aliaksandr Valialkin
639b26b40c lib/fs: export vm_nfs_pending_dirs_to_remove metric for monitoring the number of pending directories that couldn't be removed due to NFS lock 2020-08-06 15:31:34 +03:00
Aliaksandr Valialkin
8f16388428 lib/storage: limit the number of concurrent calls to storage.searchTSIDs to GOMAXPROCS*2
This should limit the maximum memory usage and reduce CPU trashing on vmstorage
when multiple heavy queries are executed.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648
2020-08-05 18:30:07 +03:00
Aliaksandr Valialkin
aaa497ff0b Perform conversion from string to []byte according to rule #6 at https://golang.org/pkg/unsafe/#Pointer 2020-08-05 11:55:58 +03:00
Aliaksandr Valialkin
a034f02fb2 lib/backup: allow using ~/.aws/config without region
Use us-west-2 for determining bucket region.
2020-08-04 13:07:59 +03:00
Aliaksandr Valialkin
509d12643b app/vmselect: show X-Forwarded-For contents on /api/v1/status/active_queries page
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/659
2020-07-31 20:05:18 +03:00
Aliaksandr Valialkin
5e71fab8a6 lib/storage: reduce the maximum number of concurrent merge workers to GOMAXPROCS/2
Previously the limit has been raised to GOMAXPROCS, but it has been appeared that this
increases query latencies since more CPUs are busy with merges.

While at it, substitute `*MergeConcurrencyLimitCh` channels with simple integer limits.
2020-07-31 17:46:56 +03:00
Aliaksandr Valialkin
e7959094f6 lib/storage: remove prioritizing of merging small parts over merging big parts, since it doesn't work as expected
The prioritizing could lead to big merge starvation, which could end up in too big number of parts that must be merged into big parts.

Multiple big merges may be initiated after the migration from v1.39.0 or v1.39.1. It is OK - these merges should be finished soon,
which should return CPU and disk IO usage to normal levels.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/618
2020-07-30 19:57:27 +03:00
Aliaksandr Valialkin
922d9aadf2 lib/storage: properly update vm_slow_row_inserts_total metric when importing multiple data points per time series at once
Previously the `vm_slow_row_inserts_total` metric may be incremented multiple times for different data points per a single time series,
while only a single increment is needed when inserting the first data point for this time series.
2020-07-30 16:17:24 +03:00
Aliaksandr Valialkin
67a64c142d lib/httpserver: emit X-Forwarded-For additionally to remoteAddr in error logs
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/659
2020-07-29 13:12:42 +03:00
Sasasu
5b81bdde39
lib/storage: metaindexRow use memroy more efficiently (#655)
due to memory align the metaindexRow structure use 64-byte pre object.
this commit changes the order of field, make metaindexRow use 56-byte pre
object.

Signed-off-by: Sasasu <su@sasasu.me>
2020-07-27 19:02:53 +03:00
Aliaksandr Valialkin
865610a7c8 lib/protoparser/prometheus: add a test for cassandra-exporter
Thanks to Seva
2020-07-27 18:37:11 +03:00
Aliaksandr Valialkin
54ef2d8112 lib/storage: slightly reduce code difference between single-node and cluster versions 2020-07-24 00:31:16 +03:00
Aliaksandr Valialkin
039c9d2441 lib/storage: respect -search.maxQueryDuration when searching for time series in inverted index
Previously the time spent on inverted index search could exceed the configured `-search.maxQueryDuration`.
This commit stops searching in inverted index on query timeout.
2020-07-23 21:21:42 +03:00
Aliaksandr Valialkin
2a45871823 lib/storage: add more fine-grained pace limiting for search 2020-07-23 19:26:08 +03:00
Aliaksandr Valialkin
6f05c4d351 lib/storage: improve prioritizing of data ingestion over querying
Prioritize also small merges over big merges.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648
2020-07-23 13:23:36 +03:00
Aliaksandr Valialkin
61c611f5ad lib/storage: properly calculate global metrics in UpdateStats() 2020-07-23 00:35:15 +03:00
Aliaksandr Valialkin
9224ede54f lib/mergeset: properly calculate global metrics in UpdateStats()
Previously these metrics could be calculated multiple times for multiple mergeset.Table instances.
2020-07-23 00:35:13 +03:00
Aliaksandr Valialkin
228d137936 lib/storage: reorder mergeBlockStreams() args in order to make them more consistent 2020-07-22 21:58:10 +03:00
Aliaksandr Valialkin
e4303d3d21 lib/storage: prevent possible race condition when all the goroutines exit Storage.AddRows, before goroutines other goroutines are blocked on searchTSIDsCond inside Storage.searchTSIDs
This condition may occur after the following sequence of events:

1) A goroutine enters the loop body when len(addRowsConcurrencyCh) == cap(addRowsConcurrencyCh) inside Storage.searchTSIDs.
2) All the goroutines return from Storage.AddRows.
3) The goroutine from step 1 blocks on searchTSIDsCond.Wait() inside the loop body.

The goroutine remains blocked until the next call to Storage.AddRows, which calls searchTSIDsCond.Signal().
This may take indefinite time.
2020-07-22 21:52:34 +03:00
Aliaksandr Valialkin
31ae5911a8 app/vmagent: add -remoteWrite.decimalPlaces command-line flag, which may be used for reducing disk space usage on the remote storage 2020-07-21 21:55:32 +03:00
Aliaksandr Valialkin
d3442b40b2 lib/uint64set: optimize adding items to the set via Set.AddMulti 2020-07-21 20:56:59 +03:00
Aliaksandr Valialkin
b35cb293f5 lib/httpserver: log remote address in error message from httpserver.Errorf
This should improve detection of the root cause of errors.
Thanks to Anant for the idea.
2020-07-20 14:11:22 +03:00
Aliaksandr Valialkin
e1107fec10 lib/storage: reset MetricName->TSID cache after marking metricIDs as deleted
This is a follow-up commit after 12b16077c4 ,
which didn't reset the `tsidCache` in all the required places.
This could result in indefinite errors like:

    missing metricName by metricID ...; this could be the case after unclean shutdown; deleting the metricID, so it could be re-created next time

Fix this by resetting the cache inside deleteMetricIDs function.
2020-07-14 14:06:32 +03:00
Aliaksandr Valialkin
cde18d1f43 lib/protoparser: properly update vm_protoparser_rows_read_total{type="promscrape"} metric 2020-07-14 12:16:35 +03:00
Seva Poliakov
457e61900d add vm_protoparser_rows_read_total metrics to promscrape (#624)
* add vm_protoparser_rows_read_total metrics to promscrape

move vm_protoparser_rows_read_total for promscrape to better place

move vm_protoparser_rows_read_total for promscrape to better place

* remove possibility of infinity loop at prometheus parser
2020-07-14 12:16:34 +03:00
Roman Khavronenko
7e347972c4
lib/flagutil: specify additional description for all Array type flags (#620)
Array type flag is now defined as `value` type in flag description when printed.
This change adds additional description to every Array type flag so it would be
clear what exact type is used:
```
  -remoteWrite.urlRelabelConfig array
        Optional path to relabel config for the corresponding -remoteWrite.url
        Supports array of values separated by comma or specified via multiple flags.
```
2020-07-13 21:56:37 +03:00
Roman Khavronenko
19dd121968
lib/persistentqueue: add vm_persistentqueue_bytes_pending metric (#619)
Metric `vm_persistentqueue_bytes_pending` is a gauge that shows current amount
of bytes in persistentqueue flushed on disk as a difference between write and read
offsets. This metric is very similar to `vmagent_remotewrite_pending_data_bytes`
except of accounting for bytes in-memory.
2020-07-13 21:54:09 +03:00
Roman Khavronenko
829ec4f9cf
Extend metric vm_promscrape_targets with status label (#615)
The change to `vm_promscrape_targets` metric suppose to improve observability
for `vmagent` so it will be possible to track how many targets are up or down
for every specific scrape group:
```
vm_promscrape_targets{type="static_configs", status="down"} 1
vm_promscrape_targets{type="static_configs", status="up"} 2
```
2020-07-13 21:52:03 +03:00
Aliaksandr Valialkin
cba820e390 app/{vminsert,vmagent}: add ability to import data in Prometheus exposition format via /api/v1/import/prometheus 2020-07-10 12:14:07 +03:00
Aliaksandr Valialkin
6fe3c48a6e properly calculate readCalls 2020-07-10 12:00:58 +03:00
Aliaksandr Valialkin
3e1d7d8489 lib/promscrape: send Accept header similar to Prometheus when scraping targets
This should fix scraping Spring Boot servers, which return incorrect response
unless `Accept: text/plain` request header is set.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/608
2020-07-08 19:48:22 +03:00
Aliaksandr Valialkin
cb92113632 lib/storage: limit the maximum concurrency for data ingestion to GOMAXPROCS
Previously the concurrency has been limited to GOMAXPROCS*2. This had little sense,
since every call to Storage.AddRows is bound to CPU, so the maximum ingestion bandwidth
is achieved when the number of concurrent calls to Storage.AddRows is limited to the number of CPUs,
i.e. to GOMAXPROCS.
2020-07-08 17:32:18 +03:00
Roman Khavronenko
e7557e0252
lib/protoparser: fix metric name of unmarshal errors in promremotewrite (#607)
The change fixes the typo in metric name `vm_protoparser_unmarshal_errors` to
respect the naming standard.
2020-07-08 14:18:41 +03:00
Aliaksandr Valialkin
e59b9916aa lib/protoparser/graphite: go fmt 2020-07-08 14:12:10 +03:00
Aliaksandr Valialkin
d0b694c5c8 lib/protoparser/graphite: add more tests after eb45185eef
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/610
2020-07-08 14:10:35 +03:00
Seva Poliakov
eb45185eef
Fix graphite minus one timestamp (#609)
* fix graphite -1 timestamp

* format the graphite fix -1 timestamp
2020-07-08 13:59:19 +03:00
Aliaksandr Valialkin
32b9fb58b8 lib/storage: clarify out of retention period error message by mentioning -retentionPeriod command-line flag 2020-07-08 13:54:26 +03:00
Aliaksandr Valialkin
12b16077c4 lib/storage: reset MetricName->TSID cache after deleting time series
This should prevent from adding new data points to deleted time series
without the need to check for the deleted time series.

This improves ingestion performance a bit when the `deleted time series ids` aka `dmis` set
contains big number of time series.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/596

Based on the idea from @n4mine at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/604
2020-07-06 22:01:08 +03:00
Aliaksandr Valialkin
a23806f486 lib/fs: clarify description for -fs.disableMmap command-line flag 2020-07-06 14:28:34 +03:00
Aliaksandr Valialkin
6daa5f7500 lib/storage: prioritize data ingestion over heavy queries
Heavy queries could result in the lack of CPU resources for processing the current data ingestion stream.
Prevent this by delaying queries' execution until free resources are available for data ingestion.

Expose `vm_search_delays_total` metric, which may be used in for alerting when there is no enough CPU resources
for data ingestion and/or for executing heavy queries.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291
2020-07-05 19:42:05 +03:00
Aliaksandr Valialkin
ff6a0955eb lib/promscrape: use HostClient.DoDeadline instead of HostClient.Do in order to guarantee strict deadline across multiple scrape attempts 2020-07-03 21:33:22 +03:00
Aliaksandr Valialkin
8b133e40d5 lib/promscrape: prevent from too big deadline misses on scrape retries
The maximum deadline miss duration is reduced to 2x scrape_interval in the worst case.
By default it is limited to scrape_interval configured for the given scrape target.
2020-07-03 20:41:36 +03:00
Aliaksandr Valialkin
44a54b8b3d lib/promscrape: check for nil error before checking for the returned status code when scraping targets 2020-07-03 18:37:14 +03:00
Aliaksandr Valialkin
0b2086b7a5 app/vminsert: prevent from adding and/or selecting labels with empty values
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/600
2020-07-02 23:14:11 +03:00
Aliaksandr Valialkin
91b3482894 app/vminsert: add ability to apply relabeling to all the incoming metrics if -relabelConfig command-line arg points to a file with a list of relabel_config entries
See https://victoriametrics.github.io/#relabeling
2020-07-02 20:39:28 +03:00
Aliaksandr Valialkin
4dd3de9286 lib/promscrape: add ability to set disable_compression and disable_keepalive options in scrape_config section of the config passed to -promscrape.config
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/580
2020-07-02 14:19:14 +03:00
Aliaksandr Valialkin
8da3f773ae lib/promscrape: add -promscrape.disableKeepAlive command-line flag for disabling http keep-alive connections when scraping targets
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/580
2020-07-01 02:20:20 +03:00
Aliaksandr Valialkin
b277ba8121 lib/httpserver: add Unwrap method to ErrorWithStatusCode, so As and Is functions in standard errors package may properly unwrap the error inside ErrorWithStatusCode 2020-07-01 00:54:01 +03:00
Aliaksandr Valialkin
56ccfa5218 all: use errors.As instead of type assertion for detecting net.Error 2020-07-01 00:15:34 +03:00