Aliaksandr Valialkin
27f7cca7ff
lib/backup: donload only the remaining parts for partially downloaded files after vmrestore
restart
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/487
2020-05-15 17:03:33 +03:00
Aliaksandr Valialkin
82ffbcb9a6
app/vmstorage: add vm_slow_metric_name_loads_total
metric, which could be used as an indicator when more RAM is needed for improving query performance
2020-05-15 14:11:45 +03:00
Aliaksandr Valialkin
82ccdfaa91
app/vmstorage: add vm_slow_row_inserts_total
and vm_slow_per_day_index_inserts_total
metrics for determining whether VictoriaMetrics required more RAM for the current number of active time series
2020-05-15 13:44:32 +03:00
Aliaksandr Valialkin
0eacea1de1
lib/{storage,mergeset}: further tuning of compression levels depending on block size
...
This should improve performance for querying newly added data, since it can be unpacked faster.
2020-05-15 13:24:37 +03:00
Aliaksandr Valialkin
737d641920
lib/storage: wait for all the goroutines to finish in TestSearch in order to prevent racy behavior on test finish
2020-05-15 13:24:37 +03:00
Aliaksandr Valialkin
4fc33163c4
lib/storage: optimize ingestion pefrormance for new time series
2020-05-15 13:24:37 +03:00
Aliaksandr Valialkin
f9f3afb6af
lib/mergeset: tune compression levels in order to improve ingestion performance a bit
2020-05-15 13:24:37 +03:00
Aliaksandr Valialkin
8b32e7c3a0
lib/storage: reduce indentation in Storage.add
2020-05-15 13:24:37 +03:00
Aliaksandr Valialkin
1573ececb2
lib/storage: return the first error instead of the last error, since the first error usually points to the root cause
2020-05-15 13:24:37 +03:00
Aliaksandr Valialkin
0afd48d2ee
lib: extract common code for returning fast unix timestamp into lib/fasttime
2020-05-14 23:02:07 +03:00
Aliaksandr Valialkin
42866fa754
lib/{storage,mergeset}: return dst on error from unmarshalBlockHeaders, so it could be reused
2020-05-14 15:32:07 +03:00
Aliaksandr Valialkin
827a3a7866
lib/storage: document that getnerateUniqueMetricID should return dense ids
2020-05-14 14:08:45 +03:00
Aliaksandr Valialkin
606585f7be
lib/{storage,mergeset}: cleanup: remove unused partSearch.indexBlockReuse
2020-05-14 14:03:03 +03:00
Aliaksandr Valialkin
4fe67504f9
lib/storage: optimize label matching for regexp ending with literal suffix
...
For example, `{label=~"foo.*bar.+baz"}` contains literal suffix `baz`,
so it should work faster now.
2020-05-13 11:47:07 +03:00
Aliaksandr Valialkin
96e001d254
app/vmagent: fix a bug with improper relabeling when multiple -remoteWrite.urlRelableConfig
args are set
...
This bug could result in incorrect relabeling and metrics' drop.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/467
2020-05-12 22:02:58 +03:00
Aliaksandr Valialkin
a6f16dcc11
lib/fs: do not use mmap for 32-bit arches by default, since they cannot map files bigger than 4GB in RAM
2020-05-12 20:22:09 +03:00
Aliaksandr Valialkin
0a134ace63
app/vmagent: fix scraping mTLS targets, which has been broken in v1.35.1
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/470
2020-05-12 17:23:03 +03:00
Aliaksandr Valialkin
8300cc17af
app/vmagent,lib/promscrape: do not set HostClient.DialDualStack, since it isnt used if HostClient.Dial is set
2020-05-12 15:24:18 +03:00
Aliaksandr Valialkin
3232605524
lib/storage: properly initialize part struct before trying to close it on error
...
This should prevent from nil pointer dereference bug at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/468 .
2020-05-12 14:54:31 +03:00
Aliaksandr Valialkin
dbd0c552d5
lib/storage: gradually pre-populate per-day inverted index for the next day
...
This should prevent from CPU usage spikes at 00:00 UTC every day when
inverted index for new day must be quickly created for all the active time series.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/430
2020-05-12 12:13:05 +03:00
Aliaksandr Valialkin
cc00a2c453
lib/storage: typo fixes in error messages: or -> of
2020-05-12 12:12:42 +03:00
Aliaksandr Valialkin
ce2107bc52
lib/storage: speed up matching for common regexps in label filters
...
The following regexps have been optimized:
* 'foo.+bar'
* 'foo.+bar.+baz'
This should improve performance for matching Graphite-like metrics.
2020-05-11 22:40:55 +03:00
Aliaksandr Valialkin
12a1a71cc1
lib/storage: add a benchmark for Graphite-like regexps for metric names
2020-05-11 22:37:32 +03:00
Aliaksandr Valialkin
099e44005b
lib/httpserver: add -http.shutdownDelay
flag for a grace period before http server shutdown
...
The http server returns 503 non-OK error at `/health` page during grace period,
so load balancers in front of the http server could re-route incoming requests
to other servers.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/463
2020-05-07 15:30:35 +03:00
Aliaksandr Valialkin
787fcfba0c
lib/httpserver: reduce typical duration for http server graceful shutdown
...
Previously the duration for graceful shutdown for http server could take more than a minute
because of imporperly set timeouts in setNetworkTimeout.
Now typical duration for graceful shutdown should be reduced to less than 5 seconds.
2020-05-07 14:12:39 +03:00
Aliaksandr Valialkin
91a49eecea
lib/flagutil: make errcheck happy by explicitly ignoring Array.Set result in tests
2020-05-06 22:37:39 +03:00
Aliaksandr Valialkin
c4c447507d
lib/flagutil: properly parse quoted flag values for flagutil.Array
2020-05-06 22:27:21 +03:00
Aliaksandr Valialkin
8a00807f60
app/vmagent: allow setting independent auth configs per each configured -remoteWrite.url
2020-05-06 16:51:41 +03:00
Aliaksandr Valialkin
68928bf3df
lib/promscrape/discovery/gce: discover per-zone instances for gce_sd_config
in parallel. This should reduce discovery latency
2020-05-06 15:00:09 +03:00
Aliaksandr Valialkin
3f52a97f9b
lib/promscrape: add Prometheus-compatible DNS-based service discovery aka dns_sd_configs
2020-05-06 00:01:58 +03:00
Aliaksandr Valialkin
364789c24c
lib/promscrape: properly connect to TCP6 addresses if -enableTCP6
is set
2020-05-06 00:01:57 +03:00
Aliaksandr Valialkin
7b5ef63384
lib/procutil: add NewSighupChan function, which returns a channel, which is triggered on every SIGHUP
2020-05-05 10:54:09 +03:00
Aliaksandr Valialkin
4fa817be10
lib/promscrape: allow explicitly setting empty token via token: ""
in consul_sd_config
2020-05-05 07:50:15 +03:00
Aliaksandr Valialkin
40c3ffb359
lib/promscrape: add Prometheus-compatible service discovery for Consul aka consul_sd_configs
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/330
2020-05-04 20:51:17 +03:00
Aliaksandr Valialkin
83f0e35b7b
lib/promauth: properly set up client certificate in tls.Config
...
Previously the client certificate has been mistakenly set up as a server certificate
2020-05-04 20:51:08 +03:00
Aliaksandr Valialkin
218e566647
lib/promscrape: move common code for discovery api config map handling into discoveryutils
2020-05-04 20:51:01 +03:00
Aliaksandr Valialkin
6310b20e72
lib/promscrape/discovery/kubernetes/: unify apiConfig creation
2020-05-04 20:50:49 +03:00
Aliaksandr Valialkin
66b0ae79a5
lib/promscrape: remove debug line left after the commit e4aac6ea40
2020-05-03 17:15:32 +03:00
Aliaksandr Valialkin
69004a5f67
lib/promscrape: fix tests after the commit 658a8742ac
...
The original commit copies `__address__` label to `instance` label when generating per-target labels as Prometheus does.
See https://www.robustperception.io/life-of-a-label for details.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/453
2020-05-03 16:56:15 +03:00
DexterZhang
658a8742ac
fix(vmagent): different behavior as how prometheus deal with labels. [Issue#453] ( #454 )
2020-05-03 16:51:03 +03:00
Aliaksandr Valialkin
e4aac6ea40
lib/promscrape: make consistent scrape time offsets across reloads for the same ScrapeURL and Labels
...
This should make consistent intervals between data points for scrape targets across reloads.
Previously these intervals were random.
2020-05-03 14:30:21 +03:00
Aliaksandr Valialkin
d9f1b4d6a3
lib/promscrape: fix TestGetFileSDScrapeWorkSuccess after 3b234d82e5
2020-05-03 14:28:31 +03:00
Aliaksandr Valialkin
3b234d82e5
lib/promscrape: reload only modified scrapers on config changes
...
This should improve scrape stability when big number of targets are scraped and these targets are frequently changed.
Thanks to @xbsura for the idea and initial implementation attempts at the following pull requests:
- https://github.com/VictoriaMetrics/VictoriaMetrics/pull/449
- https://github.com/VictoriaMetrics/VictoriaMetrics/pull/458
- https://github.com/VictoriaMetrics/VictoriaMetrics/pull/459
- https://github.com/VictoriaMetrics/VictoriaMetrics/pull/460
2020-05-03 12:45:40 +03:00
Aliaksandr Valialkin
ee810e5f3a
lib/httpserver: rename http.externalURL
to http.pathPrefix
and improve help message for this flag
...
The `http.externalURL` flag name was slightly misleading, so it has been renamed to `http.pathPrefix`.
2020-05-02 13:07:34 +03:00
DexterZhang
34743974d5
feat(httpserver): add http.externalUrl
config to http server, it adds prefix to http path automatically ( #452 )
2020-05-02 12:42:53 +03:00
Aliaksandr Valialkin
432187ac3b
app/vminsert: add /-/reload
handler in the same way as for vmagent
2020-04-30 02:15:39 +03:00
Aliaksandr Valialkin
825a2dd554
lib/procutil: prevent from app termination on SIGHUP signal, since this signal is frequently used for config reload
2020-04-30 02:09:27 +03:00
Aliaksandr Valialkin
01c17092e1
lib/httpserver: mention that -http.maxGracefulShutdownDuration
command-line flag value can be increased on shutdown timeout
2020-04-30 01:38:06 +03:00
Aliaksandr Valialkin
5ec036439d
lib/promscrape: set 30 seconds timeout for discovery api requests
...
Previously such requests could hang for long time. This could make debugging harder.
2020-04-29 17:33:34 +03:00
Aliaksandr Valialkin
43c39dc36c
vendor: use github.com/VictoriaMetrics/fasthttp instead of github.com/fasthttp/fasthttp
...
The upstream fasthttp may contain issues like 996610f021
,
plus a code that isn't used by VictoriaMetrics. So let's use a private copy under our control instead.
2020-04-29 17:33:34 +03:00
Aliaksandr Valialkin
4e4f57b121
lib/metricsql: move it to a separate repository - github.com/VictoriaMetrics/metrics
2020-04-28 15:28:22 +03:00
Aliaksandr Valialkin
83aca79137
lib/storage: recover when metricID->metricName entry is missing in the inverted index after unclean shutdown
...
Newly added index entries can be missing after unclean shutdown, since they didn't flush to persistent storage yet.
Log about this and delete the corresponding metricID, so it could be re-created next time.
2020-04-28 12:00:33 +03:00
Aliaksandr Valialkin
521df0e2fc
lib/promscrape: handle connection reset when targets responds with http redirect
2020-04-28 02:13:02 +03:00
肖贝贝
2b16c188e8
fix: vmagent not follow 301/302 redirect bug ( #445 )
...
Co-authored-by: xiaobeibei <xiaobeibei@bigo.sg>
2020-04-28 01:29:37 +03:00
Aliaksandr Valialkin
303905cd84
lib/{encoding,decimal}: typo fixes in tests: epxecting->expecting
2020-04-28 00:01:55 +03:00
Aliaksandr Valialkin
36fa3078c2
lib/encoding: reduce possibility of failure in TestMarshalInt64ArraySize
2020-04-28 00:01:54 +03:00
Aliaksandr Valialkin
95942f1ac6
lib/promscrape/discovery/gce: make golangci-lint happy
2020-04-27 19:28:10 +03:00
Aliaksandr Valialkin
b768bc9a6a
lib/promscrape: add initial support for Prometheus-compatible service discovery for Amazon EC2 aka ec2_sd_configs
2020-04-27 19:25:53 +03:00
Aliaksandr Valialkin
de59703a16
lib/promscrape/discovery/gce: properly set filter
query arg in api url
2020-04-27 16:01:17 +03:00
Aliaksandr Valialkin
b4afe562c1
lib/storage: postpone reading data from blocks during search
...
This eliminates the need for storing block data into temporary files on a single-node VictoriaMetrics
during heavy queries, which touch big number of time series over long time ranges.
This improves single-node VM performance on heavy queries by up to 2x.
2020-04-27 11:45:24 +03:00
Aliaksandr Valialkin
0224071ebe
lib/promscrape/discovery/gce: allow empty project and zone for gce_sd_config
2020-04-27 11:45:02 +03:00
Aliaksandr Valialkin
6954d0edb7
lib/promscrape/discovery/gce: allow empty zone
arg in gce_sd_config
- in this case zones for the given project are automatically discovered
2020-04-26 14:34:11 +03:00
kreedom
fb967ae6c8
happy fmt
2020-04-26 14:16:32 +03:00
Aliaksandr Valialkin
d7c1ff8b0c
lib/storage: improve deduplication algorithm
...
Now it leaves only the first data point on each `-dedup.minScrapeInterval` interval.
Previously it may leave two data points on the interval. This could lead to unexpected results
for `histogram_quantile(phi, sum(rate(buckets)) by (le))` query.
2020-04-26 13:10:02 +03:00
Aliaksandr Valialkin
491b31b369
lib/storage: postpone label filters matching too many time series instead of giving up with error
...
This should reduce the frequency of the following errors:
cannot find tag filter matching less than N time series; either increase -search.maxUniqueTimeseries or use more specific tag filters
more than N time series found on the time range [...]; either increase -search.maxUniqueTimeseries or shrink the time range
2020-04-24 21:13:50 +03:00
Aliaksandr Valialkin
7b8008e0bd
lib/promscrape/discovery/gce: make golint happy by ignoring resp.Body.Close() result
2020-04-24 18:13:09 +03:00
Aliaksandr Valialkin
9ef5935552
lib/promscrape: initial implementation for gce_sd_configs
aga Prometheus-compatible service discovery for Google Compute Engine
2020-04-24 17:51:22 +03:00
Aliaksandr Valialkin
24461153bf
lib/promscrape: query /api/v1/namespaces/*
for the configured namespaces in kubernetes_sd_config
...
This should fix authroization issues described at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/432
2020-04-24 14:33:50 +03:00
Aliaksandr Valialkin
00e897119f
lib/promscrape: add -promscrape.configCheckInterval
command-line flag for automating config checking
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/431
2020-04-23 23:41:08 +03:00
Aliaksandr Valialkin
a9a7a7175e
lib/promscrape: access Config entries by reference, so they can be compared by addresses
2020-04-23 14:38:20 +03:00
Aliaksandr Valialkin
1c5d14a2eb
lib/promscrape: move KubernetesSDConfig to lib/promscrape/discovery/kubernetes
2020-04-23 11:34:22 +03:00
Aliaksandr Valialkin
a714568374
lib/promscrape/discovery/kubernetes: hide role switch logic behind GetLabels function
2020-04-22 22:16:11 +03:00
Aliaksandr Valialkin
364db13c9c
app/vmselect: add /api/v1/status/tsdb
page with useful stats for locating root cause for high cardinality issues
...
See https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/425
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/268
2020-04-22 22:03:43 +03:00
Aliaksandr Valialkin
2dc5593b75
lib/writeconcurrencylimiter: improve docs for -maxConcurrentInserts command-line flag
2020-04-20 21:03:00 +03:00
Aliaksandr Valialkin
5454b518a6
lib/promscrape/discovery/kubernetes: reuse a client for empty api_server
inside different jobs
2020-04-20 17:07:11 +03:00
Aliaksandr Valialkin
43375df923
lib/promscrape/discovery/kubernetes: update stale comments
2020-04-17 14:06:20 +03:00
Aliaksandr Valialkin
5d1537a395
lib/promscrape: suppress scrape errors if -promscrape.suppressScrapeErrors
flag is set
2020-04-16 23:41:30 +03:00
Aliaksandr Valialkin
600490131f
lib/promscrape: print all the labels for the target on error message for failed scrape
...
This should improve debuggability.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/420
2020-04-16 23:35:05 +03:00
Aliaksandr Valialkin
bd4c6d21dd
lib/promscrape: retry target scraping when the target closes previously established keep-alive connection to it
...
This should fix the following error:
the server closed connection before returning the first response byte. Make sure the server returns 'Connection: close' response header before closing the connection
2020-04-16 23:25:29 +03:00
Aliaksandr Valialkin
2fd2dec5eb
lib/logger: typo fix
2020-04-16 00:19:10 +03:00
Aliaksandr Valialkin
071fdf5518
lib/logger: add WARN level for logging expected errors such as invalid user queries
2020-04-15 20:50:26 +03:00
Aliaksandr Valialkin
6f7f64f757
app/vmselect: handle timestamp(metric offset X)
the same way as Prometheus does
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/415
2020-04-15 12:01:00 +03:00
Aliaksandr Valialkin
426a0567c4
lib/promscrape: code cleanup in runScraper func
2020-04-15 11:36:24 +03:00
Aliaksandr Valialkin
c1de3f67b4
lib/storage: skip metricID if the corresponding metricID->metricName is missing in inverted index during search
...
This case is possible when the corresponding metricID->metricName entry didn't propagate to inverted index yet.
This should fix the following error:
error when searching tsids for tfss [...]: cannot find metricName by metricID 1582417212213420669: EOF
2020-04-15 00:06:43 +03:00
Aliaksandr Valialkin
067c7afebc
lib/promscrape: show information on improperly configured scrape targets at the bottom of /targets
page
...
This is a common error whith improperly configured target autodiscovery and/or relabeling.
This error leads to duplicate scraping of the same targets with the same set of labels, which leads
to duplicate samples in time series.
2020-04-14 14:55:05 +03:00
Aliaksandr Valialkin
ac35635b71
lib/promscrape/discovery/kubernetes: remove only unused client for API server during cleaning
2020-04-14 14:19:21 +03:00
Aliaksandr Valialkin
78863d7066
lib/promscrape: add promrelabel.GetLabelValueByName helper function
2020-04-14 14:12:01 +03:00
Aliaksandr Valialkin
c64f003cfb
lib/promscrape: mention job name in error messages when target cannot be scraped
...
This should improve debuggability
2020-04-14 13:33:13 +03:00
Aliaksandr Valialkin
4718a5d951
lib/promscrape: reset ScrapeWork.ID in tests
2020-04-14 13:31:31 +03:00
Aliaksandr Valialkin
257521a634
lib/promscrape: properly expose statuses for targets with duplicate scrape urls at /targets
page
...
Previously targets with duplicate scrape urls were merged into a single line on the page.
Now each target with duplicate scrape url is displayed on a separate line.
2020-04-14 13:10:01 +03:00
Aliaksandr Valialkin
6a75c95194
lib/promscrape: remove labels starting with __meta_
after applying relabel_configs
as Prometheus does
...
This should reduce CPU load during scraping when target discovery generates
big number of `__meta_*` labels (for instance, k8s discovery).
See https://www.robustperception.io/life-of-a-label for details.
2020-04-14 12:23:22 +03:00
Aliaksandr Valialkin
01d7d799dc
lib/promscrape: rename 'scrape_config->scrape_limit' to 'scrape_config->sample_limit'
...
`scrape_config` block from Prometheus config contains `sample_limit` field,
while in `vmagent` this field was mistakenly named as `scrape_limit`.
2020-04-14 11:59:57 +03:00
Aliaksandr Valialkin
2e4e202c2b
lib/promscrape: add initial support for kubernetes_sd_config
...
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/334
2020-04-13 21:03:28 +03:00
Aliaksandr Valialkin
2814b1490f
lib/promscrape: add -promscrape.config.strictParse
flag for detecting errors in -promscrape.config
file
2020-04-13 13:15:44 +03:00
Aliaksandr Valialkin
90b4a6dd12
lib/promscrape: extract common auth code to lib/promauth
2020-04-13 12:59:10 +03:00
Aliaksandr Valialkin
4de6c6bbf0
lib/storage: disable deduplication after dedup tests are complete
...
The rest of tests expect that the de-duplication is disabled.
2020-04-10 17:28:31 +03:00
Aliaksandr Valialkin
ded0c0d3c7
lib/storage: correctly handle -dedup.minScrapeInterval
values smaller than 8ms
...
Such small values may be used for removing samples with duplicate timestamps.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/409 for details.
2020-04-10 16:36:41 +03:00
Aliaksandr Valialkin
7d73623c69
lib/{storage,mergeset}: make sure that requests
and misses
cache counters never go down
2020-04-10 14:45:01 +03:00
Aliaksandr Valialkin
e62afc7366
lib/protoparser: add -*TrimTimstamp
command-line flags for Influx, Graphite, OpenTSDB and CSV data
...
These flags can be used for reducing disk space usage for timestamps data ingested over the given protocols
2020-04-10 12:44:39 +03:00
Aliaksandr Valialkin
0681b4c27a
lib/workingsetcache: accumulate stat counters on cache rotation
...
This should prevent from cache stats counters going down after cache rotation,
which may corrupt `cache hit ratio` graph on the official Grafan dasbhoards
when using the following query:
1 - (sum(rate(vm_cache_misses_total[5m])) by (type) / sum(rate(vm_cache_requests_total[5m])) by (type))
2020-04-10 11:51:40 +03:00