Merge branch 'public-single-node' into pmm-6401-read-prometheus-data-files

This commit is contained in:
Aliaksandr Valialkin 2021-03-15 22:44:24 +02:00
commit 7d44cdd8ce
74 changed files with 3952 additions and 2314 deletions

259
README.md
View file

@ -170,6 +170,7 @@ Alphabetically sorted links to case studies:
* [Font used](#font-used) * [Font used](#font-used)
* [Color Palette](#color-palette) * [Color Palette](#color-palette)
* [We kindly ask](#we-kindly-ask) * [We kindly ask](#we-kindly-ask)
* [List of command-line flags](#list-of-command-line-flags)
## How to start VictoriaMetrics ## How to start VictoriaMetrics
@ -182,7 +183,7 @@ The following command-line flags are used the most:
* `-storageDataPath` - path to data directory. VictoriaMetrics stores all the data in this directory. Default path is `victoria-metrics-data` in the current working directory. * `-storageDataPath` - path to data directory. VictoriaMetrics stores all the data in this directory. Default path is `victoria-metrics-data` in the current working directory.
* `-retentionPeriod` - retention for stored data. Older data is automatically deleted. Default retention is 1 month. See [these docs](#retention) for more details. * `-retentionPeriod` - retention for stored data. Older data is automatically deleted. Default retention is 1 month. See [these docs](#retention) for more details.
Other flags have good enough default values, so set them only if you really need this. Pass `-help` to see all the available flags with description and default values. Other flags have good enough default values, so set them only if you really need this. Pass `-help` to see [all the available flags with description and default values](#list-of-command-line-flags).
See how to [ingest data to VictoriaMetrics](#how-to-import-time-series-data), how to [query VictoriaMetrics](#grafana-setup) See how to [ingest data to VictoriaMetrics](#how-to-import-time-series-data), how to [query VictoriaMetrics](#grafana-setup)
and how to [handle alerts](#alerting). and how to [handle alerts](#alerting).
@ -413,6 +414,10 @@ while VictoriaMetrics stores them with *milliseconds* precision.
Extra labels may be added to all the written time series by passing `extra_label=name=value` query args. Extra labels may be added to all the written time series by passing `extra_label=name=value` query args.
For example, `/write?extra_label=foo=bar` would add `{foo="bar"}` label to all the ingested metrics. For example, `/write?extra_label=foo=bar` would add `{foo="bar"}` label to all the ingested metrics.
Some plugins for Telegraf such as [fluentd](https://github.com/fangli/fluent-plugin-influxdb), [Juniper/open-nti](https://github.com/Juniper/open-nti)
or [Juniper/jitmon](https://github.com/Juniper/jtimon) send `SHOW DATABASES` query to `/query` and expect a particular database name in the response.
Comma-separated list of expected databases can be passed to VictoriaMetrics via `-influx.databaseNames` command-line flag.
## How to send data from Graphite-compatible agents such as [StatsD](https://github.com/etsy/statsd) ## How to send data from Graphite-compatible agents such as [StatsD](https://github.com/etsy/statsd)
Enable Graphite receiver in VictoriaMetrics by setting `-graphiteListenAddr` command line flag. For instance, Enable Graphite receiver in VictoriaMetrics by setting `-graphiteListenAddr` command line flag. For instance,
@ -562,14 +567,17 @@ in front of VictoriaMetrics. [Contact us](mailto:sales@victoriametrics.com) if y
VictoriaMetrics accepts relative times in `time`, `start` and `end` query args additionally to unix timestamps and [RFC3339](https://www.ietf.org/rfc/rfc3339.txt). VictoriaMetrics accepts relative times in `time`, `start` and `end` query args additionally to unix timestamps and [RFC3339](https://www.ietf.org/rfc/rfc3339.txt).
For example, the following query would return data for the last 30 minutes: `/api/v1/query_range?start=-30m&query=...`. For example, the following query would return data for the last 30 minutes: `/api/v1/query_range?start=-30m&query=...`.
VictoriaMetrics accepts `round_digits` query arg for `/api/v1/query` and `/api/v1/query_range` handlers. It can be used for rounding response values to the given number of digits after the decimal point. For example, `/api/v1/query?query=avg_over_time(temperature[1h])&round_digits=2` would round response values to up to two digits after the decimal point.
By default, VictoriaMetrics returns time series for the last 5 minutes from `/api/v1/series`, while the Prometheus API defaults to all time. Use `start` and `end` to select a different time range. By default, VictoriaMetrics returns time series for the last 5 minutes from `/api/v1/series`, while the Prometheus API defaults to all time. Use `start` and `end` to select a different time range.
VictoriaMetrics accepts additional args for `/api/v1/labels` and `/api/v1/label/.../values` handlers. VictoriaMetrics accepts additional args for `/api/v1/labels` and `/api/v1/label/.../values` handlers.
See [this feature request](https://github.com/prometheus/prometheus/issues/6178) for details:
* Any number [time series selectors](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors) via `match[]` query arg. * Any number [time series selectors](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors) via `match[]` query arg.
* Optional `start` and `end` query args for limiting the time range for the selected labels or label values. * Optional `start` and `end` query args for limiting the time range for the selected labels or label values.
See [this feature request](https://github.com/prometheus/prometheus/issues/6178) for details.
Additionally VictoriaMetrics provides the following handlers: Additionally VictoriaMetrics provides the following handlers:
* `/api/v1/series/count` - returns the total number of time series in the database. Some notes: * `/api/v1/series/count` - returns the total number of time series in the database. Some notes:
@ -1367,6 +1375,8 @@ See the example of alerting rules for VM components [here](https://github.com/Vi
VictoriaMetrics accepts optional `date=YYYY-MM-DD` and `topN=42` args on this page. By default `date` equals to the current date, VictoriaMetrics accepts optional `date=YYYY-MM-DD` and `topN=42` args on this page. By default `date` equals to the current date,
while `topN` equals to 10. while `topN` equals to 10.
* New time series can be logged if `-logNewSeries` command-line flag is passed to VictoriaMetrics.
* VictoriaMetrics limits the number of labels per each metric with `-maxLabelsPerTimeseries` command-line flag. * VictoriaMetrics limits the number of labels per each metric with `-maxLabelsPerTimeseries` command-line flag.
This prevents from ingesting metrics with too many labels. It is recommended [monitoring](#monitoring) `vm_metrics_with_dropped_labels_total` This prevents from ingesting metrics with too many labels. It is recommended [monitoring](#monitoring) `vm_metrics_with_dropped_labels_total`
metric in order to determine whether `-maxLabelsPerTimeseries` must be adjusted for your workload. metric in order to determine whether `-maxLabelsPerTimeseries` must be adjusted for your workload.
@ -1538,3 +1548,248 @@ Files included in each folder:
* There should be sufficient clear space around the logo. * There should be sufficient clear space around the logo.
* Do not change spacing, alignment, or relative locations of the design elements. * Do not change spacing, alignment, or relative locations of the design elements.
* Do not change the proportions of any of the design elements or the design itself. You may resize as needed but must retain all proportions. * Do not change the proportions of any of the design elements or the design itself. You may resize as needed but must retain all proportions.
## List of command-line flags
Pass `-help` to VictoriaMetrics in order to see the list of supported command-line flags with their description:
```
-bigMergeConcurrency int
The maximum number of CPU cores to use for big merges. Default value is used if set to 0
-csvTrimTimestamp duration
Trim timestamps when importing csv data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
-dedup.minScrapeInterval duration
Remove superflouos samples from time series if they are located closer to each other than this duration. This may be useful for reducing overhead when multiple identically configured Prometheus instances write data to the same VictoriaMetrics. Deduplication is disabled if the -dedup.minScrapeInterval is 0
-deleteAuthKey string
authKey for metrics' deletion via /api/v1/admin/tsdb/delete_series and /tags/delSeries
-denyQueriesOutsideRetention
Whether to deny queries outside of the configured -retentionPeriod. When set, then /api/v1/query_range would return '503 Service Unavailable' error for queries with 'from' value outside -retentionPeriod. This may be useful when multiple data sources with distinct retentions are hidden behind query-tee
-dryRun
Whether to check only -promscrape.config and then exit. Unknown config entries are allowed in -promscrape.config by default. This can be changed with -promscrape.config.strictParse
-enableTCP6
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP is used
-envflag.enable
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set
-envflag.prefix string
Prefix for environment variables if -envflag.enable is set
-finalMergeDelay duration
The delay before starting final merge for per-month partition after no new data is ingested into it. Final merge may require additional disk IO and CPU resources. Final merge may increase query speed and reduce disk space usage in some cases. Zero value disables final merge
-forceFlushAuthKey string
authKey, which must be passed in query string to /internal/force_flush pages
-forceMergeAuthKey string
authKey, which must be passed in query string to /internal/force_merge pages
-fs.disableMmap
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
-graphiteListenAddr string
TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty
-graphiteTrimTimestamp duration
Trim timestamps for Graphite data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
-http.connTimeout duration
Incoming http connections are closed after the configured timeout. This may help spreading incoming load among a cluster of services behind load balancer. Note that the real timeout may be bigger by up to 10% as a protection from Thundering herd problem (default 2m0s)
-http.disableResponseCompression
Disable compression of HTTP responses for saving CPU resources. By default compression is enabled to save network bandwidth
-http.idleConnTimeout duration
Timeout for incoming idle http connections (default 1m0s)
-http.maxGracefulShutdownDuration duration
The maximum duration for graceful shutdown of HTTP server. Highly loaded server may require increased value for graceful shutdown (default 7s)
-http.pathPrefix string
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
-http.shutdownDelay duration
Optional delay before http server shutdown. During this dealy the servier returns non-OK responses from /health page, so load balancers can route new requests to other servers
-httpAuth.password string
Password for HTTP Basic Auth. The authentication is disabled if -httpAuth.username is empty
-httpAuth.username string
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
-httpListenAddr string
TCP address to listen for http connections (default ":8428")
-import.maxLineLen size
The maximum length in bytes of a single line accepted by /api/v1/import; the line length can be limited with 'max_rows_per_line' query arg passed to /api/v1/export
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 104857600)
-influx.databaseNames array
Comma-separated list of database names to return from /query and /influx/query API. This can be needed for accepting data from Telegraf plugins such as https://github.com/fangli/fluent-plugin-influxdb
Supports array of values separated by comma or specified via multiple flags.
-influx.maxLineSize size
The maximum size in bytes for a single Influx line during parsing
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 262144)
-influxListenAddr string
TCP and UDP address to listen for Influx line protocol data. Usually :8189 must be set. Doesn't work if empty. This flag isn't needed when ingesting data over HTTP - just send it to http://<victoriametrics>:8428/write
-influxMeasurementFieldSeparator string
Separator for '{measurement}{separator}{field_name}' metric name when inserted via Influx line protocol (default "_")
-influxSkipMeasurement
Uses '{field_name}' as a metric name while ignoring '{measurement}' and '-influxMeasurementFieldSeparator'
-influxSkipSingleField
Uses '{measurement}' instead of '{measurement}{separator}{field_name}' for metic name if Influx line contains only a single field
-influxTrimTimestamp duration
Trim timestamps for Influx line protocol data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
-insert.maxQueueDuration duration
The maximum duration for waiting in the queue for insert requests due to -maxConcurrentInserts (default 1m0s)
-loggerDisableTimestamps
Whether to disable writing timestamps in logs
-loggerErrorsPerSecondLimit int
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, then the remaining errors are suppressed. Zero value disables the rate limit
-loggerFormat string
Format for logs. Possible values: default, json (default "default")
-loggerLevel string
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
-loggerOutput string
Output for the logs. Supported values: stderr, stdout (default "stderr")
-loggerTimezone string
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
-maxConcurrentInserts int
The maximum number of concurrent inserts. Default value should work for most cases, since it minimizes the overhead for concurrent inserts. This option is tigthly coupled with -insert.maxQueueDuration (default 16)
-maxInsertRequestSize size
The maximum size in bytes of a single Prometheus remote_write API request
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
-maxLabelsPerTimeseries int
The maximum number of labels accepted per time series. Superfluous labels are dropped (default 30)
-memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
-metricsAuthKey string
Auth key for /metrics. It overrides httpAuth settings
-opentsdbHTTPListenAddr string
TCP address to listen for OpentTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty
-opentsdbListenAddr string
TCP and UDP address to listen for OpentTSDB metrics. Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. Usually :4242 must be set. Doesn't work if empty
-opentsdbTrimTimestamp duration
Trim timestamps for OpenTSDB 'telnet put' data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
-opentsdbhttp.maxInsertRequestSize size
The maximum size of OpenTSDB HTTP put request
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
-opentsdbhttpTrimTimestamp duration
Trim timestamps for OpenTSDB HTTP data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
-pprofAuthKey string
Auth key for /debug/pprof. It overrides httpAuth settings
-precisionBits int
The number of precision bits to store per each value. Lower precision bits improves data compression at the cost of precision loss (default 64)
-promscrape.cluster.memberNum int
The number of number in the cluster of scrapers. It must be an unique value in the range 0 ... promscrape.cluster.membersCount-1 across scrapers in the cluster
-promscrape.cluster.membersCount int
The number of members in a cluster of scrapers. Each member must have an unique -promscrape.cluster.memberNum in the range 0 ... promscrape.cluster.membersCount-1 . Each member then scrapes roughly 1/N of all the targets. By default cluster scraping is disabled, i.e. a single scraper scrapes all the targets
-promscrape.cluster.replicationFactor int
The number of members in the cluster, which scrape the same targets. If the replication factor is greater than 2, then the deduplication must be enabled at remote storage side. See https://victoriametrics.github.io/#deduplication (default 1)
-promscrape.config string
Optional path to Prometheus config file with 'scrape_configs' section containing targets to scrape. See https://victoriametrics.github.io/#how-to-scrape-prometheus-exporters-such-as-node-exporter for details
-promscrape.config.dryRun
Checks -promscrape.config file for errors and unsupported fields and then exits. Returns non-zero exit code on parsing errors and emits these errors to stderr. See also -promscrape.config.strictParse command-line flag. Pass -loggerLevel=ERROR if you don't need to see info messages in the output.
-promscrape.config.strictParse
Whether to allow only supported fields in -promscrape.config . By default unsupported fields are silently skipped
-promscrape.configCheckInterval duration
Interval for checking for changes in '-promscrape.config' file. By default the checking is disabled. Send SIGHUP signal in order to force config check for changes
-promscrape.consulSDCheckInterval duration
Interval for checking for changes in Consul. This works only if consul_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config for details (default 30s)
-promscrape.disableCompression
Whether to disable sending 'Accept-Encoding: gzip' request headers to all the scrape targets. This may reduce CPU usage on scrape targets at the cost of higher network bandwidth utilization. It is possible to set 'disable_compression: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
-promscrape.disableKeepAlive
Whether to disable HTTP keep-alive connections when scraping all the targets. This may be useful when targets has no support for HTTP keep-alive connection. It is possible to set 'disable_keepalive: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control. Note that disabling HTTP keep-alive may increase load on both vmagent and scrape targets
-promscrape.discovery.concurrency int
The maximum number of concurrent requests to Prometheus autodiscovery API (Consul, Kubernetes, etc.) (default 100)
-promscrape.discovery.concurrentWaitTime duration
The maximum duration for waiting to perform API requests if more than -promscrape.discovery.concurrency requests are simultaneously performed (default 1m0s)
-promscrape.dnsSDCheckInterval duration
Interval for checking for changes in dns. This works only if dns_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config for details (default 30s)
-promscrape.dockerswarmSDCheckInterval duration
Interval for checking for changes in dockerswarm. This works only if dockerswarm_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dockerswarm_sd_config for details (default 30s)
-promscrape.dropOriginalLabels
Whether to drop original labels for scrape targets at /targets and /api/v1/targets pages. This may be needed for reducing memory usage when original labels for big number of scrape targets occupy big amounts of memory. Note that this reduces debuggability for improper per-target relabeling configs
-promscrape.ec2SDCheckInterval duration
Interval for checking for changes in ec2. This works only if ec2_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ec2_sd_config for details (default 1m0s)
-promscrape.eurekaSDCheckInterval duration
Interval for checking for changes in eureka. This works only if eureka_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#eureka_sd_config for details (default 30s)
-promscrape.fileSDCheckInterval duration
Interval for checking for changes in 'file_sd_config'. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#file_sd_config for details (default 30s)
-promscrape.gceSDCheckInterval duration
Interval for checking for changes in gce. This works only if gce_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#gce_sd_config for details (default 1m0s)
-promscrape.kubernetes.apiServerTimeout duration
How frequently to reload the full state from Kuberntes API server (default 30m0s)
-promscrape.kubernetesSDCheckInterval duration
Interval for checking for changes in Kubernetes API server. This works only if kubernetes_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config for details (default 30s)
-promscrape.maxDroppedTargets int
The maximum number of droppedTargets shown at /api/v1/targets page. Increase this value if your setup drops more scrape targets during relabeling and you need investigating labels for all the dropped targets. Note that the increased number of tracked dropped targets may result in increased memory usage (default 1000)
-promscrape.maxScrapeSize size
The maximum size of scrape response in bytes to process from Prometheus targets. Bigger responses are rejected
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 16777216)
-promscrape.openstackSDCheckInterval duration
Interval for checking for changes in openstack API server. This works only if openstack_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#openstack_sd_config for details (default 30s)
-promscrape.streamParse
Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful for reducing memory usage when millions of metrics are exposed per each scrape target. It is posible to set 'stream_parse: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
-promscrape.suppressDuplicateScrapeTargetErrors
Whether to suppress 'duplicate scrape target' errors; see https://victoriametrics.github.io/vmagent.html#troubleshooting for details
-promscrape.suppressScrapeErrors
Whether to suppress scrape errors logging. The last error for each target is always available at '/targets' page even if scrape errors logging is suppressed
-relabelConfig string
Optional path to a file with relabeling rules, which are applied to all the ingested metrics. See https://victoriametrics.github.io/#relabeling for details
-retentionPeriod value
Data with timestamps outside the retentionPeriod is automatically deleted
The following optional suffixes are supported: h (hour), d (day), w (week), y (year). If suffix isn't set, then the duration is counted in months (default 1)
-search.cacheTimestampOffset duration
The maximum duration since the current time for response data, which is always queried from the original raw data, without using the response cache. Increase this value if you see gaps in responses due to time synchronization issues between VictoriaMetrics and data sources (default 5m0s)
-search.disableCache
Whether to disable response caching. This may be useful during data backfilling
-search.latencyOffset duration
The time when data points become visible in query results after the collection. Too small value can result in incomplete last points for query results (default 30s)
-search.logSlowQueryDuration duration
Log queries with execution time exceeding this value. Zero disables slow query logging (default 5s)
-search.maxConcurrentRequests int
The maximum number of concurrent search requests. It shouldn't be high, since a single request can saturate all the CPU cores. See also -search.maxQueueDuration (default 8)
-search.maxExportDuration duration
The maximum duration for /api/v1/export call (default 720h0m0s)
-search.maxLookback duration
Synonim to -search.lookback-delta from Prometheus. The value is dynamically detected from interval between time series datapoints if not set. It can be overridden on per-query basis via max_lookback arg. See also '-search.maxStalenessInterval' flag, which has the same meaining due to historical reasons
-search.maxPointsPerTimeseries int
The maximum points per a single timeseries returned from /api/v1/query_range. This option doesn't limit the number of scanned raw samples in the database. The main purpose of this option is to limit the number of per-series points returned to graphing UI such as Grafana. There is no sense in setting this limit to values significantly exceeding horizontal resoultion of the graph (default 30000)
-search.maxQueryDuration duration
The maximum duration for query execution (default 30s)
-search.maxQueryLen size
The maximum search query length in bytes
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 16384)
-search.maxQueueDuration duration
The maximum time the request waits for execution when -search.maxConcurrentRequests limit is reached; see also -search.maxQueryDuration (default 10s)
-search.maxStalenessInterval duration
The maximum interval for staleness calculations. By default it is automatically calculated from the median interval between samples. This flag could be useful for tuning Prometheus data model closer to Influx-style data model. See https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness for details. See also '-search.maxLookback' flag, which has the same meaning due to historical reasons
-search.maxStepForPointsAdjustment duration
The maximum step when /api/v1/query_range handler adjusts points with timestamps closer than -search.latencyOffset to the current time. The adjustment is needed because such points may contain incomplete data (default 1m0s)
-search.maxTagKeys int
The maximum number of tag keys returned from /api/v1/labels (default 100000)
-search.maxTagValueSuffixesPerSearch int
The maximum number of tag value suffixes returned from /metrics/find (default 100000)
-search.maxTagValues int
The maximum number of tag values returned from /api/v1/label/<label_name>/values (default 100000)
-search.maxUniqueTimeseries int
The maximum number of unique time series each search can scan (default 300000)
-search.minStalenessInterval duration
The minimum interval for staleness calculations. This flag could be useful for removing gaps on graphs generated from time series with irregular intervals between samples. See also '-search.maxStalenessInterval'
-search.queryStats.lastQueriesCount int
Query stats for /api/v1/status/top_queries is tracked on this number of last queries. Zero value disables query stats tracking (default 20000)
-search.queryStats.minQueryDuration int
The minimum duration for queries to track in query stats at /api/v1/status/top_queries. Queries with lower duration are ignored in query stats
-search.resetCacheAuthKey string
Optional authKey for resetting rollup cache via /internal/resetRollupResultCache call
-search.treatDotsAsIsInRegexps
Whether to treat dots as is in regexp label filters used in queries. For example, foo{bar=~"a.b.c"} will be automatically converted to foo{bar=~"a\\.b\\.c"}, i.e. all the dots in regexp filters will be automatically escaped in order to match only dot char instead of matching any char. Dots in ".+", ".*" and ".{n}" regexps aren't escaped. This option is DEPRECATED in favor of {__graphite__="a.*.c"} syntax for selecting metrics matching the given Graphite metrics filter
-selfScrapeInstance string
Value for 'instance' label, which is added to self-scraped metrics (default "self")
-selfScrapeInterval duration
Interval for self-scraping own metrics at /metrics page
-selfScrapeJob string
Value for 'job' label, which is added to self-scraped metrics (default "victoria-metrics")
-smallMergeConcurrency int
The maximum number of CPU cores to use for small merges. Default value is used if set to 0
-snapshotAuthKey string
authKey, which must be passed in query string to /snapshot* pages
-storageDataPath string
Path to storage data (default "victoria-metrics-data")
-tls
Whether to enable TLS (aka HTTPS) for incoming requests. -tlsCertFile and -tlsKeyFile must be set if -tls is set
-tlsCertFile string
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs, since RSA certs are slow
-tlsKeyFile string
Path to file with TLS key. Used only if -tls is set
-version
Show VictoriaMetrics version
```

View file

@ -255,6 +255,41 @@ If each target is scraped by multiple `vmagent` instances, then data deduplicati
See [these docs](https://victoriametrics.github.io/#deduplication) for details. See [these docs](https://victoriametrics.github.io/#deduplication) for details.
## Scraping targets via a proxy
`vmagent` supports scraping targets via http and https proxies. Proxy address must be specified in `proxy_url` option. For example, the following scrape config instructs
target scraping via https proxy at `https://proxy-addr:1234`:
```yml
scrape_configs:
- job_name: foo
proxy_url: https://proxy-addr:1234
```
Proxy can be configured with the following optional settings:
* `proxy_bearer_token` and `proxy_bearer_token_file` for Bearer token authorization
* `proxy_basic_auth` for Basic authorization. See [these docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config).
* `proxy_tls_config` for TLS config. See [these docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#tls_config).
For example:
```yml
scrape_configs:
- job_name: foo
proxy_url: https://proxy-addr:1234
proxy_basic_auth:
username: foobar
password: secret
proxy_tls_config:
insecure_skip_verify: true
cert_file: /path/to/cert
key_file: /path/to/key
ca_file: /path/to/ca
server_name: real-server-name
```
## Monitoring ## Monitoring
`vmagent` exports various metrics in Prometheus exposition format at `http://vmagent-host:8429/metrics` page. We recommend setting up regular scraping of this page `vmagent` exports various metrics in Prometheus exposition format at `http://vmagent-host:8429/metrics` page. We recommend setting up regular scraping of this page
@ -477,13 +512,16 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
-httpListenAddr string -httpListenAddr string
TCP address to listen for http connections. Set this flag to empty value in order to disable listening on any port. This mode may be useful for running multiple vmagent instances on the same server. Note that /targets and /metrics pages aren't available if -httpListenAddr='' (default ":8429") TCP address to listen for http connections. Set this flag to empty value in order to disable listening on any port. This mode may be useful for running multiple vmagent instances on the same server. Note that /targets and /metrics pages aren't available if -httpListenAddr='' (default ":8429")
-import.maxLineLen max_rows_per_line -import.maxLineLen size
The maximum length in bytes of a single line accepted by /api/v1/import; the line length can be limited with max_rows_per_line query arg passed to /api/v1/export The maximum length in bytes of a single line accepted by /api/v1/import; the line length can be limited with 'max_rows_per_line' query arg passed to /api/v1/export
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 104857600) Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 104857600)
-influx.maxLineSize value -influx.databaseNames array
Comma-separated list of database names to return from /query and /influx/query API. This can be needed for accepting data from Telegraf plugins such as https://github.com/fangli/fluent-plugin-influxdb
Supports array of values separated by comma or specified via multiple flags.
-influx.maxLineSize size
The maximum size in bytes for a single Influx line during parsing The maximum size in bytes for a single Influx line during parsing
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 262144) Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 262144)
-influxListenAddr http://<vmagent>:8429/write -influxListenAddr string
TCP and UDP address to listen for Influx line protocol data. Usually :8189 must be set. Doesn't work if empty. This flag isn't needed when ingesting data over HTTP - just send it to http://<vmagent>:8429/write TCP and UDP address to listen for Influx line protocol data. Usually :8189 must be set. Doesn't work if empty. This flag isn't needed when ingesting data over HTTP - just send it to http://<vmagent>:8429/write
-influxMeasurementFieldSeparator string -influxMeasurementFieldSeparator string
Separator for '{measurement}{separator}{field_name}' metric name when inserted via Influx line protocol (default "_") Separator for '{measurement}{separator}{field_name}' metric name when inserted via Influx line protocol (default "_")
@ -511,12 +549,12 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
-maxConcurrentInserts int -maxConcurrentInserts int
The maximum number of concurrent inserts. Default value should work for most cases, since it minimizes the overhead for concurrent inserts. This option is tigthly coupled with -insert.maxQueueDuration (default 16) The maximum number of concurrent inserts. Default value should work for most cases, since it minimizes the overhead for concurrent inserts. This option is tigthly coupled with -insert.maxQueueDuration (default 16)
-maxInsertRequestSize value -maxInsertRequestSize size
The maximum size in bytes of a single Prometheus remote_write API request The maximum size in bytes of a single Prometheus remote_write API request
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 33554432) Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
-memory.allowedBytes value -memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0) Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float -memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60) Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
-metricsAuthKey string -metricsAuthKey string
@ -527,9 +565,9 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
TCP and UDP address to listen for OpentTSDB metrics. Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. Usually :4242 must be set. Doesn't work if empty TCP and UDP address to listen for OpentTSDB metrics. Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. Usually :4242 must be set. Doesn't work if empty
-opentsdbTrimTimestamp duration -opentsdbTrimTimestamp duration
Trim timestamps for OpenTSDB 'telnet put' data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s) Trim timestamps for OpenTSDB 'telnet put' data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
-opentsdbhttp.maxInsertRequestSize value -opentsdbhttp.maxInsertRequestSize size
The maximum size of OpenTSDB HTTP put request The maximum size of OpenTSDB HTTP put request
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 33554432) Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
-opentsdbhttpTrimTimestamp duration -opentsdbhttpTrimTimestamp duration
Trim timestamps for OpenTSDB HTTP data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms) Trim timestamps for OpenTSDB HTTP data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
-pprofAuthKey string -pprofAuthKey string
@ -538,6 +576,8 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
The number of number in the cluster of scrapers. It must be an unique value in the range 0 ... promscrape.cluster.membersCount-1 across scrapers in the cluster The number of number in the cluster of scrapers. It must be an unique value in the range 0 ... promscrape.cluster.membersCount-1 across scrapers in the cluster
-promscrape.cluster.membersCount int -promscrape.cluster.membersCount int
The number of members in a cluster of scrapers. Each member must have an unique -promscrape.cluster.memberNum in the range 0 ... promscrape.cluster.membersCount-1 . Each member then scrapes roughly 1/N of all the targets. By default cluster scraping is disabled, i.e. a single scraper scrapes all the targets The number of members in a cluster of scrapers. Each member must have an unique -promscrape.cluster.memberNum in the range 0 ... promscrape.cluster.membersCount-1 . Each member then scrapes roughly 1/N of all the targets. By default cluster scraping is disabled, i.e. a single scraper scrapes all the targets
-promscrape.cluster.replicationFactor int
The number of members in the cluster, which scrape the same targets. If the replication factor is greater than 2, then the deduplication must be enabled at remote storage side. See https://victoriametrics.github.io/#deduplication (default 1)
-promscrape.config string -promscrape.config string
Optional path to Prometheus config file with 'scrape_configs' section containing targets to scrape. See https://victoriametrics.github.io/#how-to-scrape-prometheus-exporters-such-as-node-exporter for details Optional path to Prometheus config file with 'scrape_configs' section containing targets to scrape. See https://victoriametrics.github.io/#how-to-scrape-prometheus-exporters-such-as-node-exporter for details
-promscrape.config.dryRun -promscrape.config.dryRun
@ -546,45 +586,45 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
Whether to allow only supported fields in -promscrape.config . By default unsupported fields are silently skipped Whether to allow only supported fields in -promscrape.config . By default unsupported fields are silently skipped
-promscrape.configCheckInterval duration -promscrape.configCheckInterval duration
Interval for checking for changes in '-promscrape.config' file. By default the checking is disabled. Send SIGHUP signal in order to force config check for changes Interval for checking for changes in '-promscrape.config' file. By default the checking is disabled. Send SIGHUP signal in order to force config check for changes
-promscrape.consulSDCheckInterval consul_sd_configs -promscrape.consulSDCheckInterval duration
Interval for checking for changes in Consul. This works only if consul_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config for details (default 30s) Interval for checking for changes in Consul. This works only if consul_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config for details (default 30s)
-promscrape.disableCompression -promscrape.disableCompression
Whether to disable sending 'Accept-Encoding: gzip' request headers to all the scrape targets. This may reduce CPU usage on scrape targets at the cost of higher network bandwidth utilization. It is possible to set 'disable_compression: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control Whether to disable sending 'Accept-Encoding: gzip' request headers to all the scrape targets. This may reduce CPU usage on scrape targets at the cost of higher network bandwidth utilization. It is possible to set 'disable_compression: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
-promscrape.disableKeepAlive disable_keepalive: true -promscrape.disableKeepAlive
Whether to disable HTTP keep-alive connections when scraping all the targets. This may be useful when targets has no support for HTTP keep-alive connection. It is possible to set disable_keepalive: true individually per each 'scrape_config` section in '-promscrape.config' for fine grained control. Note that disabling HTTP keep-alive may increase load on both vmagent and scrape targets Whether to disable HTTP keep-alive connections when scraping all the targets. This may be useful when targets has no support for HTTP keep-alive connection. It is possible to set 'disable_keepalive: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control. Note that disabling HTTP keep-alive may increase load on both vmagent and scrape targets
-promscrape.discovery.concurrency int -promscrape.discovery.concurrency int
The maximum number of concurrent requests to Prometheus autodiscovery API (Consul, Kubernetes, etc.) (default 100) The maximum number of concurrent requests to Prometheus autodiscovery API (Consul, Kubernetes, etc.) (default 100)
-promscrape.discovery.concurrentWaitTime duration -promscrape.discovery.concurrentWaitTime duration
The maximum duration for waiting to perform API requests if more than -promscrape.discovery.concurrency requests are simultaneously performed (default 1m0s) The maximum duration for waiting to perform API requests if more than -promscrape.discovery.concurrency requests are simultaneously performed (default 1m0s)
-promscrape.dnsSDCheckInterval dns_sd_configs -promscrape.dnsSDCheckInterval duration
Interval for checking for changes in dns. This works only if dns_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config for details (default 30s) Interval for checking for changes in dns. This works only if dns_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config for details (default 30s)
-promscrape.dockerswarmSDCheckInterval dockerswarm_sd_configs -promscrape.dockerswarmSDCheckInterval duration
Interval for checking for changes in dockerswarm. This works only if dockerswarm_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dockerswarm_sd_config for details (default 30s) Interval for checking for changes in dockerswarm. This works only if dockerswarm_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dockerswarm_sd_config for details (default 30s)
-promscrape.dropOriginalLabels -promscrape.dropOriginalLabels
Whether to drop original labels for scrape targets at /targets and /api/v1/targets pages. This may be needed for reducing memory usage when original labels for big number of scrape targets occupy big amounts of memory. Note that this reduces debuggability for improper per-target relabeling configs Whether to drop original labels for scrape targets at /targets and /api/v1/targets pages. This may be needed for reducing memory usage when original labels for big number of scrape targets occupy big amounts of memory. Note that this reduces debuggability for improper per-target relabeling configs
-promscrape.ec2SDCheckInterval ec2_sd_configs -promscrape.ec2SDCheckInterval duration
Interval for checking for changes in ec2. This works only if ec2_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ec2_sd_config for details (default 1m0s) Interval for checking for changes in ec2. This works only if ec2_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ec2_sd_config for details (default 1m0s)
-promscrape.eurekaSDCheckInterval eureka_sd_configs -promscrape.eurekaSDCheckInterval duration
Interval for checking for changes in eureka. This works only if eureka_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#eureka_sd_config for details (default 30s) Interval for checking for changes in eureka. This works only if eureka_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#eureka_sd_config for details (default 30s)
-promscrape.fileSDCheckInterval duration -promscrape.fileSDCheckInterval duration
Interval for checking for changes in 'file_sd_config'. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#file_sd_config for details (default 30s) Interval for checking for changes in 'file_sd_config'. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#file_sd_config for details (default 30s)
-promscrape.gceSDCheckInterval gce_sd_configs -promscrape.gceSDCheckInterval duration
Interval for checking for changes in gce. This works only if gce_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#gce_sd_config for details (default 1m0s) Interval for checking for changes in gce. This works only if gce_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#gce_sd_config for details (default 1m0s)
-promscrape.kubernetes.apiServerTimeout duration -promscrape.kubernetes.apiServerTimeout duration
How frequently to reload the full state from Kuberntes API server (default 10m0s) How frequently to reload the full state from Kuberntes API server (default 30m0s)
-promscrape.kubernetesSDCheckInterval kubernetes_sd_configs -promscrape.kubernetesSDCheckInterval duration
Interval for checking for changes in Kubernetes API server. This works only if kubernetes_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config for details (default 30s) Interval for checking for changes in Kubernetes API server. This works only if kubernetes_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config for details (default 30s)
-promscrape.maxDroppedTargets droppedTargets -promscrape.maxDroppedTargets int
The maximum number of droppedTargets shown at /api/v1/targets page. Increase this value if your setup drops more scrape targets during relabeling and you need investigating labels for all the dropped targets. Note that the increased number of tracked dropped targets may result in increased memory usage (default 1000) The maximum number of droppedTargets shown at /api/v1/targets page. Increase this value if your setup drops more scrape targets during relabeling and you need investigating labels for all the dropped targets. Note that the increased number of tracked dropped targets may result in increased memory usage (default 1000)
-promscrape.maxScrapeSize value -promscrape.maxScrapeSize size
The maximum size of scrape response in bytes to process from Prometheus targets. Bigger responses are rejected The maximum size of scrape response in bytes to process from Prometheus targets. Bigger responses are rejected
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 16777216) Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 16777216)
-promscrape.openstackSDCheckInterval openstack_sd_configs -promscrape.openstackSDCheckInterval duration
Interval for checking for changes in openstack API server. This works only if openstack_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#openstack_sd_config for details (default 30s) Interval for checking for changes in openstack API server. This works only if openstack_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#openstack_sd_config for details (default 30s)
-promscrape.streamParse stream_parse: true -promscrape.streamParse
Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful for reducing memory usage when millions of metrics are exposed per each scrape target. It is posible to set stream_parse: true individually per each `scrape_config` section in `-promscrape.config` for fine grained control Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful for reducing memory usage when millions of metrics are exposed per each scrape target. It is posible to set 'stream_parse: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
-promscrape.suppressDuplicateScrapeTargetErrors duplicate scrape target -promscrape.suppressDuplicateScrapeTargetErrors
Whether to suppress duplicate scrape target errors; see https://victoriametrics.github.io/vmagent.html#troubleshooting for details Whether to suppress 'duplicate scrape target' errors; see https://victoriametrics.github.io/vmagent.html#troubleshooting for details
-promscrape.suppressScrapeErrors -promscrape.suppressScrapeErrors
Whether to suppress scrape errors logging. The last error for each target is always available at '/targets' page even if scrape errors logging is suppressed Whether to suppress scrape errors logging. The last error for each target is always available at '/targets' page even if scrape errors logging is suppressed
-remoteWrite.basicAuth.password array -remoteWrite.basicAuth.password array
@ -601,12 +641,12 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
-remoteWrite.label array -remoteWrite.label array
Optional label in the form 'name=value' to add to all the metrics before sending them to -remoteWrite.url. Pass multiple -remoteWrite.label flags in order to add multiple flags to metrics before sending them to remote storage Optional label in the form 'name=value' to add to all the metrics before sending them to -remoteWrite.url. Pass multiple -remoteWrite.label flags in order to add multiple flags to metrics before sending them to remote storage
Supports array of values separated by comma or specified via multiple flags. Supports array of values separated by comma or specified via multiple flags.
-remoteWrite.maxBlockSize value -remoteWrite.maxBlockSize size
The maximum size in bytes of unpacked request to send to remote storage. It shouldn't exceed -maxInsertRequestSize from VictoriaMetrics The maximum size in bytes of unpacked request to send to remote storage. It shouldn't exceed -maxInsertRequestSize from VictoriaMetrics
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 8388608) Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 8388608)
-remoteWrite.maxDiskUsagePerURL value -remoteWrite.maxDiskUsagePerURL size
The maximum file-based buffer size in bytes at -remoteWrite.tmpDataPath for each -remoteWrite.url. When buffer size reaches the configured maximum, then old data is dropped when adding new data to the buffer. Buffered data is stored in ~500MB chunks, so the minimum practical value for this flag is 500000000. Disk usage is unlimited if the value is set to 0 The maximum file-based buffer size in bytes at -remoteWrite.tmpDataPath for each -remoteWrite.url. When buffer size reaches the configured maximum, then old data is dropped when adding new data to the buffer. Buffered data is stored in ~500MB chunks, so the minimum practical value for this flag is 500000000. Disk usage is unlimited if the value is set to 0
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0) Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-remoteWrite.proxyURL array -remoteWrite.proxyURL array
Optional proxy URL for writing data to -remoteWrite.url. Supported proxies: http, https, socks5. Example: -remoteWrite.proxyURL=socks5://proxy:1234 Optional proxy URL for writing data to -remoteWrite.url. Supported proxies: http, https, socks5. Example: -remoteWrite.proxyURL=socks5://proxy:1234
Supports array of values separated by comma or specified via multiple flags. Supports array of values separated by comma or specified via multiple flags.

View file

@ -23,6 +23,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envflag" "github.com/VictoriaMetrics/VictoriaMetrics/lib/envflag"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil" "github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver" "github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/influxutils"
graphiteserver "github.com/VictoriaMetrics/VictoriaMetrics/lib/ingestserver/graphite" graphiteserver "github.com/VictoriaMetrics/VictoriaMetrics/lib/ingestserver/graphite"
influxserver "github.com/VictoriaMetrics/VictoriaMetrics/lib/ingestserver/influx" influxserver "github.com/VictoriaMetrics/VictoriaMetrics/lib/ingestserver/influx"
opentsdbserver "github.com/VictoriaMetrics/VictoriaMetrics/lib/ingestserver/opentsdb" opentsdbserver "github.com/VictoriaMetrics/VictoriaMetrics/lib/ingestserver/opentsdb"
@ -40,7 +41,7 @@ var (
"Set this flag to empty value in order to disable listening on any port. This mode may be useful for running multiple vmagent instances on the same server. "+ "Set this flag to empty value in order to disable listening on any port. This mode may be useful for running multiple vmagent instances on the same server. "+
"Note that /targets and /metrics pages aren't available if -httpListenAddr=''") "Note that /targets and /metrics pages aren't available if -httpListenAddr=''")
influxListenAddr = flag.String("influxListenAddr", "", "TCP and UDP address to listen for Influx line protocol data. Usually :8189 must be set. Doesn't work if empty. "+ influxListenAddr = flag.String("influxListenAddr", "", "TCP and UDP address to listen for Influx line protocol data. Usually :8189 must be set. Doesn't work if empty. "+
"This flag isn't needed when ingesting data over HTTP - just send it to `http://<vmagent>:8429/write`") "This flag isn't needed when ingesting data over HTTP - just send it to http://<vmagent>:8429/write")
graphiteListenAddr = flag.String("graphiteListenAddr", "", "TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty") graphiteListenAddr = flag.String("graphiteListenAddr", "", "TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty")
opentsdbListenAddr = flag.String("opentsdbListenAddr", "", "TCP and UDP address to listen for OpentTSDB metrics. "+ opentsdbListenAddr = flag.String("opentsdbListenAddr", "", "TCP and UDP address to listen for OpentTSDB metrics. "+
"Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. "+ "Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. "+
@ -204,10 +205,8 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
w.WriteHeader(http.StatusNoContent) w.WriteHeader(http.StatusNoContent)
return true return true
case "/query": case "/query":
// Emulate fake response for influx query.
// This is required for TSBS benchmark.
influxQueryRequests.Inc() influxQueryRequests.Inc()
fmt.Fprintf(w, `{"results":[{"series":[{"values":[]}]}]}`) influxutils.WriteDatabaseNames(w)
return true return true
case "/targets": case "/targets":
promscrapeTargetsRequests.Inc() promscrapeTargetsRequests.Inc()

View file

@ -16,7 +16,7 @@ rules against configured address.
* Lightweight without extra dependencies. * Lightweight without extra dependencies.
### Limitations: ### Limitations:
* `vmalert` execute queries against remote datasource which has reliability risks because of network. * `vmalert` execute queries against remote datasource which has reliability risks because of network.
It is recommended to configure alerts thresholds and rules expressions with understanding that network request It is recommended to configure alerts thresholds and rules expressions with understanding that network request
may fail; may fail;
* by default, rules execution is sequential within one group, but persisting of execution results to remote * by default, rules execution is sequential within one group, but persisting of execution results to remote
@ -37,7 +37,7 @@ The build binary will be placed to `VictoriaMetrics/bin` folder.
To start using `vmalert` you will need the following things: To start using `vmalert` you will need the following things:
* list of rules - PromQL/MetricsQL expressions to execute; * list of rules - PromQL/MetricsQL expressions to execute;
* datasource address - reachable VictoriaMetrics instance for rules execution; * datasource address - reachable VictoriaMetrics instance for rules execution;
* notifier address - reachable [Alert Manager](https://github.com/prometheus/alertmanager) instance for processing, * notifier address - reachable [Alert Manager](https://github.com/prometheus/alertmanager) instance for processing,
aggregating alerts and sending notifications. aggregating alerts and sending notifications.
* remote write address - [remote write](https://prometheus.io/docs/prometheus/latest/storage/#remote-storage-integrations) * remote write address - [remote write](https://prometheus.io/docs/prometheus/latest/storage/#remote-storage-integrations)
compatible storage address for storing recording rules results and alerts state in for of timeseries. This is optional. compatible storage address for storing recording rules results and alerts state in for of timeseries. This is optional.
@ -56,11 +56,11 @@ Then configure `vmalert` accordingly:
``` ```
If you run multiple `vmalert` services for the same datastore or AlertManager - do not forget If you run multiple `vmalert` services for the same datastore or AlertManager - do not forget
to specify different `external.label` flags in order to define which `vmalert` generated rules or alerts. to specify different `external.label` flags in order to define which `vmalert` generated rules or alerts.
Configuration for [recording](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/) Configuration for [recording](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/)
and [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) rules is very and [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) rules is very
similar to Prometheus rules and configured using YAML. Configuration examples may be found similar to Prometheus rules and configured using YAML. Configuration examples may be found
in [testdata](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmalert/config/testdata) folder. in [testdata](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmalert/config/testdata) folder.
Every `rule` belongs to `group` and every configuration file may contain arbitrary number of groups: Every `rule` belongs to `group` and every configuration file may contain arbitrary number of groups:
```yaml ```yaml
@ -79,7 +79,7 @@ name: <string>
[ interval: <duration> | default = global.evaluation_interval ] [ interval: <duration> | default = global.evaluation_interval ]
# How many rules execute at once. Increasing concurrency may speed # How many rules execute at once. Increasing concurrency may speed
# up round execution speed. # up round execution speed.
[ concurrency: <integer> | default = 1 ] [ concurrency: <integer> | default = 1 ]
# Optional type for expressions inside the rules. Supported values: "graphite" and "prometheus". # Optional type for expressions inside the rules. Supported values: "graphite" and "prometheus".
@ -93,15 +93,15 @@ rules:
#### Rules #### Rules
There are two types of Rules: There are two types of Rules:
* [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) - * [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) -
Alerting rules allows to define alert conditions via [MetricsQL](https://victoriametrics.github.io/MetricsQL.html) Alerting rules allows to define alert conditions via [MetricsQL](https://victoriametrics.github.io/MetricsQL.html)
and to send notifications about firing alerts to [Alertmanager](https://github.com/prometheus/alertmanager). and to send notifications about firing alerts to [Alertmanager](https://github.com/prometheus/alertmanager).
* [recording](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/) - * [recording](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/) -
Recording rules allow you to precompute frequently needed or computationally expensive expressions Recording rules allow you to precompute frequently needed or computationally expensive expressions
and save their result as a new set of time series. and save their result as a new set of time series.
`vmalert` forbids to define duplicates - rules with the same combination of name, expression and labels `vmalert` forbids to define duplicates - rules with the same combination of name, expression and labels
within one group. within one group.
##### Alerting rules ##### Alerting rules
@ -130,7 +130,7 @@ labels:
# Annotations to add to each alert. # Annotations to add to each alert.
annotations: annotations:
[ <labelname>: <tmpl_string> ] [ <labelname>: <tmpl_string> ]
``` ```
##### Recording rules ##### Recording rules
@ -158,17 +158,17 @@ For recording rules to work `-remoteWrite.url` must specified.
#### Alerts state on restarts #### Alerts state on restarts
`vmalert` has no local storage, so alerts state is stored in the process memory. Hence, after reloading of `vmalert` `vmalert` has no local storage, so alerts state is stored in the process memory. Hence, after reloading of `vmalert`
the process alerts state will be lost. To avoid this situation, `vmalert` should be configured via the following flags: the process alerts state will be lost. To avoid this situation, `vmalert` should be configured via the following flags:
* `-remoteWrite.url` - URL to VictoriaMetrics (Single) or VMInsert (Cluster). `vmalert` will persist alerts state * `-remoteWrite.url` - URL to VictoriaMetrics (Single) or VMInsert (Cluster). `vmalert` will persist alerts state
into the configured address in the form of time series named `ALERTS` and `ALERTS_FOR_STATE` via remote-write protocol. into the configured address in the form of time series named `ALERTS` and `ALERTS_FOR_STATE` via remote-write protocol.
These are regular time series and may be queried from VM just as any other time series. These are regular time series and may be queried from VM just as any other time series.
The state stored to the configured address on every rule evaluation. The state stored to the configured address on every rule evaluation.
* `-remoteRead.url` - URL to VictoriaMetrics (Single) or VMSelect (Cluster). `vmalert` will try to restore alerts state * `-remoteRead.url` - URL to VictoriaMetrics (Single) or VMSelect (Cluster). `vmalert` will try to restore alerts state
from configured address by querying time series with name `ALERTS_FOR_STATE`. from configured address by querying time series with name `ALERTS_FOR_STATE`.
Both flags are required for the proper state restoring. Restore process may fail if time series are missing Both flags are required for the proper state restoring. Restore process may fail if time series are missing
in configured `-remoteRead.url`, weren't updated in the last `1h` or received state doesn't match current `vmalert` in configured `-remoteRead.url`, weren't updated in the last `1h` or received state doesn't match current `vmalert`
rules configuration. rules configuration.
@ -232,7 +232,7 @@ The shortlist of configuration flags is the following:
How often to evaluate the rules (default 1m0s) How often to evaluate the rules (default 1m0s)
-external.alert.source string -external.alert.source string
External Alert Source allows to override the Source link for alerts sent to AlertManager for cases where you want to build a custom link to Grafana, Prometheus or any other service. External Alert Source allows to override the Source link for alerts sent to AlertManager for cases where you want to build a custom link to Grafana, Prometheus or any other service.
eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{{$expr|quotesEscape|crlfEscape|pathEscape}}\"},{\"mode\":\"Metrics\"},{\"ui\":[true,true,true,\"none\"]}]'.If empty '/api/v1/:groupID/alertID/status' is used eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{{$expr|quotesEscape|crlfEscape|queryEscape}}\"},{\"mode\":\"Metrics\"},{\"ui\":[true,true,true,\"none\"]}]'.If empty '/api/v1/:groupID/alertID/status' is used
-external.label array -external.label array
Optional label in the form 'name=value' to add to all generated recording rules and alerts. Pass multiple -label flags in order to add multiple label sets. Optional label in the form 'name=value' to add to all generated recording rules and alerts. Pass multiple -label flags in order to add multiple label sets.
Supports array of values separated by comma or specified via multiple flags. Supports array of values separated by comma or specified via multiple flags.
@ -272,9 +272,9 @@ The shortlist of configuration flags is the following:
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC") Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int -loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
-memory.allowedBytes value -memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0) Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float -memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60) Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
-metricsAuthKey string -metricsAuthKey string
@ -348,11 +348,11 @@ The shortlist of configuration flags is the following:
-remoteWrite.url string -remoteWrite.url string
Optional URL to Victoria Metrics or VMInsert where to persist alerts state and recording rules results in form of timeseries. E.g. http://127.0.0.1:8428 Optional URL to Victoria Metrics or VMInsert where to persist alerts state and recording rules results in form of timeseries. E.g. http://127.0.0.1:8428
-rule array -rule array
Path to the file with alert rules. Path to the file with alert rules.
Supports patterns. Flag can be specified multiple times. Supports patterns. Flag can be specified multiple times.
Examples: Examples:
-rule="/path/to/file". Path to a single file with alerting rules -rule="/path/to/file". Path to a single file with alerting rules
-rule="dir/*.yaml" -rule="/*.yaml". Relative path to all .yaml files in "dir" folder, -rule="dir/*.yaml" -rule="/*.yaml". Relative path to all .yaml files in "dir" folder,
absolute path to all .yaml files in root. absolute path to all .yaml files in root.
Rule files may contain %{ENV_VAR} placeholders, which are substituted by the corresponding env vars. Rule files may contain %{ENV_VAR} placeholders, which are substituted by the corresponding env vars.
Supports array of values separated by comma or specified via multiple flags. Supports array of values separated by comma or specified via multiple flags.
@ -370,7 +370,7 @@ The shortlist of configuration flags is the following:
Show VictoriaMetrics version Show VictoriaMetrics version
``` ```
Pass `-help` to `vmalert` in order to see the full list of supported Pass `-help` to `vmalert` in order to see the full list of supported
command-line flags with their descriptions. command-line flags with their descriptions.
To reload configuration without `vmalert` restart send SIGHUP signal To reload configuration without `vmalert` restart send SIGHUP signal
@ -379,13 +379,13 @@ or send GET request to `/-/reload` endpoint.
### Contributing ### Contributing
`vmalert` is mostly designed and built by VictoriaMetrics community. `vmalert` is mostly designed and built by VictoriaMetrics community.
Feel free to share your experience and ideas for improving this Feel free to share your experience and ideas for improving this
software. Please keep simplicity as the main priority. software. Please keep simplicity as the main priority.
### How to build from sources ### How to build from sources
It is recommended using It is recommended using
[binary releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) [binary releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases)
- `vmalert` is located in `vmutils-*` archives there. - `vmalert` is located in `vmutils-*` archives there.

View file

@ -26,11 +26,11 @@ import (
) )
var ( var (
rulePath = flagutil.NewArray("rule", `Path to the file with alert rules. rulePath = flagutil.NewArray("rule", `Path to the file with alert rules.
Supports patterns. Flag can be specified multiple times. Supports patterns. Flag can be specified multiple times.
Examples: Examples:
-rule="/path/to/file". Path to a single file with alerting rules -rule="/path/to/file". Path to a single file with alerting rules
-rule="dir/*.yaml" -rule="/*.yaml". Relative path to all .yaml files in "dir" folder, -rule="dir/*.yaml" -rule="/*.yaml". Relative path to all .yaml files in "dir" folder,
absolute path to all .yaml files in root. absolute path to all .yaml files in root.
Rule files may contain %{ENV_VAR} placeholders, which are substituted by the corresponding env vars.`) Rule files may contain %{ENV_VAR} placeholders, which are substituted by the corresponding env vars.`)
@ -41,7 +41,7 @@ Rule files may contain %{ENV_VAR} placeholders, which are substituted by the cor
validateExpressions = flag.Bool("rule.validateExpressions", true, "Whether to validate rules expressions via MetricsQL engine") validateExpressions = flag.Bool("rule.validateExpressions", true, "Whether to validate rules expressions via MetricsQL engine")
externalURL = flag.String("external.url", "", "External URL is used as alert's source for sent alerts to the notifier") externalURL = flag.String("external.url", "", "External URL is used as alert's source for sent alerts to the notifier")
externalAlertSource = flag.String("external.alert.source", "", `External Alert Source allows to override the Source link for alerts sent to AlertManager for cases where you want to build a custom link to Grafana, Prometheus or any other service. externalAlertSource = flag.String("external.alert.source", "", `External Alert Source allows to override the Source link for alerts sent to AlertManager for cases where you want to build a custom link to Grafana, Prometheus or any other service.
eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{{$expr|quotesEscape|crlfEscape|pathEscape}}\"},{\"mode\":\"Metrics\"},{\"ui\":[true,true,true,\"none\"]}]'.If empty '/api/v1/:groupID/alertID/status' is used`) eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{{$expr|quotesEscape|crlfEscape|queryEscape}}\"},{\"mode\":\"Metrics\"},{\"ui\":[true,true,true,\"none\"]}]'.If empty '/api/v1/:groupID/alertID/status' is used`)
externalLabels = flagutil.NewArray("external.label", "Optional label in the form 'name=value' to add to all generated recording rules and alerts. "+ externalLabels = flagutil.NewArray("external.label", "Optional label in the form 'name=value' to add to all generated recording rules and alerts. "+
"Pass multiple -label flags in order to add multiple label sets.") "Pass multiple -label flags in order to add multiple label sets.")

View file

@ -208,9 +208,9 @@ See the docs at https://victoriametrics.github.io/vmauth.html .
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC") Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int -loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
-memory.allowedBytes value -memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0) Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float -memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60) Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
-metricsAuthKey string -metricsAuthKey string

View file

@ -205,12 +205,12 @@ See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time-
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC") Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int -loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
-maxBytesPerSecond value -maxBytesPerSecond size
The maximum upload speed. There is no limit if it is set to 0 The maximum upload speed. There is no limit if it is set to 0
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0) Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedBytes value -memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0) Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float -memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60) Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
-origin string -origin string

View file

@ -19,6 +19,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/relabel" "github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/relabel"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/vmimport" "github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/vmimport"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver" "github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/influxutils"
graphiteserver "github.com/VictoriaMetrics/VictoriaMetrics/lib/ingestserver/graphite" graphiteserver "github.com/VictoriaMetrics/VictoriaMetrics/lib/ingestserver/graphite"
influxserver "github.com/VictoriaMetrics/VictoriaMetrics/lib/ingestserver/influx" influxserver "github.com/VictoriaMetrics/VictoriaMetrics/lib/ingestserver/influx"
opentsdbserver "github.com/VictoriaMetrics/VictoriaMetrics/lib/ingestserver/opentsdb" opentsdbserver "github.com/VictoriaMetrics/VictoriaMetrics/lib/ingestserver/opentsdb"
@ -34,7 +35,7 @@ import (
var ( var (
graphiteListenAddr = flag.String("graphiteListenAddr", "", "TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty") graphiteListenAddr = flag.String("graphiteListenAddr", "", "TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty")
influxListenAddr = flag.String("influxListenAddr", "", "TCP and UDP address to listen for Influx line protocol data. Usually :8189 must be set. Doesn't work if empty. "+ influxListenAddr = flag.String("influxListenAddr", "", "TCP and UDP address to listen for Influx line protocol data. Usually :8189 must be set. Doesn't work if empty. "+
"This flag isn't needed when ingesting data over HTTP - just send it to `http://<victoriametrics>:8428/write`") "This flag isn't needed when ingesting data over HTTP - just send it to http://<victoriametrics>:8428/write")
opentsdbListenAddr = flag.String("opentsdbListenAddr", "", "TCP and UDP address to listen for OpentTSDB metrics. "+ opentsdbListenAddr = flag.String("opentsdbListenAddr", "", "TCP and UDP address to listen for OpentTSDB metrics. "+
"Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. "+ "Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. "+
"Usually :4242 must be set. Doesn't work if empty") "Usually :4242 must be set. Doesn't work if empty")
@ -147,10 +148,8 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
w.WriteHeader(http.StatusNoContent) w.WriteHeader(http.StatusNoContent)
return true return true
case "/influx/query", "/query": case "/influx/query", "/query":
// Emulate fake response for influx query.
// This is required for TSBS benchmark.
influxQueryRequests.Inc() influxQueryRequests.Inc()
fmt.Fprintf(w, `{"results":[{"series":[{"values":[]}]}]}`) influxutils.WriteDatabaseNames(w)
return true return true
case "/prometheus/targets", "/targets": case "/prometheus/targets", "/targets":
promscrapeTargetsRequests.Inc() promscrapeTargetsRequests.Inc()

View file

@ -105,12 +105,12 @@ i.e. the end result would be similar to [rsync --delete](https://askubuntu.com/q
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC") Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int -loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
-maxBytesPerSecond value -maxBytesPerSecond size
The maximum download speed. There is no limit if it is set to 0 The maximum download speed. There is no limit if it is set to 0
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0) Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedBytes value -memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0) Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float -memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60) Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
-skipBackupCompleteCheck -skipBackupCompleteCheck

View file

@ -968,6 +968,11 @@ func QueryHandler(startTime time.Time, w http.ResponseWriter, r *http.Request) e
start -= offset start -= offset
end := start end := start
start = end - window start = end - window
// Do not include data point with a timestamp matching the lower boundary of the window as Prometheus does.
start++
if end < start {
end = start
}
if err := exportHandler(w, []string{childQuery}, etf, start, end, "promapi", 0, false, deadline); err != nil { if err := exportHandler(w, []string{childQuery}, etf, start, end, "promapi", 0, false, deadline); err != nil {
return fmt.Errorf("error when exporting data for query=%q on the time range (start=%d, end=%d): %w", childQuery, start, end, err) return fmt.Errorf("error when exporting data for query=%q on the time range (start=%d, end=%d): %w", childQuery, start, end, err)
} }
@ -1017,6 +1022,7 @@ func QueryHandler(startTime time.Time, w http.ResponseWriter, r *http.Request) e
QuotedRemoteAddr: httpserver.GetQuotedRemoteAddr(r), QuotedRemoteAddr: httpserver.GetQuotedRemoteAddr(r),
Deadline: deadline, Deadline: deadline,
LookbackDelta: lookbackDelta, LookbackDelta: lookbackDelta,
RoundDigits: getRoundDigits(r),
EnforcedTagFilters: etf, EnforcedTagFilters: etf,
} }
result, err := promql.Exec(&ec, query, true) result, err := promql.Exec(&ec, query, true)
@ -1121,6 +1127,7 @@ func queryRangeHandler(startTime time.Time, w http.ResponseWriter, query string,
Deadline: deadline, Deadline: deadline,
MayCache: mayCache, MayCache: mayCache,
LookbackDelta: lookbackDelta, LookbackDelta: lookbackDelta,
RoundDigits: getRoundDigits(r),
EnforcedTagFilters: etf, EnforcedTagFilters: etf,
} }
result, err := promql.Exec(&ec, query, false) result, err := promql.Exec(&ec, query, false)
@ -1297,6 +1304,18 @@ func getMatchesFromRequest(r *http.Request) []string {
return matches return matches
} }
func getRoundDigits(r *http.Request) int {
s := r.FormValue("round_digits")
if len(s) == 0 {
return 100
}
n, err := strconv.Atoi(s)
if err != nil {
return 100
}
return n
}
func getLatencyOffsetMilliseconds() int64 { func getLatencyOffsetMilliseconds() int64 {
d := latencyOffset.Milliseconds() d := latencyOffset.Milliseconds()
if d <= 1000 { if d <= 1000 {

View file

@ -98,11 +98,14 @@ type EvalConfig struct {
// LookbackDelta is analog to `-query.lookback-delta` from Prometheus. // LookbackDelta is analog to `-query.lookback-delta` from Prometheus.
LookbackDelta int64 LookbackDelta int64
timestamps []int64 // How many decimal digits after the point to leave in response.
timestampsOnce sync.Once RoundDigits int
// EnforcedTagFilters used for apply additional label filters to query. // EnforcedTagFilters used for apply additional label filters to query.
EnforcedTagFilters []storage.TagFilter EnforcedTagFilters []storage.TagFilter
timestamps []int64
timestampsOnce sync.Once
} }
// newEvalConfig returns new EvalConfig copy from src. // newEvalConfig returns new EvalConfig copy from src.
@ -114,6 +117,7 @@ func newEvalConfig(src *EvalConfig) *EvalConfig {
ec.Deadline = src.Deadline ec.Deadline = src.Deadline
ec.MayCache = src.MayCache ec.MayCache = src.MayCache
ec.LookbackDelta = src.LookbackDelta ec.LookbackDelta = src.LookbackDelta
ec.RoundDigits = src.RoundDigits
ec.EnforcedTagFilters = src.EnforcedTagFilters ec.EnforcedTagFilters = src.EnforcedTagFilters
// do not copy src.timestamps - they must be generated again. // do not copy src.timestamps - they must be generated again.

View file

@ -12,6 +12,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/netstorage" "github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/netstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/querystats" "github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/querystats"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/decimal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger" "github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/metrics" "github.com/VictoriaMetrics/metrics"
"github.com/VictoriaMetrics/metricsql" "github.com/VictoriaMetrics/metricsql"
@ -72,6 +73,14 @@ func Exec(ec *EvalConfig, q string, isFirstPointOnly bool) ([]netstorage.Result,
if err != nil { if err != nil {
return nil, err return nil, err
} }
if n := ec.RoundDigits; n < 100 {
for i := range result {
values := result[i].Values
for j, v := range values {
values[j] = decimal.RoundToDecimalDigits(v, n)
}
}
}
return result, err return result, err
} }

View file

@ -57,10 +57,11 @@ func TestExecSuccess(t *testing.T) {
f := func(q string, resultExpected []netstorage.Result) { f := func(q string, resultExpected []netstorage.Result) {
t.Helper() t.Helper()
ec := &EvalConfig{ ec := &EvalConfig{
Start: start, Start: start,
End: end, End: end,
Step: step, Step: step,
Deadline: searchutils.NewDeadline(time.Now(), time.Minute, ""), Deadline: searchutils.NewDeadline(time.Now(), time.Minute, ""),
RoundDigits: 100,
} }
for i := 0; i < 5; i++ { for i := 0; i < 5; i++ {
result, err := Exec(ec, q, false) result, err := Exec(ec, q, false)
@ -3653,6 +3654,210 @@ func TestExecSuccess(t *testing.T) {
resultExpected := []netstorage.Result{r1, r2, r3, r4} resultExpected := []netstorage.Result{r1, r2, r3, r4}
f(q, resultExpected) f(q, resultExpected)
}) })
t.Run(`prometheus_buckets(overlapped ranges)`, func(t *testing.T) {
t.Parallel()
q := `sort(prometheus_buckets((
alias(label_set(90, "foo", "bar", "vmrange", "0...0"), "xxx"),
alias(label_set(time()/20, "foo", "bar", "vmrange", "0...0.2"), "xxx"),
alias(label_set(time()/20, "foo", "bar", "vmrange", "0.2...0.25"), "xxx"),
alias(label_set(time()/20, "foo", "bar", "vmrange", "0...0.26"), "xxx"),
alias(label_set(time()/100, "foo", "bar", "vmrange", "0.2...40"), "xxx"),
alias(label_set(time()/10, "foo", "bar", "vmrange", "40...Inf"), "xxx"),
)))`
r1 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{90, 90, 90, 90, 90, 90},
Timestamps: timestampsExpected,
}
r1.MetricName.MetricGroup = []byte("xxx")
r1.MetricName.Tags = []storage.Tag{
{
Key: []byte("foo"),
Value: []byte("bar"),
},
{
Key: []byte("le"),
Value: []byte("0"),
},
}
r2 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{140, 150, 160, 170, 180, 190},
Timestamps: timestampsExpected,
}
r2.MetricName.MetricGroup = []byte("xxx")
r2.MetricName.Tags = []storage.Tag{
{
Key: []byte("foo"),
Value: []byte("bar"),
},
{
Key: []byte("le"),
Value: []byte("0.2"),
},
}
r3 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{190, 210, 230, 250, 270, 290},
Timestamps: timestampsExpected,
}
r3.MetricName.MetricGroup = []byte("xxx")
r3.MetricName.Tags = []storage.Tag{
{
Key: []byte("foo"),
Value: []byte("bar"),
},
{
Key: []byte("le"),
Value: []byte("0.25"),
},
}
r4 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{240, 270, 300, 330, 360, 390},
Timestamps: timestampsExpected,
}
r4.MetricName.MetricGroup = []byte("xxx")
r4.MetricName.Tags = []storage.Tag{
{
Key: []byte("foo"),
Value: []byte("bar"),
},
{
Key: []byte("le"),
Value: []byte("0.26"),
},
}
r5 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{250, 282, 314, 346, 378, 410},
Timestamps: timestampsExpected,
}
r5.MetricName.MetricGroup = []byte("xxx")
r5.MetricName.Tags = []storage.Tag{
{
Key: []byte("foo"),
Value: []byte("bar"),
},
{
Key: []byte("le"),
Value: []byte("40"),
},
}
r6 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{350, 402, 454, 506, 558, 610},
Timestamps: timestampsExpected,
}
r6.MetricName.MetricGroup = []byte("xxx")
r6.MetricName.Tags = []storage.Tag{
{
Key: []byte("foo"),
Value: []byte("bar"),
},
{
Key: []byte("le"),
Value: []byte("Inf"),
},
}
resultExpected := []netstorage.Result{r1, r2, r3, r4, r5, r6}
f(q, resultExpected)
})
t.Run(`prometheus_buckets(overlapped ranges at the end)`, func(t *testing.T) {
t.Parallel()
q := `sort(prometheus_buckets((
alias(label_set(90, "foo", "bar", "vmrange", "0...0"), "xxx"),
alias(label_set(time()/20, "foo", "bar", "vmrange", "0...0.2"), "xxx"),
alias(label_set(time()/20, "foo", "bar", "vmrange", "0.2...0.25"), "xxx"),
alias(label_set(time()/20, "foo", "bar", "vmrange", "0...0.25"), "xxx"),
alias(label_set(time()/100, "foo", "bar", "vmrange", "0.2...40"), "xxx"),
alias(label_set(time()/10, "foo", "bar", "vmrange", "40...Inf"), "xxx"),
)))`
r1 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{90, 90, 90, 90, 90, 90},
Timestamps: timestampsExpected,
}
r1.MetricName.MetricGroup = []byte("xxx")
r1.MetricName.Tags = []storage.Tag{
{
Key: []byte("foo"),
Value: []byte("bar"),
},
{
Key: []byte("le"),
Value: []byte("0"),
},
}
r2 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{140, 150, 160, 170, 180, 190},
Timestamps: timestampsExpected,
}
r2.MetricName.MetricGroup = []byte("xxx")
r2.MetricName.Tags = []storage.Tag{
{
Key: []byte("foo"),
Value: []byte("bar"),
},
{
Key: []byte("le"),
Value: []byte("0.2"),
},
}
r3 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{190, 210, 230, 250, 270, 290},
Timestamps: timestampsExpected,
}
r3.MetricName.MetricGroup = []byte("xxx")
r3.MetricName.Tags = []storage.Tag{
{
Key: []byte("foo"),
Value: []byte("bar"),
},
{
Key: []byte("le"),
Value: []byte("0.25"),
},
}
r4 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{200, 222, 244, 266, 288, 310},
Timestamps: timestampsExpected,
}
r4.MetricName.MetricGroup = []byte("xxx")
r4.MetricName.Tags = []storage.Tag{
{
Key: []byte("foo"),
Value: []byte("bar"),
},
{
Key: []byte("le"),
Value: []byte("40"),
},
}
r5 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{300, 342, 384, 426, 468, 510},
Timestamps: timestampsExpected,
}
r5.MetricName.MetricGroup = []byte("xxx")
r5.MetricName.Tags = []storage.Tag{
{
Key: []byte("foo"),
Value: []byte("bar"),
},
{
Key: []byte("le"),
Value: []byte("Inf"),
},
}
resultExpected := []netstorage.Result{r1, r2, r3, r4, r5}
f(q, resultExpected)
})
t.Run(`median_over_time()`, func(t *testing.T) { t.Run(`median_over_time()`, func(t *testing.T) {
t.Parallel() t.Parallel()
q := `median_over_time({})` q := `median_over_time({})`
@ -6371,10 +6576,11 @@ func TestExecError(t *testing.T) {
f := func(q string) { f := func(q string) {
t.Helper() t.Helper()
ec := &EvalConfig{ ec := &EvalConfig{
Start: 1000, Start: 1000,
End: 2000, End: 2000,
Step: 100, Step: 100,
Deadline: searchutils.NewDeadline(time.Now(), time.Minute, ""), Deadline: searchutils.NewDeadline(time.Now(), time.Minute, ""),
RoundDigits: 100,
} }
for i := 0; i < 4; i++ { for i := 0; i < 4; i++ {
rv, err := Exec(ec, q, false) rv, err := Exec(ec, q, false)

View file

@ -538,6 +538,7 @@ func (rc *rollupConfig) doInternal(dstValues []float64, tsm *timeseriesMap, valu
// Do not drop trailing data points for queries, which return 2 or 1 point (aka instant queries). // Do not drop trailing data points for queries, which return 2 or 1 point (aka instant queries).
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/845 // See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/845
canDropLastSample := rc.CanDropLastSample && len(rc.Timestamps) > 2 canDropLastSample := rc.CanDropLastSample && len(rc.Timestamps) > 2
f := rc.Func
for _, tEnd := range rc.Timestamps { for _, tEnd := range rc.Timestamps {
tStart := tEnd - window tStart := tEnd - window
ni = seekFirstTimestampIdxAfter(timestamps[i:], tStart, ni) ni = seekFirstTimestampIdxAfter(timestamps[i:], tStart, ni)
@ -577,7 +578,7 @@ func (rc *rollupConfig) doInternal(dstValues []float64, tsm *timeseriesMap, valu
rfa.realNextValue = nan rfa.realNextValue = nan
} }
rfa.currTimestamp = tEnd rfa.currTimestamp = tEnd
value := rc.Func(rfa) value := f(rfa)
rfa.idx++ rfa.idx++
dstValues = append(dstValues, value) dstValues = append(dstValues, value)
} }
@ -643,12 +644,12 @@ func getScrapeInterval(timestamps []int64) int64 {
return int64(maxSilenceInterval) return int64(maxSilenceInterval)
} }
// Estimate scrape interval as 0.6 quantile for the first 100 intervals. // Estimate scrape interval as 0.6 quantile for the first 20 intervals.
h := histogram.GetFast() h := histogram.GetFast()
tsPrev := timestamps[0] tsPrev := timestamps[0]
timestamps = timestamps[1:] timestamps = timestamps[1:]
if len(timestamps) > 100 { if len(timestamps) > 20 {
timestamps = timestamps[:100] timestamps = timestamps[:20]
} }
for _, ts := range timestamps { for _, ts := range timestamps {
h.Update(float64(ts - tsPrev)) h.Update(float64(ts - tsPrev))

View file

@ -518,6 +518,7 @@ func vmrangeBucketsToLE(tss []*timeseries) []*timeseries {
sort.Slice(xss, func(i, j int) bool { return xss[i].end < xss[j].end }) sort.Slice(xss, func(i, j int) bool { return xss[i].end < xss[j].end })
xssNew := make([]x, 0, len(xss)+2) xssNew := make([]x, 0, len(xss)+2)
var xsPrev x var xsPrev x
uniqTs := make(map[string]*timeseries, len(xss))
for _, xs := range xss { for _, xs := range xss {
ts := xs.ts ts := xs.ts
if isZeroTS(ts) { if isZeroTS(ts) {
@ -525,7 +526,8 @@ func vmrangeBucketsToLE(tss []*timeseries) []*timeseries {
xsPrev = xs xsPrev = xs
continue continue
} }
if xs.start != xsPrev.end { if xs.start != xsPrev.end && uniqTs[xs.startStr] == nil {
uniqTs[xs.startStr] = xs.ts
xssNew = append(xssNew, x{ xssNew = append(xssNew, x{
endStr: xs.startStr, endStr: xs.startStr,
end: xs.start, end: xs.start,
@ -533,7 +535,14 @@ func vmrangeBucketsToLE(tss []*timeseries) []*timeseries {
}) })
} }
ts.MetricName.AddTag("le", xs.endStr) ts.MetricName.AddTag("le", xs.endStr)
xssNew = append(xssNew, xs) prevTs := uniqTs[xs.endStr]
if prevTs != nil {
// the end of the current bucket is not unique, need to merge it with the existing bucket.
mergeNonOverlappingTimeseries(prevTs, xs.ts)
} else {
xssNew = append(xssNew, xs)
uniqTs[xs.endStr] = xs.ts
}
xsPrev = xs xsPrev = xs
} }
if !math.IsInf(xsPrev.end, 1) { if !math.IsInf(xsPrev.end, 1) {

View file

@ -12,9 +12,9 @@ import (
) )
var ( var (
lastQueriesCount = flag.Int("search.queryStats.lastQueriesCount", 20000, "Query stats for `/api/v1/status/top_queries` is tracked on this number of last queries. "+ lastQueriesCount = flag.Int("search.queryStats.lastQueriesCount", 20000, "Query stats for /api/v1/status/top_queries is tracked on this number of last queries. "+
"Zero value disables query stats tracking") "Zero value disables query stats tracking")
minQueryDuration = flag.Duration("search.queryStats.minQueryDuration", 0, "The minimum duration for queries to track in query stats at `/api/v1/status/top_queries`. "+ minQueryDuration = flag.Duration("search.queryStats.minQueryDuration", 0, "The minimum duration for queries to track in query stats at /api/v1/status/top_queries. "+
"Queries with lower duration are ignored in query stats") "Queries with lower duration are ignored in query stats")
) )

View file

@ -37,6 +37,8 @@ var (
bigMergeConcurrency = flag.Int("bigMergeConcurrency", 0, "The maximum number of CPU cores to use for big merges. Default value is used if set to 0") bigMergeConcurrency = flag.Int("bigMergeConcurrency", 0, "The maximum number of CPU cores to use for big merges. Default value is used if set to 0")
smallMergeConcurrency = flag.Int("smallMergeConcurrency", 0, "The maximum number of CPU cores to use for small merges. Default value is used if set to 0") smallMergeConcurrency = flag.Int("smallMergeConcurrency", 0, "The maximum number of CPU cores to use for small merges. Default value is used if set to 0")
logNewSeries = flag.Bool("logNewSeries", false, "Whether to log new series. This option is for debug purposes only. It can lead to performance issues "+
"when big number of new series are ingested into VictoriaMetrics")
denyQueriesOutsideRetention = flag.Bool("denyQueriesOutsideRetention", false, "Whether to deny queries outside of the configured -retentionPeriod. "+ denyQueriesOutsideRetention = flag.Bool("denyQueriesOutsideRetention", false, "Whether to deny queries outside of the configured -retentionPeriod. "+
"When set, then /api/v1/query_range would return '503 Service Unavailable' error for queries with 'from' value outside -retentionPeriod. "+ "When set, then /api/v1/query_range would return '503 Service Unavailable' error for queries with 'from' value outside -retentionPeriod. "+
"This may be useful when multiple data sources with distinct retentions are hidden behind query-tee") "This may be useful when multiple data sources with distinct retentions are hidden behind query-tee")
@ -72,6 +74,7 @@ func InitWithoutMetrics(resetCacheIfNeeded func(mrs []storage.MetricRow)) {
} }
resetResponseCacheIfNeeded = resetCacheIfNeeded resetResponseCacheIfNeeded = resetCacheIfNeeded
storage.SetLogNewSeries(*logNewSeries)
storage.SetFinalMergeDelay(*finalMergeDelay) storage.SetFinalMergeDelay(*finalMergeDelay)
storage.SetBigMergeWorkersCount(*bigMergeConcurrency) storage.SetBigMergeWorkersCount(*bigMergeConcurrency)
storage.SetSmallMergeWorkersCount(*smallMergeConcurrency) storage.SetSmallMergeWorkersCount(*smallMergeConcurrency)

File diff suppressed because it is too large Load diff

View file

@ -4,7 +4,7 @@ DOCKER_NAMESPACE := victoriametrics
ROOT_IMAGE ?= alpine:3.13.2 ROOT_IMAGE ?= alpine:3.13.2
CERTS_IMAGE := alpine:3.13.2 CERTS_IMAGE := alpine:3.13.2
GO_BUILDER_IMAGE := golang:1.16.0 GO_BUILDER_IMAGE := golang:1.16.2
BUILDER_IMAGE := local/builder:2.0.0-$(shell echo $(GO_BUILDER_IMAGE) | tr : _) BUILDER_IMAGE := local/builder:2.0.0-$(shell echo $(GO_BUILDER_IMAGE) | tr : _)
BASE_IMAGE := local/base:1.1.3-$(shell echo $(ROOT_IMAGE) | tr : _)-$(shell echo $(CERTS_IMAGE) | tr : _) BASE_IMAGE := local/base:1.1.3-$(shell echo $(ROOT_IMAGE) | tr : _)-$(shell echo $(CERTS_IMAGE) | tr : _)

View file

@ -122,6 +122,16 @@ groups:
description: "High rate of slow inserts on \"{{ $labels.instance }}\" may be a sign of resource exhaustion description: "High rate of slow inserts on \"{{ $labels.instance }}\" may be a sign of resource exhaustion
for the current load. It is likely more RAM is needed for optimal handling of the current number of active time series." for the current load. It is likely more RAM is needed for optimal handling of the current number of active time series."
- alert: ProcessNearFDLimits
expr: process_open_fds / process_max_fds > 0.8
for: 10m
labels:
severity: critical
annotations:
summary: "Number of free file descriptors is less than 20% for \"{{ $labels.job }}\"(\"{{ $labels.instance }}\") for the last 10m"
description: "Exhausting OS file descriptors limit can cause severe degradation of the process.
Consider to increase the limit as fast as possible."
# Alerts group for vmagent assumes that Grafana dashboard # Alerts group for vmagent assumes that Grafana dashboard
# https://grafana.com/grafana/dashboards/12683 is installed. # https://grafana.com/grafana/dashboards/12683 is installed.
# Pls update the `dashboard` annotation according to your setup. # Pls update the `dashboard` annotation according to your setup.

View file

@ -70,8 +70,7 @@ services:
- '--rule=/etc/alerts/*.yml' - '--rule=/etc/alerts/*.yml'
# display source of alerts in grafana # display source of alerts in grafana
- '-external.url=http://127.0.0.1:3000' #grafana outside container - '-external.url=http://127.0.0.1:3000' #grafana outside container
- '--external.alert.source=explore?orgId=1&left=["now-1h","now","VictoriaMetrics",{"expr":"{{$$expr|quotesEscape|pathEscape}}"},{"mode":"Metrics"},{"ui":[true,true,true,"none"]}]' ## when copypaste the line be aware of '$$' for escaping in '$expr' - '--external.alert.source=explore?orgId=1&left=["now-1h","now","VictoriaMetrics",{"expr":"{{$$expr|quotesEscape|crlfEscape|queryEscape}}"},{"mode":"Metrics"},{"ui":[true,true,true,"none"]}]' ## when copypaste the line be aware of '$$' for escaping in '$expr' networks:
networks:
- vm_net - vm_net
restart: always restart: always
alertmanager: alertmanager:

View file

@ -27,6 +27,7 @@
* [Observability, Availability & DORAs Research Program](https://medium.com/alteos-tech-blog/observability-availability-and-doras-research-program-85deb6680e78) * [Observability, Availability & DORAs Research Program](https://medium.com/alteos-tech-blog/observability-availability-and-doras-research-program-85deb6680e78)
* [Tame Kubernetes Costs with Percona Monitoring and Management and Prometheus Operator](https://www.percona.com/blog/2021/02/12/tame-kubernetes-costs-with-percona-monitoring-and-management-and-prometheus-operator/) * [Tame Kubernetes Costs with Percona Monitoring and Management and Prometheus Operator](https://www.percona.com/blog/2021/02/12/tame-kubernetes-costs-with-percona-monitoring-and-management-and-prometheus-operator/)
* [Prometheus Victoria Metrics On AWS ECS](https://dalefro.medium.com/prometheus-victoria-metrics-on-aws-ecs-62448e266090) * [Prometheus Victoria Metrics On AWS ECS](https://dalefro.medium.com/prometheus-victoria-metrics-on-aws-ecs-62448e266090)
* [Monitoring with Prometheus, Grafana, AlertManager and VictoriaMetrics](https://www.sensedia.com/post/monitoring-with-prometheus-alertmanager)
## Our articles ## Our articles

62
docs/BestPractices.md Normal file
View file

@ -0,0 +1,62 @@
# VM best practices
VictoriaMetrics is a fast, cost-effective and scalable monitoring solution and time series database. It can be used as a long-term, remote storage for Prometheus which allows it to gather metrics from different systems and store them in a single location or separate them for different purposes (short-, long-term, responsibility zones etc).
## Install Recommendation
There is no need to tune VictoriaMetrics because it uses reasonable defaults for command-line flags. These flags are automatically adjusted for the available CPU and RAM resources. There is no need for Operating System tuning because VictoriaMetrics is optimized for default OS settings. The only option is to increase the limit on the [number of open files in the OS](https://medium.com/@muhammadtriwibowo/set-permanently-ulimit-n-open-files-in-ubuntu-4d61064429a), so Prometheus instances could establish more connections to VictoriaMetrics (65535 standard production value).
## Filesystem Considerations
The recommended filesystem is ext4. If you plan to store more than 1TB of data on ext4 partition or plan to extend it to more than 16TB, then the following options are recommended to pass to mkfs.ext4:
mkfs.ext4 ... -O 64bit,huge_file,extent -T huge
## Operation System
When configuring VictoriaMetrics, the best practice is to use the latest Ubuntu OS version.
## VictoriaMetrics Versions
Always update VictoriaMetrics instances in the environment to avoid version and build mismatch that will result in differences in performance and operational features. It is strongly recommended that you keep VictoriaMetrics in the environment up-to-date and install all VictoriaMetrics updates as soon as they are available. The best place to find the most recent updates as soon as they are available is to follow [this link](https://github.com/VictoriaMetrics/VictoriaMetrics/releases).
## Upgrade
It is safe to upgrade VictoriaMetrics to new versions unless the [release notes](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) say otherwise. It is safe to skip multiple versions during the upgrade unless release notes say otherwise. It is recommended to perform regular upgrades to the latest version, since it may contain important bug fixes, performance optimizations or new features.
It is also safe to downgrade to the previous version unless release notes say otherwise.
The following steps must be performed during the upgrade / downgrade process:
* Send SIGINT signal to VictoriaMetrics process so that it is stopped gracefully.
* Wait until the process stops. This can take a few seconds.
* Start the upgraded VictoriaMetrics.
Prometheus doesn't drop data during the VictoriaMetrics restart. See [this article](https://grafana.com/blog/2019/03/25/whats-new-in-prometheus-2.8-wal-based-remote-write/) for details.
## Security
Do not forget to protect sensitive endpoints in VictoriaMetrics when exposing them to untrusted networks such as the internet. Please consider setting the following command-line flags:
* tls, -tlsCertFile and -tlsKeyFile for switching from HTTP to HTTPS.
* httpAuth.username and -httpAuth.password for protecting all the HTTP endpoints with [HTTP Basic Authentication](https://en.wikipedia.org/wiki/Basic_access_authentication).
* deleteAuthKey for protecting /api/v1/admin/tsdb/delete_series endpoint. See how to [delete time series](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/README.md#how-to-delete-time-series).
* snapshotAuthKey for protecting /snapshot* endpoints. See [how to work with snapshots](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/README.md#how-to-work-with-snapshots).
* forceMergeAuthKey for protecting /internal/force_merge endpoint. See [force merge docs](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/README.md#forced-merge).
* search.resetCacheAuthKey for protecting /internal/resetRollupResultCache endpoint. See [backfilling](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/README.md#backfilling) for more details.
Explicitly set internal network interface to TCP and UDP ports for data ingestion with Graphite and OpenTSDB formats. For example, substitute -graphiteListenAddr=:2003 with -graphiteListenAddr=<internal_iface_ip>:2003.
It is preferable to authorize all incoming requests from untrusted networks with [vmauth](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmauth/README.md) or a similar auth proxy.
## Backup Recommendations
VictoriaMetrics supports backups via [vmbackup](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmbackup/README.md) and [vmrestore](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmrestore/README.md) tools. We also provide the vmbackuper tool for our paid, enterprise subscribers - see [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/466) for additional details.
## Networking
Network usage: outbound traffic is negligible. Ingress traffic is ~100 bytes per ingested data point via [Prometheus remote_write API](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write). The actual ingress bandwidth usage depends on the average number of labels per ingested metric and the average size of label values. A higher number of per-metric labels and longer label values result inhigher ingress bandwidth.
## Storage Considerations
Storage space: VictoriaMetrics needs less than a byte per data point on average. So, ~260GB is required to store a month-long insert stream of 100K data points per second. The actual storage size depends largely on data randomness (entropy). Higher randomness means higher storage size requirements. Read [this article](https://medium.com/faun/victoriametrics-achieving-better-compression-for-time-series-data-than-gorilla-317bc1f95932) for details.
## RAM
RAM size: VictoriaMetrics needs less than 1KB per active time series. Therefore, ~1GB of RAM is required for 1M active time series. Time series are considered active if new data points have been added recently or if they have been recently queried. The number of active time series may be obtained from vm_cache_entries{type="storage/hour_metric_ids"} metric exported on the /metrics page. VictoriaMetrics stores various caches in RAM. Memory size for these caches may be limited with -memory.allowedPercent or -memory.allowedBytes flags.
## CPU
CPU cores: VictoriaMetrics needs one CPU core per 300K inserted data points per second. So, ~4 CPU cores are required for processing the insert stream of 1M data points per second. The ingestion rate may be lower for high cardinality data or for time series with a high number of labels. See [this article](https://valyala.medium.com/insert-benchmarks-with-inch-influxdb-vs-victoriametrics-e31a41ae2893) for details. If you see lower numbers per CPU core, it is likely that the active time series info doesn't fit in your caches and you will need more RAM to lower CPU usage.
## Technical Support and Services
If you have questions about installing or using this software pleasecheck this and other documents first. Answers to the most frequently askedquestions can be found on the Technical Papers webpage or in VictoriaMetrics community channels. If you need further assistance with VictoriaMetrics, please contact us at info@victoriametrics.com - well be happy to help.
Following VictoriaMetrics best practices allows for the optimal configuration of our fast and scalable monitoring solution and time series database while minimizing or avoiding downtime or performance issues during installation and software usage. Our best practices also allow you to quickly troubleshoot any issues that might arise.

View file

@ -6,13 +6,24 @@
- `histogram_avg(buckets)` - returns the average value for the given buckets. - `histogram_avg(buckets)` - returns the average value for the given buckets.
- `histogram_stdvar(buckets)` - returns standard variance for the given buckets. - `histogram_stdvar(buckets)` - returns standard variance for the given buckets.
- `histogram_stddev(buckets)` - returns standard deviation for the given buckets. - `histogram_stddev(buckets)` - returns standard deviation for the given buckets.
* FEATURE: reduce median query duration by up to 2x. See https://github.com/VictoriaMetrics/VictoriaMetrics/commit/18fe0ff14bc78860c5569e2b70de1db78fac61be
* FEATURE: export `vm_available_memory_bytes` and `vm_available_cpu_cores` metrics, which show the number of available RAM and available CPU cores for VictoriaMetrics apps.
* FEATURE: vmagent: add ability to replicate scrape targets among `vmagent` instances in the cluster with `-promscrape.cluster.replicationFactor` command-line flag. See [these docs](https://victoriametrics.github.io/vmagent.html#scraping-big-number-of-targets). * FEATURE: vmagent: add ability to replicate scrape targets among `vmagent` instances in the cluster with `-promscrape.cluster.replicationFactor` command-line flag. See [these docs](https://victoriametrics.github.io/vmagent.html#scraping-big-number-of-targets).
* FATURE: vmagent: accept `scrape_offset` option at `scrape_config`. This option may be useful when scrapes must start at the specified offset of every scrape interval. See [these docs](https://victoriametrics.github.io/vmagent.html#troubleshooting) for details. * FEATURE: vmagent: accept `scrape_offset` option at `scrape_config`. This option may be useful when scrapes must start at the specified offset of every scrape interval. See [these docs](https://victoriametrics.github.io/vmagent.html#troubleshooting) for details.
* FEATURE: vmagent: support `proxy_tls_config`, `proxy_basic_auth`, `proxy_bearer_token` and `proxy_bearer_token_file` options at `scrape_config` section for configuring proxies specified via `proxy_url`. See [these docs](https://victoriametrics.github.io/vmagent.html#scraping-targets-via-a-proxy).
* FEATURE: vmauth: allow using regexp paths in `url_map`. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1112) for details. * FEATURE: vmauth: allow using regexp paths in `url_map`. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1112) for details.
* FEATURE: accept `round_digits` query arg at `/api/v1/query` and `/api/v1/query_range` handlers. This option can be set at Prometheus datasource in Grafana for limiting the number of digits after the decimal point in response values.
* FEATURE: add `-influx.databaseNames` command-line flag, which can be used for accepting data from some Telegraf plugins such as [fluentd plugin](https://github.com/fangli/fluent-plugin-influxdb). See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1124).
* FEATURE: add `-logNewSeries` command-line flag, which can be used for debugging the source of time series churn rate.
* BUGFIX: vmagent: prevent from high CPU usage bug during failing scrapes with small `scrape_timeout` (less than a few seconds). * BUGFIX: vmagent: prevent from high CPU usage bug during failing scrapes with small `scrape_timeout` (less than a few seconds).
* BUGFIX: vmagent: reduce memory usage when Kubernetes service discovery is used in big number of distinct jobs by sharing the cache. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1113 * BUGFIX: vmagent: reduce memory usage when Kubernetes service discovery is used in big number of distinct scrape config jobs by sharing Kubernetes object cache. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1113
* BUGFIX: vmagent: apply `sample_limit` only after `metric_relabel_configs` are applied as Prometheus does. Previously the `sample_limit` was applied before metrics relabeling.
* BUGFIX: vmagent: properly apply `tls_config`, `basic_auth` and `bearer_token` to proxy connections if `proxy_url` option is set. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1116
* BUGFIX: vmagent: properly scrape targets via https proxy specified in `proxy_url` if `insecure_skip_verify` flag isn't set in `tls_config` section. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1116
* BUGFUX: avoid `duplicate time series` error if `prometheus_buckets()` covers a time range with distinct set of buckets.
* BUGFIX: prevent exponent overflow when processing extremely small values close to zero such as `2.964393875E-314`. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1114 * BUGFIX: prevent exponent overflow when processing extremely small values close to zero such as `2.964393875E-314`. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1114
* BUGFIX: do not include datapoints with a timestamp `t-d` when returning results from `/api/v1/query?query=m[d]&time=t` as Prometheus does.
# [v1.55.1](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.55.1) # [v1.55.1](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.55.1)

View file

@ -338,7 +338,7 @@ Please see [Monitoring K8S with VictoriaMetrics](https://docs.google.com/present
Numbers: Numbers:
- Active time series: ~2500 Million - Active time series: ~25 Million
- Datapoints: ~20 Trillion - Datapoints: ~20 Trillion
- Ingestion rate: ~1800k/s - Ingestion rate: ~1800k/s
- Disk usage: ~20 TB - Disk usage: ~20 TB

View file

@ -373,7 +373,7 @@ for protecting from user errors such as accidental data deletion.
The following steps must be performed for each `vmstorage` node for creating a backup: The following steps must be performed for each `vmstorage` node for creating a backup:
1. Create an instant snapshot by navigating to `/snapshot/create` HTTP handler. It will create snapshot and return its name. 1. Create an instant snapshot by navigating to `/snapshot/create` HTTP handler. It will create snapshot and return its name.
2. Archive the created snapshot from `<-storageDataPath>/snapshots/<snapshot_name>` folder using [vmbackup](https://victoriametrics.github.io/vbackup.html). 2. Archive the created snapshot from `<-storageDataPath>/snapshots/<snapshot_name>` folder using [vmbackup](https://victoriametrics.github.io/vmbackup.html).
The archival process doesn't interfere with `vmstorage` work, so it may be performed at any suitable time. The archival process doesn't interfere with `vmstorage` work, so it may be performed at any suitable time.
3. Delete unused snapshots via `/snapshot/delete?snapshot=<snapshot_name>` or `/snapshot/delete_all` in order to free up occupied storage space. 3. Delete unused snapshots via `/snapshot/delete?snapshot=<snapshot_name>` or `/snapshot/delete_all` in order to free up occupied storage space.

View file

@ -1,3 +0,0 @@
# MetricsQL
The page has been moved to [MetricsQL](https://victoriametrics.github.io/MetricsQL.html).

View file

@ -170,6 +170,7 @@ Alphabetically sorted links to case studies:
* [Font used](#font-used) * [Font used](#font-used)
* [Color Palette](#color-palette) * [Color Palette](#color-palette)
* [We kindly ask](#we-kindly-ask) * [We kindly ask](#we-kindly-ask)
* [List of command-line flags](#list-of-command-line-flags)
## How to start VictoriaMetrics ## How to start VictoriaMetrics
@ -182,7 +183,7 @@ The following command-line flags are used the most:
* `-storageDataPath` - path to data directory. VictoriaMetrics stores all the data in this directory. Default path is `victoria-metrics-data` in the current working directory. * `-storageDataPath` - path to data directory. VictoriaMetrics stores all the data in this directory. Default path is `victoria-metrics-data` in the current working directory.
* `-retentionPeriod` - retention for stored data. Older data is automatically deleted. Default retention is 1 month. See [these docs](#retention) for more details. * `-retentionPeriod` - retention for stored data. Older data is automatically deleted. Default retention is 1 month. See [these docs](#retention) for more details.
Other flags have good enough default values, so set them only if you really need this. Pass `-help` to see all the available flags with description and default values. Other flags have good enough default values, so set them only if you really need this. Pass `-help` to see [all the available flags with description and default values](#list-of-command-line-flags).
See how to [ingest data to VictoriaMetrics](#how-to-import-time-series-data), how to [query VictoriaMetrics](#grafana-setup) See how to [ingest data to VictoriaMetrics](#how-to-import-time-series-data), how to [query VictoriaMetrics](#grafana-setup)
and how to [handle alerts](#alerting). and how to [handle alerts](#alerting).
@ -413,6 +414,10 @@ while VictoriaMetrics stores them with *milliseconds* precision.
Extra labels may be added to all the written time series by passing `extra_label=name=value` query args. Extra labels may be added to all the written time series by passing `extra_label=name=value` query args.
For example, `/write?extra_label=foo=bar` would add `{foo="bar"}` label to all the ingested metrics. For example, `/write?extra_label=foo=bar` would add `{foo="bar"}` label to all the ingested metrics.
Some plugins for Telegraf such as [fluentd](https://github.com/fangli/fluent-plugin-influxdb), [Juniper/open-nti](https://github.com/Juniper/open-nti)
or [Juniper/jitmon](https://github.com/Juniper/jtimon) send `SHOW DATABASES` query to `/query` and expect a particular database name in the response.
Comma-separated list of expected databases can be passed to VictoriaMetrics via `-influx.databaseNames` command-line flag.
## How to send data from Graphite-compatible agents such as [StatsD](https://github.com/etsy/statsd) ## How to send data from Graphite-compatible agents such as [StatsD](https://github.com/etsy/statsd)
Enable Graphite receiver in VictoriaMetrics by setting `-graphiteListenAddr` command line flag. For instance, Enable Graphite receiver in VictoriaMetrics by setting `-graphiteListenAddr` command line flag. For instance,
@ -562,14 +567,17 @@ in front of VictoriaMetrics. [Contact us](mailto:sales@victoriametrics.com) if y
VictoriaMetrics accepts relative times in `time`, `start` and `end` query args additionally to unix timestamps and [RFC3339](https://www.ietf.org/rfc/rfc3339.txt). VictoriaMetrics accepts relative times in `time`, `start` and `end` query args additionally to unix timestamps and [RFC3339](https://www.ietf.org/rfc/rfc3339.txt).
For example, the following query would return data for the last 30 minutes: `/api/v1/query_range?start=-30m&query=...`. For example, the following query would return data for the last 30 minutes: `/api/v1/query_range?start=-30m&query=...`.
VictoriaMetrics accepts `round_digits` query arg for `/api/v1/query` and `/api/v1/query_range` handlers. It can be used for rounding response values to the given number of digits after the decimal point. For example, `/api/v1/query?query=avg_over_time(temperature[1h])&round_digits=2` would round response values to up to two digits after the decimal point.
By default, VictoriaMetrics returns time series for the last 5 minutes from `/api/v1/series`, while the Prometheus API defaults to all time. Use `start` and `end` to select a different time range. By default, VictoriaMetrics returns time series for the last 5 minutes from `/api/v1/series`, while the Prometheus API defaults to all time. Use `start` and `end` to select a different time range.
VictoriaMetrics accepts additional args for `/api/v1/labels` and `/api/v1/label/.../values` handlers. VictoriaMetrics accepts additional args for `/api/v1/labels` and `/api/v1/label/.../values` handlers.
See [this feature request](https://github.com/prometheus/prometheus/issues/6178) for details:
* Any number [time series selectors](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors) via `match[]` query arg. * Any number [time series selectors](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors) via `match[]` query arg.
* Optional `start` and `end` query args for limiting the time range for the selected labels or label values. * Optional `start` and `end` query args for limiting the time range for the selected labels or label values.
See [this feature request](https://github.com/prometheus/prometheus/issues/6178) for details.
Additionally VictoriaMetrics provides the following handlers: Additionally VictoriaMetrics provides the following handlers:
* `/api/v1/series/count` - returns the total number of time series in the database. Some notes: * `/api/v1/series/count` - returns the total number of time series in the database. Some notes:
@ -1367,6 +1375,8 @@ See the example of alerting rules for VM components [here](https://github.com/Vi
VictoriaMetrics accepts optional `date=YYYY-MM-DD` and `topN=42` args on this page. By default `date` equals to the current date, VictoriaMetrics accepts optional `date=YYYY-MM-DD` and `topN=42` args on this page. By default `date` equals to the current date,
while `topN` equals to 10. while `topN` equals to 10.
* New time series can be logged if `-logNewSeries` command-line flag is passed to VictoriaMetrics.
* VictoriaMetrics limits the number of labels per each metric with `-maxLabelsPerTimeseries` command-line flag. * VictoriaMetrics limits the number of labels per each metric with `-maxLabelsPerTimeseries` command-line flag.
This prevents from ingesting metrics with too many labels. It is recommended [monitoring](#monitoring) `vm_metrics_with_dropped_labels_total` This prevents from ingesting metrics with too many labels. It is recommended [monitoring](#monitoring) `vm_metrics_with_dropped_labels_total`
metric in order to determine whether `-maxLabelsPerTimeseries` must be adjusted for your workload. metric in order to determine whether `-maxLabelsPerTimeseries` must be adjusted for your workload.
@ -1538,3 +1548,248 @@ Files included in each folder:
* There should be sufficient clear space around the logo. * There should be sufficient clear space around the logo.
* Do not change spacing, alignment, or relative locations of the design elements. * Do not change spacing, alignment, or relative locations of the design elements.
* Do not change the proportions of any of the design elements or the design itself. You may resize as needed but must retain all proportions. * Do not change the proportions of any of the design elements or the design itself. You may resize as needed but must retain all proportions.
## List of command-line flags
Pass `-help` to VictoriaMetrics in order to see the list of supported command-line flags with their description:
```
-bigMergeConcurrency int
The maximum number of CPU cores to use for big merges. Default value is used if set to 0
-csvTrimTimestamp duration
Trim timestamps when importing csv data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
-dedup.minScrapeInterval duration
Remove superflouos samples from time series if they are located closer to each other than this duration. This may be useful for reducing overhead when multiple identically configured Prometheus instances write data to the same VictoriaMetrics. Deduplication is disabled if the -dedup.minScrapeInterval is 0
-deleteAuthKey string
authKey for metrics' deletion via /api/v1/admin/tsdb/delete_series and /tags/delSeries
-denyQueriesOutsideRetention
Whether to deny queries outside of the configured -retentionPeriod. When set, then /api/v1/query_range would return '503 Service Unavailable' error for queries with 'from' value outside -retentionPeriod. This may be useful when multiple data sources with distinct retentions are hidden behind query-tee
-dryRun
Whether to check only -promscrape.config and then exit. Unknown config entries are allowed in -promscrape.config by default. This can be changed with -promscrape.config.strictParse
-enableTCP6
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP is used
-envflag.enable
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set
-envflag.prefix string
Prefix for environment variables if -envflag.enable is set
-finalMergeDelay duration
The delay before starting final merge for per-month partition after no new data is ingested into it. Final merge may require additional disk IO and CPU resources. Final merge may increase query speed and reduce disk space usage in some cases. Zero value disables final merge
-forceFlushAuthKey string
authKey, which must be passed in query string to /internal/force_flush pages
-forceMergeAuthKey string
authKey, which must be passed in query string to /internal/force_merge pages
-fs.disableMmap
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
-graphiteListenAddr string
TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty
-graphiteTrimTimestamp duration
Trim timestamps for Graphite data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
-http.connTimeout duration
Incoming http connections are closed after the configured timeout. This may help spreading incoming load among a cluster of services behind load balancer. Note that the real timeout may be bigger by up to 10% as a protection from Thundering herd problem (default 2m0s)
-http.disableResponseCompression
Disable compression of HTTP responses for saving CPU resources. By default compression is enabled to save network bandwidth
-http.idleConnTimeout duration
Timeout for incoming idle http connections (default 1m0s)
-http.maxGracefulShutdownDuration duration
The maximum duration for graceful shutdown of HTTP server. Highly loaded server may require increased value for graceful shutdown (default 7s)
-http.pathPrefix string
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
-http.shutdownDelay duration
Optional delay before http server shutdown. During this dealy the servier returns non-OK responses from /health page, so load balancers can route new requests to other servers
-httpAuth.password string
Password for HTTP Basic Auth. The authentication is disabled if -httpAuth.username is empty
-httpAuth.username string
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
-httpListenAddr string
TCP address to listen for http connections (default ":8428")
-import.maxLineLen size
The maximum length in bytes of a single line accepted by /api/v1/import; the line length can be limited with 'max_rows_per_line' query arg passed to /api/v1/export
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 104857600)
-influx.databaseNames array
Comma-separated list of database names to return from /query and /influx/query API. This can be needed for accepting data from Telegraf plugins such as https://github.com/fangli/fluent-plugin-influxdb
Supports array of values separated by comma or specified via multiple flags.
-influx.maxLineSize size
The maximum size in bytes for a single Influx line during parsing
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 262144)
-influxListenAddr string
TCP and UDP address to listen for Influx line protocol data. Usually :8189 must be set. Doesn't work if empty. This flag isn't needed when ingesting data over HTTP - just send it to http://<victoriametrics>:8428/write
-influxMeasurementFieldSeparator string
Separator for '{measurement}{separator}{field_name}' metric name when inserted via Influx line protocol (default "_")
-influxSkipMeasurement
Uses '{field_name}' as a metric name while ignoring '{measurement}' and '-influxMeasurementFieldSeparator'
-influxSkipSingleField
Uses '{measurement}' instead of '{measurement}{separator}{field_name}' for metic name if Influx line contains only a single field
-influxTrimTimestamp duration
Trim timestamps for Influx line protocol data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
-insert.maxQueueDuration duration
The maximum duration for waiting in the queue for insert requests due to -maxConcurrentInserts (default 1m0s)
-loggerDisableTimestamps
Whether to disable writing timestamps in logs
-loggerErrorsPerSecondLimit int
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, then the remaining errors are suppressed. Zero value disables the rate limit
-loggerFormat string
Format for logs. Possible values: default, json (default "default")
-loggerLevel string
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
-loggerOutput string
Output for the logs. Supported values: stderr, stdout (default "stderr")
-loggerTimezone string
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
-maxConcurrentInserts int
The maximum number of concurrent inserts. Default value should work for most cases, since it minimizes the overhead for concurrent inserts. This option is tigthly coupled with -insert.maxQueueDuration (default 16)
-maxInsertRequestSize size
The maximum size in bytes of a single Prometheus remote_write API request
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
-maxLabelsPerTimeseries int
The maximum number of labels accepted per time series. Superfluous labels are dropped (default 30)
-memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
-metricsAuthKey string
Auth key for /metrics. It overrides httpAuth settings
-opentsdbHTTPListenAddr string
TCP address to listen for OpentTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty
-opentsdbListenAddr string
TCP and UDP address to listen for OpentTSDB metrics. Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. Usually :4242 must be set. Doesn't work if empty
-opentsdbTrimTimestamp duration
Trim timestamps for OpenTSDB 'telnet put' data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
-opentsdbhttp.maxInsertRequestSize size
The maximum size of OpenTSDB HTTP put request
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
-opentsdbhttpTrimTimestamp duration
Trim timestamps for OpenTSDB HTTP data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
-pprofAuthKey string
Auth key for /debug/pprof. It overrides httpAuth settings
-precisionBits int
The number of precision bits to store per each value. Lower precision bits improves data compression at the cost of precision loss (default 64)
-promscrape.cluster.memberNum int
The number of number in the cluster of scrapers. It must be an unique value in the range 0 ... promscrape.cluster.membersCount-1 across scrapers in the cluster
-promscrape.cluster.membersCount int
The number of members in a cluster of scrapers. Each member must have an unique -promscrape.cluster.memberNum in the range 0 ... promscrape.cluster.membersCount-1 . Each member then scrapes roughly 1/N of all the targets. By default cluster scraping is disabled, i.e. a single scraper scrapes all the targets
-promscrape.cluster.replicationFactor int
The number of members in the cluster, which scrape the same targets. If the replication factor is greater than 2, then the deduplication must be enabled at remote storage side. See https://victoriametrics.github.io/#deduplication (default 1)
-promscrape.config string
Optional path to Prometheus config file with 'scrape_configs' section containing targets to scrape. See https://victoriametrics.github.io/#how-to-scrape-prometheus-exporters-such-as-node-exporter for details
-promscrape.config.dryRun
Checks -promscrape.config file for errors and unsupported fields and then exits. Returns non-zero exit code on parsing errors and emits these errors to stderr. See also -promscrape.config.strictParse command-line flag. Pass -loggerLevel=ERROR if you don't need to see info messages in the output.
-promscrape.config.strictParse
Whether to allow only supported fields in -promscrape.config . By default unsupported fields are silently skipped
-promscrape.configCheckInterval duration
Interval for checking for changes in '-promscrape.config' file. By default the checking is disabled. Send SIGHUP signal in order to force config check for changes
-promscrape.consulSDCheckInterval duration
Interval for checking for changes in Consul. This works only if consul_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config for details (default 30s)
-promscrape.disableCompression
Whether to disable sending 'Accept-Encoding: gzip' request headers to all the scrape targets. This may reduce CPU usage on scrape targets at the cost of higher network bandwidth utilization. It is possible to set 'disable_compression: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
-promscrape.disableKeepAlive
Whether to disable HTTP keep-alive connections when scraping all the targets. This may be useful when targets has no support for HTTP keep-alive connection. It is possible to set 'disable_keepalive: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control. Note that disabling HTTP keep-alive may increase load on both vmagent and scrape targets
-promscrape.discovery.concurrency int
The maximum number of concurrent requests to Prometheus autodiscovery API (Consul, Kubernetes, etc.) (default 100)
-promscrape.discovery.concurrentWaitTime duration
The maximum duration for waiting to perform API requests if more than -promscrape.discovery.concurrency requests are simultaneously performed (default 1m0s)
-promscrape.dnsSDCheckInterval duration
Interval for checking for changes in dns. This works only if dns_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config for details (default 30s)
-promscrape.dockerswarmSDCheckInterval duration
Interval for checking for changes in dockerswarm. This works only if dockerswarm_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dockerswarm_sd_config for details (default 30s)
-promscrape.dropOriginalLabels
Whether to drop original labels for scrape targets at /targets and /api/v1/targets pages. This may be needed for reducing memory usage when original labels for big number of scrape targets occupy big amounts of memory. Note that this reduces debuggability for improper per-target relabeling configs
-promscrape.ec2SDCheckInterval duration
Interval for checking for changes in ec2. This works only if ec2_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ec2_sd_config for details (default 1m0s)
-promscrape.eurekaSDCheckInterval duration
Interval for checking for changes in eureka. This works only if eureka_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#eureka_sd_config for details (default 30s)
-promscrape.fileSDCheckInterval duration
Interval for checking for changes in 'file_sd_config'. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#file_sd_config for details (default 30s)
-promscrape.gceSDCheckInterval duration
Interval for checking for changes in gce. This works only if gce_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#gce_sd_config for details (default 1m0s)
-promscrape.kubernetes.apiServerTimeout duration
How frequently to reload the full state from Kuberntes API server (default 30m0s)
-promscrape.kubernetesSDCheckInterval duration
Interval for checking for changes in Kubernetes API server. This works only if kubernetes_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config for details (default 30s)
-promscrape.maxDroppedTargets int
The maximum number of droppedTargets shown at /api/v1/targets page. Increase this value if your setup drops more scrape targets during relabeling and you need investigating labels for all the dropped targets. Note that the increased number of tracked dropped targets may result in increased memory usage (default 1000)
-promscrape.maxScrapeSize size
The maximum size of scrape response in bytes to process from Prometheus targets. Bigger responses are rejected
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 16777216)
-promscrape.openstackSDCheckInterval duration
Interval for checking for changes in openstack API server. This works only if openstack_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#openstack_sd_config for details (default 30s)
-promscrape.streamParse
Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful for reducing memory usage when millions of metrics are exposed per each scrape target. It is posible to set 'stream_parse: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
-promscrape.suppressDuplicateScrapeTargetErrors
Whether to suppress 'duplicate scrape target' errors; see https://victoriametrics.github.io/vmagent.html#troubleshooting for details
-promscrape.suppressScrapeErrors
Whether to suppress scrape errors logging. The last error for each target is always available at '/targets' page even if scrape errors logging is suppressed
-relabelConfig string
Optional path to a file with relabeling rules, which are applied to all the ingested metrics. See https://victoriametrics.github.io/#relabeling for details
-retentionPeriod value
Data with timestamps outside the retentionPeriod is automatically deleted
The following optional suffixes are supported: h (hour), d (day), w (week), y (year). If suffix isn't set, then the duration is counted in months (default 1)
-search.cacheTimestampOffset duration
The maximum duration since the current time for response data, which is always queried from the original raw data, without using the response cache. Increase this value if you see gaps in responses due to time synchronization issues between VictoriaMetrics and data sources (default 5m0s)
-search.disableCache
Whether to disable response caching. This may be useful during data backfilling
-search.latencyOffset duration
The time when data points become visible in query results after the collection. Too small value can result in incomplete last points for query results (default 30s)
-search.logSlowQueryDuration duration
Log queries with execution time exceeding this value. Zero disables slow query logging (default 5s)
-search.maxConcurrentRequests int
The maximum number of concurrent search requests. It shouldn't be high, since a single request can saturate all the CPU cores. See also -search.maxQueueDuration (default 8)
-search.maxExportDuration duration
The maximum duration for /api/v1/export call (default 720h0m0s)
-search.maxLookback duration
Synonim to -search.lookback-delta from Prometheus. The value is dynamically detected from interval between time series datapoints if not set. It can be overridden on per-query basis via max_lookback arg. See also '-search.maxStalenessInterval' flag, which has the same meaining due to historical reasons
-search.maxPointsPerTimeseries int
The maximum points per a single timeseries returned from /api/v1/query_range. This option doesn't limit the number of scanned raw samples in the database. The main purpose of this option is to limit the number of per-series points returned to graphing UI such as Grafana. There is no sense in setting this limit to values significantly exceeding horizontal resoultion of the graph (default 30000)
-search.maxQueryDuration duration
The maximum duration for query execution (default 30s)
-search.maxQueryLen size
The maximum search query length in bytes
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 16384)
-search.maxQueueDuration duration
The maximum time the request waits for execution when -search.maxConcurrentRequests limit is reached; see also -search.maxQueryDuration (default 10s)
-search.maxStalenessInterval duration
The maximum interval for staleness calculations. By default it is automatically calculated from the median interval between samples. This flag could be useful for tuning Prometheus data model closer to Influx-style data model. See https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness for details. See also '-search.maxLookback' flag, which has the same meaning due to historical reasons
-search.maxStepForPointsAdjustment duration
The maximum step when /api/v1/query_range handler adjusts points with timestamps closer than -search.latencyOffset to the current time. The adjustment is needed because such points may contain incomplete data (default 1m0s)
-search.maxTagKeys int
The maximum number of tag keys returned from /api/v1/labels (default 100000)
-search.maxTagValueSuffixesPerSearch int
The maximum number of tag value suffixes returned from /metrics/find (default 100000)
-search.maxTagValues int
The maximum number of tag values returned from /api/v1/label/<label_name>/values (default 100000)
-search.maxUniqueTimeseries int
The maximum number of unique time series each search can scan (default 300000)
-search.minStalenessInterval duration
The minimum interval for staleness calculations. This flag could be useful for removing gaps on graphs generated from time series with irregular intervals between samples. See also '-search.maxStalenessInterval'
-search.queryStats.lastQueriesCount int
Query stats for /api/v1/status/top_queries is tracked on this number of last queries. Zero value disables query stats tracking (default 20000)
-search.queryStats.minQueryDuration int
The minimum duration for queries to track in query stats at /api/v1/status/top_queries. Queries with lower duration are ignored in query stats
-search.resetCacheAuthKey string
Optional authKey for resetting rollup cache via /internal/resetRollupResultCache call
-search.treatDotsAsIsInRegexps
Whether to treat dots as is in regexp label filters used in queries. For example, foo{bar=~"a.b.c"} will be automatically converted to foo{bar=~"a\\.b\\.c"}, i.e. all the dots in regexp filters will be automatically escaped in order to match only dot char instead of matching any char. Dots in ".+", ".*" and ".{n}" regexps aren't escaped. This option is DEPRECATED in favor of {__graphite__="a.*.c"} syntax for selecting metrics matching the given Graphite metrics filter
-selfScrapeInstance string
Value for 'instance' label, which is added to self-scraped metrics (default "self")
-selfScrapeInterval duration
Interval for self-scraping own metrics at /metrics page
-selfScrapeJob string
Value for 'job' label, which is added to self-scraped metrics (default "victoria-metrics")
-smallMergeConcurrency int
The maximum number of CPU cores to use for small merges. Default value is used if set to 0
-snapshotAuthKey string
authKey, which must be passed in query string to /snapshot* pages
-storageDataPath string
Path to storage data (default "victoria-metrics-data")
-tls
Whether to enable TLS (aka HTTPS) for incoming requests. -tlsCertFile and -tlsKeyFile must be set if -tls is set
-tlsCertFile string
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs, since RSA certs are slow
-tlsKeyFile string
Path to file with TLS key. Used only if -tls is set
-version
Show VictoriaMetrics version
```

View file

@ -255,6 +255,41 @@ If each target is scraped by multiple `vmagent` instances, then data deduplicati
See [these docs](https://victoriametrics.github.io/#deduplication) for details. See [these docs](https://victoriametrics.github.io/#deduplication) for details.
## Scraping targets via a proxy
`vmagent` supports scraping targets via http and https proxies. Proxy address must be specified in `proxy_url` option. For example, the following scrape config instructs
target scraping via https proxy at `https://proxy-addr:1234`:
```yml
scrape_configs:
- job_name: foo
proxy_url: https://proxy-addr:1234
```
Proxy can be configured with the following optional settings:
* `proxy_bearer_token` and `proxy_bearer_token_file` for Bearer token authorization
* `proxy_basic_auth` for Basic authorization. See [these docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config).
* `proxy_tls_config` for TLS config. See [these docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#tls_config).
For example:
```yml
scrape_configs:
- job_name: foo
proxy_url: https://proxy-addr:1234
proxy_basic_auth:
username: foobar
password: secret
proxy_tls_config:
insecure_skip_verify: true
cert_file: /path/to/cert
key_file: /path/to/key
ca_file: /path/to/ca
server_name: real-server-name
```
## Monitoring ## Monitoring
`vmagent` exports various metrics in Prometheus exposition format at `http://vmagent-host:8429/metrics` page. We recommend setting up regular scraping of this page `vmagent` exports various metrics in Prometheus exposition format at `http://vmagent-host:8429/metrics` page. We recommend setting up regular scraping of this page
@ -477,13 +512,16 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
-httpListenAddr string -httpListenAddr string
TCP address to listen for http connections. Set this flag to empty value in order to disable listening on any port. This mode may be useful for running multiple vmagent instances on the same server. Note that /targets and /metrics pages aren't available if -httpListenAddr='' (default ":8429") TCP address to listen for http connections. Set this flag to empty value in order to disable listening on any port. This mode may be useful for running multiple vmagent instances on the same server. Note that /targets and /metrics pages aren't available if -httpListenAddr='' (default ":8429")
-import.maxLineLen max_rows_per_line -import.maxLineLen size
The maximum length in bytes of a single line accepted by /api/v1/import; the line length can be limited with max_rows_per_line query arg passed to /api/v1/export The maximum length in bytes of a single line accepted by /api/v1/import; the line length can be limited with 'max_rows_per_line' query arg passed to /api/v1/export
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 104857600) Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 104857600)
-influx.maxLineSize value -influx.databaseNames array
Comma-separated list of database names to return from /query and /influx/query API. This can be needed for accepting data from Telegraf plugins such as https://github.com/fangli/fluent-plugin-influxdb
Supports array of values separated by comma or specified via multiple flags.
-influx.maxLineSize size
The maximum size in bytes for a single Influx line during parsing The maximum size in bytes for a single Influx line during parsing
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 262144) Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 262144)
-influxListenAddr http://<vmagent>:8429/write -influxListenAddr string
TCP and UDP address to listen for Influx line protocol data. Usually :8189 must be set. Doesn't work if empty. This flag isn't needed when ingesting data over HTTP - just send it to http://<vmagent>:8429/write TCP and UDP address to listen for Influx line protocol data. Usually :8189 must be set. Doesn't work if empty. This flag isn't needed when ingesting data over HTTP - just send it to http://<vmagent>:8429/write
-influxMeasurementFieldSeparator string -influxMeasurementFieldSeparator string
Separator for '{measurement}{separator}{field_name}' metric name when inserted via Influx line protocol (default "_") Separator for '{measurement}{separator}{field_name}' metric name when inserted via Influx line protocol (default "_")
@ -511,12 +549,12 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
-maxConcurrentInserts int -maxConcurrentInserts int
The maximum number of concurrent inserts. Default value should work for most cases, since it minimizes the overhead for concurrent inserts. This option is tigthly coupled with -insert.maxQueueDuration (default 16) The maximum number of concurrent inserts. Default value should work for most cases, since it minimizes the overhead for concurrent inserts. This option is tigthly coupled with -insert.maxQueueDuration (default 16)
-maxInsertRequestSize value -maxInsertRequestSize size
The maximum size in bytes of a single Prometheus remote_write API request The maximum size in bytes of a single Prometheus remote_write API request
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 33554432) Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
-memory.allowedBytes value -memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0) Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float -memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60) Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
-metricsAuthKey string -metricsAuthKey string
@ -527,9 +565,9 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
TCP and UDP address to listen for OpentTSDB metrics. Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. Usually :4242 must be set. Doesn't work if empty TCP and UDP address to listen for OpentTSDB metrics. Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. Usually :4242 must be set. Doesn't work if empty
-opentsdbTrimTimestamp duration -opentsdbTrimTimestamp duration
Trim timestamps for OpenTSDB 'telnet put' data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s) Trim timestamps for OpenTSDB 'telnet put' data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
-opentsdbhttp.maxInsertRequestSize value -opentsdbhttp.maxInsertRequestSize size
The maximum size of OpenTSDB HTTP put request The maximum size of OpenTSDB HTTP put request
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 33554432) Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
-opentsdbhttpTrimTimestamp duration -opentsdbhttpTrimTimestamp duration
Trim timestamps for OpenTSDB HTTP data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms) Trim timestamps for OpenTSDB HTTP data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
-pprofAuthKey string -pprofAuthKey string
@ -538,6 +576,8 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
The number of number in the cluster of scrapers. It must be an unique value in the range 0 ... promscrape.cluster.membersCount-1 across scrapers in the cluster The number of number in the cluster of scrapers. It must be an unique value in the range 0 ... promscrape.cluster.membersCount-1 across scrapers in the cluster
-promscrape.cluster.membersCount int -promscrape.cluster.membersCount int
The number of members in a cluster of scrapers. Each member must have an unique -promscrape.cluster.memberNum in the range 0 ... promscrape.cluster.membersCount-1 . Each member then scrapes roughly 1/N of all the targets. By default cluster scraping is disabled, i.e. a single scraper scrapes all the targets The number of members in a cluster of scrapers. Each member must have an unique -promscrape.cluster.memberNum in the range 0 ... promscrape.cluster.membersCount-1 . Each member then scrapes roughly 1/N of all the targets. By default cluster scraping is disabled, i.e. a single scraper scrapes all the targets
-promscrape.cluster.replicationFactor int
The number of members in the cluster, which scrape the same targets. If the replication factor is greater than 2, then the deduplication must be enabled at remote storage side. See https://victoriametrics.github.io/#deduplication (default 1)
-promscrape.config string -promscrape.config string
Optional path to Prometheus config file with 'scrape_configs' section containing targets to scrape. See https://victoriametrics.github.io/#how-to-scrape-prometheus-exporters-such-as-node-exporter for details Optional path to Prometheus config file with 'scrape_configs' section containing targets to scrape. See https://victoriametrics.github.io/#how-to-scrape-prometheus-exporters-such-as-node-exporter for details
-promscrape.config.dryRun -promscrape.config.dryRun
@ -546,45 +586,45 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
Whether to allow only supported fields in -promscrape.config . By default unsupported fields are silently skipped Whether to allow only supported fields in -promscrape.config . By default unsupported fields are silently skipped
-promscrape.configCheckInterval duration -promscrape.configCheckInterval duration
Interval for checking for changes in '-promscrape.config' file. By default the checking is disabled. Send SIGHUP signal in order to force config check for changes Interval for checking for changes in '-promscrape.config' file. By default the checking is disabled. Send SIGHUP signal in order to force config check for changes
-promscrape.consulSDCheckInterval consul_sd_configs -promscrape.consulSDCheckInterval duration
Interval for checking for changes in Consul. This works only if consul_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config for details (default 30s) Interval for checking for changes in Consul. This works only if consul_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config for details (default 30s)
-promscrape.disableCompression -promscrape.disableCompression
Whether to disable sending 'Accept-Encoding: gzip' request headers to all the scrape targets. This may reduce CPU usage on scrape targets at the cost of higher network bandwidth utilization. It is possible to set 'disable_compression: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control Whether to disable sending 'Accept-Encoding: gzip' request headers to all the scrape targets. This may reduce CPU usage on scrape targets at the cost of higher network bandwidth utilization. It is possible to set 'disable_compression: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
-promscrape.disableKeepAlive disable_keepalive: true -promscrape.disableKeepAlive
Whether to disable HTTP keep-alive connections when scraping all the targets. This may be useful when targets has no support for HTTP keep-alive connection. It is possible to set disable_keepalive: true individually per each 'scrape_config` section in '-promscrape.config' for fine grained control. Note that disabling HTTP keep-alive may increase load on both vmagent and scrape targets Whether to disable HTTP keep-alive connections when scraping all the targets. This may be useful when targets has no support for HTTP keep-alive connection. It is possible to set 'disable_keepalive: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control. Note that disabling HTTP keep-alive may increase load on both vmagent and scrape targets
-promscrape.discovery.concurrency int -promscrape.discovery.concurrency int
The maximum number of concurrent requests to Prometheus autodiscovery API (Consul, Kubernetes, etc.) (default 100) The maximum number of concurrent requests to Prometheus autodiscovery API (Consul, Kubernetes, etc.) (default 100)
-promscrape.discovery.concurrentWaitTime duration -promscrape.discovery.concurrentWaitTime duration
The maximum duration for waiting to perform API requests if more than -promscrape.discovery.concurrency requests are simultaneously performed (default 1m0s) The maximum duration for waiting to perform API requests if more than -promscrape.discovery.concurrency requests are simultaneously performed (default 1m0s)
-promscrape.dnsSDCheckInterval dns_sd_configs -promscrape.dnsSDCheckInterval duration
Interval for checking for changes in dns. This works only if dns_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config for details (default 30s) Interval for checking for changes in dns. This works only if dns_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config for details (default 30s)
-promscrape.dockerswarmSDCheckInterval dockerswarm_sd_configs -promscrape.dockerswarmSDCheckInterval duration
Interval for checking for changes in dockerswarm. This works only if dockerswarm_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dockerswarm_sd_config for details (default 30s) Interval for checking for changes in dockerswarm. This works only if dockerswarm_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dockerswarm_sd_config for details (default 30s)
-promscrape.dropOriginalLabels -promscrape.dropOriginalLabels
Whether to drop original labels for scrape targets at /targets and /api/v1/targets pages. This may be needed for reducing memory usage when original labels for big number of scrape targets occupy big amounts of memory. Note that this reduces debuggability for improper per-target relabeling configs Whether to drop original labels for scrape targets at /targets and /api/v1/targets pages. This may be needed for reducing memory usage when original labels for big number of scrape targets occupy big amounts of memory. Note that this reduces debuggability for improper per-target relabeling configs
-promscrape.ec2SDCheckInterval ec2_sd_configs -promscrape.ec2SDCheckInterval duration
Interval for checking for changes in ec2. This works only if ec2_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ec2_sd_config for details (default 1m0s) Interval for checking for changes in ec2. This works only if ec2_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ec2_sd_config for details (default 1m0s)
-promscrape.eurekaSDCheckInterval eureka_sd_configs -promscrape.eurekaSDCheckInterval duration
Interval for checking for changes in eureka. This works only if eureka_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#eureka_sd_config for details (default 30s) Interval for checking for changes in eureka. This works only if eureka_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#eureka_sd_config for details (default 30s)
-promscrape.fileSDCheckInterval duration -promscrape.fileSDCheckInterval duration
Interval for checking for changes in 'file_sd_config'. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#file_sd_config for details (default 30s) Interval for checking for changes in 'file_sd_config'. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#file_sd_config for details (default 30s)
-promscrape.gceSDCheckInterval gce_sd_configs -promscrape.gceSDCheckInterval duration
Interval for checking for changes in gce. This works only if gce_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#gce_sd_config for details (default 1m0s) Interval for checking for changes in gce. This works only if gce_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#gce_sd_config for details (default 1m0s)
-promscrape.kubernetes.apiServerTimeout duration -promscrape.kubernetes.apiServerTimeout duration
How frequently to reload the full state from Kuberntes API server (default 10m0s) How frequently to reload the full state from Kuberntes API server (default 30m0s)
-promscrape.kubernetesSDCheckInterval kubernetes_sd_configs -promscrape.kubernetesSDCheckInterval duration
Interval for checking for changes in Kubernetes API server. This works only if kubernetes_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config for details (default 30s) Interval for checking for changes in Kubernetes API server. This works only if kubernetes_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config for details (default 30s)
-promscrape.maxDroppedTargets droppedTargets -promscrape.maxDroppedTargets int
The maximum number of droppedTargets shown at /api/v1/targets page. Increase this value if your setup drops more scrape targets during relabeling and you need investigating labels for all the dropped targets. Note that the increased number of tracked dropped targets may result in increased memory usage (default 1000) The maximum number of droppedTargets shown at /api/v1/targets page. Increase this value if your setup drops more scrape targets during relabeling and you need investigating labels for all the dropped targets. Note that the increased number of tracked dropped targets may result in increased memory usage (default 1000)
-promscrape.maxScrapeSize value -promscrape.maxScrapeSize size
The maximum size of scrape response in bytes to process from Prometheus targets. Bigger responses are rejected The maximum size of scrape response in bytes to process from Prometheus targets. Bigger responses are rejected
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 16777216) Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 16777216)
-promscrape.openstackSDCheckInterval openstack_sd_configs -promscrape.openstackSDCheckInterval duration
Interval for checking for changes in openstack API server. This works only if openstack_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#openstack_sd_config for details (default 30s) Interval for checking for changes in openstack API server. This works only if openstack_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#openstack_sd_config for details (default 30s)
-promscrape.streamParse stream_parse: true -promscrape.streamParse
Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful for reducing memory usage when millions of metrics are exposed per each scrape target. It is posible to set stream_parse: true individually per each `scrape_config` section in `-promscrape.config` for fine grained control Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful for reducing memory usage when millions of metrics are exposed per each scrape target. It is posible to set 'stream_parse: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
-promscrape.suppressDuplicateScrapeTargetErrors duplicate scrape target -promscrape.suppressDuplicateScrapeTargetErrors
Whether to suppress duplicate scrape target errors; see https://victoriametrics.github.io/vmagent.html#troubleshooting for details Whether to suppress 'duplicate scrape target' errors; see https://victoriametrics.github.io/vmagent.html#troubleshooting for details
-promscrape.suppressScrapeErrors -promscrape.suppressScrapeErrors
Whether to suppress scrape errors logging. The last error for each target is always available at '/targets' page even if scrape errors logging is suppressed Whether to suppress scrape errors logging. The last error for each target is always available at '/targets' page even if scrape errors logging is suppressed
-remoteWrite.basicAuth.password array -remoteWrite.basicAuth.password array
@ -601,12 +641,12 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
-remoteWrite.label array -remoteWrite.label array
Optional label in the form 'name=value' to add to all the metrics before sending them to -remoteWrite.url. Pass multiple -remoteWrite.label flags in order to add multiple flags to metrics before sending them to remote storage Optional label in the form 'name=value' to add to all the metrics before sending them to -remoteWrite.url. Pass multiple -remoteWrite.label flags in order to add multiple flags to metrics before sending them to remote storage
Supports array of values separated by comma or specified via multiple flags. Supports array of values separated by comma or specified via multiple flags.
-remoteWrite.maxBlockSize value -remoteWrite.maxBlockSize size
The maximum size in bytes of unpacked request to send to remote storage. It shouldn't exceed -maxInsertRequestSize from VictoriaMetrics The maximum size in bytes of unpacked request to send to remote storage. It shouldn't exceed -maxInsertRequestSize from VictoriaMetrics
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 8388608) Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 8388608)
-remoteWrite.maxDiskUsagePerURL value -remoteWrite.maxDiskUsagePerURL size
The maximum file-based buffer size in bytes at -remoteWrite.tmpDataPath for each -remoteWrite.url. When buffer size reaches the configured maximum, then old data is dropped when adding new data to the buffer. Buffered data is stored in ~500MB chunks, so the minimum practical value for this flag is 500000000. Disk usage is unlimited if the value is set to 0 The maximum file-based buffer size in bytes at -remoteWrite.tmpDataPath for each -remoteWrite.url. When buffer size reaches the configured maximum, then old data is dropped when adding new data to the buffer. Buffered data is stored in ~500MB chunks, so the minimum practical value for this flag is 500000000. Disk usage is unlimited if the value is set to 0
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0) Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-remoteWrite.proxyURL array -remoteWrite.proxyURL array
Optional proxy URL for writing data to -remoteWrite.url. Supported proxies: http, https, socks5. Example: -remoteWrite.proxyURL=socks5://proxy:1234 Optional proxy URL for writing data to -remoteWrite.url. Supported proxies: http, https, socks5. Example: -remoteWrite.proxyURL=socks5://proxy:1234
Supports array of values separated by comma or specified via multiple flags. Supports array of values separated by comma or specified via multiple flags.

View file

@ -16,7 +16,7 @@ rules against configured address.
* Lightweight without extra dependencies. * Lightweight without extra dependencies.
### Limitations: ### Limitations:
* `vmalert` execute queries against remote datasource which has reliability risks because of network. * `vmalert` execute queries against remote datasource which has reliability risks because of network.
It is recommended to configure alerts thresholds and rules expressions with understanding that network request It is recommended to configure alerts thresholds and rules expressions with understanding that network request
may fail; may fail;
* by default, rules execution is sequential within one group, but persisting of execution results to remote * by default, rules execution is sequential within one group, but persisting of execution results to remote
@ -37,7 +37,7 @@ The build binary will be placed to `VictoriaMetrics/bin` folder.
To start using `vmalert` you will need the following things: To start using `vmalert` you will need the following things:
* list of rules - PromQL/MetricsQL expressions to execute; * list of rules - PromQL/MetricsQL expressions to execute;
* datasource address - reachable VictoriaMetrics instance for rules execution; * datasource address - reachable VictoriaMetrics instance for rules execution;
* notifier address - reachable [Alert Manager](https://github.com/prometheus/alertmanager) instance for processing, * notifier address - reachable [Alert Manager](https://github.com/prometheus/alertmanager) instance for processing,
aggregating alerts and sending notifications. aggregating alerts and sending notifications.
* remote write address - [remote write](https://prometheus.io/docs/prometheus/latest/storage/#remote-storage-integrations) * remote write address - [remote write](https://prometheus.io/docs/prometheus/latest/storage/#remote-storage-integrations)
compatible storage address for storing recording rules results and alerts state in for of timeseries. This is optional. compatible storage address for storing recording rules results and alerts state in for of timeseries. This is optional.
@ -56,11 +56,11 @@ Then configure `vmalert` accordingly:
``` ```
If you run multiple `vmalert` services for the same datastore or AlertManager - do not forget If you run multiple `vmalert` services for the same datastore or AlertManager - do not forget
to specify different `external.label` flags in order to define which `vmalert` generated rules or alerts. to specify different `external.label` flags in order to define which `vmalert` generated rules or alerts.
Configuration for [recording](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/) Configuration for [recording](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/)
and [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) rules is very and [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) rules is very
similar to Prometheus rules and configured using YAML. Configuration examples may be found similar to Prometheus rules and configured using YAML. Configuration examples may be found
in [testdata](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmalert/config/testdata) folder. in [testdata](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmalert/config/testdata) folder.
Every `rule` belongs to `group` and every configuration file may contain arbitrary number of groups: Every `rule` belongs to `group` and every configuration file may contain arbitrary number of groups:
```yaml ```yaml
@ -79,7 +79,7 @@ name: <string>
[ interval: <duration> | default = global.evaluation_interval ] [ interval: <duration> | default = global.evaluation_interval ]
# How many rules execute at once. Increasing concurrency may speed # How many rules execute at once. Increasing concurrency may speed
# up round execution speed. # up round execution speed.
[ concurrency: <integer> | default = 1 ] [ concurrency: <integer> | default = 1 ]
# Optional type for expressions inside the rules. Supported values: "graphite" and "prometheus". # Optional type for expressions inside the rules. Supported values: "graphite" and "prometheus".
@ -93,15 +93,15 @@ rules:
#### Rules #### Rules
There are two types of Rules: There are two types of Rules:
* [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) - * [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) -
Alerting rules allows to define alert conditions via [MetricsQL](https://victoriametrics.github.io/MetricsQL.html) Alerting rules allows to define alert conditions via [MetricsQL](https://victoriametrics.github.io/MetricsQL.html)
and to send notifications about firing alerts to [Alertmanager](https://github.com/prometheus/alertmanager). and to send notifications about firing alerts to [Alertmanager](https://github.com/prometheus/alertmanager).
* [recording](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/) - * [recording](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/) -
Recording rules allow you to precompute frequently needed or computationally expensive expressions Recording rules allow you to precompute frequently needed or computationally expensive expressions
and save their result as a new set of time series. and save their result as a new set of time series.
`vmalert` forbids to define duplicates - rules with the same combination of name, expression and labels `vmalert` forbids to define duplicates - rules with the same combination of name, expression and labels
within one group. within one group.
##### Alerting rules ##### Alerting rules
@ -130,7 +130,7 @@ labels:
# Annotations to add to each alert. # Annotations to add to each alert.
annotations: annotations:
[ <labelname>: <tmpl_string> ] [ <labelname>: <tmpl_string> ]
``` ```
##### Recording rules ##### Recording rules
@ -158,17 +158,17 @@ For recording rules to work `-remoteWrite.url` must specified.
#### Alerts state on restarts #### Alerts state on restarts
`vmalert` has no local storage, so alerts state is stored in the process memory. Hence, after reloading of `vmalert` `vmalert` has no local storage, so alerts state is stored in the process memory. Hence, after reloading of `vmalert`
the process alerts state will be lost. To avoid this situation, `vmalert` should be configured via the following flags: the process alerts state will be lost. To avoid this situation, `vmalert` should be configured via the following flags:
* `-remoteWrite.url` - URL to VictoriaMetrics (Single) or VMInsert (Cluster). `vmalert` will persist alerts state * `-remoteWrite.url` - URL to VictoriaMetrics (Single) or VMInsert (Cluster). `vmalert` will persist alerts state
into the configured address in the form of time series named `ALERTS` and `ALERTS_FOR_STATE` via remote-write protocol. into the configured address in the form of time series named `ALERTS` and `ALERTS_FOR_STATE` via remote-write protocol.
These are regular time series and may be queried from VM just as any other time series. These are regular time series and may be queried from VM just as any other time series.
The state stored to the configured address on every rule evaluation. The state stored to the configured address on every rule evaluation.
* `-remoteRead.url` - URL to VictoriaMetrics (Single) or VMSelect (Cluster). `vmalert` will try to restore alerts state * `-remoteRead.url` - URL to VictoriaMetrics (Single) or VMSelect (Cluster). `vmalert` will try to restore alerts state
from configured address by querying time series with name `ALERTS_FOR_STATE`. from configured address by querying time series with name `ALERTS_FOR_STATE`.
Both flags are required for the proper state restoring. Restore process may fail if time series are missing Both flags are required for the proper state restoring. Restore process may fail if time series are missing
in configured `-remoteRead.url`, weren't updated in the last `1h` or received state doesn't match current `vmalert` in configured `-remoteRead.url`, weren't updated in the last `1h` or received state doesn't match current `vmalert`
rules configuration. rules configuration.
@ -232,7 +232,7 @@ The shortlist of configuration flags is the following:
How often to evaluate the rules (default 1m0s) How often to evaluate the rules (default 1m0s)
-external.alert.source string -external.alert.source string
External Alert Source allows to override the Source link for alerts sent to AlertManager for cases where you want to build a custom link to Grafana, Prometheus or any other service. External Alert Source allows to override the Source link for alerts sent to AlertManager for cases where you want to build a custom link to Grafana, Prometheus or any other service.
eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{{$expr|quotesEscape|crlfEscape|pathEscape}}\"},{\"mode\":\"Metrics\"},{\"ui\":[true,true,true,\"none\"]}]'.If empty '/api/v1/:groupID/alertID/status' is used eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{{$expr|quotesEscape|crlfEscape|queryEscape}}\"},{\"mode\":\"Metrics\"},{\"ui\":[true,true,true,\"none\"]}]'.If empty '/api/v1/:groupID/alertID/status' is used
-external.label array -external.label array
Optional label in the form 'name=value' to add to all generated recording rules and alerts. Pass multiple -label flags in order to add multiple label sets. Optional label in the form 'name=value' to add to all generated recording rules and alerts. Pass multiple -label flags in order to add multiple label sets.
Supports array of values separated by comma or specified via multiple flags. Supports array of values separated by comma or specified via multiple flags.
@ -272,9 +272,9 @@ The shortlist of configuration flags is the following:
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC") Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int -loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
-memory.allowedBytes value -memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0) Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float -memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60) Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
-metricsAuthKey string -metricsAuthKey string
@ -348,11 +348,11 @@ The shortlist of configuration flags is the following:
-remoteWrite.url string -remoteWrite.url string
Optional URL to Victoria Metrics or VMInsert where to persist alerts state and recording rules results in form of timeseries. E.g. http://127.0.0.1:8428 Optional URL to Victoria Metrics or VMInsert where to persist alerts state and recording rules results in form of timeseries. E.g. http://127.0.0.1:8428
-rule array -rule array
Path to the file with alert rules. Path to the file with alert rules.
Supports patterns. Flag can be specified multiple times. Supports patterns. Flag can be specified multiple times.
Examples: Examples:
-rule="/path/to/file". Path to a single file with alerting rules -rule="/path/to/file". Path to a single file with alerting rules
-rule="dir/*.yaml" -rule="/*.yaml". Relative path to all .yaml files in "dir" folder, -rule="dir/*.yaml" -rule="/*.yaml". Relative path to all .yaml files in "dir" folder,
absolute path to all .yaml files in root. absolute path to all .yaml files in root.
Rule files may contain %{ENV_VAR} placeholders, which are substituted by the corresponding env vars. Rule files may contain %{ENV_VAR} placeholders, which are substituted by the corresponding env vars.
Supports array of values separated by comma or specified via multiple flags. Supports array of values separated by comma or specified via multiple flags.
@ -370,7 +370,7 @@ The shortlist of configuration flags is the following:
Show VictoriaMetrics version Show VictoriaMetrics version
``` ```
Pass `-help` to `vmalert` in order to see the full list of supported Pass `-help` to `vmalert` in order to see the full list of supported
command-line flags with their descriptions. command-line flags with their descriptions.
To reload configuration without `vmalert` restart send SIGHUP signal To reload configuration without `vmalert` restart send SIGHUP signal
@ -379,13 +379,13 @@ or send GET request to `/-/reload` endpoint.
### Contributing ### Contributing
`vmalert` is mostly designed and built by VictoriaMetrics community. `vmalert` is mostly designed and built by VictoriaMetrics community.
Feel free to share your experience and ideas for improving this Feel free to share your experience and ideas for improving this
software. Please keep simplicity as the main priority. software. Please keep simplicity as the main priority.
### How to build from sources ### How to build from sources
It is recommended using It is recommended using
[binary releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) [binary releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases)
- `vmalert` is located in `vmutils-*` archives there. - `vmalert` is located in `vmutils-*` archives there.

View file

@ -208,9 +208,9 @@ See the docs at https://victoriametrics.github.io/vmauth.html .
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC") Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int -loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
-memory.allowedBytes value -memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0) Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float -memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60) Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
-metricsAuthKey string -metricsAuthKey string

View file

@ -205,12 +205,12 @@ See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time-
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC") Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int -loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
-maxBytesPerSecond value -maxBytesPerSecond size
The maximum upload speed. There is no limit if it is set to 0 The maximum upload speed. There is no limit if it is set to 0
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0) Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedBytes value -memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0) Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float -memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60) Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
-origin string -origin string

View file

@ -105,12 +105,12 @@ i.e. the end result would be similar to [rsync --delete](https://askubuntu.com/q
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC") Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int -loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
-maxBytesPerSecond value -maxBytesPerSecond size
The maximum download speed. There is no limit if it is set to 0 The maximum download speed. There is no limit if it is set to 0
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0) Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedBytes value -memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0) Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float -memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60) Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
-skipBackupCompleteCheck -skipBackupCompleteCheck

2
go.mod
View file

@ -7,7 +7,7 @@ require (
// Do not use the original github.com/valyala/fasthttp because of issues // Do not use the original github.com/valyala/fasthttp because of issues
// like https://github.com/valyala/fasthttp/commit/996610f021ff45fdc98c2ce7884d5fa4e7f9199b // like https://github.com/valyala/fasthttp/commit/996610f021ff45fdc98c2ce7884d5fa4e7f9199b
github.com/VictoriaMetrics/fasthttp v1.0.13 github.com/VictoriaMetrics/fasthttp v1.0.14
github.com/VictoriaMetrics/metrics v1.15.2 github.com/VictoriaMetrics/metrics v1.15.2
github.com/VictoriaMetrics/metricsql v0.14.0 github.com/VictoriaMetrics/metricsql v0.14.0
github.com/aws/aws-sdk-go v1.37.26 github.com/aws/aws-sdk-go v1.37.26

5
go.sum
View file

@ -82,8 +82,8 @@ github.com/Shopify/sarama v1.19.0/go.mod h1:FVkBWblsNy7DGZRfXLU0O9RCGt5g3g3yEuWX
github.com/Shopify/toxiproxy v2.1.4+incompatible/go.mod h1:OXgGpZ6Cli1/URJOF1DMxUHB2q5Ap20/P/eIdh4G0pI= github.com/Shopify/toxiproxy v2.1.4+incompatible/go.mod h1:OXgGpZ6Cli1/URJOF1DMxUHB2q5Ap20/P/eIdh4G0pI=
github.com/VictoriaMetrics/fastcache v1.5.8 h1:XW+YVx9lEXITBVv35ugK9OyotdNJVcbza69o3jmqWuI= github.com/VictoriaMetrics/fastcache v1.5.8 h1:XW+YVx9lEXITBVv35ugK9OyotdNJVcbza69o3jmqWuI=
github.com/VictoriaMetrics/fastcache v1.5.8/go.mod h1:SiMZNgwEPJ9qWLshu9tyuE6bKc9ZWYhcNV/L7jurprQ= github.com/VictoriaMetrics/fastcache v1.5.8/go.mod h1:SiMZNgwEPJ9qWLshu9tyuE6bKc9ZWYhcNV/L7jurprQ=
github.com/VictoriaMetrics/fasthttp v1.0.13 h1:5JNS4vSPdN4QyfcpAg3Y1Wznf0uXEuSOFpeIlFw3MgM= github.com/VictoriaMetrics/fasthttp v1.0.14 h1:iWCdHg7JQ1SO0xvPAgw3QFpFT3he+Ugdshg+1clN6CQ=
github.com/VictoriaMetrics/fasthttp v1.0.13/go.mod h1:3SeUL4zwB/p/a9aEeRc6gdlbrtNHXBJR6N376EgiSHU= github.com/VictoriaMetrics/fasthttp v1.0.14/go.mod h1:eDVgYyGts3xXpYpVGDxQ3ZlQKW5TSvOqfc9FryjH1JA=
github.com/VictoriaMetrics/metrics v1.12.2/go.mod h1:Z1tSfPfngDn12bTfZSCqArT3OPY3u88J12hSoOhuiRE= github.com/VictoriaMetrics/metrics v1.12.2/go.mod h1:Z1tSfPfngDn12bTfZSCqArT3OPY3u88J12hSoOhuiRE=
github.com/VictoriaMetrics/metrics v1.15.2 h1:w/GD8L9tm+gvx1oZvAofRRXwammiicdI0jgLghA2Gdo= github.com/VictoriaMetrics/metrics v1.15.2 h1:w/GD8L9tm+gvx1oZvAofRRXwammiicdI0jgLghA2Gdo=
github.com/VictoriaMetrics/metrics v1.15.2/go.mod h1:Z1tSfPfngDn12bTfZSCqArT3OPY3u88J12hSoOhuiRE= github.com/VictoriaMetrics/metrics v1.15.2/go.mod h1:Z1tSfPfngDn12bTfZSCqArT3OPY3u88J12hSoOhuiRE=
@ -507,7 +507,6 @@ github.com/klauspost/compress v1.4.0/go.mod h1:RyIbtBH6LamlWaDj8nUwkbUhJ87Yi3uG0
github.com/klauspost/compress v1.9.5/go.mod h1:RyIbtBH6LamlWaDj8nUwkbUhJ87Yi3uG0guNDohfE1A= github.com/klauspost/compress v1.9.5/go.mod h1:RyIbtBH6LamlWaDj8nUwkbUhJ87Yi3uG0guNDohfE1A=
github.com/klauspost/compress v1.10.7/go.mod h1:aoV0uJVorq1K+umq18yTdKaF57EivdYsUV+/s2qKfXs= github.com/klauspost/compress v1.10.7/go.mod h1:aoV0uJVorq1K+umq18yTdKaF57EivdYsUV+/s2qKfXs=
github.com/klauspost/compress v1.11.0/go.mod h1:aoV0uJVorq1K+umq18yTdKaF57EivdYsUV+/s2qKfXs= github.com/klauspost/compress v1.11.0/go.mod h1:aoV0uJVorq1K+umq18yTdKaF57EivdYsUV+/s2qKfXs=
github.com/klauspost/compress v1.11.3/go.mod h1:aoV0uJVorq1K+umq18yTdKaF57EivdYsUV+/s2qKfXs=
github.com/klauspost/compress v1.11.12 h1:famVnQVu7QwryBN4jNseQdUKES71ZAOnB6UQQJPZvqk= github.com/klauspost/compress v1.11.12 h1:famVnQVu7QwryBN4jNseQdUKES71ZAOnB6UQQJPZvqk=
github.com/klauspost/compress v1.11.12/go.mod h1:aoV0uJVorq1K+umq18yTdKaF57EivdYsUV+/s2qKfXs= github.com/klauspost/compress v1.11.12/go.mod h1:aoV0uJVorq1K+umq18yTdKaF57EivdYsUV+/s2qKfXs=
github.com/klauspost/cpuid v0.0.0-20170728055534-ae7887de9fa5/go.mod h1:Pj4uuM528wm8OyEC2QMXAi2YiTZ96dNQPGgoMS4s3ek= github.com/klauspost/cpuid v0.0.0-20170728055534-ae7887de9fa5/go.mod h1:Pj4uuM528wm8OyEC2QMXAi2YiTZ96dNQPGgoMS4s3ek=

View file

@ -9,7 +9,7 @@ import (
// NewBytes returns new `bytes` flag with the given name, defaultValue and description. // NewBytes returns new `bytes` flag with the given name, defaultValue and description.
func NewBytes(name string, defaultValue int, description string) *Bytes { func NewBytes(name string, defaultValue int, description string) *Bytes {
description += "\nSupports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB" description += "\nSupports the following optional suffixes for `size` values: KB, MB, GB, KiB, MiB, GiB"
b := Bytes{ b := Bytes{
N: defaultValue, N: defaultValue,
valueString: fmt.Sprintf("%d", defaultValue), valueString: fmt.Sprintf("%d", defaultValue),

View file

@ -12,6 +12,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/buildinfo" "github.com/VictoriaMetrics/VictoriaMetrics/lib/buildinfo"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil" "github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil" "github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/memory" "github.com/VictoriaMetrics/VictoriaMetrics/lib/memory"
"github.com/VictoriaMetrics/metrics" "github.com/VictoriaMetrics/metrics"
@ -48,6 +49,8 @@ func writePrometheusMetrics(w io.Writer) {
fmt.Fprintf(w, "vm_app_version{version=%q, short_version=%q} 1\n", buildinfo.Version, fmt.Fprintf(w, "vm_app_version{version=%q, short_version=%q} 1\n", buildinfo.Version,
versionRe.FindString(buildinfo.Version)) versionRe.FindString(buildinfo.Version))
fmt.Fprintf(w, "vm_allowed_memory_bytes %d\n", memory.Allowed()) fmt.Fprintf(w, "vm_allowed_memory_bytes %d\n", memory.Allowed())
fmt.Fprintf(w, "vm_available_memory_bytes %d\n", memory.Allowed()+memory.Remaining())
fmt.Fprintf(w, "vm_available_cpu_cores %d\n", cgroup.AvailableCPUs())
// Export start time and uptime in seconds // Export start time and uptime in seconds
fmt.Fprintf(w, "vm_app_start_timestamp %d\n", startTime.Unix()) fmt.Fprintf(w, "vm_app_start_timestamp %d\n", startTime.Unix())

View file

@ -0,0 +1,29 @@
package influxutils
import (
"fmt"
"net/http"
"strings"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
)
var influxDatabaseNames = flagutil.NewArray("influx.databaseNames", "Comma-separated list of database names to return from /query and /influx/query API. "+
"This can be needed for accepting data from Telegraf plugins such as https://github.com/fangli/fluent-plugin-influxdb")
// WriteDatabaseNames writes influxDatabaseNames to w.
func WriteDatabaseNames(w http.ResponseWriter) {
// Emulate fake response for influx query.
// This is required for TSBS benchmark and some Telegraf plugins.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1124
w.Header().Set("Content-Type", "application/json; charset=utf-8")
dbNames := *influxDatabaseNames
if len(dbNames) == 0 {
dbNames = []string{"_internal"}
}
dbs := make([]string, len(dbNames))
for i := range dbNames {
dbs[i] = fmt.Sprintf(`[%q]`, dbNames[i])
}
fmt.Fprintf(w, `{"results":[{"statement_id":0,"series":[{"name":"databases","columns":["name"],"values":[%s]}]}]}`, strings.Join(dbs, ","))
}

View file

@ -15,7 +15,9 @@ import (
// ... // ...
// pools[n] is for capacities from 2^(n+2)+1 to 2^(n+3) // pools[n] is for capacities from 2^(n+2)+1 to 2^(n+3)
// //
var pools [30]sync.Pool // Limit the maximum capacity to 2^18, since there are no performance benefits
// in caching byte slices with bigger capacities.
var pools [17]sync.Pool
// Get returns byte buffer with the given capacity. // Get returns byte buffer with the given capacity.
func Get(capacity int) *bytesutil.ByteBuffer { func Get(capacity int) *bytesutil.ByteBuffer {
@ -37,9 +39,11 @@ func Get(capacity int) *bytesutil.ByteBuffer {
// Put returns bb to the pool. // Put returns bb to the pool.
func Put(bb *bytesutil.ByteBuffer) { func Put(bb *bytesutil.ByteBuffer) {
capacity := cap(bb.B) capacity := cap(bb.B)
id, _ := getPoolIDAndCapacity(capacity) id, poolCapacity := getPoolIDAndCapacity(capacity)
bb.Reset() if capacity <= poolCapacity {
pools[id].Put(bb) bb.Reset()
pools[id].Put(bb)
}
} }
func getPoolIDAndCapacity(size int) (int, int) { func getPoolIDAndCapacity(size int) (int, int) {
@ -49,7 +53,7 @@ func getPoolIDAndCapacity(size int) (int, int) {
} }
size >>= 3 size >>= 3
id := bits.Len(uint(size)) id := bits.Len(uint(size))
if id > len(pools) { if id >= len(pools) {
id = len(pools) - 1 id = len(pools) - 1
} }
return id, (1 << (id + 3)) return id, (1 << (id + 3))

View file

@ -65,13 +65,16 @@ func (ac *Config) tlsCertificateString() string {
// NewTLSConfig returns new TLS config for the given ac. // NewTLSConfig returns new TLS config for the given ac.
func (ac *Config) NewTLSConfig() *tls.Config { func (ac *Config) NewTLSConfig() *tls.Config {
tlsCfg := &tls.Config{ tlsCfg := &tls.Config{
RootCAs: ac.TLSRootCA,
ClientSessionCache: tls.NewLRUClientSessionCache(0), ClientSessionCache: tls.NewLRUClientSessionCache(0),
} }
if ac == nil {
return tlsCfg
}
if ac.TLSCertificate != nil { if ac.TLSCertificate != nil {
// Do not set tlsCfg.GetClientCertificate, since tlsCfg.Certificates should work OK. // Do not set tlsCfg.GetClientCertificate, since tlsCfg.Certificates should work OK.
tlsCfg.Certificates = []tls.Certificate{*ac.TLSCertificate} tlsCfg.Certificates = []tls.Certificate{*ac.TLSCertificate}
} }
tlsCfg.RootCAs = ac.TLSRootCA
tlsCfg.ServerName = ac.TLSServerName tlsCfg.ServerName = ac.TLSServerName
tlsCfg.InsecureSkipVerify = ac.TLSInsecureSkipVerify tlsCfg.InsecureSkipVerify = ac.TLSInsecureSkipVerify
return tlsCfg return tlsCfg

View file

@ -27,11 +27,11 @@ var (
"It is possible to set 'disable_compression: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control") "It is possible to set 'disable_compression: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control")
disableKeepAlive = flag.Bool("promscrape.disableKeepAlive", false, "Whether to disable HTTP keep-alive connections when scraping all the targets. "+ disableKeepAlive = flag.Bool("promscrape.disableKeepAlive", false, "Whether to disable HTTP keep-alive connections when scraping all the targets. "+
"This may be useful when targets has no support for HTTP keep-alive connection. "+ "This may be useful when targets has no support for HTTP keep-alive connection. "+
"It is possible to set `disable_keepalive: true` individually per each 'scrape_config` section in '-promscrape.config' for fine grained control. "+ "It is possible to set 'disable_keepalive: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control. "+
"Note that disabling HTTP keep-alive may increase load on both vmagent and scrape targets") "Note that disabling HTTP keep-alive may increase load on both vmagent and scrape targets")
streamParse = flag.Bool("promscrape.streamParse", false, "Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful "+ streamParse = flag.Bool("promscrape.streamParse", false, "Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful "+
"for reducing memory usage when millions of metrics are exposed per each scrape target. "+ "for reducing memory usage when millions of metrics are exposed per each scrape target. "+
"It is posible to set `stream_parse: true` individually per each `scrape_config` section in `-promscrape.config` for fine grained control") "It is posible to set 'stream_parse: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control")
) )
type client struct { type client struct {
@ -67,7 +67,7 @@ func newClient(sw *ScrapeWork) *client {
host += ":443" host += ":443"
} }
} }
dialFunc, err := newStatDialFunc(sw.ProxyURL, tlsCfg) dialFunc, err := newStatDialFunc(sw.ProxyURL, sw.ProxyAuthConfig)
if err != nil { if err != nil {
logger.Fatalf("cannot create dial func: %s", err) logger.Fatalf("cannot create dial func: %s", err)
} }

View file

@ -110,11 +110,15 @@ type ScrapeConfig struct {
SampleLimit int `yaml:"sample_limit,omitempty"` SampleLimit int `yaml:"sample_limit,omitempty"`
// These options are supported only by lib/promscrape. // These options are supported only by lib/promscrape.
DisableCompression bool `yaml:"disable_compression,omitempty"` DisableCompression bool `yaml:"disable_compression,omitempty"`
DisableKeepAlive bool `yaml:"disable_keepalive,omitempty"` DisableKeepAlive bool `yaml:"disable_keepalive,omitempty"`
StreamParse bool `yaml:"stream_parse,omitempty"` StreamParse bool `yaml:"stream_parse,omitempty"`
ScrapeAlignInterval time.Duration `yaml:"scrape_align_interval,omitempty"` ScrapeAlignInterval time.Duration `yaml:"scrape_align_interval,omitempty"`
ScrapeOffset time.Duration `yaml:"scrape_offset,omitempty"` ScrapeOffset time.Duration `yaml:"scrape_offset,omitempty"`
ProxyTLSConfig *promauth.TLSConfig `yaml:"proxy_tls_config,omitempty"`
ProxyBasicAuth *promauth.BasicAuthConfig `yaml:"proxy_basic_auth,omitempty"`
ProxyBearerToken string `yaml:"proxy_bearer_token,omitempty"`
ProxyBearerTokenFile string `yaml:"proxy_bearer_token_file,omitempty"`
// This is set in loadConfig // This is set in loadConfig
swc *scrapeWorkConfig swc *scrapeWorkConfig
@ -247,7 +251,7 @@ func (cfg *Config) getKubernetesSDScrapeWork(prev []*ScrapeWork) []*ScrapeWork {
target := metaLabels["__address__"] target := metaLabels["__address__"]
sw, err := sc.swc.getScrapeWork(target, nil, metaLabels) sw, err := sc.swc.getScrapeWork(target, nil, metaLabels)
if err != nil { if err != nil {
logger.Errorf("cannot create kubernetes_sd_config target target %q for job_name %q: %s", target, sc.swc.jobName, err) logger.Errorf("cannot create kubernetes_sd_config target %q for job_name %q: %s", target, sc.swc.jobName, err)
return nil return nil
} }
return sw return sw
@ -543,6 +547,10 @@ func getScrapeWorkConfig(sc *ScrapeConfig, baseDir string, globalCfg *GlobalConf
if err != nil { if err != nil {
return nil, fmt.Errorf("cannot parse auth config for `job_name` %q: %w", jobName, err) return nil, fmt.Errorf("cannot parse auth config for `job_name` %q: %w", jobName, err)
} }
proxyAC, err := promauth.NewConfig(baseDir, sc.ProxyBasicAuth, sc.ProxyBearerToken, sc.ProxyBearerTokenFile, sc.ProxyTLSConfig)
if err != nil {
return nil, fmt.Errorf("cannot parse proxy auth config for `job_name` %q: %w", jobName, err)
}
relabelConfigs, err := promrelabel.ParseRelabelConfigs(sc.RelabelConfigs) relabelConfigs, err := promrelabel.ParseRelabelConfigs(sc.RelabelConfigs)
if err != nil { if err != nil {
return nil, fmt.Errorf("cannot parse `relabel_configs` for `job_name` %q: %w", jobName, err) return nil, fmt.Errorf("cannot parse `relabel_configs` for `job_name` %q: %w", jobName, err)
@ -559,6 +567,7 @@ func getScrapeWorkConfig(sc *ScrapeConfig, baseDir string, globalCfg *GlobalConf
scheme: scheme, scheme: scheme,
params: params, params: params,
proxyURL: sc.ProxyURL, proxyURL: sc.ProxyURL,
proxyAuthConfig: proxyAC,
authConfig: ac, authConfig: ac,
honorLabels: honorLabels, honorLabels: honorLabels,
honorTimestamps: honorTimestamps, honorTimestamps: honorTimestamps,
@ -583,6 +592,7 @@ type scrapeWorkConfig struct {
scheme string scheme string
params map[string][]string params map[string][]string
proxyURL proxy.URL proxyURL proxy.URL
proxyAuthConfig *promauth.Config
authConfig *promauth.Config authConfig *promauth.Config
honorLabels bool honorLabels bool
honorTimestamps bool honorTimestamps bool
@ -849,6 +859,7 @@ func (swc *scrapeWorkConfig) getScrapeWork(target string, extraLabels, metaLabel
OriginalLabels: originalLabels, OriginalLabels: originalLabels,
Labels: labels, Labels: labels,
ProxyURL: swc.proxyURL, ProxyURL: swc.proxyURL,
ProxyAuthConfig: swc.proxyAuthConfig,
AuthConfig: swc.authConfig, AuthConfig: swc.authConfig,
MetricRelabelConfigs: swc.metricRelabelConfigs, MetricRelabelConfigs: swc.metricRelabelConfigs,
SampleLimit: swc.sampleLimit, SampleLimit: swc.sampleLimit,

View file

@ -10,6 +10,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth" "github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal" "github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/proxy"
) )
func TestNeedSkipScrapeWork(t *testing.T) { func TestNeedSkipScrapeWork(t *testing.T) {
@ -154,6 +155,7 @@ scrape_configs:
}, },
}, },
AuthConfig: &promauth.Config{}, AuthConfig: &promauth.Config{},
ProxyAuthConfig: &promauth.Config{},
jobNameOriginal: "blackbox", jobNameOriginal: "blackbox",
}} }}
if !reflect.DeepEqual(sws, swsExpected) { if !reflect.DeepEqual(sws, swsExpected) {
@ -548,6 +550,7 @@ scrape_configs:
}, },
}, },
AuthConfig: &promauth.Config{}, AuthConfig: &promauth.Config{},
ProxyAuthConfig: &promauth.Config{},
jobNameOriginal: "foo", jobNameOriginal: "foo",
}, },
{ {
@ -587,6 +590,7 @@ scrape_configs:
}, },
}, },
AuthConfig: &promauth.Config{}, AuthConfig: &promauth.Config{},
ProxyAuthConfig: &promauth.Config{},
jobNameOriginal: "foo", jobNameOriginal: "foo",
}, },
{ {
@ -626,6 +630,7 @@ scrape_configs:
}, },
}, },
AuthConfig: &promauth.Config{}, AuthConfig: &promauth.Config{},
ProxyAuthConfig: &promauth.Config{},
jobNameOriginal: "foo", jobNameOriginal: "foo",
}, },
}) })
@ -679,6 +684,7 @@ scrape_configs:
}, },
}, },
AuthConfig: &promauth.Config{}, AuthConfig: &promauth.Config{},
ProxyAuthConfig: &promauth.Config{},
jobNameOriginal: "foo", jobNameOriginal: "foo",
}, },
}) })
@ -729,6 +735,7 @@ scrape_configs:
}, },
}, },
AuthConfig: &promauth.Config{}, AuthConfig: &promauth.Config{},
ProxyAuthConfig: &promauth.Config{},
jobNameOriginal: "foo", jobNameOriginal: "foo",
}, },
}) })
@ -748,6 +755,10 @@ scrape_configs:
p: ["x&y", "="] p: ["x&y", "="]
xaa: xaa:
bearer_token: xyz bearer_token: xyz
proxy_url: http://foo.bar
proxy_basic_auth:
username: foo
password: bar
static_configs: static_configs:
- targets: ["foo.bar", "aaa"] - targets: ["foo.bar", "aaa"]
labels: labels:
@ -801,6 +812,10 @@ scrape_configs:
AuthConfig: &promauth.Config{ AuthConfig: &promauth.Config{
Authorization: "Bearer xyz", Authorization: "Bearer xyz",
}, },
ProxyAuthConfig: &promauth.Config{
Authorization: "Basic Zm9vOmJhcg==",
},
ProxyURL: proxy.MustNewURL("http://foo.bar"),
jobNameOriginal: "foo", jobNameOriginal: "foo",
}, },
{ {
@ -842,6 +857,10 @@ scrape_configs:
AuthConfig: &promauth.Config{ AuthConfig: &promauth.Config{
Authorization: "Bearer xyz", Authorization: "Bearer xyz",
}, },
ProxyAuthConfig: &promauth.Config{
Authorization: "Basic Zm9vOmJhcg==",
},
ProxyURL: proxy.MustNewURL("http://foo.bar"),
jobNameOriginal: "foo", jobNameOriginal: "foo",
}, },
{ {
@ -877,6 +896,7 @@ scrape_configs:
TLSServerName: "foobar", TLSServerName: "foobar",
TLSInsecureSkipVerify: true, TLSInsecureSkipVerify: true,
}, },
ProxyAuthConfig: &promauth.Config{},
jobNameOriginal: "qwer", jobNameOriginal: "qwer",
}, },
}) })
@ -955,6 +975,7 @@ scrape_configs:
}, },
}, },
AuthConfig: &promauth.Config{}, AuthConfig: &promauth.Config{},
ProxyAuthConfig: &promauth.Config{},
jobNameOriginal: "foo", jobNameOriginal: "foo",
}, },
}) })
@ -1017,6 +1038,7 @@ scrape_configs:
}, },
}, },
AuthConfig: &promauth.Config{}, AuthConfig: &promauth.Config{},
ProxyAuthConfig: &promauth.Config{},
jobNameOriginal: "foo", jobNameOriginal: "foo",
}, },
}) })
@ -1060,6 +1082,7 @@ scrape_configs:
}, },
}, },
AuthConfig: &promauth.Config{}, AuthConfig: &promauth.Config{},
ProxyAuthConfig: &promauth.Config{},
jobNameOriginal: "foo", jobNameOriginal: "foo",
}, },
}) })
@ -1099,7 +1122,8 @@ scrape_configs:
Value: "foo", Value: "foo",
}, },
}, },
AuthConfig: &promauth.Config{}, AuthConfig: &promauth.Config{},
ProxyAuthConfig: &promauth.Config{},
MetricRelabelConfigs: mustParseRelabelConfigs(` MetricRelabelConfigs: mustParseRelabelConfigs(`
- source_labels: [foo] - source_labels: [foo]
target_label: abc target_label: abc
@ -1145,6 +1169,7 @@ scrape_configs:
AuthConfig: &promauth.Config{ AuthConfig: &promauth.Config{
Authorization: "Basic eHl6OnNlY3JldC1wYXNz", Authorization: "Basic eHl6OnNlY3JldC1wYXNz",
}, },
ProxyAuthConfig: &promauth.Config{},
jobNameOriginal: "foo", jobNameOriginal: "foo",
}, },
}) })
@ -1184,6 +1209,7 @@ scrape_configs:
AuthConfig: &promauth.Config{ AuthConfig: &promauth.Config{
Authorization: "Bearer secret-pass", Authorization: "Bearer secret-pass",
}, },
ProxyAuthConfig: &promauth.Config{},
jobNameOriginal: "foo", jobNameOriginal: "foo",
}, },
}) })
@ -1229,6 +1255,7 @@ scrape_configs:
AuthConfig: &promauth.Config{ AuthConfig: &promauth.Config{
TLSCertificate: &snakeoilCert, TLSCertificate: &snakeoilCert,
}, },
ProxyAuthConfig: &promauth.Config{},
jobNameOriginal: "foo", jobNameOriginal: "foo",
}, },
}) })
@ -1291,6 +1318,7 @@ scrape_configs:
}, },
}, },
AuthConfig: &promauth.Config{}, AuthConfig: &promauth.Config{},
ProxyAuthConfig: &promauth.Config{},
jobNameOriginal: "aaa", jobNameOriginal: "aaa",
}, },
}) })
@ -1352,6 +1380,7 @@ scrape_configs:
}, },
}, },
AuthConfig: &promauth.Config{}, AuthConfig: &promauth.Config{},
ProxyAuthConfig: &promauth.Config{},
SampleLimit: 100, SampleLimit: 100,
DisableKeepAlive: true, DisableKeepAlive: true,
DisableCompression: true, DisableCompression: true,
@ -1398,6 +1427,7 @@ scrape_configs:
}, },
jobNameOriginal: "path wo slash", jobNameOriginal: "path wo slash",
AuthConfig: &promauth.Config{}, AuthConfig: &promauth.Config{},
ProxyAuthConfig: &promauth.Config{},
}, },
}) })
} }

View file

@ -15,7 +15,7 @@ import (
// SDCheckInterval is check interval for Consul service discovery. // SDCheckInterval is check interval for Consul service discovery.
var SDCheckInterval = flag.Duration("promscrape.consulSDCheckInterval", 30*time.Second, "Interval for checking for changes in Consul. "+ var SDCheckInterval = flag.Duration("promscrape.consulSDCheckInterval", 30*time.Second, "Interval for checking for changes in Consul. "+
"This works only if `consul_sd_configs` is configured in '-promscrape.config' file. "+ "This works only if consul_sd_configs is configured in '-promscrape.config' file. "+
"See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config for details") "See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config for details")
// consulWatcher is a watcher for consul api, updates services map in background with long-polling. // consulWatcher is a watcher for consul api, updates services map in background with long-polling.

View file

@ -1,21 +1,15 @@
package kubernetes package kubernetes
import ( import (
"flag"
"fmt" "fmt"
"net" "net"
"net/http"
"net/url"
"os" "os"
"strings" "strings"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth" "github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape/discoveryutils" "github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape/discoveryutils"
) )
var apiServerTimeout = flag.Duration("promscrape.kubernetes.apiServerTimeout", 30*time.Minute, "How frequently to reload the full state from Kuberntes API server")
// apiConfig contains config for API server // apiConfig contains config for API server
type apiConfig struct { type apiConfig struct {
aw *apiWatcher aw *apiWatcher
@ -36,6 +30,11 @@ func getAPIConfig(sdc *SDConfig, baseDir string, swcFunc ScrapeWorkConstructorFu
} }
func newAPIConfig(sdc *SDConfig, baseDir string, swcFunc ScrapeWorkConstructorFunc) (*apiConfig, error) { func newAPIConfig(sdc *SDConfig, baseDir string, swcFunc ScrapeWorkConstructorFunc) (*apiConfig, error) {
switch sdc.Role {
case "node", "pod", "service", "endpoints", "endpointslices", "ingress":
default:
return nil, fmt.Errorf("unexpected `role`: %q; must be one of `node`, `pod`, `service`, `endpoints`, `endpointslices` or `ingress`", sdc.Role)
}
ac, err := promauth.NewConfig(baseDir, sdc.BasicAuth, sdc.BearerToken, sdc.BearerTokenFile, sdc.TLSConfig) ac, err := promauth.NewConfig(baseDir, sdc.BasicAuth, sdc.BearerToken, sdc.BearerTokenFile, sdc.TLSConfig)
if err != nil { if err != nil {
return nil, fmt.Errorf("cannot parse auth config: %w", err) return nil, fmt.Errorf("cannot parse auth config: %w", err)
@ -75,20 +74,7 @@ func newAPIConfig(sdc *SDConfig, baseDir string, swcFunc ScrapeWorkConstructorFu
for strings.HasSuffix(apiServer, "/") { for strings.HasSuffix(apiServer, "/") {
apiServer = apiServer[:len(apiServer)-1] apiServer = apiServer[:len(apiServer)-1]
} }
var proxy func(*http.Request) (*url.URL, error) aw := newAPIWatcher(apiServer, ac, sdc, swcFunc)
if proxyURL := sdc.ProxyURL.URL(); proxyURL != nil {
proxy = http.ProxyURL(proxyURL)
}
client := &http.Client{
Transport: &http.Transport{
TLSClientConfig: ac.NewTLSConfig(),
Proxy: proxy,
TLSHandshakeTimeout: 10 * time.Second,
IdleConnTimeout: *apiServerTimeout,
},
Timeout: *apiServerTimeout,
}
aw := newAPIWatcher(client, apiServer, ac.Authorization, sdc.Namespaces.Names, sdc.Selectors, swcFunc)
cfg := &apiConfig{ cfg := &apiConfig{
aw: aw, aw: aw,
} }

View file

@ -1,9 +1,9 @@
package kubernetes package kubernetes
import ( import (
"context"
"encoding/json" "encoding/json"
"errors" "errors"
"flag"
"fmt" "fmt"
"io" "io"
"io/ioutil" "io/ioutil"
@ -16,9 +16,12 @@ import (
"time" "time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger" "github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
"github.com/VictoriaMetrics/metrics" "github.com/VictoriaMetrics/metrics"
) )
var apiServerTimeout = flag.Duration("promscrape.kubernetes.apiServerTimeout", 30*time.Minute, "How frequently to reload the full state from Kuberntes API server")
// WatchEvent is a watch event returned from API server endpoints if `watch=1` query arg is set. // WatchEvent is a watch event returned from API server endpoints if `watch=1` query arg is set.
// //
// See https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes // See https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes
@ -30,282 +33,75 @@ type WatchEvent struct {
// object is any Kubernetes object. // object is any Kubernetes object.
type object interface { type object interface {
key() string key() string
getTargetLabels(aw *apiWatcher) []map[string]string getTargetLabels(gw *groupWatcher) []map[string]string
} }
// parseObjectFunc must parse object from the given data. // parseObjectFunc must parse object from the given data.
type parseObjectFunc func(data []byte) (object, error) type parseObjectFunc func(data []byte) (object, error)
// parseObjectListFunc must parse objectList from the given data. // parseObjectListFunc must parse objectList from the given r.
type parseObjectListFunc func(data []byte) (map[string]object, ListMeta, error) type parseObjectListFunc func(r io.Reader) (map[string]object, ListMeta, error)
// apiWatcher is used for watching for Kuberntes object changes and caching their latest states. // apiWatcher is used for watching for Kuberntes object changes and caching their latest states.
type apiWatcher struct { type apiWatcher struct {
// The client used for watching for object changes role string
client *http.Client
// Kubenetes API server address in the form http://api-server // Constructor for creating ScrapeWork objects from labels
apiServer string
// The contents for `Authorization` HTTP request header
authorization string
// Namespaces to watch
namespaces []string
// Selectors to apply during watch
selectors []Selector
// Constructor for creating ScrapeWork objects from labels.
swcFunc ScrapeWorkConstructorFunc swcFunc ScrapeWorkConstructorFunc
// mu protects watchersByURL gw *groupWatcher
mu sync.Mutex
// a map of watchers keyed by request urls // swos contains a map of ScrapeWork objects for the given apiWatcher
watchersByURL map[string]*urlWatcher swosByKey map[string][]interface{}
swosByKeyLock sync.Mutex
stopFunc func() swosCount *metrics.Counter
stopCtx context.Context }
wg sync.WaitGroup
func newAPIWatcher(apiServer string, ac *promauth.Config, sdc *SDConfig, swcFunc ScrapeWorkConstructorFunc) *apiWatcher {
namespaces := sdc.Namespaces.Names
selectors := sdc.Selectors
proxyURL := sdc.ProxyURL.URL()
gw := getGroupWatcher(apiServer, ac, namespaces, selectors, proxyURL)
return &apiWatcher{
role: sdc.Role,
swcFunc: swcFunc,
gw: gw,
swosByKey: make(map[string][]interface{}),
swosCount: metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_scrape_works{role=%q}`, sdc.Role)),
}
} }
func (aw *apiWatcher) mustStop() { func (aw *apiWatcher) mustStop() {
aw.stopFunc() aw.gw.unsubscribeAPIWatcher(aw)
aw.wg.Wait() aw.reloadScrapeWorks(make(map[string][]interface{}))
} }
func newAPIWatcher(client *http.Client, apiServer, authorization string, namespaces []string, selectors []Selector, swcFunc ScrapeWorkConstructorFunc) *apiWatcher { func (aw *apiWatcher) reloadScrapeWorks(swosByKey map[string][]interface{}) {
stopCtx, stopFunc := context.WithCancel(context.Background()) aw.swosByKeyLock.Lock()
return &apiWatcher{ aw.swosCount.Add(len(swosByKey) - len(aw.swosByKey))
apiServer: apiServer, aw.swosByKey = swosByKey
authorization: authorization, aw.swosByKeyLock.Unlock()
client: client,
namespaces: namespaces,
selectors: selectors,
swcFunc: swcFunc,
watchersByURL: make(map[string]*urlWatcher),
stopFunc: stopFunc,
stopCtx: stopCtx,
}
} }
// getScrapeWorkObjectsForRole returns all the ScrapeWork objects for the given role. func (aw *apiWatcher) setScrapeWorks(key string, labels []map[string]string) {
func (aw *apiWatcher) getScrapeWorkObjectsForRole(role string) []interface{} { swos := getScrapeWorkObjectsForLabels(aw.swcFunc, labels)
aw.startWatchersForRole(role) aw.swosByKeyLock.Lock()
var swos []interface{} if len(swos) > 0 {
aw.mu.Lock() aw.swosCount.Add(len(swos) - len(aw.swosByKey[key]))
for _, uw := range aw.watchersByURL { aw.swosByKey[key] = swos
if uw.role != role { } else {
continue aw.swosCount.Add(-len(aw.swosByKey[key]))
} delete(aw.swosByKey, key)
uw.mu.Lock()
for _, swosLocal := range uw.swosByKey {
swos = append(swos, swosLocal...)
}
uw.mu.Unlock()
} }
aw.mu.Unlock() aw.swosByKeyLock.Unlock()
return swos
} }
// getObjectByRole returns an object with the given (namespace, name) key and the given role. func (aw *apiWatcher) removeScrapeWorks(key string) {
func (aw *apiWatcher) getObjectByRole(role, namespace, name string) object { aw.swosByKeyLock.Lock()
if aw == nil { aw.swosCount.Add(-len(aw.swosByKey[key]))
return nil delete(aw.swosByKey, key)
} aw.swosByKeyLock.Unlock()
key := namespace + "/" + name
aw.startWatchersForRole(role)
var o object
aw.mu.Lock()
for _, uw := range aw.watchersByURL {
if uw.role != role {
continue
}
o = uw.objectsByKey.get(key)
if o != nil {
break
}
}
aw.mu.Unlock()
return o
}
func (aw *apiWatcher) startWatchersForRole(role string) {
parseObject, parseObjectList := getObjectParsersForRole(role)
paths := getAPIPaths(role, aw.namespaces, aw.selectors)
for _, path := range paths {
apiURL := aw.apiServer + path
aw.startWatcherForURL(role, apiURL, parseObject, parseObjectList)
}
}
func (aw *apiWatcher) startWatcherForURL(role, apiURL string, parseObject parseObjectFunc, parseObjectList parseObjectListFunc) {
aw.mu.Lock()
if aw.watchersByURL[apiURL] != nil {
// Watcher for the given path already exists.
aw.mu.Unlock()
return
}
uw := aw.newURLWatcher(role, apiURL, parseObject, parseObjectList)
aw.watchersByURL[apiURL] = uw
aw.mu.Unlock()
uw.watchersCount.Inc()
uw.watchersCreated.Inc()
uw.reloadObjects()
aw.wg.Add(1)
go func() {
defer aw.wg.Done()
logger.Infof("started watcher for %q", apiURL)
uw.watchForUpdates()
logger.Infof("stopped watcher for %q", apiURL)
uw.objectsByKey.decRef()
aw.mu.Lock()
delete(aw.watchersByURL, apiURL)
aw.mu.Unlock()
uw.watchersCount.Dec()
uw.watchersStopped.Inc()
}()
}
// needStop returns true if aw must be stopped.
func (aw *apiWatcher) needStop() bool {
select {
case <-aw.stopCtx.Done():
return true
default:
return false
}
}
// doRequest performs http request to the given requestURL.
func (aw *apiWatcher) doRequest(requestURL string) (*http.Response, error) {
req, err := http.NewRequestWithContext(aw.stopCtx, "GET", requestURL, nil)
if err != nil {
logger.Fatalf("cannot create a request for %q: %s", requestURL, err)
}
if aw.authorization != "" {
req.Header.Set("Authorization", aw.authorization)
}
return aw.client.Do(req)
}
// urlWatcher watches for an apiURL and updates object states in objectsByKey.
type urlWatcher struct {
role string
apiURL string
parseObject parseObjectFunc
parseObjectList parseObjectListFunc
// objectsByKey contains the latest state for objects obtained from apiURL
objectsByKey *objectsMap
// mu protects swosByKey and resourceVersion
mu sync.Mutex
swosByKey map[string][]interface{}
resourceVersion string
// the parent apiWatcher
aw *apiWatcher
watchersCount *metrics.Counter
watchersCreated *metrics.Counter
watchersStopped *metrics.Counter
}
func (aw *apiWatcher) newURLWatcher(role, apiURL string, parseObject parseObjectFunc, parseObjectList parseObjectListFunc) *urlWatcher {
return &urlWatcher{
role: role,
apiURL: apiURL,
parseObject: parseObject,
parseObjectList: parseObjectList,
objectsByKey: sharedObjectsGlobal.getByAPIURL(role, apiURL),
swosByKey: make(map[string][]interface{}),
aw: aw,
watchersCount: metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_url_watchers{role=%q}`, role)),
watchersCreated: metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_url_watchers_created_total{role=%q}`, role)),
watchersStopped: metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_url_watchers_stopped_total{role=%q}`, role)),
}
}
// Limit the concurrency for per-role objects reloading to 1.
//
// This should reduce memory usage when big number of watchers simultaneously receive an update for objects of the same role.
var reloadObjectsLocksByRole = map[string]*sync.Mutex{
"node": {},
"pod": {},
"service": {},
"endpoints": {},
"endpointslices": {},
"ingress": {},
}
func (uw *urlWatcher) resetResourceVersion() {
uw.mu.Lock()
uw.resourceVersion = ""
uw.mu.Unlock()
}
// reloadObjects reloads objects to the latest state and returns resourceVersion for the latest state.
func (uw *urlWatcher) reloadObjects() string {
lock := reloadObjectsLocksByRole[uw.role]
lock.Lock()
defer lock.Unlock()
uw.mu.Lock()
resourceVersion := uw.resourceVersion
uw.mu.Unlock()
if resourceVersion != "" {
// Fast path - objects have been already reloaded by concurrent goroutines.
return resourceVersion
}
aw := uw.aw
requestURL := uw.apiURL
resp, err := aw.doRequest(requestURL)
if err != nil {
if !aw.needStop() {
logger.Errorf("error when performing a request to %q: %s", requestURL, err)
}
return ""
}
body, _ := ioutil.ReadAll(resp.Body)
_ = resp.Body.Close()
if resp.StatusCode != http.StatusOK {
logger.Errorf("unexpected status code for request to %q: %d; want %d; response: %q", requestURL, resp.StatusCode, http.StatusOK, body)
return ""
}
objectsByKey, metadata, err := uw.parseObjectList(body)
if err != nil {
if !aw.needStop() {
logger.Errorf("cannot parse response from %q: %s", requestURL, err)
}
return ""
}
uw.objectsByKey.reload(objectsByKey)
swosByKey := make(map[string][]interface{})
for k, o := range objectsByKey {
labels := o.getTargetLabels(aw)
swos := getScrapeWorkObjectsForLabels(aw.swcFunc, labels)
if len(swos) > 0 {
swosByKey[k] = swos
}
}
uw.mu.Lock()
uw.swosByKey = swosByKey
uw.resourceVersion = metadata.ResourceVersion
uw.mu.Unlock()
return metadata.ResourceVersion
} }
func getScrapeWorkObjectsForLabels(swcFunc ScrapeWorkConstructorFunc, labelss []map[string]string) []interface{} { func getScrapeWorkObjectsForLabels(swcFunc ScrapeWorkConstructorFunc, labelss []map[string]string) []interface{} {
@ -320,11 +116,362 @@ func getScrapeWorkObjectsForLabels(swcFunc ScrapeWorkConstructorFunc, labelss []
return swos return swos
} }
// getScrapeWorkObjects returns all the ScrapeWork objects for the given aw.
func (aw *apiWatcher) getScrapeWorkObjects() []interface{} {
aw.gw.startWatchersForRole(aw.role, aw)
aw.swosByKeyLock.Lock()
defer aw.swosByKeyLock.Unlock()
size := 0
for _, swosLocal := range aw.swosByKey {
size += len(swosLocal)
}
swos := make([]interface{}, 0, size)
for _, swosLocal := range aw.swosByKey {
swos = append(swos, swosLocal...)
}
return swos
}
// groupWatcher watches for Kubernetes objects on the given apiServer with the given namespaces,
// selectors and authorization using the given client.
type groupWatcher struct {
apiServer string
namespaces []string
selectors []Selector
authorization string
client *http.Client
mu sync.Mutex
m map[string]*urlWatcher
}
func newGroupWatcher(apiServer string, ac *promauth.Config, namespaces []string, selectors []Selector, proxyURL *url.URL) *groupWatcher {
var proxy func(*http.Request) (*url.URL, error)
if proxyURL != nil {
proxy = http.ProxyURL(proxyURL)
}
client := &http.Client{
Transport: &http.Transport{
TLSClientConfig: ac.NewTLSConfig(),
Proxy: proxy,
TLSHandshakeTimeout: 10 * time.Second,
IdleConnTimeout: *apiServerTimeout,
},
Timeout: *apiServerTimeout,
}
return &groupWatcher{
apiServer: apiServer,
authorization: ac.Authorization,
namespaces: namespaces,
selectors: selectors,
client: client,
m: make(map[string]*urlWatcher),
}
}
func getGroupWatcher(apiServer string, ac *promauth.Config, namespaces []string, selectors []Selector, proxyURL *url.URL) *groupWatcher {
key := fmt.Sprintf("apiServer=%s, namespaces=%s, selectors=%s, proxyURL=%v, authConfig=%s",
apiServer, namespaces, selectorsKey(selectors), proxyURL, ac.String())
groupWatchersLock.Lock()
gw := groupWatchers[key]
if gw == nil {
gw = newGroupWatcher(apiServer, ac, namespaces, selectors, proxyURL)
groupWatchers[key] = gw
}
groupWatchersLock.Unlock()
return gw
}
func selectorsKey(selectors []Selector) string {
var sb strings.Builder
for _, s := range selectors {
fmt.Fprintf(&sb, "{role=%q, label=%q, field=%q}", s.Role, s.Label, s.Field)
}
return sb.String()
}
var (
groupWatchersLock sync.Mutex
groupWatchers = make(map[string]*groupWatcher)
_ = metrics.NewGauge(`vm_promscrape_discovery_kubernetes_group_watchers`, func() float64 {
groupWatchersLock.Lock()
n := len(groupWatchers)
groupWatchersLock.Unlock()
return float64(n)
})
)
// getObjectByRole returns an object with the given (namespace, name) key and the given role.
func (gw *groupWatcher) getObjectByRole(role, namespace, name string) object {
if gw == nil {
// this is needed for testing
return nil
}
key := namespace + "/" + name
gw.startWatchersForRole(role, nil)
gw.mu.Lock()
defer gw.mu.Unlock()
for _, uw := range gw.m {
if uw.role != role {
continue
}
uw.mu.Lock()
o := uw.objectsByKey[key]
uw.mu.Unlock()
if o != nil {
return o
}
}
return nil
}
func (gw *groupWatcher) startWatchersForRole(role string, aw *apiWatcher) {
paths := getAPIPaths(role, gw.namespaces, gw.selectors)
for _, path := range paths {
apiURL := gw.apiServer + path
gw.mu.Lock()
uw := gw.m[apiURL]
if uw == nil {
uw = newURLWatcher(role, apiURL, gw)
gw.m[apiURL] = uw
}
gw.mu.Unlock()
uw.subscribeAPIWatcher(aw)
}
}
func (gw *groupWatcher) reloadScrapeWorksForAPIWatchers(aws []*apiWatcher, objectsByKey map[string]object) {
if len(aws) == 0 {
return
}
swosByKey := make([]map[string][]interface{}, len(aws))
for i := range aws {
swosByKey[i] = make(map[string][]interface{})
}
for key, o := range objectsByKey {
labels := o.getTargetLabels(gw)
for i, aw := range aws {
swos := getScrapeWorkObjectsForLabels(aw.swcFunc, labels)
if len(swos) > 0 {
swosByKey[i][key] = swos
}
}
}
for i, aw := range aws {
aw.reloadScrapeWorks(swosByKey[i])
}
}
// doRequest performs http request to the given requestURL.
func (gw *groupWatcher) doRequest(requestURL string) (*http.Response, error) {
req, err := http.NewRequest("GET", requestURL, nil)
if err != nil {
logger.Fatalf("cannot create a request for %q: %s", requestURL, err)
}
if gw.authorization != "" {
req.Header.Set("Authorization", gw.authorization)
}
return gw.client.Do(req)
}
func (gw *groupWatcher) unsubscribeAPIWatcher(aw *apiWatcher) {
gw.mu.Lock()
for _, uw := range gw.m {
uw.unsubscribeAPIWatcher(aw)
}
gw.mu.Unlock()
}
// urlWatcher watches for an apiURL and updates object states in objectsByKey.
type urlWatcher struct {
role string
apiURL string
gw *groupWatcher
parseObject parseObjectFunc
parseObjectList parseObjectListFunc
// mu protects aws, awsPending, objectsByKey and resourceVersion
mu sync.Mutex
// aws contains registered apiWatcher objects
aws map[*apiWatcher]struct{}
// awsPending contains pending apiWatcher objects, which must be moved to aws in a batch
awsPending map[*apiWatcher]struct{}
// objectsByKey contains the latest state for objects obtained from apiURL
objectsByKey map[string]object
resourceVersion string
objectsCount *metrics.Counter
objectsAdded *metrics.Counter
objectsRemoved *metrics.Counter
objectsUpdated *metrics.Counter
}
func newURLWatcher(role, apiURL string, gw *groupWatcher) *urlWatcher {
parseObject, parseObjectList := getObjectParsersForRole(role)
metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_url_watchers{role=%q}`, role)).Inc()
uw := &urlWatcher{
role: role,
apiURL: apiURL,
gw: gw,
parseObject: parseObject,
parseObjectList: parseObjectList,
aws: make(map[*apiWatcher]struct{}),
awsPending: make(map[*apiWatcher]struct{}),
objectsByKey: make(map[string]object),
objectsCount: metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_objects{role=%q}`, role)),
objectsAdded: metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_objects_added_total{role=%q}`, role)),
objectsRemoved: metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_objects_removed_total{role=%q}`, role)),
objectsUpdated: metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_objects_updated_total{role=%q}`, role)),
}
logger.Infof("started %s watcher for %q", uw.role, uw.apiURL)
go uw.watchForUpdates()
go uw.processPendingSubscribers()
return uw
}
func (uw *urlWatcher) subscribeAPIWatcher(aw *apiWatcher) {
if aw == nil {
return
}
uw.mu.Lock()
if _, ok := uw.aws[aw]; !ok {
if _, ok := uw.awsPending[aw]; !ok {
uw.awsPending[aw] = struct{}{}
metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_subscibers{role=%q,type="pending"}`, uw.role)).Inc()
}
}
uw.mu.Unlock()
}
func (uw *urlWatcher) unsubscribeAPIWatcher(aw *apiWatcher) {
uw.mu.Lock()
if _, ok := uw.aws[aw]; ok {
delete(uw.aws, aw)
metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_subscibers{role=%q,type="permanent"}`, uw.role)).Dec()
} else if _, ok := uw.awsPending[aw]; ok {
delete(uw.awsPending, aw)
metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_subscibers{role=%q,type="pending"}`, uw.role)).Dec()
}
uw.mu.Unlock()
}
func (uw *urlWatcher) processPendingSubscribers() {
t := time.NewTicker(time.Second)
for range t.C {
var awsPending []*apiWatcher
var objectsByKey map[string]object
uw.mu.Lock()
if len(uw.awsPending) > 0 {
awsPending = getAPIWatchers(uw.awsPending)
for _, aw := range awsPending {
if _, ok := uw.aws[aw]; ok {
logger.Panicf("BUG: aw=%p already exists in uw.aws", aw)
}
uw.aws[aw] = struct{}{}
delete(uw.awsPending, aw)
}
objectsByKey = make(map[string]object, len(uw.objectsByKey))
for key, o := range uw.objectsByKey {
objectsByKey[key] = o
}
}
metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_subscibers{role=%q,type="pending"}`, uw.role)).Add(-len(awsPending))
metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_subscibers{role=%q,type="permanent"}`, uw.role)).Add(len(awsPending))
uw.mu.Unlock()
uw.gw.reloadScrapeWorksForAPIWatchers(awsPending, objectsByKey)
}
}
func (uw *urlWatcher) setResourceVersion(resourceVersion string) {
uw.mu.Lock()
uw.resourceVersion = resourceVersion
uw.mu.Unlock()
}
// reloadObjects reloads objects to the latest state and returns resourceVersion for the latest state.
func (uw *urlWatcher) reloadObjects() string {
uw.mu.Lock()
resourceVersion := uw.resourceVersion
uw.mu.Unlock()
if resourceVersion != "" {
// Fast path - there is no need in reloading the objects.
return resourceVersion
}
requestURL := uw.apiURL
resp, err := uw.gw.doRequest(requestURL)
if err != nil {
logger.Errorf("cannot perform request to %q: %s", requestURL, err)
return ""
}
if resp.StatusCode != http.StatusOK {
body, _ := ioutil.ReadAll(resp.Body)
_ = resp.Body.Close()
logger.Errorf("unexpected status code for request to %q: %d; want %d; response: %q", requestURL, resp.StatusCode, http.StatusOK, body)
return ""
}
objectsByKey, metadata, err := uw.parseObjectList(resp.Body)
_ = resp.Body.Close()
if err != nil {
logger.Errorf("cannot parse objects from %q: %s", requestURL, err)
return ""
}
uw.mu.Lock()
var updated, removed, added int
for key := range uw.objectsByKey {
if o, ok := objectsByKey[key]; ok {
uw.objectsByKey[key] = o
updated++
} else {
delete(uw.objectsByKey, key)
removed++
}
}
for key, o := range objectsByKey {
if _, ok := uw.objectsByKey[key]; !ok {
uw.objectsByKey[key] = o
added++
}
}
uw.objectsUpdated.Add(updated)
uw.objectsRemoved.Add(removed)
uw.objectsAdded.Add(added)
uw.objectsCount.Add(added - removed)
uw.resourceVersion = metadata.ResourceVersion
aws := getAPIWatchers(uw.aws)
uw.mu.Unlock()
uw.gw.reloadScrapeWorksForAPIWatchers(aws, objectsByKey)
logger.Infof("reloaded %d objects from %q", len(objectsByKey), requestURL)
return metadata.ResourceVersion
}
func getAPIWatchers(awsMap map[*apiWatcher]struct{}) []*apiWatcher {
aws := make([]*apiWatcher, 0, len(awsMap))
for aw := range awsMap {
aws = append(aws, aw)
}
return aws
}
// watchForUpdates watches for object updates starting from uw.resourceVersion and updates the corresponding objects to the latest state. // watchForUpdates watches for object updates starting from uw.resourceVersion and updates the corresponding objects to the latest state.
// //
// See https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes // See https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes
func (uw *urlWatcher) watchForUpdates() { func (uw *urlWatcher) watchForUpdates() {
aw := uw.aw
backoffDelay := time.Second backoffDelay := time.Second
maxBackoffDelay := 30 * time.Second maxBackoffDelay := 30 * time.Second
backoffSleep := func() { backoffSleep := func() {
@ -339,25 +486,19 @@ func (uw *urlWatcher) watchForUpdates() {
if strings.Contains(apiURL, "?") { if strings.Contains(apiURL, "?") {
delimiter = "&" delimiter = "&"
} }
timeoutSeconds := time.Duration(0.9 * float64(aw.client.Timeout)).Seconds() timeoutSeconds := time.Duration(0.9 * float64(uw.gw.client.Timeout)).Seconds()
apiURL += delimiter + "watch=1&timeoutSeconds=" + strconv.Itoa(int(timeoutSeconds)) apiURL += delimiter + "watch=1&allowWatchBookmarks=true&timeoutSeconds=" + strconv.Itoa(int(timeoutSeconds))
for { for {
if aw.needStop() {
return
}
resourceVersion := uw.reloadObjects() resourceVersion := uw.reloadObjects()
requestURL := apiURL if resourceVersion == "" {
if resourceVersion != "" { backoffSleep()
requestURL += "&resourceVersion=" + url.QueryEscape(resourceVersion) continue
} }
resp, err := aw.doRequest(requestURL) requestURL := apiURL + "&resourceVersion=" + url.QueryEscape(resourceVersion)
if err != nil { resp, err := uw.gw.doRequest(requestURL)
if aw.needStop() { if err != nil {
return logger.Errorf("cannot perform request to %q: %s", requestURL, err)
}
logger.Errorf("error when performing a request to %q: %s", requestURL, err)
backoffSleep() backoffSleep()
uw.resetResourceVersion()
continue continue
} }
if resp.StatusCode != http.StatusOK { if resp.StatusCode != http.StatusOK {
@ -367,24 +508,20 @@ func (uw *urlWatcher) watchForUpdates() {
if resp.StatusCode == 410 { if resp.StatusCode == 410 {
// There is no need for sleep on 410 error. See https://kubernetes.io/docs/reference/using-api/api-concepts/#410-gone-responses // There is no need for sleep on 410 error. See https://kubernetes.io/docs/reference/using-api/api-concepts/#410-gone-responses
backoffDelay = time.Second backoffDelay = time.Second
uw.setResourceVersion("")
} else { } else {
backoffSleep() backoffSleep()
} }
uw.resetResourceVersion()
continue continue
} }
backoffDelay = time.Second backoffDelay = time.Second
err = uw.readObjectUpdateStream(resp.Body) err = uw.readObjectUpdateStream(resp.Body)
_ = resp.Body.Close() _ = resp.Body.Close()
if err != nil { if err != nil {
if aw.needStop() {
return
}
if !errors.Is(err, io.EOF) { if !errors.Is(err, io.EOF) {
logger.Errorf("error when reading WatchEvent stream from %q: %s", requestURL, err) logger.Errorf("error when reading WatchEvent stream from %q: %s", requestURL, err)
} }
backoffSleep() backoffSleep()
uw.resetResourceVersion()
continue continue
} }
} }
@ -392,41 +529,79 @@ func (uw *urlWatcher) watchForUpdates() {
// readObjectUpdateStream reads Kuberntes watch events from r and updates locally cached objects according to the received events. // readObjectUpdateStream reads Kuberntes watch events from r and updates locally cached objects according to the received events.
func (uw *urlWatcher) readObjectUpdateStream(r io.Reader) error { func (uw *urlWatcher) readObjectUpdateStream(r io.Reader) error {
aw := uw.aw
d := json.NewDecoder(r) d := json.NewDecoder(r)
var we WatchEvent var we WatchEvent
for { for {
if err := d.Decode(&we); err != nil { if err := d.Decode(&we); err != nil {
return err return err
} }
o, err := uw.parseObject(we.Object)
if err != nil {
return err
}
key := o.key()
switch we.Type { switch we.Type {
case "ADDED", "MODIFIED": case "ADDED", "MODIFIED":
uw.objectsByKey.update(key, o) o, err := uw.parseObject(we.Object)
labels := o.getTargetLabels(aw) if err != nil {
swos := getScrapeWorkObjectsForLabels(aw.swcFunc, labels) return err
uw.mu.Lock()
if len(swos) > 0 {
uw.swosByKey[key] = swos
} else {
delete(uw.swosByKey, key)
} }
uw.mu.Unlock() key := o.key()
case "DELETED":
uw.objectsByKey.remove(key)
uw.mu.Lock() uw.mu.Lock()
delete(uw.swosByKey, key) if _, ok := uw.objectsByKey[key]; !ok {
uw.objectsCount.Inc()
uw.objectsAdded.Inc()
} else {
uw.objectsUpdated.Inc()
}
uw.objectsByKey[key] = o
aws := getAPIWatchers(uw.aws)
uw.mu.Unlock() uw.mu.Unlock()
labels := o.getTargetLabels(uw.gw)
for _, aw := range aws {
aw.setScrapeWorks(key, labels)
}
case "DELETED":
o, err := uw.parseObject(we.Object)
if err != nil {
return err
}
key := o.key()
uw.mu.Lock()
if _, ok := uw.objectsByKey[key]; ok {
uw.objectsCount.Dec()
uw.objectsRemoved.Inc()
delete(uw.objectsByKey, key)
}
aws := getAPIWatchers(uw.aws)
uw.mu.Unlock()
for _, aw := range aws {
aw.removeScrapeWorks(key)
}
case "BOOKMARK":
// See https://kubernetes.io/docs/reference/using-api/api-concepts/#watch-bookmarks
bm, err := parseBookmark(we.Object)
if err != nil {
return fmt.Errorf("cannot parse bookmark from %q: %w", we.Object, err)
}
uw.setResourceVersion(bm.Metadata.ResourceVersion)
default: default:
return fmt.Errorf("unexpected WatchEvent type %q for role %q", we.Type, uw.role) return fmt.Errorf("unexpected WatchEvent type %q for role %q", we.Type, uw.role)
} }
} }
} }
// Bookmark is a bookmark from Kubernetes Watch API.
// See https://kubernetes.io/docs/reference/using-api/api-concepts/#watch-bookmarks
type Bookmark struct {
Metadata struct {
ResourceVersion string
}
}
func parseBookmark(data []byte) (*Bookmark, error) {
var bm Bookmark
if err := json.Unmarshal(data, &bm); err != nil {
return nil, err
}
return &bm, nil
}
func getAPIPaths(role string, namespaces []string, selectors []Selector) []string { func getAPIPaths(role string, namespaces []string, selectors []Selector) []string {
objectName := getObjectNameByRole(role) objectName := getObjectNameByRole(role)
if objectName == "nodes" || len(namespaces) == 0 { if objectName == "nodes" || len(namespaces) == 0 {
@ -521,105 +696,3 @@ func getObjectParsersForRole(role string) (parseObjectFunc, parseObjectListFunc)
return nil, nil return nil, nil
} }
} }
type objectsMap struct {
mu sync.Mutex
refCount int
m map[string]object
objectsAdded *metrics.Counter
objectsRemoved *metrics.Counter
objectsCount *metrics.Counter
}
func (om *objectsMap) incRef() {
om.mu.Lock()
om.refCount++
om.mu.Unlock()
}
func (om *objectsMap) decRef() {
om.mu.Lock()
om.refCount--
if om.refCount < 0 {
logger.Panicf("BUG: refCount cannot be smaller than 0; got %d", om.refCount)
}
if om.refCount == 0 {
// Free up memory occupied by om.m
om.objectsRemoved.Add(len(om.m))
om.objectsCount.Add(-len(om.m))
om.m = make(map[string]object)
}
om.mu.Unlock()
}
func (om *objectsMap) reload(m map[string]object) {
om.mu.Lock()
om.objectsAdded.Add(len(m))
om.objectsRemoved.Add(len(om.m))
om.objectsCount.Add(len(m) - len(om.m))
for k := range om.m {
delete(om.m, k)
}
for k, o := range m {
om.m[k] = o
}
om.mu.Unlock()
}
func (om *objectsMap) update(key string, o object) {
om.mu.Lock()
if om.m[key] == nil {
om.objectsAdded.Inc()
om.objectsCount.Inc()
}
om.m[key] = o
om.mu.Unlock()
}
func (om *objectsMap) remove(key string) {
om.mu.Lock()
if om.m[key] != nil {
om.objectsRemoved.Inc()
om.objectsCount.Dec()
delete(om.m, key)
}
om.mu.Unlock()
}
func (om *objectsMap) get(key string) object {
om.mu.Lock()
o, ok := om.m[key]
om.mu.Unlock()
if !ok {
return nil
}
return o
}
type sharedObjects struct {
mu sync.Mutex
oms map[string]*objectsMap
}
func (so *sharedObjects) getByAPIURL(role, apiURL string) *objectsMap {
so.mu.Lock()
om := so.oms[apiURL]
if om == nil {
om = &objectsMap{
m: make(map[string]object),
objectsCount: metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_objects{role=%q}`, role)),
objectsAdded: metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_objects_added_total{role=%q}`, role)),
objectsRemoved: metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_objects_removed_total{role=%q}`, role)),
}
so.oms[apiURL] = om
}
so.mu.Unlock()
om.incRef()
return om
}
var sharedObjectsGlobal = &sharedObjects{
oms: make(map[string]*objectsMap),
}

View file

@ -160,3 +160,15 @@ func TestGetAPIPaths(t *testing.T) {
"/apis/networking.k8s.io/v1beta1/namespaces/y/ingresses?labelSelector=cde%2Cbaaa&fieldSelector=abc", "/apis/networking.k8s.io/v1beta1/namespaces/y/ingresses?labelSelector=cde%2Cbaaa&fieldSelector=abc",
}) })
} }
func TestParseBookmark(t *testing.T) {
data := `{"kind": "Pod", "apiVersion": "v1", "metadata": {"resourceVersion": "12746"} }`
bm, err := parseBookmark([]byte(data))
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
expectedResourceVersion := "12746"
if bm.Metadata.ResourceVersion != expectedResourceVersion {
t.Fatalf("unexpected resourceVersion; got %q; want %q", bm.Metadata.ResourceVersion, expectedResourceVersion)
}
}

View file

@ -3,6 +3,7 @@ package kubernetes
import ( import (
"encoding/json" "encoding/json"
"fmt" "fmt"
"io"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape/discoveryutils" "github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape/discoveryutils"
) )
@ -11,10 +12,11 @@ func (eps *Endpoints) key() string {
return eps.Metadata.key() return eps.Metadata.key()
} }
func parseEndpointsList(data []byte) (map[string]object, ListMeta, error) { func parseEndpointsList(r io.Reader) (map[string]object, ListMeta, error) {
var epsl EndpointsList var epsl EndpointsList
if err := json.Unmarshal(data, &epsl); err != nil { d := json.NewDecoder(r)
return nil, epsl.Metadata, fmt.Errorf("cannot unmarshal EndpointsList from %q: %w", data, err) if err := d.Decode(&epsl); err != nil {
return nil, epsl.Metadata, fmt.Errorf("cannot unmarshal EndpointsList: %w", err)
} }
objectsByKey := make(map[string]object) objectsByKey := make(map[string]object)
for _, eps := range epsl.Items { for _, eps := range epsl.Items {
@ -88,17 +90,17 @@ type EndpointPort struct {
// getTargetLabels returns labels for each endpoint in eps. // getTargetLabels returns labels for each endpoint in eps.
// //
// See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#endpoints // See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#endpoints
func (eps *Endpoints) getTargetLabels(aw *apiWatcher) []map[string]string { func (eps *Endpoints) getTargetLabels(gw *groupWatcher) []map[string]string {
var svc *Service var svc *Service
if o := aw.getObjectByRole("service", eps.Metadata.Namespace, eps.Metadata.Name); o != nil { if o := gw.getObjectByRole("service", eps.Metadata.Namespace, eps.Metadata.Name); o != nil {
svc = o.(*Service) svc = o.(*Service)
} }
podPortsSeen := make(map[*Pod][]int) podPortsSeen := make(map[*Pod][]int)
var ms []map[string]string var ms []map[string]string
for _, ess := range eps.Subsets { for _, ess := range eps.Subsets {
for _, epp := range ess.Ports { for _, epp := range ess.Ports {
ms = appendEndpointLabelsForAddresses(ms, aw, podPortsSeen, eps, ess.Addresses, epp, svc, "true") ms = appendEndpointLabelsForAddresses(ms, gw, podPortsSeen, eps, ess.Addresses, epp, svc, "true")
ms = appendEndpointLabelsForAddresses(ms, aw, podPortsSeen, eps, ess.NotReadyAddresses, epp, svc, "false") ms = appendEndpointLabelsForAddresses(ms, gw, podPortsSeen, eps, ess.NotReadyAddresses, epp, svc, "false")
} }
} }
@ -133,11 +135,11 @@ func (eps *Endpoints) getTargetLabels(aw *apiWatcher) []map[string]string {
return ms return ms
} }
func appendEndpointLabelsForAddresses(ms []map[string]string, aw *apiWatcher, podPortsSeen map[*Pod][]int, eps *Endpoints, func appendEndpointLabelsForAddresses(ms []map[string]string, gw *groupWatcher, podPortsSeen map[*Pod][]int, eps *Endpoints,
eas []EndpointAddress, epp EndpointPort, svc *Service, ready string) []map[string]string { eas []EndpointAddress, epp EndpointPort, svc *Service, ready string) []map[string]string {
for _, ea := range eas { for _, ea := range eas {
var p *Pod var p *Pod
if o := aw.getObjectByRole("pod", ea.TargetRef.Namespace, ea.TargetRef.Name); o != nil { if o := gw.getObjectByRole("pod", ea.TargetRef.Namespace, ea.TargetRef.Name); o != nil {
p = o.(*Pod) p = o.(*Pod)
} }
m := getEndpointLabelsForAddressAndPort(podPortsSeen, eps, ea, epp, p, svc, ready) m := getEndpointLabelsForAddressAndPort(podPortsSeen, eps, ea, epp, p, svc, ready)

View file

@ -1,6 +1,7 @@
package kubernetes package kubernetes
import ( import (
"bytes"
"testing" "testing"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal" "github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
@ -10,7 +11,8 @@ import (
func TestParseEndpointsListFailure(t *testing.T) { func TestParseEndpointsListFailure(t *testing.T) {
f := func(s string) { f := func(s string) {
t.Helper() t.Helper()
objectsByKey, _, err := parseEndpointsList([]byte(s)) r := bytes.NewBufferString(s)
objectsByKey, _, err := parseEndpointsList(r)
if err == nil { if err == nil {
t.Fatalf("expecting non-nil error") t.Fatalf("expecting non-nil error")
} }
@ -78,7 +80,8 @@ func TestParseEndpointsListSuccess(t *testing.T) {
] ]
} }
` `
objectsByKey, meta, err := parseEndpointsList([]byte(data)) r := bytes.NewBufferString(data)
objectsByKey, meta, err := parseEndpointsList(r)
if err != nil { if err != nil {
t.Fatalf("unexpected error: %s", err) t.Fatalf("unexpected error: %s", err)
} }

View file

@ -3,6 +3,7 @@ package kubernetes
import ( import (
"encoding/json" "encoding/json"
"fmt" "fmt"
"io"
"strconv" "strconv"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape/discoveryutils" "github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape/discoveryutils"
@ -12,10 +13,11 @@ func (eps *EndpointSlice) key() string {
return eps.Metadata.key() return eps.Metadata.key()
} }
func parseEndpointSliceList(data []byte) (map[string]object, ListMeta, error) { func parseEndpointSliceList(r io.Reader) (map[string]object, ListMeta, error) {
var epsl EndpointSliceList var epsl EndpointSliceList
if err := json.Unmarshal(data, &epsl); err != nil { d := json.NewDecoder(r)
return nil, epsl.Metadata, fmt.Errorf("cannot unmarshal EndpointSliceList from %q: %w", data, err) if err := d.Decode(&epsl); err != nil {
return nil, epsl.Metadata, fmt.Errorf("cannot unmarshal EndpointSliceList: %w", err)
} }
objectsByKey := make(map[string]object) objectsByKey := make(map[string]object)
for _, eps := range epsl.Items { for _, eps := range epsl.Items {
@ -35,16 +37,16 @@ func parseEndpointSlice(data []byte) (object, error) {
// getTargetLabels returns labels for eps. // getTargetLabels returns labels for eps.
// //
// See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#endpointslices // See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#endpointslices
func (eps *EndpointSlice) getTargetLabels(aw *apiWatcher) []map[string]string { func (eps *EndpointSlice) getTargetLabels(gw *groupWatcher) []map[string]string {
var svc *Service var svc *Service
if o := aw.getObjectByRole("service", eps.Metadata.Namespace, eps.Metadata.Name); o != nil { if o := gw.getObjectByRole("service", eps.Metadata.Namespace, eps.Metadata.Name); o != nil {
svc = o.(*Service) svc = o.(*Service)
} }
podPortsSeen := make(map[*Pod][]int) podPortsSeen := make(map[*Pod][]int)
var ms []map[string]string var ms []map[string]string
for _, ess := range eps.Endpoints { for _, ess := range eps.Endpoints {
var p *Pod var p *Pod
if o := aw.getObjectByRole("pod", ess.TargetRef.Namespace, ess.TargetRef.Name); o != nil { if o := gw.getObjectByRole("pod", ess.TargetRef.Namespace, ess.TargetRef.Name); o != nil {
p = o.(*Pod) p = o.(*Pod)
} }
for _, epp := range eps.Ports { for _, epp := range eps.Ports {

View file

@ -1,6 +1,7 @@
package kubernetes package kubernetes
import ( import (
"bytes"
"testing" "testing"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal" "github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
@ -9,7 +10,8 @@ import (
func TestParseEndpointSliceListFail(t *testing.T) { func TestParseEndpointSliceListFail(t *testing.T) {
f := func(data string) { f := func(data string) {
objectsByKey, _, err := parseEndpointSliceList([]byte(data)) r := bytes.NewBufferString(data)
objectsByKey, _, err := parseEndpointSliceList(r)
if err == nil { if err == nil {
t.Errorf("unexpected result, test must fail! data: %s", data) t.Errorf("unexpected result, test must fail! data: %s", data)
} }
@ -175,7 +177,8 @@ func TestParseEndpointSliceListSuccess(t *testing.T) {
} }
] ]
}` }`
objectsByKey, meta, err := parseEndpointSliceList([]byte(data)) r := bytes.NewBufferString(data)
objectsByKey, meta, err := parseEndpointSliceList(r)
if err != nil { if err != nil {
t.Errorf("cannot parse data for EndpointSliceList: %v", err) t.Errorf("cannot parse data for EndpointSliceList: %v", err)
return return

View file

@ -3,16 +3,18 @@ package kubernetes
import ( import (
"encoding/json" "encoding/json"
"fmt" "fmt"
"io"
) )
func (ig *Ingress) key() string { func (ig *Ingress) key() string {
return ig.Metadata.key() return ig.Metadata.key()
} }
func parseIngressList(data []byte) (map[string]object, ListMeta, error) { func parseIngressList(r io.Reader) (map[string]object, ListMeta, error) {
var igl IngressList var igl IngressList
if err := json.Unmarshal(data, &igl); err != nil { d := json.NewDecoder(r)
return nil, igl.Metadata, fmt.Errorf("cannot unmarshal IngressList from %q: %w", data, err) if err := d.Decode(&igl); err != nil {
return nil, igl.Metadata, fmt.Errorf("cannot unmarshal IngressList: %w", err)
} }
objectsByKey := make(map[string]object) objectsByKey := make(map[string]object)
for _, ig := range igl.Items { for _, ig := range igl.Items {
@ -85,7 +87,7 @@ type HTTPIngressPath struct {
// getTargetLabels returns labels for ig. // getTargetLabels returns labels for ig.
// //
// See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ingress // See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ingress
func (ig *Ingress) getTargetLabels(aw *apiWatcher) []map[string]string { func (ig *Ingress) getTargetLabels(gw *groupWatcher) []map[string]string {
tlsHosts := make(map[string]bool) tlsHosts := make(map[string]bool)
for _, tls := range ig.Spec.TLS { for _, tls := range ig.Spec.TLS {
for _, host := range tls.Hosts { for _, host := range tls.Hosts {

View file

@ -1,6 +1,7 @@
package kubernetes package kubernetes
import ( import (
"bytes"
"testing" "testing"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal" "github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
@ -10,7 +11,8 @@ import (
func TestParseIngressListFailure(t *testing.T) { func TestParseIngressListFailure(t *testing.T) {
f := func(s string) { f := func(s string) {
t.Helper() t.Helper()
objectsByKey, _, err := parseIngressList([]byte(s)) r := bytes.NewBufferString(s)
objectsByKey, _, err := parseIngressList(r)
if err == nil { if err == nil {
t.Fatalf("expecting non-nil error") t.Fatalf("expecting non-nil error")
} }
@ -70,7 +72,8 @@ func TestParseIngressListSuccess(t *testing.T) {
} }
] ]
}` }`
objectsByKey, meta, err := parseIngressList([]byte(data)) r := bytes.NewBufferString(data)
objectsByKey, meta, err := parseIngressList(r)
if err != nil { if err != nil {
t.Fatalf("unexpected error: %s", err) t.Fatalf("unexpected error: %s", err)
} }

View file

@ -48,12 +48,7 @@ func (sdc *SDConfig) GetScrapeWorkObjects(baseDir string, swcFunc ScrapeWorkCons
if err != nil { if err != nil {
return nil, fmt.Errorf("cannot create API config: %w", err) return nil, fmt.Errorf("cannot create API config: %w", err)
} }
switch sdc.Role { return cfg.aw.getScrapeWorkObjects(), nil
case "node", "pod", "service", "endpoints", "endpointslices", "ingress":
return cfg.aw.getScrapeWorkObjectsForRole(sdc.Role), nil
default:
return nil, fmt.Errorf("unexpected `role`: %q; must be one of `node`, `pod`, `service`, `endpoints`, `endpointslices` or `ingress`; skipping it", sdc.Role)
}
} }
// MustStop stops further usage for sdc. // MustStop stops further usage for sdc.

View file

@ -3,6 +3,7 @@ package kubernetes
import ( import (
"encoding/json" "encoding/json"
"fmt" "fmt"
"io"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape/discoveryutils" "github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape/discoveryutils"
) )
@ -12,10 +13,11 @@ func (n *Node) key() string {
return n.Metadata.key() return n.Metadata.key()
} }
func parseNodeList(data []byte) (map[string]object, ListMeta, error) { func parseNodeList(r io.Reader) (map[string]object, ListMeta, error) {
var nl NodeList var nl NodeList
if err := json.Unmarshal(data, &nl); err != nil { d := json.NewDecoder(r)
return nil, nl.Metadata, fmt.Errorf("cannot unmarshal NodeList from %q: %w", data, err) if err := d.Decode(&nl); err != nil {
return nil, nl.Metadata, fmt.Errorf("cannot unmarshal NodeList: %w", err)
} }
objectsByKey := make(map[string]object) objectsByKey := make(map[string]object)
for _, n := range nl.Items { for _, n := range nl.Items {
@ -74,7 +76,7 @@ type NodeDaemonEndpoints struct {
// getTargetLabels returs labels for the given n. // getTargetLabels returs labels for the given n.
// //
// See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#node // See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#node
func (n *Node) getTargetLabels(aw *apiWatcher) []map[string]string { func (n *Node) getTargetLabels(gw *groupWatcher) []map[string]string {
addr := getNodeAddr(n.Status.Addresses) addr := getNodeAddr(n.Status.Addresses)
if len(addr) == 0 { if len(addr) == 0 {
// Skip node without address // Skip node without address

View file

@ -1,6 +1,7 @@
package kubernetes package kubernetes
import ( import (
"bytes"
"reflect" "reflect"
"sort" "sort"
"strconv" "strconv"
@ -13,7 +14,8 @@ import (
func TestParseNodeListFailure(t *testing.T) { func TestParseNodeListFailure(t *testing.T) {
f := func(s string) { f := func(s string) {
t.Helper() t.Helper()
objectsByKey, _, err := parseNodeList([]byte(s)) r := bytes.NewBufferString(s)
objectsByKey, _, err := parseNodeList(r)
if err == nil { if err == nil {
t.Fatalf("expecting non-nil error") t.Fatalf("expecting non-nil error")
} }
@ -229,7 +231,8 @@ func TestParseNodeListSuccess(t *testing.T) {
] ]
} }
` `
objectsByKey, meta, err := parseNodeList([]byte(data)) r := bytes.NewBufferString(data)
objectsByKey, meta, err := parseNodeList(r)
if err != nil { if err != nil {
t.Fatalf("unexpected error: %s", err) t.Fatalf("unexpected error: %s", err)
} }

View file

@ -3,6 +3,7 @@ package kubernetes
import ( import (
"encoding/json" "encoding/json"
"fmt" "fmt"
"io"
"strconv" "strconv"
"strings" "strings"
@ -13,10 +14,11 @@ func (p *Pod) key() string {
return p.Metadata.key() return p.Metadata.key()
} }
func parsePodList(data []byte) (map[string]object, ListMeta, error) { func parsePodList(r io.Reader) (map[string]object, ListMeta, error) {
var pl PodList var pl PodList
if err := json.Unmarshal(data, &pl); err != nil { d := json.NewDecoder(r)
return nil, pl.Metadata, fmt.Errorf("cannot unmarshal PodList from %q: %w", data, err) if err := d.Decode(&pl); err != nil {
return nil, pl.Metadata, fmt.Errorf("cannot unmarshal PodList: %w", err)
} }
objectsByKey := make(map[string]object) objectsByKey := make(map[string]object)
for _, p := range pl.Items { for _, p := range pl.Items {
@ -95,7 +97,7 @@ type PodCondition struct {
// getTargetLabels returns labels for each port of the given p. // getTargetLabels returns labels for each port of the given p.
// //
// See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#pod // See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#pod
func (p *Pod) getTargetLabels(aw *apiWatcher) []map[string]string { func (p *Pod) getTargetLabels(gw *groupWatcher) []map[string]string {
if len(p.Status.PodIP) == 0 { if len(p.Status.PodIP) == 0 {
// Skip pod without IP // Skip pod without IP
return nil return nil

View file

@ -1,6 +1,7 @@
package kubernetes package kubernetes
import ( import (
"bytes"
"testing" "testing"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal" "github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
@ -10,7 +11,8 @@ import (
func TestParsePodListFailure(t *testing.T) { func TestParsePodListFailure(t *testing.T) {
f := func(s string) { f := func(s string) {
t.Helper() t.Helper()
objectsByKey, _, err := parsePodList([]byte(s)) r := bytes.NewBufferString(s)
objectsByKey, _, err := parsePodList(r)
if err == nil { if err == nil {
t.Fatalf("expecting non-nil error") t.Fatalf("expecting non-nil error")
} }
@ -227,7 +229,8 @@ func TestParsePodListSuccess(t *testing.T) {
] ]
} }
` `
objectsByKey, meta, err := parsePodList([]byte(data)) r := bytes.NewBufferString(data)
objectsByKey, meta, err := parsePodList(r)
if err != nil { if err != nil {
t.Fatalf("unexpected error: %s", err) t.Fatalf("unexpected error: %s", err)
} }

View file

@ -3,6 +3,7 @@ package kubernetes
import ( import (
"encoding/json" "encoding/json"
"fmt" "fmt"
"io"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape/discoveryutils" "github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape/discoveryutils"
) )
@ -11,10 +12,11 @@ func (s *Service) key() string {
return s.Metadata.key() return s.Metadata.key()
} }
func parseServiceList(data []byte) (map[string]object, ListMeta, error) { func parseServiceList(r io.Reader) (map[string]object, ListMeta, error) {
var sl ServiceList var sl ServiceList
if err := json.Unmarshal(data, &sl); err != nil { d := json.NewDecoder(r)
return nil, sl.Metadata, fmt.Errorf("cannot unmarshal ServiceList from %q: %w", data, err) if err := d.Decode(&sl); err != nil {
return nil, sl.Metadata, fmt.Errorf("cannot unmarshal ServiceList: %w", err)
} }
objectsByKey := make(map[string]object) objectsByKey := make(map[string]object)
for _, s := range sl.Items { for _, s := range sl.Items {
@ -69,7 +71,7 @@ type ServicePort struct {
// getTargetLabels returns labels for each port of the given s. // getTargetLabels returns labels for each port of the given s.
// //
// See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#service // See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#service
func (s *Service) getTargetLabels(aw *apiWatcher) []map[string]string { func (s *Service) getTargetLabels(gw *groupWatcher) []map[string]string {
host := fmt.Sprintf("%s.%s.svc", s.Metadata.Name, s.Metadata.Namespace) host := fmt.Sprintf("%s.%s.svc", s.Metadata.Name, s.Metadata.Namespace)
var ms []map[string]string var ms []map[string]string
for _, sp := range s.Spec.Ports { for _, sp := range s.Spec.Ports {

View file

@ -1,6 +1,7 @@
package kubernetes package kubernetes
import ( import (
"bytes"
"testing" "testing"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal" "github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
@ -10,7 +11,8 @@ import (
func TestParseServiceListFailure(t *testing.T) { func TestParseServiceListFailure(t *testing.T) {
f := func(s string) { f := func(s string) {
t.Helper() t.Helper()
objectsByKey, _, err := parseServiceList([]byte(s)) r := bytes.NewBufferString(s)
objectsByKey, _, err := parseServiceList(r)
if err == nil { if err == nil {
t.Fatalf("expecting non-nil error") t.Fatalf("expecting non-nil error")
} }
@ -88,7 +90,8 @@ func TestParseServiceListSuccess(t *testing.T) {
] ]
} }
` `
objectsByKey, meta, err := parseServiceList([]byte(data)) r := bytes.NewBufferString(data)
objectsByKey, meta, err := parseServiceList(r)
if err != nil { if err != nil {
t.Fatalf("unexpected error: %s", err) t.Fatalf("unexpected error: %s", err)
} }

View file

@ -66,7 +66,7 @@ func NewClient(apiServer string, ac *promauth.Config, proxyURL proxy.URL) (*Clie
hostPort := string(u.Host()) hostPort := string(u.Host())
isTLS := string(u.Scheme()) == "https" isTLS := string(u.Scheme()) == "https"
if isTLS && ac != nil { if isTLS {
tlsCfg = ac.NewTLSConfig() tlsCfg = ac.NewTLSConfig()
} }
if !strings.Contains(hostPort, ":") { if !strings.Contains(hostPort, ":") {
@ -77,7 +77,7 @@ func NewClient(apiServer string, ac *promauth.Config, proxyURL proxy.URL) (*Clie
hostPort = net.JoinHostPort(hostPort, port) hostPort = net.JoinHostPort(hostPort, port)
} }
if dialFunc == nil { if dialFunc == nil {
dialFunc, err = proxyURL.NewDialFunc(tlsCfg) dialFunc, err = proxyURL.NewDialFunc(ac)
if err != nil { if err != nil {
return nil, err return nil, err
} }

View file

@ -21,29 +21,29 @@ var (
fileSDCheckInterval = flag.Duration("promscrape.fileSDCheckInterval", 30*time.Second, "Interval for checking for changes in 'file_sd_config'. "+ fileSDCheckInterval = flag.Duration("promscrape.fileSDCheckInterval", 30*time.Second, "Interval for checking for changes in 'file_sd_config'. "+
"See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#file_sd_config for details") "See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#file_sd_config for details")
kubernetesSDCheckInterval = flag.Duration("promscrape.kubernetesSDCheckInterval", 30*time.Second, "Interval for checking for changes in Kubernetes API server. "+ kubernetesSDCheckInterval = flag.Duration("promscrape.kubernetesSDCheckInterval", 30*time.Second, "Interval for checking for changes in Kubernetes API server. "+
"This works only if `kubernetes_sd_configs` is configured in '-promscrape.config' file. "+ "This works only if kubernetes_sd_configs is configured in '-promscrape.config' file. "+
"See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config for details") "See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config for details")
openstackSDCheckInterval = flag.Duration("promscrape.openstackSDCheckInterval", 30*time.Second, "Interval for checking for changes in openstack API server. "+ openstackSDCheckInterval = flag.Duration("promscrape.openstackSDCheckInterval", 30*time.Second, "Interval for checking for changes in openstack API server. "+
"This works only if `openstack_sd_configs` is configured in '-promscrape.config' file. "+ "This works only if openstack_sd_configs is configured in '-promscrape.config' file. "+
"See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#openstack_sd_config for details") "See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#openstack_sd_config for details")
eurekaSDCheckInterval = flag.Duration("promscrape.eurekaSDCheckInterval", 30*time.Second, "Interval for checking for changes in eureka. "+ eurekaSDCheckInterval = flag.Duration("promscrape.eurekaSDCheckInterval", 30*time.Second, "Interval for checking for changes in eureka. "+
"This works only if `eureka_sd_configs` is configured in '-promscrape.config' file. "+ "This works only if eureka_sd_configs is configured in '-promscrape.config' file. "+
"See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#eureka_sd_config for details") "See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#eureka_sd_config for details")
dnsSDCheckInterval = flag.Duration("promscrape.dnsSDCheckInterval", 30*time.Second, "Interval for checking for changes in dns. "+ dnsSDCheckInterval = flag.Duration("promscrape.dnsSDCheckInterval", 30*time.Second, "Interval for checking for changes in dns. "+
"This works only if `dns_sd_configs` is configured in '-promscrape.config' file. "+ "This works only if dns_sd_configs is configured in '-promscrape.config' file. "+
"See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config for details") "See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config for details")
ec2SDCheckInterval = flag.Duration("promscrape.ec2SDCheckInterval", time.Minute, "Interval for checking for changes in ec2. "+ ec2SDCheckInterval = flag.Duration("promscrape.ec2SDCheckInterval", time.Minute, "Interval for checking for changes in ec2. "+
"This works only if `ec2_sd_configs` is configured in '-promscrape.config' file. "+ "This works only if ec2_sd_configs is configured in '-promscrape.config' file. "+
"See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ec2_sd_config for details") "See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ec2_sd_config for details")
gceSDCheckInterval = flag.Duration("promscrape.gceSDCheckInterval", time.Minute, "Interval for checking for changes in gce. "+ gceSDCheckInterval = flag.Duration("promscrape.gceSDCheckInterval", time.Minute, "Interval for checking for changes in gce. "+
"This works only if `gce_sd_configs` is configured in '-promscrape.config' file. "+ "This works only if gce_sd_configs is configured in '-promscrape.config' file. "+
"See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#gce_sd_config for details") "See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#gce_sd_config for details")
dockerswarmSDCheckInterval = flag.Duration("promscrape.dockerswarmSDCheckInterval", 30*time.Second, "Interval for checking for changes in dockerswarm. "+ dockerswarmSDCheckInterval = flag.Duration("promscrape.dockerswarmSDCheckInterval", 30*time.Second, "Interval for checking for changes in dockerswarm. "+
"This works only if `dockerswarm_sd_configs` is configured in '-promscrape.config' file. "+ "This works only if dockerswarm_sd_configs is configured in '-promscrape.config' file. "+
"See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dockerswarm_sd_config for details") "See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dockerswarm_sd_config for details")
promscrapeConfigFile = flag.String("promscrape.config", "", "Optional path to Prometheus config file with 'scrape_configs' section containing targets to scrape. "+ promscrapeConfigFile = flag.String("promscrape.config", "", "Optional path to Prometheus config file with 'scrape_configs' section containing targets to scrape. "+
"See https://victoriametrics.github.io/#how-to-scrape-prometheus-exporters-such-as-node-exporter for details") "See https://victoriametrics.github.io/#how-to-scrape-prometheus-exporters-such-as-node-exporter for details")
suppressDuplicateScrapeTargetErrors = flag.Bool("promscrape.suppressDuplicateScrapeTargetErrors", false, "Whether to suppress `duplicate scrape target` errors; "+ suppressDuplicateScrapeTargetErrors = flag.Bool("promscrape.suppressDuplicateScrapeTargetErrors", false, "Whether to suppress 'duplicate scrape target' errors; "+
"see https://victoriametrics.github.io/vmagent.html#troubleshooting for details") "see https://victoriametrics.github.io/vmagent.html#troubleshooting for details")
) )
@ -231,11 +231,17 @@ func (scfg *scrapeConfig) run() {
cfg := <-scfg.cfgCh cfg := <-scfg.cfgCh
var swsPrev []*ScrapeWork var swsPrev []*ScrapeWork
updateScrapeWork := func(cfg *Config) { updateScrapeWork := func(cfg *Config) {
startTime := time.Now() for {
sws := scfg.getScrapeWork(cfg, swsPrev) startTime := time.Now()
sg.update(sws) sws := scfg.getScrapeWork(cfg, swsPrev)
swsPrev = sws retry := sg.update(sws)
scfg.discoveryDuration.UpdateDuration(startTime) swsPrev = sws
scfg.discoveryDuration.UpdateDuration(startTime)
if !retry {
return
}
time.Sleep(2 * time.Second)
}
} }
updateScrapeWork(cfg) updateScrapeWork(cfg)
atomic.AddInt32(&PendingScrapeConfigs, -1) atomic.AddInt32(&PendingScrapeConfigs, -1)
@ -295,7 +301,7 @@ func (sg *scraperGroup) stop() {
sg.wg.Wait() sg.wg.Wait()
} }
func (sg *scraperGroup) update(sws []*ScrapeWork) { func (sg *scraperGroup) update(sws []*ScrapeWork) (retry bool) {
sg.mLock.Lock() sg.mLock.Lock()
defer sg.mLock.Unlock() defer sg.mLock.Unlock()
@ -352,6 +358,7 @@ func (sg *scraperGroup) update(sws []*ScrapeWork) {
sg.changesCount.Add(additionsCount + deletionsCount) sg.changesCount.Add(additionsCount + deletionsCount)
logger.Infof("%s: added targets: %d, removed targets: %d; total targets: %d", sg.name, additionsCount, deletionsCount, len(sg.m)) logger.Infof("%s: added targets: %d, removed targets: %d; total targets: %d", sg.name, additionsCount, deletionsCount, len(sg.m))
} }
return deletionsCount > 0 && len(sg.m) == 0
} }
type scraper struct { type scraper struct {

View file

@ -68,12 +68,15 @@ type ScrapeWork struct {
// See also https://prometheus.io/docs/concepts/jobs_instances/ // See also https://prometheus.io/docs/concepts/jobs_instances/
Labels []prompbmarshal.Label Labels []prompbmarshal.Label
// Auth config
AuthConfig *promauth.Config
// ProxyURL HTTP proxy url // ProxyURL HTTP proxy url
ProxyURL proxy.URL ProxyURL proxy.URL
// Auth config for ProxyUR:
ProxyAuthConfig *promauth.Config
// Auth config
AuthConfig *promauth.Config
// Optional `metric_relabel_configs`. // Optional `metric_relabel_configs`.
MetricRelabelConfigs *promrelabel.ParsedConfigs MetricRelabelConfigs *promrelabel.ParsedConfigs
@ -105,9 +108,10 @@ type ScrapeWork struct {
func (sw *ScrapeWork) key() string { func (sw *ScrapeWork) key() string {
// Do not take into account OriginalLabels. // Do not take into account OriginalLabels.
key := fmt.Sprintf("ScrapeURL=%s, ScrapeInterval=%s, ScrapeTimeout=%s, HonorLabels=%v, HonorTimestamps=%v, Labels=%s, "+ key := fmt.Sprintf("ScrapeURL=%s, ScrapeInterval=%s, ScrapeTimeout=%s, HonorLabels=%v, HonorTimestamps=%v, Labels=%s, "+
"AuthConfig=%s, MetricRelabelConfigs=%s, SampleLimit=%d, DisableCompression=%v, DisableKeepAlive=%v, StreamParse=%v, "+ "ProxyURL=%s, ProxyAuthConfig=%s, AuthConfig=%s, MetricRelabelConfigs=%s, SampleLimit=%d, DisableCompression=%v, DisableKeepAlive=%v, StreamParse=%v, "+
"ScrapeAlignInterval=%s, ScrapeOffset=%s", "ScrapeAlignInterval=%s, ScrapeOffset=%s",
sw.ScrapeURL, sw.ScrapeInterval, sw.ScrapeTimeout, sw.HonorLabels, sw.HonorTimestamps, sw.LabelsString(), sw.ScrapeURL, sw.ScrapeInterval, sw.ScrapeTimeout, sw.HonorLabels, sw.HonorTimestamps, sw.LabelsString(),
sw.ProxyURL.String(), sw.ProxyAuthConfig.String(),
sw.AuthConfig.String(), sw.MetricRelabelConfigs.String(), sw.SampleLimit, sw.DisableCompression, sw.DisableKeepAlive, sw.StreamParse, sw.AuthConfig.String(), sw.MetricRelabelConfigs.String(), sw.SampleLimit, sw.DisableCompression, sw.DisableKeepAlive, sw.StreamParse,
sw.ScrapeAlignInterval, sw.ScrapeOffset) sw.ScrapeAlignInterval, sw.ScrapeOffset)
return key return key
@ -173,9 +177,9 @@ type scrapeWork struct {
// It is used as a hint in order to reduce memory usage for body buffers. // It is used as a hint in order to reduce memory usage for body buffers.
prevBodyLen int prevBodyLen int
// prevRowsLen contains the number rows scraped during the previous scrape. // prevLabelsLen contains the number labels scraped during the previous scrape.
// It is used as a hint in order to reduce memory usage when parsing scrape responses. // It is used as a hint in order to reduce memory usage when parsing scrape responses.
prevRowsLen int prevLabelsLen int
} }
func (sw *scrapeWork) run(stopCh <-chan struct{}) { func (sw *scrapeWork) run(stopCh <-chan struct{}) {
@ -279,7 +283,7 @@ func (sw *scrapeWork) scrapeInternal(scrapeTimestamp, realTimestamp int64) error
scrapeDuration.Update(duration) scrapeDuration.Update(duration)
scrapeResponseSize.Update(float64(len(body.B))) scrapeResponseSize.Update(float64(len(body.B)))
up := 1 up := 1
wc := writeRequestCtxPool.Get(sw.prevRowsLen) wc := writeRequestCtxPool.Get(sw.prevLabelsLen)
if err != nil { if err != nil {
up = 0 up = 0
scrapesFailed.Inc() scrapesFailed.Inc()
@ -290,27 +294,15 @@ func (sw *scrapeWork) scrapeInternal(scrapeTimestamp, realTimestamp int64) error
srcRows := wc.rows.Rows srcRows := wc.rows.Rows
samplesScraped := len(srcRows) samplesScraped := len(srcRows)
scrapedSamples.Update(float64(samplesScraped)) scrapedSamples.Update(float64(samplesScraped))
if sw.Config.SampleLimit > 0 && samplesScraped > sw.Config.SampleLimit { for i := range srcRows {
srcRows = srcRows[:0] sw.addRowToTimeseries(wc, &srcRows[i], scrapeTimestamp, true)
}
samplesPostRelabeling := len(wc.writeRequest.Timeseries)
if sw.Config.SampleLimit > 0 && samplesPostRelabeling > sw.Config.SampleLimit {
wc.resetNoRows()
up = 0 up = 0
scrapesSkippedBySampleLimit.Inc() scrapesSkippedBySampleLimit.Inc()
} }
samplesPostRelabeling := 0
for i := range srcRows {
sw.addRowToTimeseries(wc, &srcRows[i], scrapeTimestamp, true)
if len(wc.labels) > 40000 {
// Limit the maximum size of wc.writeRequest.
// This should reduce memory usage when scraping targets with millions of metrics and/or labels.
// For example, when scraping /federate handler from Prometheus - see https://prometheus.io/docs/prometheus/latest/federation/
samplesPostRelabeling += len(wc.writeRequest.Timeseries)
sw.updateSeriesAdded(wc)
startTime := time.Now()
sw.PushData(&wc.writeRequest)
pushDataDuration.UpdateDuration(startTime)
wc.resetNoRows()
}
}
samplesPostRelabeling += len(wc.writeRequest.Timeseries)
sw.updateSeriesAdded(wc) sw.updateSeriesAdded(wc)
seriesAdded := sw.finalizeSeriesAdded(samplesPostRelabeling) seriesAdded := sw.finalizeSeriesAdded(samplesPostRelabeling)
sw.addAutoTimeseries(wc, "up", float64(up), scrapeTimestamp) sw.addAutoTimeseries(wc, "up", float64(up), scrapeTimestamp)
@ -321,7 +313,7 @@ func (sw *scrapeWork) scrapeInternal(scrapeTimestamp, realTimestamp int64) error
startTime := time.Now() startTime := time.Now()
sw.PushData(&wc.writeRequest) sw.PushData(&wc.writeRequest)
pushDataDuration.UpdateDuration(startTime) pushDataDuration.UpdateDuration(startTime)
sw.prevRowsLen = samplesScraped sw.prevLabelsLen = len(wc.labels)
wc.reset() wc.reset()
writeRequestCtxPool.Put(wc) writeRequestCtxPool.Put(wc)
// body must be released only after wc is released, since wc refers to body. // body must be released only after wc is released, since wc refers to body.
@ -335,7 +327,7 @@ func (sw *scrapeWork) scrapeStream(scrapeTimestamp, realTimestamp int64) error {
samplesScraped := 0 samplesScraped := 0
samplesPostRelabeling := 0 samplesPostRelabeling := 0
responseSize := int64(0) responseSize := int64(0)
wc := writeRequestCtxPool.Get(sw.prevRowsLen) wc := writeRequestCtxPool.Get(sw.prevLabelsLen)
sr, err := sw.GetStreamReader() sr, err := sw.GetStreamReader()
if err != nil { if err != nil {
@ -385,7 +377,7 @@ func (sw *scrapeWork) scrapeStream(scrapeTimestamp, realTimestamp int64) error {
startTime := time.Now() startTime := time.Now()
sw.PushData(&wc.writeRequest) sw.PushData(&wc.writeRequest)
pushDataDuration.UpdateDuration(startTime) pushDataDuration.UpdateDuration(startTime)
sw.prevRowsLen = len(wc.rows.Rows) sw.prevLabelsLen = len(wc.labels)
wc.reset() wc.reset()
writeRequestCtxPool.Put(wc) writeRequestCtxPool.Put(wc)
tsmGlobal.Update(sw.Config, sw.ScrapeGroup, up == 1, realTimestamp, int64(duration*1000), err) tsmGlobal.Update(sw.Config, sw.ScrapeGroup, up == 1, realTimestamp, int64(duration*1000), err)
@ -397,11 +389,11 @@ func (sw *scrapeWork) scrapeStream(scrapeTimestamp, realTimestamp int64) error {
// //
// Its logic has been copied from leveledbytebufferpool. // Its logic has been copied from leveledbytebufferpool.
type leveledWriteRequestCtxPool struct { type leveledWriteRequestCtxPool struct {
pools [30]sync.Pool pools [13]sync.Pool
} }
func (lwp *leveledWriteRequestCtxPool) Get(rowsCapacity int) *writeRequestCtx { func (lwp *leveledWriteRequestCtxPool) Get(labelsCapacity int) *writeRequestCtx {
id, capacityNeeded := lwp.getPoolIDAndCapacity(rowsCapacity) id, capacityNeeded := lwp.getPoolIDAndCapacity(labelsCapacity)
for i := 0; i < 2; i++ { for i := 0; i < 2; i++ {
if id < 0 || id >= len(lwp.pools) { if id < 0 || id >= len(lwp.pools) {
break break
@ -417,10 +409,12 @@ func (lwp *leveledWriteRequestCtxPool) Get(rowsCapacity int) *writeRequestCtx {
} }
func (lwp *leveledWriteRequestCtxPool) Put(wc *writeRequestCtx) { func (lwp *leveledWriteRequestCtxPool) Put(wc *writeRequestCtx) {
capacity := cap(wc.rows.Rows) capacity := cap(wc.labels)
id, _ := lwp.getPoolIDAndCapacity(capacity) id, poolCapacity := lwp.getPoolIDAndCapacity(capacity)
wc.reset() if capacity <= poolCapacity {
lwp.pools[id].Put(wc) wc.reset()
lwp.pools[id].Put(wc)
}
} }
func (lwp *leveledWriteRequestCtxPool) getPoolIDAndCapacity(size int) (int, int) { func (lwp *leveledWriteRequestCtxPool) getPoolIDAndCapacity(size int) (int, int) {
@ -430,7 +424,7 @@ func (lwp *leveledWriteRequestCtxPool) getPoolIDAndCapacity(size int) (int, int)
} }
size >>= 3 size >>= 3
id := bits.Len(uint(size)) id := bits.Len(uint(size))
if id > len(lwp.pools) { if id >= len(lwp.pools) {
id = len(lwp.pools) - 1 id = len(lwp.pools) - 1
} }
return id, (1 << (id + 3)) return id, (1 << (id + 3))

View file

@ -332,7 +332,7 @@ func TestScrapeWorkScrapeInternalSuccess(t *testing.T) {
up 0 123 up 0 123
scrape_samples_scraped 2 123 scrape_samples_scraped 2 123
scrape_duration_seconds 0 123 scrape_duration_seconds 0 123
scrape_samples_post_metric_relabeling 0 123 scrape_samples_post_metric_relabeling 2 123
scrape_series_added 0 123 scrape_series_added 0 123
`) `)
} }

View file

@ -2,7 +2,6 @@ package promscrape
import ( import (
"context" "context"
"crypto/tls"
"fmt" "fmt"
"net" "net"
"sync" "sync"
@ -10,6 +9,7 @@ import (
"time" "time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/netutil" "github.com/VictoriaMetrics/VictoriaMetrics/lib/netutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/proxy" "github.com/VictoriaMetrics/VictoriaMetrics/lib/proxy"
"github.com/VictoriaMetrics/fasthttp" "github.com/VictoriaMetrics/fasthttp"
"github.com/VictoriaMetrics/metrics" "github.com/VictoriaMetrics/metrics"
@ -49,8 +49,8 @@ var (
stdDialerOnce sync.Once stdDialerOnce sync.Once
) )
func newStatDialFunc(proxyURL proxy.URL, tlsConfig *tls.Config) (fasthttp.DialFunc, error) { func newStatDialFunc(proxyURL proxy.URL, ac *promauth.Config) (fasthttp.DialFunc, error) {
dialFunc, err := proxyURL.NewDialFunc(tlsConfig) dialFunc, err := proxyURL.NewDialFunc(ac)
if err != nil { if err != nil {
return nil, err return nil, err
} }

View file

@ -18,7 +18,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel" "github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
) )
var maxDroppedTargets = flag.Int("promscrape.maxDroppedTargets", 1000, "The maximum number of `droppedTargets` shown at /api/v1/targets page. "+ var maxDroppedTargets = flag.Int("promscrape.maxDroppedTargets", 1000, "The maximum number of droppedTargets to show at /api/v1/targets page. "+
"Increase this value if your setup drops more scrape targets during relabeling and you need investigating labels for all the dropped targets. "+ "Increase this value if your setup drops more scrape targets during relabeling and you need investigating labels for all the dropped targets. "+
"Note that the increased number of tracked dropped targets may result in increased memory usage") "Note that the increased number of tracked dropped targets may result in increased memory usage")

View file

@ -15,7 +15,7 @@ import (
) )
var maxLineLen = flagutil.NewBytes("import.maxLineLen", 100*1024*1024, "The maximum length in bytes of a single line accepted by /api/v1/import; "+ var maxLineLen = flagutil.NewBytes("import.maxLineLen", 100*1024*1024, "The maximum length in bytes of a single line accepted by /api/v1/import; "+
"the line length can be limited with `max_rows_per_line` query arg passed to /api/v1/export") "the line length can be limited with 'max_rows_per_line' query arg passed to /api/v1/export")
// ParseStream parses /api/v1/import lines from req and calls callback for the parsed rows. // ParseStream parses /api/v1/import lines from req and calls callback for the parsed rows.
// //

View file

@ -7,9 +7,12 @@ import (
"fmt" "fmt"
"net" "net"
"net/url" "net/url"
"strings"
"time" "time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/netutil" "github.com/VictoriaMetrics/VictoriaMetrics/lib/netutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
"github.com/VictoriaMetrics/fasthttp" "github.com/VictoriaMetrics/fasthttp"
) )
@ -18,6 +21,17 @@ type URL struct {
url *url.URL url *url.URL
} }
// MustNewURL returns new URL for the given u.
func MustNewURL(u string) URL {
pu, err := url.Parse(u)
if err != nil {
logger.Panicf("BUG: cannot parse u=%q: %s", u, err)
}
return URL{
url: pu,
}
}
// URL return the underlying url. // URL return the underlying url.
func (u *URL) URL() *url.URL { func (u *URL) URL() *url.URL {
if u == nil || u.url == nil { if u == nil || u.url == nil {
@ -26,6 +40,15 @@ func (u *URL) URL() *url.URL {
return u.url return u.url
} }
// String returns string representation of u.
func (u *URL) String() string {
pu := u.URL()
if pu == nil {
return ""
}
return pu.String()
}
// MarshalYAML implements yaml.Marshaler interface. // MarshalYAML implements yaml.Marshaler interface.
func (u *URL) MarshalYAML() (interface{}, error) { func (u *URL) MarshalYAML() (interface{}, error) {
if u.url == nil { if u.url == nil {
@ -48,38 +71,72 @@ func (u *URL) UnmarshalYAML(unmarshal func(interface{}) error) error {
return nil return nil
} }
// NewDialFunc returns dial func for the given pu and tlsConfig. // NewDialFunc returns dial func for the given u and ac.
func (u *URL) NewDialFunc(tlsConfig *tls.Config) (fasthttp.DialFunc, error) { func (u *URL) NewDialFunc(ac *promauth.Config) (fasthttp.DialFunc, error) {
if u == nil || u.url == nil { if u == nil || u.url == nil {
return defaultDialFunc, nil return defaultDialFunc, nil
} }
pu := u.url pu := u.url
if pu.Scheme != "http" && pu.Scheme != "https" { if pu.Scheme != "http" && pu.Scheme != "https" {
return nil, fmt.Errorf("unknown scheme=%q for proxy_url=%q, must be http or https", pu.Scheme, pu) return nil, fmt.Errorf("unknown scheme=%q for proxy_url=%q, must be http or https", pu.Scheme, pu.Redacted())
} }
isTLS := pu.Scheme == "https"
proxyAddr := addMissingPort(pu.Host, isTLS)
var authHeader string var authHeader string
if ac != nil {
authHeader = ac.Authorization
}
if pu.User != nil && len(pu.User.Username()) > 0 { if pu.User != nil && len(pu.User.Username()) > 0 {
userPasswordEncoded := base64.StdEncoding.EncodeToString([]byte(pu.User.String())) userPasswordEncoded := base64.StdEncoding.EncodeToString([]byte(pu.User.String()))
authHeader = "Proxy-Authorization: Basic " + userPasswordEncoded + "\r\n" authHeader = "Basic " + userPasswordEncoded
}
if authHeader != "" {
authHeader = "Proxy-Authorization: " + authHeader + "\r\n"
}
var tlsCfg *tls.Config
if isTLS {
tlsCfg = ac.NewTLSConfig()
if !tlsCfg.InsecureSkipVerify && tlsCfg.ServerName == "" {
tlsCfg.ServerName = tlsServerName(proxyAddr)
}
} }
dialFunc := func(addr string) (net.Conn, error) { dialFunc := func(addr string) (net.Conn, error) {
proxyConn, err := defaultDialFunc(pu.Host) proxyConn, err := defaultDialFunc(proxyAddr)
if err != nil { if err != nil {
return nil, fmt.Errorf("cannot connect to proxy %q: %w", pu, err) return nil, fmt.Errorf("cannot connect to proxy %q: %w", pu.Redacted(), err)
} }
if pu.Scheme == "https" { if isTLS {
proxyConn = tls.Client(proxyConn, tlsConfig) proxyConn = tls.Client(proxyConn, tlsCfg)
} }
conn, err := sendConnectRequest(proxyConn, addr, authHeader) conn, err := sendConnectRequest(proxyConn, proxyAddr, addr, authHeader)
if err != nil { if err != nil {
_ = proxyConn.Close() _ = proxyConn.Close()
return nil, fmt.Errorf("error when sending CONNECT request to proxy %q: %w", pu, err) return nil, fmt.Errorf("error when sending CONNECT request to proxy %q: %w", pu.Redacted(), err)
} }
return conn, nil return conn, nil
} }
return dialFunc, nil return dialFunc, nil
} }
func addMissingPort(addr string, isTLS bool) string {
if strings.IndexByte(addr, ':') >= 0 {
return addr
}
port := "80"
if isTLS {
port = "443"
}
return addr + ":" + port
}
func tlsServerName(addr string) string {
host, _, err := net.SplitHostPort(addr)
if err != nil {
return addr
}
return host
}
func defaultDialFunc(addr string) (net.Conn, error) { func defaultDialFunc(addr string) (net.Conn, error) {
network := "tcp4" network := "tcp4"
if netutil.TCP6Enabled() { if netutil.TCP6Enabled() {
@ -90,8 +147,8 @@ func defaultDialFunc(addr string) (net.Conn, error) {
} }
// sendConnectRequest sends CONNECT request to proxyConn for the given addr and authHeader and returns the established connection to dstAddr. // sendConnectRequest sends CONNECT request to proxyConn for the given addr and authHeader and returns the established connection to dstAddr.
func sendConnectRequest(proxyConn net.Conn, dstAddr, authHeader string) (net.Conn, error) { func sendConnectRequest(proxyConn net.Conn, proxyAddr, dstAddr, authHeader string) (net.Conn, error) {
req := "CONNECT " + dstAddr + " HTTP/1.1\r\nHost: " + dstAddr + "\r\n" + authHeader + "\r\n" req := "CONNECT " + dstAddr + " HTTP/1.1\r\nHost: " + proxyAddr + "\r\n" + authHeader + "\r\n"
if _, err := proxyConn.Write([]byte(req)); err != nil { if _, err := proxyConn.Write([]byte(req)); err != nil {
return nil, fmt.Errorf("cannot send CONNECT request for dstAddr=%q: %w", dstAddr, err) return nil, fmt.Errorf("cannot send CONNECT request for dstAddr=%q: %w", dstAddr, err)
} }

View file

@ -577,9 +577,21 @@ func (db *indexDB) createTSIDByName(dst *TSID, metricName []byte) error {
// on db.tb flush via invalidateTagCache flushCallback passed to OpenTable. // on db.tb flush via invalidateTagCache flushCallback passed to OpenTable.
atomic.AddUint64(&db.newTimeseriesCreated, 1) atomic.AddUint64(&db.newTimeseriesCreated, 1)
if logNewSeries {
logger.Infof("new series created: %s", mn.String())
}
return nil return nil
} }
// SetLogNewSeries updates new series logging.
//
// This function must be called before any calling any storage functions.
func SetLogNewSeries(ok bool) {
logNewSeries = ok
}
var logNewSeries = false
func (db *indexDB) generateTSID(dst *TSID, metricName []byte, mn *MetricName) error { func (db *indexDB) generateTSID(dst *TSID, metricName []byte, mn *MetricName) error {
// Search the TSID in the external storage. // Search the TSID in the external storage.
// This is usually the db from the previous period. // This is usually the db from the previous period.
@ -2048,15 +2060,6 @@ func (is *indexSearch) getTagFilterWithMinMetricIDsCount(tfs *TagFilters, maxMet
metricIDs, _, err := is.getMetricIDsForTagFilter(tf, nil, maxMetrics) metricIDs, _, err := is.getMetricIDsForTagFilter(tf, nil, maxMetrics)
if err != nil { if err != nil {
if err == errFallbackToMetricNameMatch {
// Skip tag filters requiring to scan for too many metrics.
kb.B = append(kb.B[:0], uselessSingleTagFilterKeyPrefix)
kb.B = encoding.MarshalUint64(kb.B, uint64(maxMetrics))
kb.B = tf.Marshal(kb.B)
is.db.uselessTagFiltersCache.Set(kb.B, uselessTagFilterCacheValue)
uselessTagFilters++
continue
}
return nil, nil, fmt.Errorf("cannot find MetricIDs for tagFilter %s: %w", tf, err) return nil, nil, fmt.Errorf("cannot find MetricIDs for tagFilter %s: %w", tf, err)
} }
if metricIDs.Len() >= maxMetrics { if metricIDs.Len() >= maxMetrics {
@ -2306,7 +2309,7 @@ func (is *indexSearch) updateMetricIDsForTagFilters(metricIDs *uint64set.Set, tf
// Fast path: found metricIDs by date range. // Fast path: found metricIDs by date range.
return nil return nil
} }
if err != errFallbackToMetricNameMatch { if err != errFallbackToGlobalSearch {
return err return err
} }
@ -2330,12 +2333,6 @@ func (is *indexSearch) updateMetricIDsForTagFilters(metricIDs *uint64set.Set, tf
continue continue
} }
mIDs, err := is.intersectMetricIDsWithTagFilter(tf, minMetricIDs) mIDs, err := is.intersectMetricIDsWithTagFilter(tf, minMetricIDs)
if err == errFallbackToMetricNameMatch {
// The tag filter requires too many index scans. Postpone it,
// so tag filters with lower number of index scans may be applied.
tfsPostponed = append(tfsPostponed, tf)
continue
}
if err != nil { if err != nil {
return err return err
} }
@ -2345,11 +2342,8 @@ func (is *indexSearch) updateMetricIDsForTagFilters(metricIDs *uint64set.Set, tf
if len(tfsPostponed) > 0 && successfulIntersects == 0 { if len(tfsPostponed) > 0 && successfulIntersects == 0 {
return is.updateMetricIDsByMetricNameMatch(metricIDs, minMetricIDs, tfsPostponed) return is.updateMetricIDsByMetricNameMatch(metricIDs, minMetricIDs, tfsPostponed)
} }
for i, tf := range tfsPostponed { for _, tf := range tfsPostponed {
mIDs, err := is.intersectMetricIDsWithTagFilter(tf, minMetricIDs) mIDs, err := is.intersectMetricIDsWithTagFilter(tf, minMetricIDs)
if err == errFallbackToMetricNameMatch {
return is.updateMetricIDsByMetricNameMatch(metricIDs, minMetricIDs, tfsPostponed[i:])
}
if err != nil { if err != nil {
return err return err
} }
@ -2363,7 +2357,6 @@ const (
uselessSingleTagFilterKeyPrefix = 0 uselessSingleTagFilterKeyPrefix = 0
uselessMultiTagFiltersKeyPrefix = 1 uselessMultiTagFiltersKeyPrefix = 1
uselessNegativeTagFilterKeyPrefix = 2 uselessNegativeTagFilterKeyPrefix = 2
uselessTagIntersectKeyPrefix = 3
) )
var uselessTagFilterCacheValue = []byte("1") var uselessTagFilterCacheValue = []byte("1")
@ -2375,29 +2368,28 @@ func (is *indexSearch) getMetricIDsForTagFilter(tf *tagFilter, filter *uint64set
metricIDs := &uint64set.Set{} metricIDs := &uint64set.Set{}
if len(tf.orSuffixes) > 0 { if len(tf.orSuffixes) > 0 {
// Fast path for orSuffixes - seek for rows for each value from orSuffixes. // Fast path for orSuffixes - seek for rows for each value from orSuffixes.
loopsCount, err := is.updateMetricIDsForOrSuffixesNoFilter(tf, maxMetrics, metricIDs) var loopsCount uint64
var err error
if filter != nil {
loopsCount, err = is.updateMetricIDsForOrSuffixesWithFilter(tf, metricIDs, filter)
} else {
loopsCount, err = is.updateMetricIDsForOrSuffixesNoFilter(tf, maxMetrics, metricIDs)
}
if err != nil { if err != nil {
if err == errFallbackToMetricNameMatch {
return nil, loopsCount, err
}
return nil, loopsCount, fmt.Errorf("error when searching for metricIDs for tagFilter in fast path: %w; tagFilter=%s", err, tf) return nil, loopsCount, fmt.Errorf("error when searching for metricIDs for tagFilter in fast path: %w; tagFilter=%s", err, tf)
} }
return metricIDs, loopsCount, nil return metricIDs, loopsCount, nil
} }
// Slow path - scan for all the rows with the given prefix. // Slow path - scan for all the rows with the given prefix.
maxLoopsCount := uint64(maxMetrics) * maxIndexScanSlowLoopsPerMetric loopsCount, err := is.getMetricIDsForTagFilterSlow(tf, filter, metricIDs.Add)
loopsCount, err := is.getMetricIDsForTagFilterSlow(tf, filter, maxLoopsCount, metricIDs.Add)
if err != nil { if err != nil {
if err == errFallbackToMetricNameMatch {
return nil, loopsCount, err
}
return nil, loopsCount, fmt.Errorf("error when searching for metricIDs for tagFilter in slow path: %w; tagFilter=%s", err, tf) return nil, loopsCount, fmt.Errorf("error when searching for metricIDs for tagFilter in slow path: %w; tagFilter=%s", err, tf)
} }
return metricIDs, loopsCount, nil return metricIDs, loopsCount, nil
} }
func (is *indexSearch) getMetricIDsForTagFilterSlow(tf *tagFilter, filter *uint64set.Set, maxLoopsCount uint64, f func(metricID uint64)) (uint64, error) { func (is *indexSearch) getMetricIDsForTagFilterSlow(tf *tagFilter, filter *uint64set.Set, f func(metricID uint64)) (uint64, error) {
if len(tf.orSuffixes) > 0 { if len(tf.orSuffixes) > 0 {
logger.Panicf("BUG: the getMetricIDsForTagFilterSlow must be called only for empty tf.orSuffixes; got %s", tf.orSuffixes) logger.Panicf("BUG: the getMetricIDsForTagFilterSlow must be called only for empty tf.orSuffixes; got %s", tf.orSuffixes)
} }
@ -2436,9 +2428,6 @@ func (is *indexSearch) getMetricIDsForTagFilterSlow(tf *tagFilter, filter *uint6
} }
mp.ParseMetricIDs() mp.ParseMetricIDs()
loopsCount += uint64(mp.MetricIDsLen()) loopsCount += uint64(mp.MetricIDsLen())
if loopsCount > maxLoopsCount {
return loopsCount, errFallbackToMetricNameMatch
}
if prevMatch && string(suffix) == string(prevMatchingSuffix) { if prevMatch && string(suffix) == string(prevMatchingSuffix) {
// Fast path: the same tag value found. // Fast path: the same tag value found.
// There is no need in checking it again with potentially // There is no need in checking it again with potentially
@ -2522,26 +2511,28 @@ func (is *indexSearch) updateMetricIDsForOrSuffixesNoFilter(tf *tagFilter, maxMe
return loopsCount, nil return loopsCount, nil
} }
func (is *indexSearch) updateMetricIDsForOrSuffixesWithFilter(tf *tagFilter, metricIDs, filter *uint64set.Set) error { func (is *indexSearch) updateMetricIDsForOrSuffixesWithFilter(tf *tagFilter, metricIDs, filter *uint64set.Set) (uint64, error) {
sortedFilter := filter.AppendTo(nil) sortedFilter := filter.AppendTo(nil)
kb := kbPool.Get() kb := kbPool.Get()
defer kbPool.Put(kb) defer kbPool.Put(kb)
var loopsCount uint64
for _, orSuffix := range tf.orSuffixes { for _, orSuffix := range tf.orSuffixes {
kb.B = append(kb.B[:0], tf.prefix...) kb.B = append(kb.B[:0], tf.prefix...)
kb.B = append(kb.B, orSuffix...) kb.B = append(kb.B, orSuffix...)
kb.B = append(kb.B, tagSeparatorChar) kb.B = append(kb.B, tagSeparatorChar)
if err := is.updateMetricIDsForOrSuffixWithFilter(kb.B, metricIDs, sortedFilter, tf.isNegative); err != nil { lc, err := is.updateMetricIDsForOrSuffixWithFilter(kb.B, metricIDs, sortedFilter, tf.isNegative)
return err if err != nil {
return loopsCount, err
} }
loopsCount += lc
} }
return nil return loopsCount, nil
} }
func (is *indexSearch) updateMetricIDsForOrSuffixNoFilter(prefix []byte, maxMetrics int, metricIDs *uint64set.Set) (uint64, error) { func (is *indexSearch) updateMetricIDsForOrSuffixNoFilter(prefix []byte, maxMetrics int, metricIDs *uint64set.Set) (uint64, error) {
ts := &is.ts ts := &is.ts
mp := &is.mp mp := &is.mp
mp.Reset() mp.Reset()
maxLoopsCount := uint64(maxMetrics) * maxIndexScanLoopsPerMetric
var loopsCount uint64 var loopsCount uint64
loopsPaceLimiter := 0 loopsPaceLimiter := 0
ts.Seek(prefix) ts.Seek(prefix)
@ -2560,9 +2551,6 @@ func (is *indexSearch) updateMetricIDsForOrSuffixNoFilter(prefix []byte, maxMetr
return loopsCount, err return loopsCount, err
} }
loopsCount += uint64(mp.MetricIDsLen()) loopsCount += uint64(mp.MetricIDsLen())
if loopsCount > maxLoopsCount {
return loopsCount, errFallbackToMetricNameMatch
}
mp.ParseMetricIDs() mp.ParseMetricIDs()
metricIDs.AddMulti(mp.MetricIDs) metricIDs.AddMulti(mp.MetricIDs)
} }
@ -2572,16 +2560,15 @@ func (is *indexSearch) updateMetricIDsForOrSuffixNoFilter(prefix []byte, maxMetr
return loopsCount, nil return loopsCount, nil
} }
func (is *indexSearch) updateMetricIDsForOrSuffixWithFilter(prefix []byte, metricIDs *uint64set.Set, sortedFilter []uint64, isNegative bool) error { func (is *indexSearch) updateMetricIDsForOrSuffixWithFilter(prefix []byte, metricIDs *uint64set.Set, sortedFilter []uint64, isNegative bool) (uint64, error) {
if len(sortedFilter) == 0 { if len(sortedFilter) == 0 {
return nil return 0, nil
} }
firstFilterMetricID := sortedFilter[0] firstFilterMetricID := sortedFilter[0]
lastFilterMetricID := sortedFilter[len(sortedFilter)-1] lastFilterMetricID := sortedFilter[len(sortedFilter)-1]
ts := &is.ts ts := &is.ts
mp := &is.mp mp := &is.mp
mp.Reset() mp.Reset()
maxLoopsCount := uint64(len(sortedFilter)) * maxIndexScanLoopsPerMetric
var loopsCount uint64 var loopsCount uint64
loopsPaceLimiter := 0 loopsPaceLimiter := 0
ts.Seek(prefix) ts.Seek(prefix)
@ -2590,17 +2577,18 @@ func (is *indexSearch) updateMetricIDsForOrSuffixWithFilter(prefix []byte, metri
for ts.NextItem() { for ts.NextItem() {
if loopsPaceLimiter&paceLimiterMediumIterationsMask == 0 { if loopsPaceLimiter&paceLimiterMediumIterationsMask == 0 {
if err := checkSearchDeadlineAndPace(is.deadline); err != nil { if err := checkSearchDeadlineAndPace(is.deadline); err != nil {
return err return loopsCount, err
} }
} }
loopsPaceLimiter++ loopsPaceLimiter++
item := ts.Item item := ts.Item
if !bytes.HasPrefix(item, prefix) { if !bytes.HasPrefix(item, prefix) {
return nil return loopsCount, nil
} }
if err := mp.InitOnlyTail(item, item[len(prefix):]); err != nil { if err := mp.InitOnlyTail(item, item[len(prefix):]); err != nil {
return err return loopsCount, err
} }
loopsCount += uint64(mp.MetricIDsLen())
firstMetricID, lastMetricID := mp.FirstAndLastMetricIDs() firstMetricID, lastMetricID := mp.FirstAndLastMetricIDs()
if lastMetricID < firstFilterMetricID { if lastMetricID < firstFilterMetricID {
// Skip the item, since it contains metricIDs lower // Skip the item, since it contains metricIDs lower
@ -2610,14 +2598,11 @@ func (is *indexSearch) updateMetricIDsForOrSuffixWithFilter(prefix []byte, metri
if firstMetricID > lastFilterMetricID { if firstMetricID > lastFilterMetricID {
// Stop searching, since the current item and all the subsequent items // Stop searching, since the current item and all the subsequent items
// contain metricIDs higher than metricIDs in sortedFilter. // contain metricIDs higher than metricIDs in sortedFilter.
return nil return loopsCount, nil
} }
sf = sortedFilter sf = sortedFilter
loopsCount += uint64(mp.MetricIDsLen())
if loopsCount > maxLoopsCount {
return errFallbackToMetricNameMatch
}
mp.ParseMetricIDs() mp.ParseMetricIDs()
matchingMetricIDs := mp.MetricIDs[:0]
for _, metricID = range mp.MetricIDs { for _, metricID = range mp.MetricIDs {
if len(sf) == 0 { if len(sf) == 0 {
break break
@ -2632,18 +2617,23 @@ func (is *indexSearch) updateMetricIDsForOrSuffixWithFilter(prefix []byte, metri
if metricID < sf[0] { if metricID < sf[0] {
continue continue
} }
if isNegative { matchingMetricIDs = append(matchingMetricIDs, metricID)
metricIDs.Del(metricID)
} else {
metricIDs.Add(metricID)
}
sf = sf[1:] sf = sf[1:]
} }
if len(matchingMetricIDs) > 0 {
if isNegative {
for _, metricID := range matchingMetricIDs {
metricIDs.Del(metricID)
}
} else {
metricIDs.AddMulti(matchingMetricIDs)
}
}
} }
if err := ts.Error(); err != nil { if err := ts.Error(); err != nil {
return fmt.Errorf("error when searching for tag filter prefix %q: %w", prefix, err) return loopsCount, fmt.Errorf("error when searching for tag filter prefix %q: %w", prefix, err)
} }
return nil return loopsCount, nil
} }
func binarySearchUint64(a []uint64, v uint64) uint { func binarySearchUint64(a []uint64, v uint64) uint {
@ -2660,7 +2650,7 @@ func binarySearchUint64(a []uint64, v uint64) uint {
return i return i
} }
var errFallbackToMetricNameMatch = errors.New("fall back to updateMetricIDsByMetricNameMatch because of too many index scan loops") var errFallbackToGlobalSearch = errors.New("fall back from per-day index search to global index search")
var errMissingMetricIDsForDate = errors.New("missing metricIDs for date") var errMissingMetricIDsForDate = errors.New("missing metricIDs for date")
@ -2725,11 +2715,11 @@ func (is *indexSearch) tryUpdatingMetricIDsForDateRange(metricIDs *uint64set.Set
maxDate := uint64(tr.MaxTimestamp) / msecPerDay maxDate := uint64(tr.MaxTimestamp) / msecPerDay
if maxDate < minDate { if maxDate < minDate {
// Per-day inverted index doesn't cover the selected date range. // Per-day inverted index doesn't cover the selected date range.
return errFallbackToMetricNameMatch return fmt.Errorf("maxDate=%d cannot be smaller than minDate=%d", maxDate, minDate)
} }
if maxDate-minDate > maxDaysForDateMetricIDs { if maxDate-minDate > maxDaysForDateMetricIDs {
// Too much dates must be covered. Give up, since it may be slow. // Too much dates must be covered. Give up, since it may be slow.
return errFallbackToMetricNameMatch return errFallbackToGlobalSearch
} }
if minDate == maxDate { if minDate == maxDate {
// Fast path - query only a single date. // Fast path - query only a single date.
@ -2759,14 +2749,14 @@ func (is *indexSearch) tryUpdatingMetricIDsForDateRange(metricIDs *uint64set.Set
return return
} }
if err != nil { if err != nil {
if err == errFallbackToMetricNameMatch { if err == errFallbackToGlobalSearch {
// The per-date search is too expensive. Probably it is faster to perform global search // The per-date search is too expensive. Probably it is faster to perform global search
// using metric name match. // using metric name match.
errGlobal = err errGlobal = err
return return
} }
dateStr := time.Unix(int64(date*24*3600), 0) dateStr := time.Unix(int64(date*24*3600), 0)
errGlobal = fmt.Errorf("cannot search for metricIDs for %s: %w", dateStr, err) errGlobal = fmt.Errorf("cannot search for metricIDs at %s: %w", dateStr, err)
return return
} }
if metricIDs.Len() < maxMetrics { if metricIDs.Len() < maxMetrics {
@ -2788,9 +2778,8 @@ func (is *indexSearch) getMetricIDsForDateAndFilters(date uint64, tfs *TagFilter
// This stats is usually collected from the previous queries. // This stats is usually collected from the previous queries.
// This way we limit the amount of work below by applying fast filters at first. // This way we limit the amount of work below by applying fast filters at first.
type tagFilterWithWeight struct { type tagFilterWithWeight struct {
tf *tagFilter tf *tagFilter
loopsCount uint64 loopsCount uint64
lastQueryTimestamp uint64
} }
tfws := make([]tagFilterWithWeight, len(tfs.tfs)) tfws := make([]tagFilterWithWeight, len(tfs.tfs))
currentTime := fasttime.UnixTimestamp() currentTime := fasttime.UnixTimestamp()
@ -2798,26 +2787,29 @@ func (is *indexSearch) getMetricIDsForDateAndFilters(date uint64, tfs *TagFilter
tf := &tfs.tfs[i] tf := &tfs.tfs[i]
loopsCount, lastQueryTimestamp := is.getLoopsCountAndTimestampForDateFilter(date, tf) loopsCount, lastQueryTimestamp := is.getLoopsCountAndTimestampForDateFilter(date, tf)
origLoopsCount := loopsCount origLoopsCount := loopsCount
if currentTime > lastQueryTimestamp+3*3600 { if loopsCount == 0 && tf.looksLikeHeavy() {
// Update stats once per 3 hours only for relatively fast tag filters. // Set high loopsCount for heavy tag filters instead of spending CPU time on their execution.
// There is no need in spending CPU resources on updating stats for slow tag filters. loopsCount = 11e6
is.storeLoopsCountForDateFilter(date, tf, loopsCount)
}
if currentTime > lastQueryTimestamp+3600 {
// Update stats once per hour for relatively fast tag filters.
// There is no need in spending CPU resources on updating stats for heavy tag filters.
if loopsCount <= 10e6 { if loopsCount <= 10e6 {
loopsCount = 0 loopsCount = 0
} }
} }
if loopsCount == 0 { if loopsCount == 0 {
// Prevent from possible thundering herd issue when heavy tf is executed from multiple concurrent queries // Prevent from possible thundering herd issue when potentially heavy tf is executed from multiple concurrent queries
// by temporary persisting its position in the tag filters list. // by temporary persisting its position in the tag filters list.
if origLoopsCount == 0 { if origLoopsCount == 0 {
origLoopsCount = 10e6 origLoopsCount = 9e6
} }
lastQueryTimestamp = 0 is.storeLoopsCountForDateFilter(date, tf, origLoopsCount)
is.storeLoopsCountForDateFilter(date, tf, origLoopsCount, lastQueryTimestamp)
} }
tfws[i] = tagFilterWithWeight{ tfws[i] = tagFilterWithWeight{
tf: tf, tf: tf,
loopsCount: loopsCount, loopsCount: loopsCount,
lastQueryTimestamp: lastQueryTimestamp,
} }
} }
sort.Slice(tfws, func(i, j int) bool { sort.Slice(tfws, func(i, j int) bool {
@ -2829,7 +2821,6 @@ func (is *indexSearch) getMetricIDsForDateAndFilters(date uint64, tfs *TagFilter
}) })
// Populate metricIDs for the first non-negative filter. // Populate metricIDs for the first non-negative filter.
var tfsPostponed []*tagFilter
var metricIDs *uint64set.Set var metricIDs *uint64set.Set
tfwsRemaining := tfws[:0] tfwsRemaining := tfws[:0]
maxDateMetrics := maxMetrics * 50 maxDateMetrics := maxMetrics * 50
@ -2841,13 +2832,16 @@ func (is *indexSearch) getMetricIDsForDateAndFilters(date uint64, tfs *TagFilter
continue continue
} }
m, loopsCount, err := is.getMetricIDsForDateTagFilter(tf, date, nil, tfs.commonPrefix, maxDateMetrics) m, loopsCount, err := is.getMetricIDsForDateTagFilter(tf, date, nil, tfs.commonPrefix, maxDateMetrics)
is.storeLoopsCountForDateFilter(date, tf, loopsCount, tfw.lastQueryTimestamp) if loopsCount > tfw.loopsCount {
is.storeLoopsCountForDateFilter(date, tf, loopsCount)
}
if err != nil { if err != nil {
return nil, err return nil, err
} }
if m.Len() >= maxDateMetrics { if m.Len() >= maxDateMetrics {
// Too many time series found by a single tag filter. Postpone applying this filter via metricName match. // Too many time series found by a single tag filter. Postpone applying this filter.
tfsPostponed = append(tfsPostponed, tf) tfwsRemaining = append(tfwsRemaining, tfw)
tfw.loopsCount = loopsCount
continue continue
} }
metricIDs = m metricIDs = m
@ -2872,7 +2866,7 @@ func (is *indexSearch) getMetricIDsForDateAndFilters(date uint64, tfs *TagFilter
} }
if m.Len() >= maxDateMetrics { if m.Len() >= maxDateMetrics {
// Too many time series found for the given (date). Fall back to global search. // Too many time series found for the given (date). Fall back to global search.
return nil, errFallbackToMetricNameMatch return nil, errFallbackToGlobalSearch
} }
metricIDs = m metricIDs = m
} }
@ -2883,6 +2877,7 @@ func (is *indexSearch) getMetricIDsForDateAndFilters(date uint64, tfs *TagFilter
// when the intial tag filters significantly reduce the number of found metricIDs, // when the intial tag filters significantly reduce the number of found metricIDs,
// so the remaining filters could be performed via much faster metricName matching instead // so the remaining filters could be performed via much faster metricName matching instead
// of slow selecting of matching metricIDs. // of slow selecting of matching metricIDs.
var tfsPostponed []*tagFilter
for i := range tfwsRemaining { for i := range tfwsRemaining {
tfw := tfwsRemaining[i] tfw := tfwsRemaining[i]
tf := tfw.tf tf := tfw.tf
@ -2891,24 +2886,26 @@ func (is *indexSearch) getMetricIDsForDateAndFilters(date uint64, tfs *TagFilter
// Short circuit - there is no need in applying the remaining filters to an empty set. // Short circuit - there is no need in applying the remaining filters to an empty set.
break break
} }
if uint64(metricIDsLen)*maxIndexScanLoopsPerMetric < tfw.loopsCount { if tfw.loopsCount > uint64(metricIDsLen)*loopsCountPerMetricNameMatch {
// It should be faster performing metricName match on the remaining filters // It should be faster performing metricName match on the remaining filters
// instead of scanning big number of entries in the inverted index for these filters. // instead of scanning big number of entries in the inverted index for these filters.
tfsPostponed = append(tfsPostponed, tf) for i < len(tfwsRemaining) {
// Store stats for non-executed tf, since it could be updated during protection from thundered herd. tfw := tfwsRemaining[i]
is.storeLoopsCountForDateFilter(date, tf, tfw.loopsCount, tfw.lastQueryTimestamp) tf := tfw.tf
continue tfsPostponed = append(tfsPostponed, tf)
// Store stats for non-executed tf, since it could be updated during protection from thundered herd.
is.storeLoopsCountForDateFilter(date, tf, tfw.loopsCount)
i++
}
break
}
m, loopsCount, err := is.getMetricIDsForDateTagFilter(tf, date, metricIDs, tfs.commonPrefix, 0)
if loopsCount > tfw.loopsCount {
is.storeLoopsCountForDateFilter(date, tf, loopsCount)
} }
m, loopsCount, err := is.getMetricIDsForDateTagFilter(tf, date, metricIDs, tfs.commonPrefix, maxDateMetrics)
is.storeLoopsCountForDateFilter(date, tf, loopsCount, tfw.lastQueryTimestamp)
if err != nil { if err != nil {
return nil, err return nil, err
} }
if m.Len() >= maxDateMetrics {
// Too many time series found by a single tag filter. Postpone applying this filter via metricName match.
tfsPostponed = append(tfsPostponed, tf)
continue
}
if tf.isNegative { if tf.isNegative {
metricIDs.Subtract(m) metricIDs.Subtract(m)
} else { } else {
@ -3092,9 +3089,9 @@ func (is *indexSearch) getMetricIDsForDateTagFilter(tf *tagFilter, date uint64,
kbPool.Put(kb) kbPool.Put(kb)
if err != nil { if err != nil {
// Set high loopsCount for failing filter, so it is moved to the end of filter list. // Set high loopsCount for failing filter, so it is moved to the end of filter list.
loopsCount = 1e9 loopsCount = 20e9
} }
if metricIDs.Len() >= maxMetrics { if filter == nil && metricIDs.Len() >= maxMetrics {
// Increase loopsCount for tag filter matching too many metrics, // Increase loopsCount for tag filter matching too many metrics,
// So next time it is moved to the end of filter list. // So next time it is moved to the end of filter list.
loopsCount *= 2 loopsCount *= 2
@ -3115,13 +3112,8 @@ func (is *indexSearch) getLoopsCountAndTimestampForDateFilter(date uint64, tf *t
return loopsCount, timestamp return loopsCount, timestamp
} }
func (is *indexSearch) storeLoopsCountForDateFilter(date uint64, tf *tagFilter, loopsCount, prevTimestamp uint64) { func (is *indexSearch) storeLoopsCountForDateFilter(date uint64, tf *tagFilter, loopsCount uint64) {
currentTimestamp := fasttime.UnixTimestamp() currentTimestamp := fasttime.UnixTimestamp()
if currentTimestamp < prevTimestamp+5 {
// The cache already contains quite fresh entry for the current (date, tf).
// Do not update it too frequently.
return
}
is.kb.B = appendDateTagFilterCacheKey(is.kb.B[:0], date, tf) is.kb.B = appendDateTagFilterCacheKey(is.kb.B[:0], date, tf)
kb := kbPool.Get() kb := kbPool.Get()
kb.B = encoding.MarshalUint64(kb.B[:0], loopsCount) kb.B = encoding.MarshalUint64(kb.B[:0], loopsCount)
@ -3196,63 +3188,28 @@ func (is *indexSearch) updateMetricIDsForPrefix(prefix []byte, metricIDs *uint64
return nil return nil
} }
// The maximum number of index scan loops. // The estimated number of index scan loops a single loop in updateMetricIDsByMetricNameMatch takes.
// Bigger number of loops is slower than updateMetricIDsByMetricNameMatch const loopsCountPerMetricNameMatch = 500
// over the found metrics.
const maxIndexScanLoopsPerMetric = 100
// The maximum number of slow index scan loops.
// Bigger number of loops is slower than updateMetricIDsByMetricNameMatch
// over the found metrics.
const maxIndexScanSlowLoopsPerMetric = 20
func (is *indexSearch) intersectMetricIDsWithTagFilter(tf *tagFilter, filter *uint64set.Set) (*uint64set.Set, error) { func (is *indexSearch) intersectMetricIDsWithTagFilter(tf *tagFilter, filter *uint64set.Set) (*uint64set.Set, error) {
if filter.Len() == 0 { if filter.Len() == 0 {
return nil, nil return nil, nil
} }
kb := &is.kb
filterLenRounded := (uint64(filter.Len()) / 1024) * 1024
kb.B = append(kb.B[:0], uselessTagIntersectKeyPrefix)
kb.B = encoding.MarshalUint64(kb.B, filterLenRounded)
kb.B = tf.Marshal(kb.B)
if len(is.db.uselessTagFiltersCache.Get(nil, kb.B)) > 0 {
// Skip useless work, since the intersection will return
// errFallbackToMetricNameMatc for the given filter.
return nil, errFallbackToMetricNameMatch
}
metricIDs, err := is.intersectMetricIDsWithTagFilterNocache(tf, filter)
if err == nil {
return metricIDs, err
}
if err != errFallbackToMetricNameMatch {
return nil, err
}
kb.B = append(kb.B[:0], uselessTagIntersectKeyPrefix)
kb.B = encoding.MarshalUint64(kb.B, filterLenRounded)
kb.B = tf.Marshal(kb.B)
is.db.uselessTagFiltersCache.Set(kb.B, uselessTagFilterCacheValue)
return nil, errFallbackToMetricNameMatch
}
func (is *indexSearch) intersectMetricIDsWithTagFilterNocache(tf *tagFilter, filter *uint64set.Set) (*uint64set.Set, error) {
metricIDs := filter metricIDs := filter
if !tf.isNegative { if !tf.isNegative {
metricIDs = &uint64set.Set{} metricIDs = &uint64set.Set{}
} }
if len(tf.orSuffixes) > 0 { if len(tf.orSuffixes) > 0 {
// Fast path for orSuffixes - seek for rows for each value from orSuffixes. // Fast path for orSuffixes - seek for rows for each value from orSuffixes.
if err := is.updateMetricIDsForOrSuffixesWithFilter(tf, metricIDs, filter); err != nil { _, err := is.updateMetricIDsForOrSuffixesWithFilter(tf, metricIDs, filter)
if err == errFallbackToMetricNameMatch { if err != nil {
return nil, err
}
return nil, fmt.Errorf("error when intersecting metricIDs for tagFilter in fast path: %w; tagFilter=%s", err, tf) return nil, fmt.Errorf("error when intersecting metricIDs for tagFilter in fast path: %w; tagFilter=%s", err, tf)
} }
return metricIDs, nil return metricIDs, nil
} }
// Slow path - scan for all the rows with the given prefix. // Slow path - scan for all the rows with the given prefix.
maxLoopsCount := uint64(filter.Len()) * maxIndexScanSlowLoopsPerMetric _, err := is.getMetricIDsForTagFilterSlow(tf, filter, func(metricID uint64) {
_, err := is.getMetricIDsForTagFilterSlow(tf, filter, maxLoopsCount, func(metricID uint64) {
if tf.isNegative { if tf.isNegative {
// filter must be equal to metricIDs // filter must be equal to metricIDs
metricIDs.Del(metricID) metricIDs.Del(metricID)
@ -3261,9 +3218,6 @@ func (is *indexSearch) intersectMetricIDsWithTagFilterNocache(tf *tagFilter, fil
} }
}) })
if err != nil { if err != nil {
if err == errFallbackToMetricNameMatch {
return nil, err
}
return nil, fmt.Errorf("error when intersecting metricIDs for tagFilter in slow path: %w; tagFilter=%s", err, tf) return nil, fmt.Errorf("error when intersecting metricIDs for tagFilter in slow path: %w; tagFilter=%s", err, tf)
} }
return metricIDs, nil return metricIDs, nil

View file

@ -248,6 +248,10 @@ type tagFilter struct {
graphiteReverseSuffix []byte graphiteReverseSuffix []byte
} }
func (tf *tagFilter) looksLikeHeavy() bool {
return tf.isRegexp && len(tf.orSuffixes) == 0
}
func (tf *tagFilter) isComposite() bool { func (tf *tagFilter) isComposite() bool {
k := tf.key k := tf.key
return len(k) > 0 && k[0] == compositeTagKeyPrefix return len(k) > 0 && k[0] == compositeTagKeyPrefix

View file

@ -141,36 +141,32 @@ func (s *Set) AddMulti(a []uint64) {
if len(a) == 0 { if len(a) == 0 {
return return
} }
slowPath := false hiPrev := uint32(a[0] >> 32)
hi := uint32(a[0] >> 32) i := 0
for _, x := range a[1:] { for j, x := range a {
if hi != uint32(x>>32) { hi := uint32(x >> 32)
slowPath = true if hi == hiPrev {
break continue
} }
b32 := s.getOrCreateBucket32(hiPrev)
s.itemsCount += b32.addMulti(a[i:j])
hiPrev = hi
i = j
} }
if slowPath { b32 := s.getOrCreateBucket32(hiPrev)
for _, x := range a { s.itemsCount += b32.addMulti(a[i:])
s.Add(x) }
}
return func (s *Set) getOrCreateBucket32(hi uint32) *bucket32 {
}
// Fast path - all the items in a have identical higher 32 bits.
// Put them in a bulk into the corresponding bucket32.
bs := s.buckets bs := s.buckets
var b32 *bucket32
for i := range bs { for i := range bs {
if bs[i].hi == hi { if bs[i].hi == hi {
b32 = &bs[i] return &bs[i]
break
} }
} }
if b32 == nil { b32 := s.addBucket32()
b32 = s.addBucket32() b32.hi = hi
b32.hi = hi return b32
}
n := b32.addMulti(a)
s.itemsCount += n
} }
func (s *Set) addBucket32() *bucket32 { func (s *Set) addBucket32() *bucket32 {
@ -609,41 +605,32 @@ func (b *bucket32) addMulti(a []uint64) int {
if len(a) == 0 { if len(a) == 0 {
return 0 return 0
} }
hi := uint16(a[0] >> 16) count := 0
slowPath := false hiPrev := uint16(a[0] >> 16)
for _, x := range a[1:] { i := 0
if hi != uint16(x>>16) { for j, x := range a {
slowPath = true hi := uint16(x >> 16)
break if hi == hiPrev {
continue
} }
b16 := b.getOrCreateBucket16(hiPrev)
count += b16.addMulti(a[i:j])
hiPrev = hi
i = j
} }
if slowPath { b16 := b.getOrCreateBucket16(hiPrev)
count := 0 count += b16.addMulti(a[i:])
for _, x := range a { return count
if b.add(uint32(x)) { }
count++
} func (b *bucket32) getOrCreateBucket16(hi uint16) *bucket16 {
}
return count
}
// Fast path - all the items in a have identical higher 32+16 bits.
// Put them to a single bucket16 in a bulk.
var b16 *bucket16
his := b.b16his his := b.b16his
bs := b.buckets bs := b.buckets
if n := b.getHint(); n < uint32(len(his)) && his[n] == hi { n := binarySearch16(his, hi)
b16 = &bs[n] if n < 0 || n >= len(his) || his[n] != hi {
return b.addBucketAtPos(hi, n)
} }
if b16 == nil { return &bs[n]
n := binarySearch16(his, hi)
if n < 0 || n >= len(his) || his[n] != hi {
b16 = b.addBucketAtPos(hi, n)
} else {
b.setHint(n)
b16 = &bs[n]
}
}
return b16.addMulti(a)
} }
func (b *bucket32) addSlow(hi, lo uint16) bool { func (b *bucket32) addSlow(hi, lo uint16) bool {
@ -742,8 +729,8 @@ const (
type bucket16 struct { type bucket16 struct {
bits *[wordsPerBucket]uint64 bits *[wordsPerBucket]uint64
smallPool *[smallPoolSize]uint16
smallPoolLen int smallPoolLen int
smallPool [smallPoolSize]uint16
} }
const smallPoolSize = 56 const smallPoolSize = 56
@ -820,7 +807,14 @@ func (b *bucket16) intersect(a *bucket16) {
} }
func (b *bucket16) sizeBytes() uint64 { func (b *bucket16) sizeBytes() uint64 {
return uint64(unsafe.Sizeof(*b)) + uint64(unsafe.Sizeof(*b.bits)) n := unsafe.Sizeof(*b)
if b.bits != nil {
n += unsafe.Sizeof(*b.bits)
}
if b.smallPool != nil {
n += unsafe.Sizeof(*b.smallPool)
}
return uint64(n)
} }
func (b *bucket16) copyTo(dst *bucket16) { func (b *bucket16) copyTo(dst *bucket16) {
@ -831,23 +825,37 @@ func (b *bucket16) copyTo(dst *bucket16) {
dst.bits = &bits dst.bits = &bits
} }
dst.smallPoolLen = b.smallPoolLen dst.smallPoolLen = b.smallPoolLen
dst.smallPool = b.smallPool if b.smallPool != nil {
sp := dst.getOrCreateSmallPool()
*sp = *b.smallPool
}
}
func (b *bucket16) getOrCreateSmallPool() *[smallPoolSize]uint16 {
if b.smallPool == nil {
var sp [smallPoolSize]uint16
b.smallPool = &sp
}
return b.smallPool
} }
func (b *bucket16) add(x uint16) bool { func (b *bucket16) add(x uint16) bool {
if b.bits == nil { bits := b.bits
if bits == nil {
return b.addToSmallPool(x) return b.addToSmallPool(x)
} }
wordNum, bitMask := getWordNumBitMask(x) wordNum, bitMask := getWordNumBitMask(x)
word := &b.bits[wordNum] ok := bits[wordNum]&bitMask == 0
ok := *word&bitMask == 0 if ok {
*word |= bitMask bits[wordNum] |= bitMask
}
return ok return ok
} }
func (b *bucket16) addMulti(a []uint64) int { func (b *bucket16) addMulti(a []uint64) int {
count := 0 count := 0
if b.bits == nil { bits := b.bits
if bits == nil {
// Slow path // Slow path
for _, x := range a { for _, x := range a {
if b.add(uint16(x)) { if b.add(uint16(x)) {
@ -858,11 +866,10 @@ func (b *bucket16) addMulti(a []uint64) int {
// Fast path // Fast path
for _, x := range a { for _, x := range a {
wordNum, bitMask := getWordNumBitMask(uint16(x)) wordNum, bitMask := getWordNumBitMask(uint16(x))
word := &b.bits[wordNum] if bits[wordNum]&bitMask == 0 {
if *word&bitMask == 0 { bits[wordNum] |= bitMask
count++ count++
} }
*word |= bitMask
} }
} }
return count return count
@ -872,15 +879,16 @@ func (b *bucket16) addToSmallPool(x uint16) bool {
if b.hasInSmallPool(x) { if b.hasInSmallPool(x) {
return false return false
} }
if b.smallPoolLen < len(b.smallPool) { sp := b.getOrCreateSmallPool()
b.smallPool[b.smallPoolLen] = x if b.smallPoolLen < len(sp) {
sp[b.smallPoolLen] = x
b.smallPoolLen++ b.smallPoolLen++
return true return true
} }
b.smallPoolLen = 0 b.smallPoolLen = 0
var bits [wordsPerBucket]uint64 var bits [wordsPerBucket]uint64
b.bits = &bits b.bits = &bits
for _, v := range b.smallPool[:] { for _, v := range sp[:] {
b.add(v) b.add(v)
} }
b.add(x) b.add(x)
@ -896,7 +904,11 @@ func (b *bucket16) has(x uint16) bool {
} }
func (b *bucket16) hasInSmallPool(x uint16) bool { func (b *bucket16) hasInSmallPool(x uint16) bool {
for _, v := range b.smallPool[:b.smallPoolLen] { sp := b.smallPool
if sp == nil {
return false
}
for _, v := range sp[:b.smallPoolLen] {
if v == x { if v == x {
return true return true
} }
@ -916,9 +928,13 @@ func (b *bucket16) del(x uint16) bool {
} }
func (b *bucket16) delFromSmallPool(x uint16) bool { func (b *bucket16) delFromSmallPool(x uint16) bool {
for i, v := range b.smallPool[:b.smallPoolLen] { sp := b.smallPool
if sp == nil {
return false
}
for i, v := range sp[:b.smallPoolLen] {
if v == x { if v == x {
copy(b.smallPool[i:], b.smallPool[i+1:]) copy(sp[i:], sp[i+1:])
b.smallPoolLen-- b.smallPoolLen--
return true return true
} }
@ -929,11 +945,15 @@ func (b *bucket16) delFromSmallPool(x uint16) bool {
func (b *bucket16) appendTo(dst []uint64, hi uint32, hi16 uint16) []uint64 { func (b *bucket16) appendTo(dst []uint64, hi uint32, hi16 uint16) []uint64 {
hi64 := uint64(hi)<<32 | uint64(hi16)<<16 hi64 := uint64(hi)<<32 | uint64(hi16)<<16
if b.bits == nil { if b.bits == nil {
sp := b.smallPool
if sp == nil {
return dst
}
// Use smallPoolSorter instead of sort.Slice here in order to reduce memory allocations. // Use smallPoolSorter instead of sort.Slice here in order to reduce memory allocations.
sps := smallPoolSorterPool.Get().(*smallPoolSorter) sps := smallPoolSorterPool.Get().(*smallPoolSorter)
// Sort a copy of b.smallPool, since b must be readonly in order to prevent from data races // Sort a copy of sp, since b must be readonly in order to prevent from data races
// when b.appendTo is called from concurrent goroutines. // when b.appendTo is called from concurrent goroutines.
sps.smallPool = b.smallPool sps.smallPool = *sp
sps.a = sps.smallPool[:b.smallPoolLen] sps.a = sps.smallPool[:b.smallPoolLen]
if len(sps.a) > 1 && !sort.IsSorted(sps) { if len(sps.a) > 1 && !sort.IsSorted(sps) {
sort.Sort(sps) sort.Sort(sps)
@ -996,6 +1016,10 @@ func getWordNumBitMask(x uint16) (uint16, uint64) {
func binarySearch16(u16 []uint16, x uint16) int { func binarySearch16(u16 []uint16, x uint16) int {
// The code has been adapted from sort.Search. // The code has been adapted from sort.Search.
n := len(u16) n := len(u16)
if n > 0 && u16[n-1] < x {
// Fast path for values scanned in ascending order.
return n
}
i, j := 0, n i, j := 0, n
for i < j { for i < j {
h := int(uint(i+j) >> 1) h := int(uint(i+j) >> 1)

View file

@ -3,7 +3,7 @@ module github.com/VictoriaMetrics/fasthttp
go 1.13 go 1.13
require ( require (
github.com/klauspost/compress v1.11.3 github.com/klauspost/compress v1.11.12
github.com/valyala/bytebufferpool v1.0.0 github.com/valyala/bytebufferpool v1.0.0
github.com/valyala/tcplisten v0.0.0-20161114210144-ceec8f93295a github.com/valyala/tcplisten v0.0.0-20161114210144-ceec8f93295a
) )

View file

@ -1,5 +1,5 @@
github.com/klauspost/compress v1.11.3 h1:dB4Bn0tN3wdCzQxnS8r06kV74qN/TAfaIS0bVE8h3jc= github.com/klauspost/compress v1.11.12 h1:famVnQVu7QwryBN4jNseQdUKES71ZAOnB6UQQJPZvqk=
github.com/klauspost/compress v1.11.3/go.mod h1:aoV0uJVorq1K+umq18yTdKaF57EivdYsUV+/s2qKfXs= github.com/klauspost/compress v1.11.12/go.mod h1:aoV0uJVorq1K+umq18yTdKaF57EivdYsUV+/s2qKfXs=
github.com/valyala/bytebufferpool v1.0.0 h1:GqA5TC/0021Y/b9FG4Oi9Mr3q7XYx6KllzawFIhcdPw= github.com/valyala/bytebufferpool v1.0.0 h1:GqA5TC/0021Y/b9FG4Oi9Mr3q7XYx6KllzawFIhcdPw=
github.com/valyala/bytebufferpool v1.0.0/go.mod h1:6bBcMArwyJ5K/AmCkWv1jt77kVWyCJ6HpOuEn7z0Csc= github.com/valyala/bytebufferpool v1.0.0/go.mod h1:6bBcMArwyJ5K/AmCkWv1jt77kVWyCJ6HpOuEn7z0Csc=
github.com/valyala/tcplisten v0.0.0-20161114210144-ceec8f93295a h1:0R4NLDRDZX6JcmhJgXi5E4b8Wg84ihbmUKp/GvSPEzc= github.com/valyala/tcplisten v0.0.0-20161114210144-ceec8f93295a h1:0R4NLDRDZX6JcmhJgXi5E4b8Wg84ihbmUKp/GvSPEzc=

2
vendor/modules.txt vendored
View file

@ -13,7 +13,7 @@ cloud.google.com/go/storage
# github.com/VictoriaMetrics/fastcache v1.5.8 # github.com/VictoriaMetrics/fastcache v1.5.8
## explicit ## explicit
github.com/VictoriaMetrics/fastcache github.com/VictoriaMetrics/fastcache
# github.com/VictoriaMetrics/fasthttp v1.0.13 # github.com/VictoriaMetrics/fasthttp v1.0.14
## explicit ## explicit
github.com/VictoriaMetrics/fasthttp github.com/VictoriaMetrics/fasthttp
github.com/VictoriaMetrics/fasthttp/fasthttputil github.com/VictoriaMetrics/fasthttp/fasthttputil