Merge branch 'public-single-node' into pmm-6401-read-prometheus-data-files

This commit is contained in:
Aliaksandr Valialkin 2021-03-15 22:44:24 +02:00
commit 7d44cdd8ce
74 changed files with 3952 additions and 2314 deletions

259
README.md
View file

@ -170,6 +170,7 @@ Alphabetically sorted links to case studies:
* [Font used](#font-used)
* [Color Palette](#color-palette)
* [We kindly ask](#we-kindly-ask)
* [List of command-line flags](#list-of-command-line-flags)
## How to start VictoriaMetrics
@ -182,7 +183,7 @@ The following command-line flags are used the most:
* `-storageDataPath` - path to data directory. VictoriaMetrics stores all the data in this directory. Default path is `victoria-metrics-data` in the current working directory.
* `-retentionPeriod` - retention for stored data. Older data is automatically deleted. Default retention is 1 month. See [these docs](#retention) for more details.
Other flags have good enough default values, so set them only if you really need this. Pass `-help` to see all the available flags with description and default values.
Other flags have good enough default values, so set them only if you really need this. Pass `-help` to see [all the available flags with description and default values](#list-of-command-line-flags).
See how to [ingest data to VictoriaMetrics](#how-to-import-time-series-data), how to [query VictoriaMetrics](#grafana-setup)
and how to [handle alerts](#alerting).
@ -413,6 +414,10 @@ while VictoriaMetrics stores them with *milliseconds* precision.
Extra labels may be added to all the written time series by passing `extra_label=name=value` query args.
For example, `/write?extra_label=foo=bar` would add `{foo="bar"}` label to all the ingested metrics.
Some plugins for Telegraf such as [fluentd](https://github.com/fangli/fluent-plugin-influxdb), [Juniper/open-nti](https://github.com/Juniper/open-nti)
or [Juniper/jitmon](https://github.com/Juniper/jtimon) send `SHOW DATABASES` query to `/query` and expect a particular database name in the response.
Comma-separated list of expected databases can be passed to VictoriaMetrics via `-influx.databaseNames` command-line flag.
## How to send data from Graphite-compatible agents such as [StatsD](https://github.com/etsy/statsd)
Enable Graphite receiver in VictoriaMetrics by setting `-graphiteListenAddr` command line flag. For instance,
@ -562,14 +567,17 @@ in front of VictoriaMetrics. [Contact us](mailto:sales@victoriametrics.com) if y
VictoriaMetrics accepts relative times in `time`, `start` and `end` query args additionally to unix timestamps and [RFC3339](https://www.ietf.org/rfc/rfc3339.txt).
For example, the following query would return data for the last 30 minutes: `/api/v1/query_range?start=-30m&query=...`.
VictoriaMetrics accepts `round_digits` query arg for `/api/v1/query` and `/api/v1/query_range` handlers. It can be used for rounding response values to the given number of digits after the decimal point. For example, `/api/v1/query?query=avg_over_time(temperature[1h])&round_digits=2` would round response values to up to two digits after the decimal point.
By default, VictoriaMetrics returns time series for the last 5 minutes from `/api/v1/series`, while the Prometheus API defaults to all time. Use `start` and `end` to select a different time range.
VictoriaMetrics accepts additional args for `/api/v1/labels` and `/api/v1/label/.../values` handlers.
See [this feature request](https://github.com/prometheus/prometheus/issues/6178) for details:
* Any number [time series selectors](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors) via `match[]` query arg.
* Optional `start` and `end` query args for limiting the time range for the selected labels or label values.
See [this feature request](https://github.com/prometheus/prometheus/issues/6178) for details.
Additionally VictoriaMetrics provides the following handlers:
* `/api/v1/series/count` - returns the total number of time series in the database. Some notes:
@ -1367,6 +1375,8 @@ See the example of alerting rules for VM components [here](https://github.com/Vi
VictoriaMetrics accepts optional `date=YYYY-MM-DD` and `topN=42` args on this page. By default `date` equals to the current date,
while `topN` equals to 10.
* New time series can be logged if `-logNewSeries` command-line flag is passed to VictoriaMetrics.
* VictoriaMetrics limits the number of labels per each metric with `-maxLabelsPerTimeseries` command-line flag.
This prevents from ingesting metrics with too many labels. It is recommended [monitoring](#monitoring) `vm_metrics_with_dropped_labels_total`
metric in order to determine whether `-maxLabelsPerTimeseries` must be adjusted for your workload.
@ -1538,3 +1548,248 @@ Files included in each folder:
* There should be sufficient clear space around the logo.
* Do not change spacing, alignment, or relative locations of the design elements.
* Do not change the proportions of any of the design elements or the design itself. You may resize as needed but must retain all proportions.
## List of command-line flags
Pass `-help` to VictoriaMetrics in order to see the list of supported command-line flags with their description:
```
-bigMergeConcurrency int
The maximum number of CPU cores to use for big merges. Default value is used if set to 0
-csvTrimTimestamp duration
Trim timestamps when importing csv data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
-dedup.minScrapeInterval duration
Remove superflouos samples from time series if they are located closer to each other than this duration. This may be useful for reducing overhead when multiple identically configured Prometheus instances write data to the same VictoriaMetrics. Deduplication is disabled if the -dedup.minScrapeInterval is 0
-deleteAuthKey string
authKey for metrics' deletion via /api/v1/admin/tsdb/delete_series and /tags/delSeries
-denyQueriesOutsideRetention
Whether to deny queries outside of the configured -retentionPeriod. When set, then /api/v1/query_range would return '503 Service Unavailable' error for queries with 'from' value outside -retentionPeriod. This may be useful when multiple data sources with distinct retentions are hidden behind query-tee
-dryRun
Whether to check only -promscrape.config and then exit. Unknown config entries are allowed in -promscrape.config by default. This can be changed with -promscrape.config.strictParse
-enableTCP6
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP is used
-envflag.enable
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set
-envflag.prefix string
Prefix for environment variables if -envflag.enable is set
-finalMergeDelay duration
The delay before starting final merge for per-month partition after no new data is ingested into it. Final merge may require additional disk IO and CPU resources. Final merge may increase query speed and reduce disk space usage in some cases. Zero value disables final merge
-forceFlushAuthKey string
authKey, which must be passed in query string to /internal/force_flush pages
-forceMergeAuthKey string
authKey, which must be passed in query string to /internal/force_merge pages
-fs.disableMmap
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
-graphiteListenAddr string
TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty
-graphiteTrimTimestamp duration
Trim timestamps for Graphite data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
-http.connTimeout duration
Incoming http connections are closed after the configured timeout. This may help spreading incoming load among a cluster of services behind load balancer. Note that the real timeout may be bigger by up to 10% as a protection from Thundering herd problem (default 2m0s)
-http.disableResponseCompression
Disable compression of HTTP responses for saving CPU resources. By default compression is enabled to save network bandwidth
-http.idleConnTimeout duration
Timeout for incoming idle http connections (default 1m0s)
-http.maxGracefulShutdownDuration duration
The maximum duration for graceful shutdown of HTTP server. Highly loaded server may require increased value for graceful shutdown (default 7s)
-http.pathPrefix string
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
-http.shutdownDelay duration
Optional delay before http server shutdown. During this dealy the servier returns non-OK responses from /health page, so load balancers can route new requests to other servers
-httpAuth.password string
Password for HTTP Basic Auth. The authentication is disabled if -httpAuth.username is empty
-httpAuth.username string
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
-httpListenAddr string
TCP address to listen for http connections (default ":8428")
-import.maxLineLen size
The maximum length in bytes of a single line accepted by /api/v1/import; the line length can be limited with 'max_rows_per_line' query arg passed to /api/v1/export
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 104857600)
-influx.databaseNames array
Comma-separated list of database names to return from /query and /influx/query API. This can be needed for accepting data from Telegraf plugins such as https://github.com/fangli/fluent-plugin-influxdb
Supports array of values separated by comma or specified via multiple flags.
-influx.maxLineSize size
The maximum size in bytes for a single Influx line during parsing
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 262144)
-influxListenAddr string
TCP and UDP address to listen for Influx line protocol data. Usually :8189 must be set. Doesn't work if empty. This flag isn't needed when ingesting data over HTTP - just send it to http://<victoriametrics>:8428/write
-influxMeasurementFieldSeparator string
Separator for '{measurement}{separator}{field_name}' metric name when inserted via Influx line protocol (default "_")
-influxSkipMeasurement
Uses '{field_name}' as a metric name while ignoring '{measurement}' and '-influxMeasurementFieldSeparator'
-influxSkipSingleField
Uses '{measurement}' instead of '{measurement}{separator}{field_name}' for metic name if Influx line contains only a single field
-influxTrimTimestamp duration
Trim timestamps for Influx line protocol data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
-insert.maxQueueDuration duration
The maximum duration for waiting in the queue for insert requests due to -maxConcurrentInserts (default 1m0s)
-loggerDisableTimestamps
Whether to disable writing timestamps in logs
-loggerErrorsPerSecondLimit int
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, then the remaining errors are suppressed. Zero value disables the rate limit
-loggerFormat string
Format for logs. Possible values: default, json (default "default")
-loggerLevel string
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
-loggerOutput string
Output for the logs. Supported values: stderr, stdout (default "stderr")
-loggerTimezone string
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
-maxConcurrentInserts int
The maximum number of concurrent inserts. Default value should work for most cases, since it minimizes the overhead for concurrent inserts. This option is tigthly coupled with -insert.maxQueueDuration (default 16)
-maxInsertRequestSize size
The maximum size in bytes of a single Prometheus remote_write API request
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
-maxLabelsPerTimeseries int
The maximum number of labels accepted per time series. Superfluous labels are dropped (default 30)
-memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
-metricsAuthKey string
Auth key for /metrics. It overrides httpAuth settings
-opentsdbHTTPListenAddr string
TCP address to listen for OpentTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty
-opentsdbListenAddr string
TCP and UDP address to listen for OpentTSDB metrics. Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. Usually :4242 must be set. Doesn't work if empty
-opentsdbTrimTimestamp duration
Trim timestamps for OpenTSDB 'telnet put' data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
-opentsdbhttp.maxInsertRequestSize size
The maximum size of OpenTSDB HTTP put request
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
-opentsdbhttpTrimTimestamp duration
Trim timestamps for OpenTSDB HTTP data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
-pprofAuthKey string
Auth key for /debug/pprof. It overrides httpAuth settings
-precisionBits int
The number of precision bits to store per each value. Lower precision bits improves data compression at the cost of precision loss (default 64)
-promscrape.cluster.memberNum int
The number of number in the cluster of scrapers. It must be an unique value in the range 0 ... promscrape.cluster.membersCount-1 across scrapers in the cluster
-promscrape.cluster.membersCount int
The number of members in a cluster of scrapers. Each member must have an unique -promscrape.cluster.memberNum in the range 0 ... promscrape.cluster.membersCount-1 . Each member then scrapes roughly 1/N of all the targets. By default cluster scraping is disabled, i.e. a single scraper scrapes all the targets
-promscrape.cluster.replicationFactor int
The number of members in the cluster, which scrape the same targets. If the replication factor is greater than 2, then the deduplication must be enabled at remote storage side. See https://victoriametrics.github.io/#deduplication (default 1)
-promscrape.config string
Optional path to Prometheus config file with 'scrape_configs' section containing targets to scrape. See https://victoriametrics.github.io/#how-to-scrape-prometheus-exporters-such-as-node-exporter for details
-promscrape.config.dryRun
Checks -promscrape.config file for errors and unsupported fields and then exits. Returns non-zero exit code on parsing errors and emits these errors to stderr. See also -promscrape.config.strictParse command-line flag. Pass -loggerLevel=ERROR if you don't need to see info messages in the output.
-promscrape.config.strictParse
Whether to allow only supported fields in -promscrape.config . By default unsupported fields are silently skipped
-promscrape.configCheckInterval duration
Interval for checking for changes in '-promscrape.config' file. By default the checking is disabled. Send SIGHUP signal in order to force config check for changes
-promscrape.consulSDCheckInterval duration
Interval for checking for changes in Consul. This works only if consul_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config for details (default 30s)
-promscrape.disableCompression
Whether to disable sending 'Accept-Encoding: gzip' request headers to all the scrape targets. This may reduce CPU usage on scrape targets at the cost of higher network bandwidth utilization. It is possible to set 'disable_compression: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
-promscrape.disableKeepAlive
Whether to disable HTTP keep-alive connections when scraping all the targets. This may be useful when targets has no support for HTTP keep-alive connection. It is possible to set 'disable_keepalive: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control. Note that disabling HTTP keep-alive may increase load on both vmagent and scrape targets
-promscrape.discovery.concurrency int
The maximum number of concurrent requests to Prometheus autodiscovery API (Consul, Kubernetes, etc.) (default 100)
-promscrape.discovery.concurrentWaitTime duration
The maximum duration for waiting to perform API requests if more than -promscrape.discovery.concurrency requests are simultaneously performed (default 1m0s)
-promscrape.dnsSDCheckInterval duration
Interval for checking for changes in dns. This works only if dns_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config for details (default 30s)
-promscrape.dockerswarmSDCheckInterval duration
Interval for checking for changes in dockerswarm. This works only if dockerswarm_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dockerswarm_sd_config for details (default 30s)
-promscrape.dropOriginalLabels
Whether to drop original labels for scrape targets at /targets and /api/v1/targets pages. This may be needed for reducing memory usage when original labels for big number of scrape targets occupy big amounts of memory. Note that this reduces debuggability for improper per-target relabeling configs
-promscrape.ec2SDCheckInterval duration
Interval for checking for changes in ec2. This works only if ec2_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ec2_sd_config for details (default 1m0s)
-promscrape.eurekaSDCheckInterval duration
Interval for checking for changes in eureka. This works only if eureka_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#eureka_sd_config for details (default 30s)
-promscrape.fileSDCheckInterval duration
Interval for checking for changes in 'file_sd_config'. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#file_sd_config for details (default 30s)
-promscrape.gceSDCheckInterval duration
Interval for checking for changes in gce. This works only if gce_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#gce_sd_config for details (default 1m0s)
-promscrape.kubernetes.apiServerTimeout duration
How frequently to reload the full state from Kuberntes API server (default 30m0s)
-promscrape.kubernetesSDCheckInterval duration
Interval for checking for changes in Kubernetes API server. This works only if kubernetes_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config for details (default 30s)
-promscrape.maxDroppedTargets int
The maximum number of droppedTargets shown at /api/v1/targets page. Increase this value if your setup drops more scrape targets during relabeling and you need investigating labels for all the dropped targets. Note that the increased number of tracked dropped targets may result in increased memory usage (default 1000)
-promscrape.maxScrapeSize size
The maximum size of scrape response in bytes to process from Prometheus targets. Bigger responses are rejected
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 16777216)
-promscrape.openstackSDCheckInterval duration
Interval for checking for changes in openstack API server. This works only if openstack_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#openstack_sd_config for details (default 30s)
-promscrape.streamParse
Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful for reducing memory usage when millions of metrics are exposed per each scrape target. It is posible to set 'stream_parse: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
-promscrape.suppressDuplicateScrapeTargetErrors
Whether to suppress 'duplicate scrape target' errors; see https://victoriametrics.github.io/vmagent.html#troubleshooting for details
-promscrape.suppressScrapeErrors
Whether to suppress scrape errors logging. The last error for each target is always available at '/targets' page even if scrape errors logging is suppressed
-relabelConfig string
Optional path to a file with relabeling rules, which are applied to all the ingested metrics. See https://victoriametrics.github.io/#relabeling for details
-retentionPeriod value
Data with timestamps outside the retentionPeriod is automatically deleted
The following optional suffixes are supported: h (hour), d (day), w (week), y (year). If suffix isn't set, then the duration is counted in months (default 1)
-search.cacheTimestampOffset duration
The maximum duration since the current time for response data, which is always queried from the original raw data, without using the response cache. Increase this value if you see gaps in responses due to time synchronization issues between VictoriaMetrics and data sources (default 5m0s)
-search.disableCache
Whether to disable response caching. This may be useful during data backfilling
-search.latencyOffset duration
The time when data points become visible in query results after the collection. Too small value can result in incomplete last points for query results (default 30s)
-search.logSlowQueryDuration duration
Log queries with execution time exceeding this value. Zero disables slow query logging (default 5s)
-search.maxConcurrentRequests int
The maximum number of concurrent search requests. It shouldn't be high, since a single request can saturate all the CPU cores. See also -search.maxQueueDuration (default 8)
-search.maxExportDuration duration
The maximum duration for /api/v1/export call (default 720h0m0s)
-search.maxLookback duration
Synonim to -search.lookback-delta from Prometheus. The value is dynamically detected from interval between time series datapoints if not set. It can be overridden on per-query basis via max_lookback arg. See also '-search.maxStalenessInterval' flag, which has the same meaining due to historical reasons
-search.maxPointsPerTimeseries int
The maximum points per a single timeseries returned from /api/v1/query_range. This option doesn't limit the number of scanned raw samples in the database. The main purpose of this option is to limit the number of per-series points returned to graphing UI such as Grafana. There is no sense in setting this limit to values significantly exceeding horizontal resoultion of the graph (default 30000)
-search.maxQueryDuration duration
The maximum duration for query execution (default 30s)
-search.maxQueryLen size
The maximum search query length in bytes
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 16384)
-search.maxQueueDuration duration
The maximum time the request waits for execution when -search.maxConcurrentRequests limit is reached; see also -search.maxQueryDuration (default 10s)
-search.maxStalenessInterval duration
The maximum interval for staleness calculations. By default it is automatically calculated from the median interval between samples. This flag could be useful for tuning Prometheus data model closer to Influx-style data model. See https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness for details. See also '-search.maxLookback' flag, which has the same meaning due to historical reasons
-search.maxStepForPointsAdjustment duration
The maximum step when /api/v1/query_range handler adjusts points with timestamps closer than -search.latencyOffset to the current time. The adjustment is needed because such points may contain incomplete data (default 1m0s)
-search.maxTagKeys int
The maximum number of tag keys returned from /api/v1/labels (default 100000)
-search.maxTagValueSuffixesPerSearch int
The maximum number of tag value suffixes returned from /metrics/find (default 100000)
-search.maxTagValues int
The maximum number of tag values returned from /api/v1/label/<label_name>/values (default 100000)
-search.maxUniqueTimeseries int
The maximum number of unique time series each search can scan (default 300000)
-search.minStalenessInterval duration
The minimum interval for staleness calculations. This flag could be useful for removing gaps on graphs generated from time series with irregular intervals between samples. See also '-search.maxStalenessInterval'
-search.queryStats.lastQueriesCount int
Query stats for /api/v1/status/top_queries is tracked on this number of last queries. Zero value disables query stats tracking (default 20000)
-search.queryStats.minQueryDuration int
The minimum duration for queries to track in query stats at /api/v1/status/top_queries. Queries with lower duration are ignored in query stats
-search.resetCacheAuthKey string
Optional authKey for resetting rollup cache via /internal/resetRollupResultCache call
-search.treatDotsAsIsInRegexps
Whether to treat dots as is in regexp label filters used in queries. For example, foo{bar=~"a.b.c"} will be automatically converted to foo{bar=~"a\\.b\\.c"}, i.e. all the dots in regexp filters will be automatically escaped in order to match only dot char instead of matching any char. Dots in ".+", ".*" and ".{n}" regexps aren't escaped. This option is DEPRECATED in favor of {__graphite__="a.*.c"} syntax for selecting metrics matching the given Graphite metrics filter
-selfScrapeInstance string
Value for 'instance' label, which is added to self-scraped metrics (default "self")
-selfScrapeInterval duration
Interval for self-scraping own metrics at /metrics page
-selfScrapeJob string
Value for 'job' label, which is added to self-scraped metrics (default "victoria-metrics")
-smallMergeConcurrency int
The maximum number of CPU cores to use for small merges. Default value is used if set to 0
-snapshotAuthKey string
authKey, which must be passed in query string to /snapshot* pages
-storageDataPath string
Path to storage data (default "victoria-metrics-data")
-tls
Whether to enable TLS (aka HTTPS) for incoming requests. -tlsCertFile and -tlsKeyFile must be set if -tls is set
-tlsCertFile string
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs, since RSA certs are slow
-tlsKeyFile string
Path to file with TLS key. Used only if -tls is set
-version
Show VictoriaMetrics version
```

View file

@ -255,6 +255,41 @@ If each target is scraped by multiple `vmagent` instances, then data deduplicati
See [these docs](https://victoriametrics.github.io/#deduplication) for details.
## Scraping targets via a proxy
`vmagent` supports scraping targets via http and https proxies. Proxy address must be specified in `proxy_url` option. For example, the following scrape config instructs
target scraping via https proxy at `https://proxy-addr:1234`:
```yml
scrape_configs:
- job_name: foo
proxy_url: https://proxy-addr:1234
```
Proxy can be configured with the following optional settings:
* `proxy_bearer_token` and `proxy_bearer_token_file` for Bearer token authorization
* `proxy_basic_auth` for Basic authorization. See [these docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config).
* `proxy_tls_config` for TLS config. See [these docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#tls_config).
For example:
```yml
scrape_configs:
- job_name: foo
proxy_url: https://proxy-addr:1234
proxy_basic_auth:
username: foobar
password: secret
proxy_tls_config:
insecure_skip_verify: true
cert_file: /path/to/cert
key_file: /path/to/key
ca_file: /path/to/ca
server_name: real-server-name
```
## Monitoring
`vmagent` exports various metrics in Prometheus exposition format at `http://vmagent-host:8429/metrics` page. We recommend setting up regular scraping of this page
@ -477,13 +512,16 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
-httpListenAddr string
TCP address to listen for http connections. Set this flag to empty value in order to disable listening on any port. This mode may be useful for running multiple vmagent instances on the same server. Note that /targets and /metrics pages aren't available if -httpListenAddr='' (default ":8429")
-import.maxLineLen max_rows_per_line
The maximum length in bytes of a single line accepted by /api/v1/import; the line length can be limited with max_rows_per_line query arg passed to /api/v1/export
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 104857600)
-influx.maxLineSize value
-import.maxLineLen size
The maximum length in bytes of a single line accepted by /api/v1/import; the line length can be limited with 'max_rows_per_line' query arg passed to /api/v1/export
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 104857600)
-influx.databaseNames array
Comma-separated list of database names to return from /query and /influx/query API. This can be needed for accepting data from Telegraf plugins such as https://github.com/fangli/fluent-plugin-influxdb
Supports array of values separated by comma or specified via multiple flags.
-influx.maxLineSize size
The maximum size in bytes for a single Influx line during parsing
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 262144)
-influxListenAddr http://<vmagent>:8429/write
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 262144)
-influxListenAddr string
TCP and UDP address to listen for Influx line protocol data. Usually :8189 must be set. Doesn't work if empty. This flag isn't needed when ingesting data over HTTP - just send it to http://<vmagent>:8429/write
-influxMeasurementFieldSeparator string
Separator for '{measurement}{separator}{field_name}' metric name when inserted via Influx line protocol (default "_")
@ -511,12 +549,12 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
-maxConcurrentInserts int
The maximum number of concurrent inserts. Default value should work for most cases, since it minimizes the overhead for concurrent inserts. This option is tigthly coupled with -insert.maxQueueDuration (default 16)
-maxInsertRequestSize value
-maxInsertRequestSize size
The maximum size in bytes of a single Prometheus remote_write API request
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
-memory.allowedBytes value
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
-memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
-metricsAuthKey string
@ -527,9 +565,9 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
TCP and UDP address to listen for OpentTSDB metrics. Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. Usually :4242 must be set. Doesn't work if empty
-opentsdbTrimTimestamp duration
Trim timestamps for OpenTSDB 'telnet put' data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
-opentsdbhttp.maxInsertRequestSize value
-opentsdbhttp.maxInsertRequestSize size
The maximum size of OpenTSDB HTTP put request
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
-opentsdbhttpTrimTimestamp duration
Trim timestamps for OpenTSDB HTTP data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
-pprofAuthKey string
@ -538,6 +576,8 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
The number of number in the cluster of scrapers. It must be an unique value in the range 0 ... promscrape.cluster.membersCount-1 across scrapers in the cluster
-promscrape.cluster.membersCount int
The number of members in a cluster of scrapers. Each member must have an unique -promscrape.cluster.memberNum in the range 0 ... promscrape.cluster.membersCount-1 . Each member then scrapes roughly 1/N of all the targets. By default cluster scraping is disabled, i.e. a single scraper scrapes all the targets
-promscrape.cluster.replicationFactor int
The number of members in the cluster, which scrape the same targets. If the replication factor is greater than 2, then the deduplication must be enabled at remote storage side. See https://victoriametrics.github.io/#deduplication (default 1)
-promscrape.config string
Optional path to Prometheus config file with 'scrape_configs' section containing targets to scrape. See https://victoriametrics.github.io/#how-to-scrape-prometheus-exporters-such-as-node-exporter for details
-promscrape.config.dryRun
@ -546,45 +586,45 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
Whether to allow only supported fields in -promscrape.config . By default unsupported fields are silently skipped
-promscrape.configCheckInterval duration
Interval for checking for changes in '-promscrape.config' file. By default the checking is disabled. Send SIGHUP signal in order to force config check for changes
-promscrape.consulSDCheckInterval consul_sd_configs
-promscrape.consulSDCheckInterval duration
Interval for checking for changes in Consul. This works only if consul_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config for details (default 30s)
-promscrape.disableCompression
Whether to disable sending 'Accept-Encoding: gzip' request headers to all the scrape targets. This may reduce CPU usage on scrape targets at the cost of higher network bandwidth utilization. It is possible to set 'disable_compression: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
-promscrape.disableKeepAlive disable_keepalive: true
Whether to disable HTTP keep-alive connections when scraping all the targets. This may be useful when targets has no support for HTTP keep-alive connection. It is possible to set disable_keepalive: true individually per each 'scrape_config` section in '-promscrape.config' for fine grained control. Note that disabling HTTP keep-alive may increase load on both vmagent and scrape targets
-promscrape.disableKeepAlive
Whether to disable HTTP keep-alive connections when scraping all the targets. This may be useful when targets has no support for HTTP keep-alive connection. It is possible to set 'disable_keepalive: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control. Note that disabling HTTP keep-alive may increase load on both vmagent and scrape targets
-promscrape.discovery.concurrency int
The maximum number of concurrent requests to Prometheus autodiscovery API (Consul, Kubernetes, etc.) (default 100)
-promscrape.discovery.concurrentWaitTime duration
The maximum duration for waiting to perform API requests if more than -promscrape.discovery.concurrency requests are simultaneously performed (default 1m0s)
-promscrape.dnsSDCheckInterval dns_sd_configs
-promscrape.dnsSDCheckInterval duration
Interval for checking for changes in dns. This works only if dns_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config for details (default 30s)
-promscrape.dockerswarmSDCheckInterval dockerswarm_sd_configs
-promscrape.dockerswarmSDCheckInterval duration
Interval for checking for changes in dockerswarm. This works only if dockerswarm_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dockerswarm_sd_config for details (default 30s)
-promscrape.dropOriginalLabels
Whether to drop original labels for scrape targets at /targets and /api/v1/targets pages. This may be needed for reducing memory usage when original labels for big number of scrape targets occupy big amounts of memory. Note that this reduces debuggability for improper per-target relabeling configs
-promscrape.ec2SDCheckInterval ec2_sd_configs
-promscrape.ec2SDCheckInterval duration
Interval for checking for changes in ec2. This works only if ec2_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ec2_sd_config for details (default 1m0s)
-promscrape.eurekaSDCheckInterval eureka_sd_configs
-promscrape.eurekaSDCheckInterval duration
Interval for checking for changes in eureka. This works only if eureka_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#eureka_sd_config for details (default 30s)
-promscrape.fileSDCheckInterval duration
Interval for checking for changes in 'file_sd_config'. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#file_sd_config for details (default 30s)
-promscrape.gceSDCheckInterval gce_sd_configs
-promscrape.gceSDCheckInterval duration
Interval for checking for changes in gce. This works only if gce_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#gce_sd_config for details (default 1m0s)
-promscrape.kubernetes.apiServerTimeout duration
How frequently to reload the full state from Kuberntes API server (default 10m0s)
-promscrape.kubernetesSDCheckInterval kubernetes_sd_configs
How frequently to reload the full state from Kuberntes API server (default 30m0s)
-promscrape.kubernetesSDCheckInterval duration
Interval for checking for changes in Kubernetes API server. This works only if kubernetes_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config for details (default 30s)
-promscrape.maxDroppedTargets droppedTargets
-promscrape.maxDroppedTargets int
The maximum number of droppedTargets shown at /api/v1/targets page. Increase this value if your setup drops more scrape targets during relabeling and you need investigating labels for all the dropped targets. Note that the increased number of tracked dropped targets may result in increased memory usage (default 1000)
-promscrape.maxScrapeSize value
-promscrape.maxScrapeSize size
The maximum size of scrape response in bytes to process from Prometheus targets. Bigger responses are rejected
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 16777216)
-promscrape.openstackSDCheckInterval openstack_sd_configs
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 16777216)
-promscrape.openstackSDCheckInterval duration
Interval for checking for changes in openstack API server. This works only if openstack_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#openstack_sd_config for details (default 30s)
-promscrape.streamParse stream_parse: true
Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful for reducing memory usage when millions of metrics are exposed per each scrape target. It is posible to set stream_parse: true individually per each `scrape_config` section in `-promscrape.config` for fine grained control
-promscrape.suppressDuplicateScrapeTargetErrors duplicate scrape target
Whether to suppress duplicate scrape target errors; see https://victoriametrics.github.io/vmagent.html#troubleshooting for details
-promscrape.streamParse
Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful for reducing memory usage when millions of metrics are exposed per each scrape target. It is posible to set 'stream_parse: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
-promscrape.suppressDuplicateScrapeTargetErrors
Whether to suppress 'duplicate scrape target' errors; see https://victoriametrics.github.io/vmagent.html#troubleshooting for details
-promscrape.suppressScrapeErrors
Whether to suppress scrape errors logging. The last error for each target is always available at '/targets' page even if scrape errors logging is suppressed
-remoteWrite.basicAuth.password array
@ -601,12 +641,12 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
-remoteWrite.label array
Optional label in the form 'name=value' to add to all the metrics before sending them to -remoteWrite.url. Pass multiple -remoteWrite.label flags in order to add multiple flags to metrics before sending them to remote storage
Supports array of values separated by comma or specified via multiple flags.
-remoteWrite.maxBlockSize value
-remoteWrite.maxBlockSize size
The maximum size in bytes of unpacked request to send to remote storage. It shouldn't exceed -maxInsertRequestSize from VictoriaMetrics
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 8388608)
-remoteWrite.maxDiskUsagePerURL value
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 8388608)
-remoteWrite.maxDiskUsagePerURL size
The maximum file-based buffer size in bytes at -remoteWrite.tmpDataPath for each -remoteWrite.url. When buffer size reaches the configured maximum, then old data is dropped when adding new data to the buffer. Buffered data is stored in ~500MB chunks, so the minimum practical value for this flag is 500000000. Disk usage is unlimited if the value is set to 0
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-remoteWrite.proxyURL array
Optional proxy URL for writing data to -remoteWrite.url. Supported proxies: http, https, socks5. Example: -remoteWrite.proxyURL=socks5://proxy:1234
Supports array of values separated by comma or specified via multiple flags.

View file

@ -23,6 +23,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envflag"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/influxutils"
graphiteserver "github.com/VictoriaMetrics/VictoriaMetrics/lib/ingestserver/graphite"
influxserver "github.com/VictoriaMetrics/VictoriaMetrics/lib/ingestserver/influx"
opentsdbserver "github.com/VictoriaMetrics/VictoriaMetrics/lib/ingestserver/opentsdb"
@ -40,7 +41,7 @@ var (
"Set this flag to empty value in order to disable listening on any port. This mode may be useful for running multiple vmagent instances on the same server. "+
"Note that /targets and /metrics pages aren't available if -httpListenAddr=''")
influxListenAddr = flag.String("influxListenAddr", "", "TCP and UDP address to listen for Influx line protocol data. Usually :8189 must be set. Doesn't work if empty. "+
"This flag isn't needed when ingesting data over HTTP - just send it to `http://<vmagent>:8429/write`")
"This flag isn't needed when ingesting data over HTTP - just send it to http://<vmagent>:8429/write")
graphiteListenAddr = flag.String("graphiteListenAddr", "", "TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty")
opentsdbListenAddr = flag.String("opentsdbListenAddr", "", "TCP and UDP address to listen for OpentTSDB metrics. "+
"Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. "+
@ -204,10 +205,8 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
w.WriteHeader(http.StatusNoContent)
return true
case "/query":
// Emulate fake response for influx query.
// This is required for TSBS benchmark.
influxQueryRequests.Inc()
fmt.Fprintf(w, `{"results":[{"series":[{"values":[]}]}]}`)
influxutils.WriteDatabaseNames(w)
return true
case "/targets":
promscrapeTargetsRequests.Inc()

View file

@ -232,7 +232,7 @@ The shortlist of configuration flags is the following:
How often to evaluate the rules (default 1m0s)
-external.alert.source string
External Alert Source allows to override the Source link for alerts sent to AlertManager for cases where you want to build a custom link to Grafana, Prometheus or any other service.
eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{{$expr|quotesEscape|crlfEscape|pathEscape}}\"},{\"mode\":\"Metrics\"},{\"ui\":[true,true,true,\"none\"]}]'.If empty '/api/v1/:groupID/alertID/status' is used
eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{{$expr|quotesEscape|crlfEscape|queryEscape}}\"},{\"mode\":\"Metrics\"},{\"ui\":[true,true,true,\"none\"]}]'.If empty '/api/v1/:groupID/alertID/status' is used
-external.label array
Optional label in the form 'name=value' to add to all generated recording rules and alerts. Pass multiple -label flags in order to add multiple label sets.
Supports array of values separated by comma or specified via multiple flags.
@ -272,9 +272,9 @@ The shortlist of configuration flags is the following:
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
-memory.allowedBytes value
-memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
-metricsAuthKey string

View file

@ -41,7 +41,7 @@ Rule files may contain %{ENV_VAR} placeholders, which are substituted by the cor
validateExpressions = flag.Bool("rule.validateExpressions", true, "Whether to validate rules expressions via MetricsQL engine")
externalURL = flag.String("external.url", "", "External URL is used as alert's source for sent alerts to the notifier")
externalAlertSource = flag.String("external.alert.source", "", `External Alert Source allows to override the Source link for alerts sent to AlertManager for cases where you want to build a custom link to Grafana, Prometheus or any other service.
eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{{$expr|quotesEscape|crlfEscape|pathEscape}}\"},{\"mode\":\"Metrics\"},{\"ui\":[true,true,true,\"none\"]}]'.If empty '/api/v1/:groupID/alertID/status' is used`)
eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{{$expr|quotesEscape|crlfEscape|queryEscape}}\"},{\"mode\":\"Metrics\"},{\"ui\":[true,true,true,\"none\"]}]'.If empty '/api/v1/:groupID/alertID/status' is used`)
externalLabels = flagutil.NewArray("external.label", "Optional label in the form 'name=value' to add to all generated recording rules and alerts. "+
"Pass multiple -label flags in order to add multiple label sets.")

View file

@ -208,9 +208,9 @@ See the docs at https://victoriametrics.github.io/vmauth.html .
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
-memory.allowedBytes value
-memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
-metricsAuthKey string

View file

@ -205,12 +205,12 @@ See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time-
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
-maxBytesPerSecond value
-maxBytesPerSecond size
The maximum upload speed. There is no limit if it is set to 0
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedBytes value
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
-origin string

View file

@ -19,6 +19,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/relabel"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/vmimport"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/influxutils"
graphiteserver "github.com/VictoriaMetrics/VictoriaMetrics/lib/ingestserver/graphite"
influxserver "github.com/VictoriaMetrics/VictoriaMetrics/lib/ingestserver/influx"
opentsdbserver "github.com/VictoriaMetrics/VictoriaMetrics/lib/ingestserver/opentsdb"
@ -34,7 +35,7 @@ import (
var (
graphiteListenAddr = flag.String("graphiteListenAddr", "", "TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty")
influxListenAddr = flag.String("influxListenAddr", "", "TCP and UDP address to listen for Influx line protocol data. Usually :8189 must be set. Doesn't work if empty. "+
"This flag isn't needed when ingesting data over HTTP - just send it to `http://<victoriametrics>:8428/write`")
"This flag isn't needed when ingesting data over HTTP - just send it to http://<victoriametrics>:8428/write")
opentsdbListenAddr = flag.String("opentsdbListenAddr", "", "TCP and UDP address to listen for OpentTSDB metrics. "+
"Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. "+
"Usually :4242 must be set. Doesn't work if empty")
@ -147,10 +148,8 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
w.WriteHeader(http.StatusNoContent)
return true
case "/influx/query", "/query":
// Emulate fake response for influx query.
// This is required for TSBS benchmark.
influxQueryRequests.Inc()
fmt.Fprintf(w, `{"results":[{"series":[{"values":[]}]}]}`)
influxutils.WriteDatabaseNames(w)
return true
case "/prometheus/targets", "/targets":
promscrapeTargetsRequests.Inc()

View file

@ -105,12 +105,12 @@ i.e. the end result would be similar to [rsync --delete](https://askubuntu.com/q
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
-maxBytesPerSecond value
-maxBytesPerSecond size
The maximum download speed. There is no limit if it is set to 0
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedBytes value
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
-skipBackupCompleteCheck

View file

@ -968,6 +968,11 @@ func QueryHandler(startTime time.Time, w http.ResponseWriter, r *http.Request) e
start -= offset
end := start
start = end - window
// Do not include data point with a timestamp matching the lower boundary of the window as Prometheus does.
start++
if end < start {
end = start
}
if err := exportHandler(w, []string{childQuery}, etf, start, end, "promapi", 0, false, deadline); err != nil {
return fmt.Errorf("error when exporting data for query=%q on the time range (start=%d, end=%d): %w", childQuery, start, end, err)
}
@ -1017,6 +1022,7 @@ func QueryHandler(startTime time.Time, w http.ResponseWriter, r *http.Request) e
QuotedRemoteAddr: httpserver.GetQuotedRemoteAddr(r),
Deadline: deadline,
LookbackDelta: lookbackDelta,
RoundDigits: getRoundDigits(r),
EnforcedTagFilters: etf,
}
result, err := promql.Exec(&ec, query, true)
@ -1121,6 +1127,7 @@ func queryRangeHandler(startTime time.Time, w http.ResponseWriter, query string,
Deadline: deadline,
MayCache: mayCache,
LookbackDelta: lookbackDelta,
RoundDigits: getRoundDigits(r),
EnforcedTagFilters: etf,
}
result, err := promql.Exec(&ec, query, false)
@ -1297,6 +1304,18 @@ func getMatchesFromRequest(r *http.Request) []string {
return matches
}
func getRoundDigits(r *http.Request) int {
s := r.FormValue("round_digits")
if len(s) == 0 {
return 100
}
n, err := strconv.Atoi(s)
if err != nil {
return 100
}
return n
}
func getLatencyOffsetMilliseconds() int64 {
d := latencyOffset.Milliseconds()
if d <= 1000 {

View file

@ -98,11 +98,14 @@ type EvalConfig struct {
// LookbackDelta is analog to `-query.lookback-delta` from Prometheus.
LookbackDelta int64
timestamps []int64
timestampsOnce sync.Once
// How many decimal digits after the point to leave in response.
RoundDigits int
// EnforcedTagFilters used for apply additional label filters to query.
EnforcedTagFilters []storage.TagFilter
timestamps []int64
timestampsOnce sync.Once
}
// newEvalConfig returns new EvalConfig copy from src.
@ -114,6 +117,7 @@ func newEvalConfig(src *EvalConfig) *EvalConfig {
ec.Deadline = src.Deadline
ec.MayCache = src.MayCache
ec.LookbackDelta = src.LookbackDelta
ec.RoundDigits = src.RoundDigits
ec.EnforcedTagFilters = src.EnforcedTagFilters
// do not copy src.timestamps - they must be generated again.

View file

@ -12,6 +12,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/netstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/querystats"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/decimal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/metrics"
"github.com/VictoriaMetrics/metricsql"
@ -72,6 +73,14 @@ func Exec(ec *EvalConfig, q string, isFirstPointOnly bool) ([]netstorage.Result,
if err != nil {
return nil, err
}
if n := ec.RoundDigits; n < 100 {
for i := range result {
values := result[i].Values
for j, v := range values {
values[j] = decimal.RoundToDecimalDigits(v, n)
}
}
}
return result, err
}

View file

@ -61,6 +61,7 @@ func TestExecSuccess(t *testing.T) {
End: end,
Step: step,
Deadline: searchutils.NewDeadline(time.Now(), time.Minute, ""),
RoundDigits: 100,
}
for i := 0; i < 5; i++ {
result, err := Exec(ec, q, false)
@ -3653,6 +3654,210 @@ func TestExecSuccess(t *testing.T) {
resultExpected := []netstorage.Result{r1, r2, r3, r4}
f(q, resultExpected)
})
t.Run(`prometheus_buckets(overlapped ranges)`, func(t *testing.T) {
t.Parallel()
q := `sort(prometheus_buckets((
alias(label_set(90, "foo", "bar", "vmrange", "0...0"), "xxx"),
alias(label_set(time()/20, "foo", "bar", "vmrange", "0...0.2"), "xxx"),
alias(label_set(time()/20, "foo", "bar", "vmrange", "0.2...0.25"), "xxx"),
alias(label_set(time()/20, "foo", "bar", "vmrange", "0...0.26"), "xxx"),
alias(label_set(time()/100, "foo", "bar", "vmrange", "0.2...40"), "xxx"),
alias(label_set(time()/10, "foo", "bar", "vmrange", "40...Inf"), "xxx"),
)))`
r1 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{90, 90, 90, 90, 90, 90},
Timestamps: timestampsExpected,
}
r1.MetricName.MetricGroup = []byte("xxx")
r1.MetricName.Tags = []storage.Tag{
{
Key: []byte("foo"),
Value: []byte("bar"),
},
{
Key: []byte("le"),
Value: []byte("0"),
},
}
r2 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{140, 150, 160, 170, 180, 190},
Timestamps: timestampsExpected,
}
r2.MetricName.MetricGroup = []byte("xxx")
r2.MetricName.Tags = []storage.Tag{
{
Key: []byte("foo"),
Value: []byte("bar"),
},
{
Key: []byte("le"),
Value: []byte("0.2"),
},
}
r3 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{190, 210, 230, 250, 270, 290},
Timestamps: timestampsExpected,
}
r3.MetricName.MetricGroup = []byte("xxx")
r3.MetricName.Tags = []storage.Tag{
{
Key: []byte("foo"),
Value: []byte("bar"),
},
{
Key: []byte("le"),
Value: []byte("0.25"),
},
}
r4 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{240, 270, 300, 330, 360, 390},
Timestamps: timestampsExpected,
}
r4.MetricName.MetricGroup = []byte("xxx")
r4.MetricName.Tags = []storage.Tag{
{
Key: []byte("foo"),
Value: []byte("bar"),
},
{
Key: []byte("le"),
Value: []byte("0.26"),
},
}
r5 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{250, 282, 314, 346, 378, 410},
Timestamps: timestampsExpected,
}
r5.MetricName.MetricGroup = []byte("xxx")
r5.MetricName.Tags = []storage.Tag{
{
Key: []byte("foo"),
Value: []byte("bar"),
},
{
Key: []byte("le"),
Value: []byte("40"),
},
}
r6 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{350, 402, 454, 506, 558, 610},
Timestamps: timestampsExpected,
}
r6.MetricName.MetricGroup = []byte("xxx")
r6.MetricName.Tags = []storage.Tag{
{
Key: []byte("foo"),
Value: []byte("bar"),
},
{
Key: []byte("le"),
Value: []byte("Inf"),
},
}
resultExpected := []netstorage.Result{r1, r2, r3, r4, r5, r6}
f(q, resultExpected)
})
t.Run(`prometheus_buckets(overlapped ranges at the end)`, func(t *testing.T) {
t.Parallel()
q := `sort(prometheus_buckets((
alias(label_set(90, "foo", "bar", "vmrange", "0...0"), "xxx"),
alias(label_set(time()/20, "foo", "bar", "vmrange", "0...0.2"), "xxx"),
alias(label_set(time()/20, "foo", "bar", "vmrange", "0.2...0.25"), "xxx"),
alias(label_set(time()/20, "foo", "bar", "vmrange", "0...0.25"), "xxx"),
alias(label_set(time()/100, "foo", "bar", "vmrange", "0.2...40"), "xxx"),
alias(label_set(time()/10, "foo", "bar", "vmrange", "40...Inf"), "xxx"),
)))`
r1 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{90, 90, 90, 90, 90, 90},
Timestamps: timestampsExpected,
}
r1.MetricName.MetricGroup = []byte("xxx")
r1.MetricName.Tags = []storage.Tag{
{
Key: []byte("foo"),
Value: []byte("bar"),
},
{
Key: []byte("le"),
Value: []byte("0"),
},
}
r2 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{140, 150, 160, 170, 180, 190},
Timestamps: timestampsExpected,
}
r2.MetricName.MetricGroup = []byte("xxx")
r2.MetricName.Tags = []storage.Tag{
{
Key: []byte("foo"),
Value: []byte("bar"),
},
{
Key: []byte("le"),
Value: []byte("0.2"),
},
}
r3 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{190, 210, 230, 250, 270, 290},
Timestamps: timestampsExpected,
}
r3.MetricName.MetricGroup = []byte("xxx")
r3.MetricName.Tags = []storage.Tag{
{
Key: []byte("foo"),
Value: []byte("bar"),
},
{
Key: []byte("le"),
Value: []byte("0.25"),
},
}
r4 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{200, 222, 244, 266, 288, 310},
Timestamps: timestampsExpected,
}
r4.MetricName.MetricGroup = []byte("xxx")
r4.MetricName.Tags = []storage.Tag{
{
Key: []byte("foo"),
Value: []byte("bar"),
},
{
Key: []byte("le"),
Value: []byte("40"),
},
}
r5 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{300, 342, 384, 426, 468, 510},
Timestamps: timestampsExpected,
}
r5.MetricName.MetricGroup = []byte("xxx")
r5.MetricName.Tags = []storage.Tag{
{
Key: []byte("foo"),
Value: []byte("bar"),
},
{
Key: []byte("le"),
Value: []byte("Inf"),
},
}
resultExpected := []netstorage.Result{r1, r2, r3, r4, r5}
f(q, resultExpected)
})
t.Run(`median_over_time()`, func(t *testing.T) {
t.Parallel()
q := `median_over_time({})`
@ -6375,6 +6580,7 @@ func TestExecError(t *testing.T) {
End: 2000,
Step: 100,
Deadline: searchutils.NewDeadline(time.Now(), time.Minute, ""),
RoundDigits: 100,
}
for i := 0; i < 4; i++ {
rv, err := Exec(ec, q, false)

View file

@ -538,6 +538,7 @@ func (rc *rollupConfig) doInternal(dstValues []float64, tsm *timeseriesMap, valu
// Do not drop trailing data points for queries, which return 2 or 1 point (aka instant queries).
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/845
canDropLastSample := rc.CanDropLastSample && len(rc.Timestamps) > 2
f := rc.Func
for _, tEnd := range rc.Timestamps {
tStart := tEnd - window
ni = seekFirstTimestampIdxAfter(timestamps[i:], tStart, ni)
@ -577,7 +578,7 @@ func (rc *rollupConfig) doInternal(dstValues []float64, tsm *timeseriesMap, valu
rfa.realNextValue = nan
}
rfa.currTimestamp = tEnd
value := rc.Func(rfa)
value := f(rfa)
rfa.idx++
dstValues = append(dstValues, value)
}
@ -643,12 +644,12 @@ func getScrapeInterval(timestamps []int64) int64 {
return int64(maxSilenceInterval)
}
// Estimate scrape interval as 0.6 quantile for the first 100 intervals.
// Estimate scrape interval as 0.6 quantile for the first 20 intervals.
h := histogram.GetFast()
tsPrev := timestamps[0]
timestamps = timestamps[1:]
if len(timestamps) > 100 {
timestamps = timestamps[:100]
if len(timestamps) > 20 {
timestamps = timestamps[:20]
}
for _, ts := range timestamps {
h.Update(float64(ts - tsPrev))

View file

@ -518,6 +518,7 @@ func vmrangeBucketsToLE(tss []*timeseries) []*timeseries {
sort.Slice(xss, func(i, j int) bool { return xss[i].end < xss[j].end })
xssNew := make([]x, 0, len(xss)+2)
var xsPrev x
uniqTs := make(map[string]*timeseries, len(xss))
for _, xs := range xss {
ts := xs.ts
if isZeroTS(ts) {
@ -525,7 +526,8 @@ func vmrangeBucketsToLE(tss []*timeseries) []*timeseries {
xsPrev = xs
continue
}
if xs.start != xsPrev.end {
if xs.start != xsPrev.end && uniqTs[xs.startStr] == nil {
uniqTs[xs.startStr] = xs.ts
xssNew = append(xssNew, x{
endStr: xs.startStr,
end: xs.start,
@ -533,7 +535,14 @@ func vmrangeBucketsToLE(tss []*timeseries) []*timeseries {
})
}
ts.MetricName.AddTag("le", xs.endStr)
prevTs := uniqTs[xs.endStr]
if prevTs != nil {
// the end of the current bucket is not unique, need to merge it with the existing bucket.
mergeNonOverlappingTimeseries(prevTs, xs.ts)
} else {
xssNew = append(xssNew, xs)
uniqTs[xs.endStr] = xs.ts
}
xsPrev = xs
}
if !math.IsInf(xsPrev.end, 1) {

View file

@ -12,9 +12,9 @@ import (
)
var (
lastQueriesCount = flag.Int("search.queryStats.lastQueriesCount", 20000, "Query stats for `/api/v1/status/top_queries` is tracked on this number of last queries. "+
lastQueriesCount = flag.Int("search.queryStats.lastQueriesCount", 20000, "Query stats for /api/v1/status/top_queries is tracked on this number of last queries. "+
"Zero value disables query stats tracking")
minQueryDuration = flag.Duration("search.queryStats.minQueryDuration", 0, "The minimum duration for queries to track in query stats at `/api/v1/status/top_queries`. "+
minQueryDuration = flag.Duration("search.queryStats.minQueryDuration", 0, "The minimum duration for queries to track in query stats at /api/v1/status/top_queries. "+
"Queries with lower duration are ignored in query stats")
)

View file

@ -37,6 +37,8 @@ var (
bigMergeConcurrency = flag.Int("bigMergeConcurrency", 0, "The maximum number of CPU cores to use for big merges. Default value is used if set to 0")
smallMergeConcurrency = flag.Int("smallMergeConcurrency", 0, "The maximum number of CPU cores to use for small merges. Default value is used if set to 0")
logNewSeries = flag.Bool("logNewSeries", false, "Whether to log new series. This option is for debug purposes only. It can lead to performance issues "+
"when big number of new series are ingested into VictoriaMetrics")
denyQueriesOutsideRetention = flag.Bool("denyQueriesOutsideRetention", false, "Whether to deny queries outside of the configured -retentionPeriod. "+
"When set, then /api/v1/query_range would return '503 Service Unavailable' error for queries with 'from' value outside -retentionPeriod. "+
"This may be useful when multiple data sources with distinct retentions are hidden behind query-tee")
@ -72,6 +74,7 @@ func InitWithoutMetrics(resetCacheIfNeeded func(mrs []storage.MetricRow)) {
}
resetResponseCacheIfNeeded = resetCacheIfNeeded
storage.SetLogNewSeries(*logNewSeries)
storage.SetFinalMergeDelay(*finalMergeDelay)
storage.SetBigMergeWorkersCount(*bigMergeConcurrency)
storage.SetSmallMergeWorkersCount(*smallMergeConcurrency)

File diff suppressed because it is too large Load diff

View file

@ -4,7 +4,7 @@ DOCKER_NAMESPACE := victoriametrics
ROOT_IMAGE ?= alpine:3.13.2
CERTS_IMAGE := alpine:3.13.2
GO_BUILDER_IMAGE := golang:1.16.0
GO_BUILDER_IMAGE := golang:1.16.2
BUILDER_IMAGE := local/builder:2.0.0-$(shell echo $(GO_BUILDER_IMAGE) | tr : _)
BASE_IMAGE := local/base:1.1.3-$(shell echo $(ROOT_IMAGE) | tr : _)-$(shell echo $(CERTS_IMAGE) | tr : _)

View file

@ -122,6 +122,16 @@ groups:
description: "High rate of slow inserts on \"{{ $labels.instance }}\" may be a sign of resource exhaustion
for the current load. It is likely more RAM is needed for optimal handling of the current number of active time series."
- alert: ProcessNearFDLimits
expr: process_open_fds / process_max_fds > 0.8
for: 10m
labels:
severity: critical
annotations:
summary: "Number of free file descriptors is less than 20% for \"{{ $labels.job }}\"(\"{{ $labels.instance }}\") for the last 10m"
description: "Exhausting OS file descriptors limit can cause severe degradation of the process.
Consider to increase the limit as fast as possible."
# Alerts group for vmagent assumes that Grafana dashboard
# https://grafana.com/grafana/dashboards/12683 is installed.
# Pls update the `dashboard` annotation according to your setup.

View file

@ -70,8 +70,7 @@ services:
- '--rule=/etc/alerts/*.yml'
# display source of alerts in grafana
- '-external.url=http://127.0.0.1:3000' #grafana outside container
- '--external.alert.source=explore?orgId=1&left=["now-1h","now","VictoriaMetrics",{"expr":"{{$$expr|quotesEscape|pathEscape}}"},{"mode":"Metrics"},{"ui":[true,true,true,"none"]}]' ## when copypaste the line be aware of '$$' for escaping in '$expr'
networks:
- '--external.alert.source=explore?orgId=1&left=["now-1h","now","VictoriaMetrics",{"expr":"{{$$expr|quotesEscape|crlfEscape|queryEscape}}"},{"mode":"Metrics"},{"ui":[true,true,true,"none"]}]' ## when copypaste the line be aware of '$$' for escaping in '$expr' networks:
- vm_net
restart: always
alertmanager:

View file

@ -27,6 +27,7 @@
* [Observability, Availability & DORAs Research Program](https://medium.com/alteos-tech-blog/observability-availability-and-doras-research-program-85deb6680e78)
* [Tame Kubernetes Costs with Percona Monitoring and Management and Prometheus Operator](https://www.percona.com/blog/2021/02/12/tame-kubernetes-costs-with-percona-monitoring-and-management-and-prometheus-operator/)
* [Prometheus Victoria Metrics On AWS ECS](https://dalefro.medium.com/prometheus-victoria-metrics-on-aws-ecs-62448e266090)
* [Monitoring with Prometheus, Grafana, AlertManager and VictoriaMetrics](https://www.sensedia.com/post/monitoring-with-prometheus-alertmanager)
## Our articles

62
docs/BestPractices.md Normal file
View file

@ -0,0 +1,62 @@
# VM best practices
VictoriaMetrics is a fast, cost-effective and scalable monitoring solution and time series database. It can be used as a long-term, remote storage for Prometheus which allows it to gather metrics from different systems and store them in a single location or separate them for different purposes (short-, long-term, responsibility zones etc).
## Install Recommendation
There is no need to tune VictoriaMetrics because it uses reasonable defaults for command-line flags. These flags are automatically adjusted for the available CPU and RAM resources. There is no need for Operating System tuning because VictoriaMetrics is optimized for default OS settings. The only option is to increase the limit on the [number of open files in the OS](https://medium.com/@muhammadtriwibowo/set-permanently-ulimit-n-open-files-in-ubuntu-4d61064429a), so Prometheus instances could establish more connections to VictoriaMetrics (65535 standard production value).
## Filesystem Considerations
The recommended filesystem is ext4. If you plan to store more than 1TB of data on ext4 partition or plan to extend it to more than 16TB, then the following options are recommended to pass to mkfs.ext4:
mkfs.ext4 ... -O 64bit,huge_file,extent -T huge
## Operation System
When configuring VictoriaMetrics, the best practice is to use the latest Ubuntu OS version.
## VictoriaMetrics Versions
Always update VictoriaMetrics instances in the environment to avoid version and build mismatch that will result in differences in performance and operational features. It is strongly recommended that you keep VictoriaMetrics in the environment up-to-date and install all VictoriaMetrics updates as soon as they are available. The best place to find the most recent updates as soon as they are available is to follow [this link](https://github.com/VictoriaMetrics/VictoriaMetrics/releases).
## Upgrade
It is safe to upgrade VictoriaMetrics to new versions unless the [release notes](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) say otherwise. It is safe to skip multiple versions during the upgrade unless release notes say otherwise. It is recommended to perform regular upgrades to the latest version, since it may contain important bug fixes, performance optimizations or new features.
It is also safe to downgrade to the previous version unless release notes say otherwise.
The following steps must be performed during the upgrade / downgrade process:
* Send SIGINT signal to VictoriaMetrics process so that it is stopped gracefully.
* Wait until the process stops. This can take a few seconds.
* Start the upgraded VictoriaMetrics.
Prometheus doesn't drop data during the VictoriaMetrics restart. See [this article](https://grafana.com/blog/2019/03/25/whats-new-in-prometheus-2.8-wal-based-remote-write/) for details.
## Security
Do not forget to protect sensitive endpoints in VictoriaMetrics when exposing them to untrusted networks such as the internet. Please consider setting the following command-line flags:
* tls, -tlsCertFile and -tlsKeyFile for switching from HTTP to HTTPS.
* httpAuth.username and -httpAuth.password for protecting all the HTTP endpoints with [HTTP Basic Authentication](https://en.wikipedia.org/wiki/Basic_access_authentication).
* deleteAuthKey for protecting /api/v1/admin/tsdb/delete_series endpoint. See how to [delete time series](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/README.md#how-to-delete-time-series).
* snapshotAuthKey for protecting /snapshot* endpoints. See [how to work with snapshots](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/README.md#how-to-work-with-snapshots).
* forceMergeAuthKey for protecting /internal/force_merge endpoint. See [force merge docs](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/README.md#forced-merge).
* search.resetCacheAuthKey for protecting /internal/resetRollupResultCache endpoint. See [backfilling](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/README.md#backfilling) for more details.
Explicitly set internal network interface to TCP and UDP ports for data ingestion with Graphite and OpenTSDB formats. For example, substitute -graphiteListenAddr=:2003 with -graphiteListenAddr=<internal_iface_ip>:2003.
It is preferable to authorize all incoming requests from untrusted networks with [vmauth](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmauth/README.md) or a similar auth proxy.
## Backup Recommendations
VictoriaMetrics supports backups via [vmbackup](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmbackup/README.md) and [vmrestore](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmrestore/README.md) tools. We also provide the vmbackuper tool for our paid, enterprise subscribers - see [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/466) for additional details.
## Networking
Network usage: outbound traffic is negligible. Ingress traffic is ~100 bytes per ingested data point via [Prometheus remote_write API](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write). The actual ingress bandwidth usage depends on the average number of labels per ingested metric and the average size of label values. A higher number of per-metric labels and longer label values result inhigher ingress bandwidth.
## Storage Considerations
Storage space: VictoriaMetrics needs less than a byte per data point on average. So, ~260GB is required to store a month-long insert stream of 100K data points per second. The actual storage size depends largely on data randomness (entropy). Higher randomness means higher storage size requirements. Read [this article](https://medium.com/faun/victoriametrics-achieving-better-compression-for-time-series-data-than-gorilla-317bc1f95932) for details.
## RAM
RAM size: VictoriaMetrics needs less than 1KB per active time series. Therefore, ~1GB of RAM is required for 1M active time series. Time series are considered active if new data points have been added recently or if they have been recently queried. The number of active time series may be obtained from vm_cache_entries{type="storage/hour_metric_ids"} metric exported on the /metrics page. VictoriaMetrics stores various caches in RAM. Memory size for these caches may be limited with -memory.allowedPercent or -memory.allowedBytes flags.
## CPU
CPU cores: VictoriaMetrics needs one CPU core per 300K inserted data points per second. So, ~4 CPU cores are required for processing the insert stream of 1M data points per second. The ingestion rate may be lower for high cardinality data or for time series with a high number of labels. See [this article](https://valyala.medium.com/insert-benchmarks-with-inch-influxdb-vs-victoriametrics-e31a41ae2893) for details. If you see lower numbers per CPU core, it is likely that the active time series info doesn't fit in your caches and you will need more RAM to lower CPU usage.
## Technical Support and Services
If you have questions about installing or using this software pleasecheck this and other documents first. Answers to the most frequently askedquestions can be found on the Technical Papers webpage or in VictoriaMetrics community channels. If you need further assistance with VictoriaMetrics, please contact us at info@victoriametrics.com - well be happy to help.
Following VictoriaMetrics best practices allows for the optimal configuration of our fast and scalable monitoring solution and time series database while minimizing or avoiding downtime or performance issues during installation and software usage. Our best practices also allow you to quickly troubleshoot any issues that might arise.

View file

@ -6,13 +6,24 @@
- `histogram_avg(buckets)` - returns the average value for the given buckets.
- `histogram_stdvar(buckets)` - returns standard variance for the given buckets.
- `histogram_stddev(buckets)` - returns standard deviation for the given buckets.
* FEATURE: reduce median query duration by up to 2x. See https://github.com/VictoriaMetrics/VictoriaMetrics/commit/18fe0ff14bc78860c5569e2b70de1db78fac61be
* FEATURE: export `vm_available_memory_bytes` and `vm_available_cpu_cores` metrics, which show the number of available RAM and available CPU cores for VictoriaMetrics apps.
* FEATURE: vmagent: add ability to replicate scrape targets among `vmagent` instances in the cluster with `-promscrape.cluster.replicationFactor` command-line flag. See [these docs](https://victoriametrics.github.io/vmagent.html#scraping-big-number-of-targets).
* FATURE: vmagent: accept `scrape_offset` option at `scrape_config`. This option may be useful when scrapes must start at the specified offset of every scrape interval. See [these docs](https://victoriametrics.github.io/vmagent.html#troubleshooting) for details.
* FEATURE: vmagent: accept `scrape_offset` option at `scrape_config`. This option may be useful when scrapes must start at the specified offset of every scrape interval. See [these docs](https://victoriametrics.github.io/vmagent.html#troubleshooting) for details.
* FEATURE: vmagent: support `proxy_tls_config`, `proxy_basic_auth`, `proxy_bearer_token` and `proxy_bearer_token_file` options at `scrape_config` section for configuring proxies specified via `proxy_url`. See [these docs](https://victoriametrics.github.io/vmagent.html#scraping-targets-via-a-proxy).
* FEATURE: vmauth: allow using regexp paths in `url_map`. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1112) for details.
* FEATURE: accept `round_digits` query arg at `/api/v1/query` and `/api/v1/query_range` handlers. This option can be set at Prometheus datasource in Grafana for limiting the number of digits after the decimal point in response values.
* FEATURE: add `-influx.databaseNames` command-line flag, which can be used for accepting data from some Telegraf plugins such as [fluentd plugin](https://github.com/fangli/fluent-plugin-influxdb). See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1124).
* FEATURE: add `-logNewSeries` command-line flag, which can be used for debugging the source of time series churn rate.
* BUGFIX: vmagent: prevent from high CPU usage bug during failing scrapes with small `scrape_timeout` (less than a few seconds).
* BUGFIX: vmagent: reduce memory usage when Kubernetes service discovery is used in big number of distinct jobs by sharing the cache. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1113
* BUGFIX: vmagent: reduce memory usage when Kubernetes service discovery is used in big number of distinct scrape config jobs by sharing Kubernetes object cache. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1113
* BUGFIX: vmagent: apply `sample_limit` only after `metric_relabel_configs` are applied as Prometheus does. Previously the `sample_limit` was applied before metrics relabeling.
* BUGFIX: vmagent: properly apply `tls_config`, `basic_auth` and `bearer_token` to proxy connections if `proxy_url` option is set. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1116
* BUGFIX: vmagent: properly scrape targets via https proxy specified in `proxy_url` if `insecure_skip_verify` flag isn't set in `tls_config` section. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1116
* BUGFUX: avoid `duplicate time series` error if `prometheus_buckets()` covers a time range with distinct set of buckets.
* BUGFIX: prevent exponent overflow when processing extremely small values close to zero such as `2.964393875E-314`. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1114
* BUGFIX: do not include datapoints with a timestamp `t-d` when returning results from `/api/v1/query?query=m[d]&time=t` as Prometheus does.
# [v1.55.1](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.55.1)

View file

@ -338,7 +338,7 @@ Please see [Monitoring K8S with VictoriaMetrics](https://docs.google.com/present
Numbers:
- Active time series: ~2500 Million
- Active time series: ~25 Million
- Datapoints: ~20 Trillion
- Ingestion rate: ~1800k/s
- Disk usage: ~20 TB

View file

@ -373,7 +373,7 @@ for protecting from user errors such as accidental data deletion.
The following steps must be performed for each `vmstorage` node for creating a backup:
1. Create an instant snapshot by navigating to `/snapshot/create` HTTP handler. It will create snapshot and return its name.
2. Archive the created snapshot from `<-storageDataPath>/snapshots/<snapshot_name>` folder using [vmbackup](https://victoriametrics.github.io/vbackup.html).
2. Archive the created snapshot from `<-storageDataPath>/snapshots/<snapshot_name>` folder using [vmbackup](https://victoriametrics.github.io/vmbackup.html).
The archival process doesn't interfere with `vmstorage` work, so it may be performed at any suitable time.
3. Delete unused snapshots via `/snapshot/delete?snapshot=<snapshot_name>` or `/snapshot/delete_all` in order to free up occupied storage space.

View file

@ -1,3 +0,0 @@
# MetricsQL
The page has been moved to [MetricsQL](https://victoriametrics.github.io/MetricsQL.html).

View file

@ -170,6 +170,7 @@ Alphabetically sorted links to case studies:
* [Font used](#font-used)
* [Color Palette](#color-palette)
* [We kindly ask](#we-kindly-ask)
* [List of command-line flags](#list-of-command-line-flags)
## How to start VictoriaMetrics
@ -182,7 +183,7 @@ The following command-line flags are used the most:
* `-storageDataPath` - path to data directory. VictoriaMetrics stores all the data in this directory. Default path is `victoria-metrics-data` in the current working directory.
* `-retentionPeriod` - retention for stored data. Older data is automatically deleted. Default retention is 1 month. See [these docs](#retention) for more details.
Other flags have good enough default values, so set them only if you really need this. Pass `-help` to see all the available flags with description and default values.
Other flags have good enough default values, so set them only if you really need this. Pass `-help` to see [all the available flags with description and default values](#list-of-command-line-flags).
See how to [ingest data to VictoriaMetrics](#how-to-import-time-series-data), how to [query VictoriaMetrics](#grafana-setup)
and how to [handle alerts](#alerting).
@ -413,6 +414,10 @@ while VictoriaMetrics stores them with *milliseconds* precision.
Extra labels may be added to all the written time series by passing `extra_label=name=value` query args.
For example, `/write?extra_label=foo=bar` would add `{foo="bar"}` label to all the ingested metrics.
Some plugins for Telegraf such as [fluentd](https://github.com/fangli/fluent-plugin-influxdb), [Juniper/open-nti](https://github.com/Juniper/open-nti)
or [Juniper/jitmon](https://github.com/Juniper/jtimon) send `SHOW DATABASES` query to `/query` and expect a particular database name in the response.
Comma-separated list of expected databases can be passed to VictoriaMetrics via `-influx.databaseNames` command-line flag.
## How to send data from Graphite-compatible agents such as [StatsD](https://github.com/etsy/statsd)
Enable Graphite receiver in VictoriaMetrics by setting `-graphiteListenAddr` command line flag. For instance,
@ -562,14 +567,17 @@ in front of VictoriaMetrics. [Contact us](mailto:sales@victoriametrics.com) if y
VictoriaMetrics accepts relative times in `time`, `start` and `end` query args additionally to unix timestamps and [RFC3339](https://www.ietf.org/rfc/rfc3339.txt).
For example, the following query would return data for the last 30 minutes: `/api/v1/query_range?start=-30m&query=...`.
VictoriaMetrics accepts `round_digits` query arg for `/api/v1/query` and `/api/v1/query_range` handlers. It can be used for rounding response values to the given number of digits after the decimal point. For example, `/api/v1/query?query=avg_over_time(temperature[1h])&round_digits=2` would round response values to up to two digits after the decimal point.
By default, VictoriaMetrics returns time series for the last 5 minutes from `/api/v1/series`, while the Prometheus API defaults to all time. Use `start` and `end` to select a different time range.
VictoriaMetrics accepts additional args for `/api/v1/labels` and `/api/v1/label/.../values` handlers.
See [this feature request](https://github.com/prometheus/prometheus/issues/6178) for details:
* Any number [time series selectors](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors) via `match[]` query arg.
* Optional `start` and `end` query args for limiting the time range for the selected labels or label values.
See [this feature request](https://github.com/prometheus/prometheus/issues/6178) for details.
Additionally VictoriaMetrics provides the following handlers:
* `/api/v1/series/count` - returns the total number of time series in the database. Some notes:
@ -1367,6 +1375,8 @@ See the example of alerting rules for VM components [here](https://github.com/Vi
VictoriaMetrics accepts optional `date=YYYY-MM-DD` and `topN=42` args on this page. By default `date` equals to the current date,
while `topN` equals to 10.
* New time series can be logged if `-logNewSeries` command-line flag is passed to VictoriaMetrics.
* VictoriaMetrics limits the number of labels per each metric with `-maxLabelsPerTimeseries` command-line flag.
This prevents from ingesting metrics with too many labels. It is recommended [monitoring](#monitoring) `vm_metrics_with_dropped_labels_total`
metric in order to determine whether `-maxLabelsPerTimeseries` must be adjusted for your workload.
@ -1538,3 +1548,248 @@ Files included in each folder:
* There should be sufficient clear space around the logo.
* Do not change spacing, alignment, or relative locations of the design elements.
* Do not change the proportions of any of the design elements or the design itself. You may resize as needed but must retain all proportions.
## List of command-line flags
Pass `-help` to VictoriaMetrics in order to see the list of supported command-line flags with their description:
```
-bigMergeConcurrency int
The maximum number of CPU cores to use for big merges. Default value is used if set to 0
-csvTrimTimestamp duration
Trim timestamps when importing csv data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
-dedup.minScrapeInterval duration
Remove superflouos samples from time series if they are located closer to each other than this duration. This may be useful for reducing overhead when multiple identically configured Prometheus instances write data to the same VictoriaMetrics. Deduplication is disabled if the -dedup.minScrapeInterval is 0
-deleteAuthKey string
authKey for metrics' deletion via /api/v1/admin/tsdb/delete_series and /tags/delSeries
-denyQueriesOutsideRetention
Whether to deny queries outside of the configured -retentionPeriod. When set, then /api/v1/query_range would return '503 Service Unavailable' error for queries with 'from' value outside -retentionPeriod. This may be useful when multiple data sources with distinct retentions are hidden behind query-tee
-dryRun
Whether to check only -promscrape.config and then exit. Unknown config entries are allowed in -promscrape.config by default. This can be changed with -promscrape.config.strictParse
-enableTCP6
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP is used
-envflag.enable
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set
-envflag.prefix string
Prefix for environment variables if -envflag.enable is set
-finalMergeDelay duration
The delay before starting final merge for per-month partition after no new data is ingested into it. Final merge may require additional disk IO and CPU resources. Final merge may increase query speed and reduce disk space usage in some cases. Zero value disables final merge
-forceFlushAuthKey string
authKey, which must be passed in query string to /internal/force_flush pages
-forceMergeAuthKey string
authKey, which must be passed in query string to /internal/force_merge pages
-fs.disableMmap
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
-graphiteListenAddr string
TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty
-graphiteTrimTimestamp duration
Trim timestamps for Graphite data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
-http.connTimeout duration
Incoming http connections are closed after the configured timeout. This may help spreading incoming load among a cluster of services behind load balancer. Note that the real timeout may be bigger by up to 10% as a protection from Thundering herd problem (default 2m0s)
-http.disableResponseCompression
Disable compression of HTTP responses for saving CPU resources. By default compression is enabled to save network bandwidth
-http.idleConnTimeout duration
Timeout for incoming idle http connections (default 1m0s)
-http.maxGracefulShutdownDuration duration
The maximum duration for graceful shutdown of HTTP server. Highly loaded server may require increased value for graceful shutdown (default 7s)
-http.pathPrefix string
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
-http.shutdownDelay duration
Optional delay before http server shutdown. During this dealy the servier returns non-OK responses from /health page, so load balancers can route new requests to other servers
-httpAuth.password string
Password for HTTP Basic Auth. The authentication is disabled if -httpAuth.username is empty
-httpAuth.username string
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
-httpListenAddr string
TCP address to listen for http connections (default ":8428")
-import.maxLineLen size
The maximum length in bytes of a single line accepted by /api/v1/import; the line length can be limited with 'max_rows_per_line' query arg passed to /api/v1/export
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 104857600)
-influx.databaseNames array
Comma-separated list of database names to return from /query and /influx/query API. This can be needed for accepting data from Telegraf plugins such as https://github.com/fangli/fluent-plugin-influxdb
Supports array of values separated by comma or specified via multiple flags.
-influx.maxLineSize size
The maximum size in bytes for a single Influx line during parsing
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 262144)
-influxListenAddr string
TCP and UDP address to listen for Influx line protocol data. Usually :8189 must be set. Doesn't work if empty. This flag isn't needed when ingesting data over HTTP - just send it to http://<victoriametrics>:8428/write
-influxMeasurementFieldSeparator string
Separator for '{measurement}{separator}{field_name}' metric name when inserted via Influx line protocol (default "_")
-influxSkipMeasurement
Uses '{field_name}' as a metric name while ignoring '{measurement}' and '-influxMeasurementFieldSeparator'
-influxSkipSingleField
Uses '{measurement}' instead of '{measurement}{separator}{field_name}' for metic name if Influx line contains only a single field
-influxTrimTimestamp duration
Trim timestamps for Influx line protocol data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
-insert.maxQueueDuration duration
The maximum duration for waiting in the queue for insert requests due to -maxConcurrentInserts (default 1m0s)
-loggerDisableTimestamps
Whether to disable writing timestamps in logs
-loggerErrorsPerSecondLimit int
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, then the remaining errors are suppressed. Zero value disables the rate limit
-loggerFormat string
Format for logs. Possible values: default, json (default "default")
-loggerLevel string
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
-loggerOutput string
Output for the logs. Supported values: stderr, stdout (default "stderr")
-loggerTimezone string
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
-maxConcurrentInserts int
The maximum number of concurrent inserts. Default value should work for most cases, since it minimizes the overhead for concurrent inserts. This option is tigthly coupled with -insert.maxQueueDuration (default 16)
-maxInsertRequestSize size
The maximum size in bytes of a single Prometheus remote_write API request
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
-maxLabelsPerTimeseries int
The maximum number of labels accepted per time series. Superfluous labels are dropped (default 30)
-memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
-metricsAuthKey string
Auth key for /metrics. It overrides httpAuth settings
-opentsdbHTTPListenAddr string
TCP address to listen for OpentTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty
-opentsdbListenAddr string
TCP and UDP address to listen for OpentTSDB metrics. Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. Usually :4242 must be set. Doesn't work if empty
-opentsdbTrimTimestamp duration
Trim timestamps for OpenTSDB 'telnet put' data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
-opentsdbhttp.maxInsertRequestSize size
The maximum size of OpenTSDB HTTP put request
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
-opentsdbhttpTrimTimestamp duration
Trim timestamps for OpenTSDB HTTP data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
-pprofAuthKey string
Auth key for /debug/pprof. It overrides httpAuth settings
-precisionBits int
The number of precision bits to store per each value. Lower precision bits improves data compression at the cost of precision loss (default 64)
-promscrape.cluster.memberNum int
The number of number in the cluster of scrapers. It must be an unique value in the range 0 ... promscrape.cluster.membersCount-1 across scrapers in the cluster
-promscrape.cluster.membersCount int
The number of members in a cluster of scrapers. Each member must have an unique -promscrape.cluster.memberNum in the range 0 ... promscrape.cluster.membersCount-1 . Each member then scrapes roughly 1/N of all the targets. By default cluster scraping is disabled, i.e. a single scraper scrapes all the targets
-promscrape.cluster.replicationFactor int
The number of members in the cluster, which scrape the same targets. If the replication factor is greater than 2, then the deduplication must be enabled at remote storage side. See https://victoriametrics.github.io/#deduplication (default 1)
-promscrape.config string
Optional path to Prometheus config file with 'scrape_configs' section containing targets to scrape. See https://victoriametrics.github.io/#how-to-scrape-prometheus-exporters-such-as-node-exporter for details
-promscrape.config.dryRun
Checks -promscrape.config file for errors and unsupported fields and then exits. Returns non-zero exit code on parsing errors and emits these errors to stderr. See also -promscrape.config.strictParse command-line flag. Pass -loggerLevel=ERROR if you don't need to see info messages in the output.
-promscrape.config.strictParse
Whether to allow only supported fields in -promscrape.config . By default unsupported fields are silently skipped
-promscrape.configCheckInterval duration
Interval for checking for changes in '-promscrape.config' file. By default the checking is disabled. Send SIGHUP signal in order to force config check for changes
-promscrape.consulSDCheckInterval duration
Interval for checking for changes in Consul. This works only if consul_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config for details (default 30s)
-promscrape.disableCompression
Whether to disable sending 'Accept-Encoding: gzip' request headers to all the scrape targets. This may reduce CPU usage on scrape targets at the cost of higher network bandwidth utilization. It is possible to set 'disable_compression: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
-promscrape.disableKeepAlive
Whether to disable HTTP keep-alive connections when scraping all the targets. This may be useful when targets has no support for HTTP keep-alive connection. It is possible to set 'disable_keepalive: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control. Note that disabling HTTP keep-alive may increase load on both vmagent and scrape targets
-promscrape.discovery.concurrency int
The maximum number of concurrent requests to Prometheus autodiscovery API (Consul, Kubernetes, etc.) (default 100)
-promscrape.discovery.concurrentWaitTime duration
The maximum duration for waiting to perform API requests if more than -promscrape.discovery.concurrency requests are simultaneously performed (default 1m0s)
-promscrape.dnsSDCheckInterval duration
Interval for checking for changes in dns. This works only if dns_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config for details (default 30s)
-promscrape.dockerswarmSDCheckInterval duration
Interval for checking for changes in dockerswarm. This works only if dockerswarm_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dockerswarm_sd_config for details (default 30s)
-promscrape.dropOriginalLabels
Whether to drop original labels for scrape targets at /targets and /api/v1/targets pages. This may be needed for reducing memory usage when original labels for big number of scrape targets occupy big amounts of memory. Note that this reduces debuggability for improper per-target relabeling configs
-promscrape.ec2SDCheckInterval duration
Interval for checking for changes in ec2. This works only if ec2_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ec2_sd_config for details (default 1m0s)
-promscrape.eurekaSDCheckInterval duration
Interval for checking for changes in eureka. This works only if eureka_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#eureka_sd_config for details (default 30s)
-promscrape.fileSDCheckInterval duration
Interval for checking for changes in 'file_sd_config'. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#file_sd_config for details (default 30s)
-promscrape.gceSDCheckInterval duration
Interval for checking for changes in gce. This works only if gce_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#gce_sd_config for details (default 1m0s)
-promscrape.kubernetes.apiServerTimeout duration
How frequently to reload the full state from Kuberntes API server (default 30m0s)
-promscrape.kubernetesSDCheckInterval duration
Interval for checking for changes in Kubernetes API server. This works only if kubernetes_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config for details (default 30s)
-promscrape.maxDroppedTargets int
The maximum number of droppedTargets shown at /api/v1/targets page. Increase this value if your setup drops more scrape targets during relabeling and you need investigating labels for all the dropped targets. Note that the increased number of tracked dropped targets may result in increased memory usage (default 1000)
-promscrape.maxScrapeSize size
The maximum size of scrape response in bytes to process from Prometheus targets. Bigger responses are rejected
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 16777216)
-promscrape.openstackSDCheckInterval duration
Interval for checking for changes in openstack API server. This works only if openstack_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#openstack_sd_config for details (default 30s)
-promscrape.streamParse
Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful for reducing memory usage when millions of metrics are exposed per each scrape target. It is posible to set 'stream_parse: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
-promscrape.suppressDuplicateScrapeTargetErrors
Whether to suppress 'duplicate scrape target' errors; see https://victoriametrics.github.io/vmagent.html#troubleshooting for details
-promscrape.suppressScrapeErrors
Whether to suppress scrape errors logging. The last error for each target is always available at '/targets' page even if scrape errors logging is suppressed
-relabelConfig string
Optional path to a file with relabeling rules, which are applied to all the ingested metrics. See https://victoriametrics.github.io/#relabeling for details
-retentionPeriod value
Data with timestamps outside the retentionPeriod is automatically deleted
The following optional suffixes are supported: h (hour), d (day), w (week), y (year). If suffix isn't set, then the duration is counted in months (default 1)
-search.cacheTimestampOffset duration
The maximum duration since the current time for response data, which is always queried from the original raw data, without using the response cache. Increase this value if you see gaps in responses due to time synchronization issues between VictoriaMetrics and data sources (default 5m0s)
-search.disableCache
Whether to disable response caching. This may be useful during data backfilling
-search.latencyOffset duration
The time when data points become visible in query results after the collection. Too small value can result in incomplete last points for query results (default 30s)
-search.logSlowQueryDuration duration
Log queries with execution time exceeding this value. Zero disables slow query logging (default 5s)
-search.maxConcurrentRequests int
The maximum number of concurrent search requests. It shouldn't be high, since a single request can saturate all the CPU cores. See also -search.maxQueueDuration (default 8)
-search.maxExportDuration duration
The maximum duration for /api/v1/export call (default 720h0m0s)
-search.maxLookback duration
Synonim to -search.lookback-delta from Prometheus. The value is dynamically detected from interval between time series datapoints if not set. It can be overridden on per-query basis via max_lookback arg. See also '-search.maxStalenessInterval' flag, which has the same meaining due to historical reasons
-search.maxPointsPerTimeseries int
The maximum points per a single timeseries returned from /api/v1/query_range. This option doesn't limit the number of scanned raw samples in the database. The main purpose of this option is to limit the number of per-series points returned to graphing UI such as Grafana. There is no sense in setting this limit to values significantly exceeding horizontal resoultion of the graph (default 30000)
-search.maxQueryDuration duration
The maximum duration for query execution (default 30s)
-search.maxQueryLen size
The maximum search query length in bytes
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 16384)
-search.maxQueueDuration duration
The maximum time the request waits for execution when -search.maxConcurrentRequests limit is reached; see also -search.maxQueryDuration (default 10s)
-search.maxStalenessInterval duration
The maximum interval for staleness calculations. By default it is automatically calculated from the median interval between samples. This flag could be useful for tuning Prometheus data model closer to Influx-style data model. See https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness for details. See also '-search.maxLookback' flag, which has the same meaning due to historical reasons
-search.maxStepForPointsAdjustment duration
The maximum step when /api/v1/query_range handler adjusts points with timestamps closer than -search.latencyOffset to the current time. The adjustment is needed because such points may contain incomplete data (default 1m0s)
-search.maxTagKeys int
The maximum number of tag keys returned from /api/v1/labels (default 100000)
-search.maxTagValueSuffixesPerSearch int
The maximum number of tag value suffixes returned from /metrics/find (default 100000)
-search.maxTagValues int
The maximum number of tag values returned from /api/v1/label/<label_name>/values (default 100000)
-search.maxUniqueTimeseries int
The maximum number of unique time series each search can scan (default 300000)
-search.minStalenessInterval duration
The minimum interval for staleness calculations. This flag could be useful for removing gaps on graphs generated from time series with irregular intervals between samples. See also '-search.maxStalenessInterval'
-search.queryStats.lastQueriesCount int
Query stats for /api/v1/status/top_queries is tracked on this number of last queries. Zero value disables query stats tracking (default 20000)
-search.queryStats.minQueryDuration int
The minimum duration for queries to track in query stats at /api/v1/status/top_queries. Queries with lower duration are ignored in query stats
-search.resetCacheAuthKey string
Optional authKey for resetting rollup cache via /internal/resetRollupResultCache call
-search.treatDotsAsIsInRegexps
Whether to treat dots as is in regexp label filters used in queries. For example, foo{bar=~"a.b.c"} will be automatically converted to foo{bar=~"a\\.b\\.c"}, i.e. all the dots in regexp filters will be automatically escaped in order to match only dot char instead of matching any char. Dots in ".+", ".*" and ".{n}" regexps aren't escaped. This option is DEPRECATED in favor of {__graphite__="a.*.c"} syntax for selecting metrics matching the given Graphite metrics filter
-selfScrapeInstance string
Value for 'instance' label, which is added to self-scraped metrics (default "self")
-selfScrapeInterval duration
Interval for self-scraping own metrics at /metrics page
-selfScrapeJob string
Value for 'job' label, which is added to self-scraped metrics (default "victoria-metrics")
-smallMergeConcurrency int
The maximum number of CPU cores to use for small merges. Default value is used if set to 0
-snapshotAuthKey string
authKey, which must be passed in query string to /snapshot* pages
-storageDataPath string
Path to storage data (default "victoria-metrics-data")
-tls
Whether to enable TLS (aka HTTPS) for incoming requests. -tlsCertFile and -tlsKeyFile must be set if -tls is set
-tlsCertFile string
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs, since RSA certs are slow
-tlsKeyFile string
Path to file with TLS key. Used only if -tls is set
-version
Show VictoriaMetrics version
```

View file

@ -255,6 +255,41 @@ If each target is scraped by multiple `vmagent` instances, then data deduplicati
See [these docs](https://victoriametrics.github.io/#deduplication) for details.
## Scraping targets via a proxy
`vmagent` supports scraping targets via http and https proxies. Proxy address must be specified in `proxy_url` option. For example, the following scrape config instructs
target scraping via https proxy at `https://proxy-addr:1234`:
```yml
scrape_configs:
- job_name: foo
proxy_url: https://proxy-addr:1234
```
Proxy can be configured with the following optional settings:
* `proxy_bearer_token` and `proxy_bearer_token_file` for Bearer token authorization
* `proxy_basic_auth` for Basic authorization. See [these docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config).
* `proxy_tls_config` for TLS config. See [these docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#tls_config).
For example:
```yml
scrape_configs:
- job_name: foo
proxy_url: https://proxy-addr:1234
proxy_basic_auth:
username: foobar
password: secret
proxy_tls_config:
insecure_skip_verify: true
cert_file: /path/to/cert
key_file: /path/to/key
ca_file: /path/to/ca
server_name: real-server-name
```
## Monitoring
`vmagent` exports various metrics in Prometheus exposition format at `http://vmagent-host:8429/metrics` page. We recommend setting up regular scraping of this page
@ -477,13 +512,16 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
-httpListenAddr string
TCP address to listen for http connections. Set this flag to empty value in order to disable listening on any port. This mode may be useful for running multiple vmagent instances on the same server. Note that /targets and /metrics pages aren't available if -httpListenAddr='' (default ":8429")
-import.maxLineLen max_rows_per_line
The maximum length in bytes of a single line accepted by /api/v1/import; the line length can be limited with max_rows_per_line query arg passed to /api/v1/export
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 104857600)
-influx.maxLineSize value
-import.maxLineLen size
The maximum length in bytes of a single line accepted by /api/v1/import; the line length can be limited with 'max_rows_per_line' query arg passed to /api/v1/export
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 104857600)
-influx.databaseNames array
Comma-separated list of database names to return from /query and /influx/query API. This can be needed for accepting data from Telegraf plugins such as https://github.com/fangli/fluent-plugin-influxdb
Supports array of values separated by comma or specified via multiple flags.
-influx.maxLineSize size
The maximum size in bytes for a single Influx line during parsing
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 262144)
-influxListenAddr http://<vmagent>:8429/write
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 262144)
-influxListenAddr string
TCP and UDP address to listen for Influx line protocol data. Usually :8189 must be set. Doesn't work if empty. This flag isn't needed when ingesting data over HTTP - just send it to http://<vmagent>:8429/write
-influxMeasurementFieldSeparator string
Separator for '{measurement}{separator}{field_name}' metric name when inserted via Influx line protocol (default "_")
@ -511,12 +549,12 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
-maxConcurrentInserts int
The maximum number of concurrent inserts. Default value should work for most cases, since it minimizes the overhead for concurrent inserts. This option is tigthly coupled with -insert.maxQueueDuration (default 16)
-maxInsertRequestSize value
-maxInsertRequestSize size
The maximum size in bytes of a single Prometheus remote_write API request
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
-memory.allowedBytes value
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
-memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
-metricsAuthKey string
@ -527,9 +565,9 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
TCP and UDP address to listen for OpentTSDB metrics. Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. Usually :4242 must be set. Doesn't work if empty
-opentsdbTrimTimestamp duration
Trim timestamps for OpenTSDB 'telnet put' data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
-opentsdbhttp.maxInsertRequestSize value
-opentsdbhttp.maxInsertRequestSize size
The maximum size of OpenTSDB HTTP put request
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
-opentsdbhttpTrimTimestamp duration
Trim timestamps for OpenTSDB HTTP data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
-pprofAuthKey string
@ -538,6 +576,8 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
The number of number in the cluster of scrapers. It must be an unique value in the range 0 ... promscrape.cluster.membersCount-1 across scrapers in the cluster
-promscrape.cluster.membersCount int
The number of members in a cluster of scrapers. Each member must have an unique -promscrape.cluster.memberNum in the range 0 ... promscrape.cluster.membersCount-1 . Each member then scrapes roughly 1/N of all the targets. By default cluster scraping is disabled, i.e. a single scraper scrapes all the targets
-promscrape.cluster.replicationFactor int
The number of members in the cluster, which scrape the same targets. If the replication factor is greater than 2, then the deduplication must be enabled at remote storage side. See https://victoriametrics.github.io/#deduplication (default 1)
-promscrape.config string
Optional path to Prometheus config file with 'scrape_configs' section containing targets to scrape. See https://victoriametrics.github.io/#how-to-scrape-prometheus-exporters-such-as-node-exporter for details
-promscrape.config.dryRun
@ -546,45 +586,45 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
Whether to allow only supported fields in -promscrape.config . By default unsupported fields are silently skipped
-promscrape.configCheckInterval duration
Interval for checking for changes in '-promscrape.config' file. By default the checking is disabled. Send SIGHUP signal in order to force config check for changes
-promscrape.consulSDCheckInterval consul_sd_configs
-promscrape.consulSDCheckInterval duration
Interval for checking for changes in Consul. This works only if consul_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config for details (default 30s)
-promscrape.disableCompression
Whether to disable sending 'Accept-Encoding: gzip' request headers to all the scrape targets. This may reduce CPU usage on scrape targets at the cost of higher network bandwidth utilization. It is possible to set 'disable_compression: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
-promscrape.disableKeepAlive disable_keepalive: true
Whether to disable HTTP keep-alive connections when scraping all the targets. This may be useful when targets has no support for HTTP keep-alive connection. It is possible to set disable_keepalive: true individually per each 'scrape_config` section in '-promscrape.config' for fine grained control. Note that disabling HTTP keep-alive may increase load on both vmagent and scrape targets
-promscrape.disableKeepAlive
Whether to disable HTTP keep-alive connections when scraping all the targets. This may be useful when targets has no support for HTTP keep-alive connection. It is possible to set 'disable_keepalive: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control. Note that disabling HTTP keep-alive may increase load on both vmagent and scrape targets
-promscrape.discovery.concurrency int
The maximum number of concurrent requests to Prometheus autodiscovery API (Consul, Kubernetes, etc.) (default 100)
-promscrape.discovery.concurrentWaitTime duration
The maximum duration for waiting to perform API requests if more than -promscrape.discovery.concurrency requests are simultaneously performed (default 1m0s)
-promscrape.dnsSDCheckInterval dns_sd_configs
-promscrape.dnsSDCheckInterval duration
Interval for checking for changes in dns. This works only if dns_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config for details (default 30s)
-promscrape.dockerswarmSDCheckInterval dockerswarm_sd_configs
-promscrape.dockerswarmSDCheckInterval duration
Interval for checking for changes in dockerswarm. This works only if dockerswarm_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dockerswarm_sd_config for details (default 30s)
-promscrape.dropOriginalLabels
Whether to drop original labels for scrape targets at /targets and /api/v1/targets pages. This may be needed for reducing memory usage when original labels for big number of scrape targets occupy big amounts of memory. Note that this reduces debuggability for improper per-target relabeling configs
-promscrape.ec2SDCheckInterval ec2_sd_configs
-promscrape.ec2SDCheckInterval duration
Interval for checking for changes in ec2. This works only if ec2_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ec2_sd_config for details (default 1m0s)
-promscrape.eurekaSDCheckInterval eureka_sd_configs
-promscrape.eurekaSDCheckInterval duration
Interval for checking for changes in eureka. This works only if eureka_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#eureka_sd_config for details (default 30s)
-promscrape.fileSDCheckInterval duration
Interval for checking for changes in 'file_sd_config'. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#file_sd_config for details (default 30s)
-promscrape.gceSDCheckInterval gce_sd_configs
-promscrape.gceSDCheckInterval duration
Interval for checking for changes in gce. This works only if gce_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#gce_sd_config for details (default 1m0s)
-promscrape.kubernetes.apiServerTimeout duration
How frequently to reload the full state from Kuberntes API server (default 10m0s)
-promscrape.kubernetesSDCheckInterval kubernetes_sd_configs
How frequently to reload the full state from Kuberntes API server (default 30m0s)
-promscrape.kubernetesSDCheckInterval duration
Interval for checking for changes in Kubernetes API server. This works only if kubernetes_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config for details (default 30s)
-promscrape.maxDroppedTargets droppedTargets
-promscrape.maxDroppedTargets int
The maximum number of droppedTargets shown at /api/v1/targets page. Increase this value if your setup drops more scrape targets during relabeling and you need investigating labels for all the dropped targets. Note that the increased number of tracked dropped targets may result in increased memory usage (default 1000)
-promscrape.maxScrapeSize value
-promscrape.maxScrapeSize size
The maximum size of scrape response in bytes to process from Prometheus targets. Bigger responses are rejected
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 16777216)
-promscrape.openstackSDCheckInterval openstack_sd_configs
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 16777216)
-promscrape.openstackSDCheckInterval duration
Interval for checking for changes in openstack API server. This works only if openstack_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#openstack_sd_config for details (default 30s)
-promscrape.streamParse stream_parse: true
Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful for reducing memory usage when millions of metrics are exposed per each scrape target. It is posible to set stream_parse: true individually per each `scrape_config` section in `-promscrape.config` for fine grained control
-promscrape.suppressDuplicateScrapeTargetErrors duplicate scrape target
Whether to suppress duplicate scrape target errors; see https://victoriametrics.github.io/vmagent.html#troubleshooting for details
-promscrape.streamParse
Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful for reducing memory usage when millions of metrics are exposed per each scrape target. It is posible to set 'stream_parse: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
-promscrape.suppressDuplicateScrapeTargetErrors
Whether to suppress 'duplicate scrape target' errors; see https://victoriametrics.github.io/vmagent.html#troubleshooting for details
-promscrape.suppressScrapeErrors
Whether to suppress scrape errors logging. The last error for each target is always available at '/targets' page even if scrape errors logging is suppressed
-remoteWrite.basicAuth.password array
@ -601,12 +641,12 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
-remoteWrite.label array
Optional label in the form 'name=value' to add to all the metrics before sending them to -remoteWrite.url. Pass multiple -remoteWrite.label flags in order to add multiple flags to metrics before sending them to remote storage
Supports array of values separated by comma or specified via multiple flags.
-remoteWrite.maxBlockSize value
-remoteWrite.maxBlockSize size
The maximum size in bytes of unpacked request to send to remote storage. It shouldn't exceed -maxInsertRequestSize from VictoriaMetrics
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 8388608)
-remoteWrite.maxDiskUsagePerURL value
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 8388608)
-remoteWrite.maxDiskUsagePerURL size
The maximum file-based buffer size in bytes at -remoteWrite.tmpDataPath for each -remoteWrite.url. When buffer size reaches the configured maximum, then old data is dropped when adding new data to the buffer. Buffered data is stored in ~500MB chunks, so the minimum practical value for this flag is 500000000. Disk usage is unlimited if the value is set to 0
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-remoteWrite.proxyURL array
Optional proxy URL for writing data to -remoteWrite.url. Supported proxies: http, https, socks5. Example: -remoteWrite.proxyURL=socks5://proxy:1234
Supports array of values separated by comma or specified via multiple flags.

View file

@ -232,7 +232,7 @@ The shortlist of configuration flags is the following:
How often to evaluate the rules (default 1m0s)
-external.alert.source string
External Alert Source allows to override the Source link for alerts sent to AlertManager for cases where you want to build a custom link to Grafana, Prometheus or any other service.
eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{{$expr|quotesEscape|crlfEscape|pathEscape}}\"},{\"mode\":\"Metrics\"},{\"ui\":[true,true,true,\"none\"]}]'.If empty '/api/v1/:groupID/alertID/status' is used
eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{{$expr|quotesEscape|crlfEscape|queryEscape}}\"},{\"mode\":\"Metrics\"},{\"ui\":[true,true,true,\"none\"]}]'.If empty '/api/v1/:groupID/alertID/status' is used
-external.label array
Optional label in the form 'name=value' to add to all generated recording rules and alerts. Pass multiple -label flags in order to add multiple label sets.
Supports array of values separated by comma or specified via multiple flags.
@ -272,9 +272,9 @@ The shortlist of configuration flags is the following:
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
-memory.allowedBytes value
-memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
-metricsAuthKey string

View file

@ -208,9 +208,9 @@ See the docs at https://victoriametrics.github.io/vmauth.html .
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
-memory.allowedBytes value
-memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
-metricsAuthKey string

View file

@ -205,12 +205,12 @@ See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time-
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
-maxBytesPerSecond value
-maxBytesPerSecond size
The maximum upload speed. There is no limit if it is set to 0
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedBytes value
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
-origin string

View file

@ -105,12 +105,12 @@ i.e. the end result would be similar to [rsync --delete](https://askubuntu.com/q
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
-maxBytesPerSecond value
-maxBytesPerSecond size
The maximum download speed. There is no limit if it is set to 0
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedBytes value
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
-skipBackupCompleteCheck

2
go.mod
View file

@ -7,7 +7,7 @@ require (
// Do not use the original github.com/valyala/fasthttp because of issues
// like https://github.com/valyala/fasthttp/commit/996610f021ff45fdc98c2ce7884d5fa4e7f9199b
github.com/VictoriaMetrics/fasthttp v1.0.13
github.com/VictoriaMetrics/fasthttp v1.0.14
github.com/VictoriaMetrics/metrics v1.15.2
github.com/VictoriaMetrics/metricsql v0.14.0
github.com/aws/aws-sdk-go v1.37.26

5
go.sum
View file

@ -82,8 +82,8 @@ github.com/Shopify/sarama v1.19.0/go.mod h1:FVkBWblsNy7DGZRfXLU0O9RCGt5g3g3yEuWX
github.com/Shopify/toxiproxy v2.1.4+incompatible/go.mod h1:OXgGpZ6Cli1/URJOF1DMxUHB2q5Ap20/P/eIdh4G0pI=
github.com/VictoriaMetrics/fastcache v1.5.8 h1:XW+YVx9lEXITBVv35ugK9OyotdNJVcbza69o3jmqWuI=
github.com/VictoriaMetrics/fastcache v1.5.8/go.mod h1:SiMZNgwEPJ9qWLshu9tyuE6bKc9ZWYhcNV/L7jurprQ=
github.com/VictoriaMetrics/fasthttp v1.0.13 h1:5JNS4vSPdN4QyfcpAg3Y1Wznf0uXEuSOFpeIlFw3MgM=
github.com/VictoriaMetrics/fasthttp v1.0.13/go.mod h1:3SeUL4zwB/p/a9aEeRc6gdlbrtNHXBJR6N376EgiSHU=
github.com/VictoriaMetrics/fasthttp v1.0.14 h1:iWCdHg7JQ1SO0xvPAgw3QFpFT3he+Ugdshg+1clN6CQ=
github.com/VictoriaMetrics/fasthttp v1.0.14/go.mod h1:eDVgYyGts3xXpYpVGDxQ3ZlQKW5TSvOqfc9FryjH1JA=
github.com/VictoriaMetrics/metrics v1.12.2/go.mod h1:Z1tSfPfngDn12bTfZSCqArT3OPY3u88J12hSoOhuiRE=
github.com/VictoriaMetrics/metrics v1.15.2 h1:w/GD8L9tm+gvx1oZvAofRRXwammiicdI0jgLghA2Gdo=
github.com/VictoriaMetrics/metrics v1.15.2/go.mod h1:Z1tSfPfngDn12bTfZSCqArT3OPY3u88J12hSoOhuiRE=
@ -507,7 +507,6 @@ github.com/klauspost/compress v1.4.0/go.mod h1:RyIbtBH6LamlWaDj8nUwkbUhJ87Yi3uG0
github.com/klauspost/compress v1.9.5/go.mod h1:RyIbtBH6LamlWaDj8nUwkbUhJ87Yi3uG0guNDohfE1A=
github.com/klauspost/compress v1.10.7/go.mod h1:aoV0uJVorq1K+umq18yTdKaF57EivdYsUV+/s2qKfXs=
github.com/klauspost/compress v1.11.0/go.mod h1:aoV0uJVorq1K+umq18yTdKaF57EivdYsUV+/s2qKfXs=
github.com/klauspost/compress v1.11.3/go.mod h1:aoV0uJVorq1K+umq18yTdKaF57EivdYsUV+/s2qKfXs=
github.com/klauspost/compress v1.11.12 h1:famVnQVu7QwryBN4jNseQdUKES71ZAOnB6UQQJPZvqk=
github.com/klauspost/compress v1.11.12/go.mod h1:aoV0uJVorq1K+umq18yTdKaF57EivdYsUV+/s2qKfXs=
github.com/klauspost/cpuid v0.0.0-20170728055534-ae7887de9fa5/go.mod h1:Pj4uuM528wm8OyEC2QMXAi2YiTZ96dNQPGgoMS4s3ek=

View file

@ -9,7 +9,7 @@ import (
// NewBytes returns new `bytes` flag with the given name, defaultValue and description.
func NewBytes(name string, defaultValue int, description string) *Bytes {
description += "\nSupports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB"
description += "\nSupports the following optional suffixes for `size` values: KB, MB, GB, KiB, MiB, GiB"
b := Bytes{
N: defaultValue,
valueString: fmt.Sprintf("%d", defaultValue),

View file

@ -12,6 +12,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/buildinfo"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/memory"
"github.com/VictoriaMetrics/metrics"
@ -48,6 +49,8 @@ func writePrometheusMetrics(w io.Writer) {
fmt.Fprintf(w, "vm_app_version{version=%q, short_version=%q} 1\n", buildinfo.Version,
versionRe.FindString(buildinfo.Version))
fmt.Fprintf(w, "vm_allowed_memory_bytes %d\n", memory.Allowed())
fmt.Fprintf(w, "vm_available_memory_bytes %d\n", memory.Allowed()+memory.Remaining())
fmt.Fprintf(w, "vm_available_cpu_cores %d\n", cgroup.AvailableCPUs())
// Export start time and uptime in seconds
fmt.Fprintf(w, "vm_app_start_timestamp %d\n", startTime.Unix())

View file

@ -0,0 +1,29 @@
package influxutils
import (
"fmt"
"net/http"
"strings"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
)
var influxDatabaseNames = flagutil.NewArray("influx.databaseNames", "Comma-separated list of database names to return from /query and /influx/query API. "+
"This can be needed for accepting data from Telegraf plugins such as https://github.com/fangli/fluent-plugin-influxdb")
// WriteDatabaseNames writes influxDatabaseNames to w.
func WriteDatabaseNames(w http.ResponseWriter) {
// Emulate fake response for influx query.
// This is required for TSBS benchmark and some Telegraf plugins.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1124
w.Header().Set("Content-Type", "application/json; charset=utf-8")
dbNames := *influxDatabaseNames
if len(dbNames) == 0 {
dbNames = []string{"_internal"}
}
dbs := make([]string, len(dbNames))
for i := range dbNames {
dbs[i] = fmt.Sprintf(`[%q]`, dbNames[i])
}
fmt.Fprintf(w, `{"results":[{"statement_id":0,"series":[{"name":"databases","columns":["name"],"values":[%s]}]}]}`, strings.Join(dbs, ","))
}

View file

@ -15,7 +15,9 @@ import (
// ...
// pools[n] is for capacities from 2^(n+2)+1 to 2^(n+3)
//
var pools [30]sync.Pool
// Limit the maximum capacity to 2^18, since there are no performance benefits
// in caching byte slices with bigger capacities.
var pools [17]sync.Pool
// Get returns byte buffer with the given capacity.
func Get(capacity int) *bytesutil.ByteBuffer {
@ -37,10 +39,12 @@ func Get(capacity int) *bytesutil.ByteBuffer {
// Put returns bb to the pool.
func Put(bb *bytesutil.ByteBuffer) {
capacity := cap(bb.B)
id, _ := getPoolIDAndCapacity(capacity)
id, poolCapacity := getPoolIDAndCapacity(capacity)
if capacity <= poolCapacity {
bb.Reset()
pools[id].Put(bb)
}
}
func getPoolIDAndCapacity(size int) (int, int) {
size--
@ -49,7 +53,7 @@ func getPoolIDAndCapacity(size int) (int, int) {
}
size >>= 3
id := bits.Len(uint(size))
if id > len(pools) {
if id >= len(pools) {
id = len(pools) - 1
}
return id, (1 << (id + 3))

View file

@ -65,13 +65,16 @@ func (ac *Config) tlsCertificateString() string {
// NewTLSConfig returns new TLS config for the given ac.
func (ac *Config) NewTLSConfig() *tls.Config {
tlsCfg := &tls.Config{
RootCAs: ac.TLSRootCA,
ClientSessionCache: tls.NewLRUClientSessionCache(0),
}
if ac == nil {
return tlsCfg
}
if ac.TLSCertificate != nil {
// Do not set tlsCfg.GetClientCertificate, since tlsCfg.Certificates should work OK.
tlsCfg.Certificates = []tls.Certificate{*ac.TLSCertificate}
}
tlsCfg.RootCAs = ac.TLSRootCA
tlsCfg.ServerName = ac.TLSServerName
tlsCfg.InsecureSkipVerify = ac.TLSInsecureSkipVerify
return tlsCfg

View file

@ -27,11 +27,11 @@ var (
"It is possible to set 'disable_compression: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control")
disableKeepAlive = flag.Bool("promscrape.disableKeepAlive", false, "Whether to disable HTTP keep-alive connections when scraping all the targets. "+
"This may be useful when targets has no support for HTTP keep-alive connection. "+
"It is possible to set `disable_keepalive: true` individually per each 'scrape_config` section in '-promscrape.config' for fine grained control. "+
"It is possible to set 'disable_keepalive: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control. "+
"Note that disabling HTTP keep-alive may increase load on both vmagent and scrape targets")
streamParse = flag.Bool("promscrape.streamParse", false, "Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful "+
"for reducing memory usage when millions of metrics are exposed per each scrape target. "+
"It is posible to set `stream_parse: true` individually per each `scrape_config` section in `-promscrape.config` for fine grained control")
"It is posible to set 'stream_parse: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control")
)
type client struct {
@ -67,7 +67,7 @@ func newClient(sw *ScrapeWork) *client {
host += ":443"
}
}
dialFunc, err := newStatDialFunc(sw.ProxyURL, tlsCfg)
dialFunc, err := newStatDialFunc(sw.ProxyURL, sw.ProxyAuthConfig)
if err != nil {
logger.Fatalf("cannot create dial func: %s", err)
}

View file

@ -115,6 +115,10 @@ type ScrapeConfig struct {
StreamParse bool `yaml:"stream_parse,omitempty"`
ScrapeAlignInterval time.Duration `yaml:"scrape_align_interval,omitempty"`
ScrapeOffset time.Duration `yaml:"scrape_offset,omitempty"`
ProxyTLSConfig *promauth.TLSConfig `yaml:"proxy_tls_config,omitempty"`
ProxyBasicAuth *promauth.BasicAuthConfig `yaml:"proxy_basic_auth,omitempty"`
ProxyBearerToken string `yaml:"proxy_bearer_token,omitempty"`
ProxyBearerTokenFile string `yaml:"proxy_bearer_token_file,omitempty"`
// This is set in loadConfig
swc *scrapeWorkConfig
@ -247,7 +251,7 @@ func (cfg *Config) getKubernetesSDScrapeWork(prev []*ScrapeWork) []*ScrapeWork {
target := metaLabels["__address__"]
sw, err := sc.swc.getScrapeWork(target, nil, metaLabels)
if err != nil {
logger.Errorf("cannot create kubernetes_sd_config target target %q for job_name %q: %s", target, sc.swc.jobName, err)
logger.Errorf("cannot create kubernetes_sd_config target %q for job_name %q: %s", target, sc.swc.jobName, err)
return nil
}
return sw
@ -543,6 +547,10 @@ func getScrapeWorkConfig(sc *ScrapeConfig, baseDir string, globalCfg *GlobalConf
if err != nil {
return nil, fmt.Errorf("cannot parse auth config for `job_name` %q: %w", jobName, err)
}
proxyAC, err := promauth.NewConfig(baseDir, sc.ProxyBasicAuth, sc.ProxyBearerToken, sc.ProxyBearerTokenFile, sc.ProxyTLSConfig)
if err != nil {
return nil, fmt.Errorf("cannot parse proxy auth config for `job_name` %q: %w", jobName, err)
}
relabelConfigs, err := promrelabel.ParseRelabelConfigs(sc.RelabelConfigs)
if err != nil {
return nil, fmt.Errorf("cannot parse `relabel_configs` for `job_name` %q: %w", jobName, err)
@ -559,6 +567,7 @@ func getScrapeWorkConfig(sc *ScrapeConfig, baseDir string, globalCfg *GlobalConf
scheme: scheme,
params: params,
proxyURL: sc.ProxyURL,
proxyAuthConfig: proxyAC,
authConfig: ac,
honorLabels: honorLabels,
honorTimestamps: honorTimestamps,
@ -583,6 +592,7 @@ type scrapeWorkConfig struct {
scheme string
params map[string][]string
proxyURL proxy.URL
proxyAuthConfig *promauth.Config
authConfig *promauth.Config
honorLabels bool
honorTimestamps bool
@ -849,6 +859,7 @@ func (swc *scrapeWorkConfig) getScrapeWork(target string, extraLabels, metaLabel
OriginalLabels: originalLabels,
Labels: labels,
ProxyURL: swc.proxyURL,
ProxyAuthConfig: swc.proxyAuthConfig,
AuthConfig: swc.authConfig,
MetricRelabelConfigs: swc.metricRelabelConfigs,
SampleLimit: swc.sampleLimit,

View file

@ -10,6 +10,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/proxy"
)
func TestNeedSkipScrapeWork(t *testing.T) {
@ -154,6 +155,7 @@ scrape_configs:
},
},
AuthConfig: &promauth.Config{},
ProxyAuthConfig: &promauth.Config{},
jobNameOriginal: "blackbox",
}}
if !reflect.DeepEqual(sws, swsExpected) {
@ -548,6 +550,7 @@ scrape_configs:
},
},
AuthConfig: &promauth.Config{},
ProxyAuthConfig: &promauth.Config{},
jobNameOriginal: "foo",
},
{
@ -587,6 +590,7 @@ scrape_configs:
},
},
AuthConfig: &promauth.Config{},
ProxyAuthConfig: &promauth.Config{},
jobNameOriginal: "foo",
},
{
@ -626,6 +630,7 @@ scrape_configs:
},
},
AuthConfig: &promauth.Config{},
ProxyAuthConfig: &promauth.Config{},
jobNameOriginal: "foo",
},
})
@ -679,6 +684,7 @@ scrape_configs:
},
},
AuthConfig: &promauth.Config{},
ProxyAuthConfig: &promauth.Config{},
jobNameOriginal: "foo",
},
})
@ -729,6 +735,7 @@ scrape_configs:
},
},
AuthConfig: &promauth.Config{},
ProxyAuthConfig: &promauth.Config{},
jobNameOriginal: "foo",
},
})
@ -748,6 +755,10 @@ scrape_configs:
p: ["x&y", "="]
xaa:
bearer_token: xyz
proxy_url: http://foo.bar
proxy_basic_auth:
username: foo
password: bar
static_configs:
- targets: ["foo.bar", "aaa"]
labels:
@ -801,6 +812,10 @@ scrape_configs:
AuthConfig: &promauth.Config{
Authorization: "Bearer xyz",
},
ProxyAuthConfig: &promauth.Config{
Authorization: "Basic Zm9vOmJhcg==",
},
ProxyURL: proxy.MustNewURL("http://foo.bar"),
jobNameOriginal: "foo",
},
{
@ -842,6 +857,10 @@ scrape_configs:
AuthConfig: &promauth.Config{
Authorization: "Bearer xyz",
},
ProxyAuthConfig: &promauth.Config{
Authorization: "Basic Zm9vOmJhcg==",
},
ProxyURL: proxy.MustNewURL("http://foo.bar"),
jobNameOriginal: "foo",
},
{
@ -877,6 +896,7 @@ scrape_configs:
TLSServerName: "foobar",
TLSInsecureSkipVerify: true,
},
ProxyAuthConfig: &promauth.Config{},
jobNameOriginal: "qwer",
},
})
@ -955,6 +975,7 @@ scrape_configs:
},
},
AuthConfig: &promauth.Config{},
ProxyAuthConfig: &promauth.Config{},
jobNameOriginal: "foo",
},
})
@ -1017,6 +1038,7 @@ scrape_configs:
},
},
AuthConfig: &promauth.Config{},
ProxyAuthConfig: &promauth.Config{},
jobNameOriginal: "foo",
},
})
@ -1060,6 +1082,7 @@ scrape_configs:
},
},
AuthConfig: &promauth.Config{},
ProxyAuthConfig: &promauth.Config{},
jobNameOriginal: "foo",
},
})
@ -1100,6 +1123,7 @@ scrape_configs:
},
},
AuthConfig: &promauth.Config{},
ProxyAuthConfig: &promauth.Config{},
MetricRelabelConfigs: mustParseRelabelConfigs(`
- source_labels: [foo]
target_label: abc
@ -1145,6 +1169,7 @@ scrape_configs:
AuthConfig: &promauth.Config{
Authorization: "Basic eHl6OnNlY3JldC1wYXNz",
},
ProxyAuthConfig: &promauth.Config{},
jobNameOriginal: "foo",
},
})
@ -1184,6 +1209,7 @@ scrape_configs:
AuthConfig: &promauth.Config{
Authorization: "Bearer secret-pass",
},
ProxyAuthConfig: &promauth.Config{},
jobNameOriginal: "foo",
},
})
@ -1229,6 +1255,7 @@ scrape_configs:
AuthConfig: &promauth.Config{
TLSCertificate: &snakeoilCert,
},
ProxyAuthConfig: &promauth.Config{},
jobNameOriginal: "foo",
},
})
@ -1291,6 +1318,7 @@ scrape_configs:
},
},
AuthConfig: &promauth.Config{},
ProxyAuthConfig: &promauth.Config{},
jobNameOriginal: "aaa",
},
})
@ -1352,6 +1380,7 @@ scrape_configs:
},
},
AuthConfig: &promauth.Config{},
ProxyAuthConfig: &promauth.Config{},
SampleLimit: 100,
DisableKeepAlive: true,
DisableCompression: true,
@ -1398,6 +1427,7 @@ scrape_configs:
},
jobNameOriginal: "path wo slash",
AuthConfig: &promauth.Config{},
ProxyAuthConfig: &promauth.Config{},
},
})
}

View file

@ -15,7 +15,7 @@ import (
// SDCheckInterval is check interval for Consul service discovery.
var SDCheckInterval = flag.Duration("promscrape.consulSDCheckInterval", 30*time.Second, "Interval for checking for changes in Consul. "+
"This works only if `consul_sd_configs` is configured in '-promscrape.config' file. "+
"This works only if consul_sd_configs is configured in '-promscrape.config' file. "+
"See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config for details")
// consulWatcher is a watcher for consul api, updates services map in background with long-polling.

View file

@ -1,21 +1,15 @@
package kubernetes
import (
"flag"
"fmt"
"net"
"net/http"
"net/url"
"os"
"strings"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape/discoveryutils"
)
var apiServerTimeout = flag.Duration("promscrape.kubernetes.apiServerTimeout", 30*time.Minute, "How frequently to reload the full state from Kuberntes API server")
// apiConfig contains config for API server
type apiConfig struct {
aw *apiWatcher
@ -36,6 +30,11 @@ func getAPIConfig(sdc *SDConfig, baseDir string, swcFunc ScrapeWorkConstructorFu
}
func newAPIConfig(sdc *SDConfig, baseDir string, swcFunc ScrapeWorkConstructorFunc) (*apiConfig, error) {
switch sdc.Role {
case "node", "pod", "service", "endpoints", "endpointslices", "ingress":
default:
return nil, fmt.Errorf("unexpected `role`: %q; must be one of `node`, `pod`, `service`, `endpoints`, `endpointslices` or `ingress`", sdc.Role)
}
ac, err := promauth.NewConfig(baseDir, sdc.BasicAuth, sdc.BearerToken, sdc.BearerTokenFile, sdc.TLSConfig)
if err != nil {
return nil, fmt.Errorf("cannot parse auth config: %w", err)
@ -75,20 +74,7 @@ func newAPIConfig(sdc *SDConfig, baseDir string, swcFunc ScrapeWorkConstructorFu
for strings.HasSuffix(apiServer, "/") {
apiServer = apiServer[:len(apiServer)-1]
}
var proxy func(*http.Request) (*url.URL, error)
if proxyURL := sdc.ProxyURL.URL(); proxyURL != nil {
proxy = http.ProxyURL(proxyURL)
}
client := &http.Client{
Transport: &http.Transport{
TLSClientConfig: ac.NewTLSConfig(),
Proxy: proxy,
TLSHandshakeTimeout: 10 * time.Second,
IdleConnTimeout: *apiServerTimeout,
},
Timeout: *apiServerTimeout,
}
aw := newAPIWatcher(client, apiServer, ac.Authorization, sdc.Namespaces.Names, sdc.Selectors, swcFunc)
aw := newAPIWatcher(apiServer, ac, sdc, swcFunc)
cfg := &apiConfig{
aw: aw,
}

View file

@ -1,9 +1,9 @@
package kubernetes
import (
"context"
"encoding/json"
"errors"
"flag"
"fmt"
"io"
"io/ioutil"
@ -16,9 +16,12 @@ import (
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
"github.com/VictoriaMetrics/metrics"
)
var apiServerTimeout = flag.Duration("promscrape.kubernetes.apiServerTimeout", 30*time.Minute, "How frequently to reload the full state from Kuberntes API server")
// WatchEvent is a watch event returned from API server endpoints if `watch=1` query arg is set.
//
// See https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes
@ -30,282 +33,75 @@ type WatchEvent struct {
// object is any Kubernetes object.
type object interface {
key() string
getTargetLabels(aw *apiWatcher) []map[string]string
getTargetLabels(gw *groupWatcher) []map[string]string
}
// parseObjectFunc must parse object from the given data.
type parseObjectFunc func(data []byte) (object, error)
// parseObjectListFunc must parse objectList from the given data.
type parseObjectListFunc func(data []byte) (map[string]object, ListMeta, error)
// parseObjectListFunc must parse objectList from the given r.
type parseObjectListFunc func(r io.Reader) (map[string]object, ListMeta, error)
// apiWatcher is used for watching for Kuberntes object changes and caching their latest states.
type apiWatcher struct {
// The client used for watching for object changes
client *http.Client
role string
// Kubenetes API server address in the form http://api-server
apiServer string
// The contents for `Authorization` HTTP request header
authorization string
// Namespaces to watch
namespaces []string
// Selectors to apply during watch
selectors []Selector
// Constructor for creating ScrapeWork objects from labels.
// Constructor for creating ScrapeWork objects from labels
swcFunc ScrapeWorkConstructorFunc
// mu protects watchersByURL
mu sync.Mutex
gw *groupWatcher
// a map of watchers keyed by request urls
watchersByURL map[string]*urlWatcher
// swos contains a map of ScrapeWork objects for the given apiWatcher
swosByKey map[string][]interface{}
swosByKeyLock sync.Mutex
stopFunc func()
stopCtx context.Context
wg sync.WaitGroup
swosCount *metrics.Counter
}
func newAPIWatcher(apiServer string, ac *promauth.Config, sdc *SDConfig, swcFunc ScrapeWorkConstructorFunc) *apiWatcher {
namespaces := sdc.Namespaces.Names
selectors := sdc.Selectors
proxyURL := sdc.ProxyURL.URL()
gw := getGroupWatcher(apiServer, ac, namespaces, selectors, proxyURL)
return &apiWatcher{
role: sdc.Role,
swcFunc: swcFunc,
gw: gw,
swosByKey: make(map[string][]interface{}),
swosCount: metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_scrape_works{role=%q}`, sdc.Role)),
}
}
func (aw *apiWatcher) mustStop() {
aw.stopFunc()
aw.wg.Wait()
aw.gw.unsubscribeAPIWatcher(aw)
aw.reloadScrapeWorks(make(map[string][]interface{}))
}
func newAPIWatcher(client *http.Client, apiServer, authorization string, namespaces []string, selectors []Selector, swcFunc ScrapeWorkConstructorFunc) *apiWatcher {
stopCtx, stopFunc := context.WithCancel(context.Background())
return &apiWatcher{
apiServer: apiServer,
authorization: authorization,
client: client,
namespaces: namespaces,
selectors: selectors,
swcFunc: swcFunc,
watchersByURL: make(map[string]*urlWatcher),
stopFunc: stopFunc,
stopCtx: stopCtx,
}
func (aw *apiWatcher) reloadScrapeWorks(swosByKey map[string][]interface{}) {
aw.swosByKeyLock.Lock()
aw.swosCount.Add(len(swosByKey) - len(aw.swosByKey))
aw.swosByKey = swosByKey
aw.swosByKeyLock.Unlock()
}
// getScrapeWorkObjectsForRole returns all the ScrapeWork objects for the given role.
func (aw *apiWatcher) getScrapeWorkObjectsForRole(role string) []interface{} {
aw.startWatchersForRole(role)
var swos []interface{}
aw.mu.Lock()
for _, uw := range aw.watchersByURL {
if uw.role != role {
continue
}
uw.mu.Lock()
for _, swosLocal := range uw.swosByKey {
swos = append(swos, swosLocal...)
}
uw.mu.Unlock()
}
aw.mu.Unlock()
return swos
}
// getObjectByRole returns an object with the given (namespace, name) key and the given role.
func (aw *apiWatcher) getObjectByRole(role, namespace, name string) object {
if aw == nil {
return nil
}
key := namespace + "/" + name
aw.startWatchersForRole(role)
var o object
aw.mu.Lock()
for _, uw := range aw.watchersByURL {
if uw.role != role {
continue
}
o = uw.objectsByKey.get(key)
if o != nil {
break
}
}
aw.mu.Unlock()
return o
}
func (aw *apiWatcher) startWatchersForRole(role string) {
parseObject, parseObjectList := getObjectParsersForRole(role)
paths := getAPIPaths(role, aw.namespaces, aw.selectors)
for _, path := range paths {
apiURL := aw.apiServer + path
aw.startWatcherForURL(role, apiURL, parseObject, parseObjectList)
}
}
func (aw *apiWatcher) startWatcherForURL(role, apiURL string, parseObject parseObjectFunc, parseObjectList parseObjectListFunc) {
aw.mu.Lock()
if aw.watchersByURL[apiURL] != nil {
// Watcher for the given path already exists.
aw.mu.Unlock()
return
}
uw := aw.newURLWatcher(role, apiURL, parseObject, parseObjectList)
aw.watchersByURL[apiURL] = uw
aw.mu.Unlock()
uw.watchersCount.Inc()
uw.watchersCreated.Inc()
uw.reloadObjects()
aw.wg.Add(1)
go func() {
defer aw.wg.Done()
logger.Infof("started watcher for %q", apiURL)
uw.watchForUpdates()
logger.Infof("stopped watcher for %q", apiURL)
uw.objectsByKey.decRef()
aw.mu.Lock()
delete(aw.watchersByURL, apiURL)
aw.mu.Unlock()
uw.watchersCount.Dec()
uw.watchersStopped.Inc()
}()
}
// needStop returns true if aw must be stopped.
func (aw *apiWatcher) needStop() bool {
select {
case <-aw.stopCtx.Done():
return true
default:
return false
}
}
// doRequest performs http request to the given requestURL.
func (aw *apiWatcher) doRequest(requestURL string) (*http.Response, error) {
req, err := http.NewRequestWithContext(aw.stopCtx, "GET", requestURL, nil)
if err != nil {
logger.Fatalf("cannot create a request for %q: %s", requestURL, err)
}
if aw.authorization != "" {
req.Header.Set("Authorization", aw.authorization)
}
return aw.client.Do(req)
}
// urlWatcher watches for an apiURL and updates object states in objectsByKey.
type urlWatcher struct {
role string
apiURL string
parseObject parseObjectFunc
parseObjectList parseObjectListFunc
// objectsByKey contains the latest state for objects obtained from apiURL
objectsByKey *objectsMap
// mu protects swosByKey and resourceVersion
mu sync.Mutex
swosByKey map[string][]interface{}
resourceVersion string
// the parent apiWatcher
aw *apiWatcher
watchersCount *metrics.Counter
watchersCreated *metrics.Counter
watchersStopped *metrics.Counter
}
func (aw *apiWatcher) newURLWatcher(role, apiURL string, parseObject parseObjectFunc, parseObjectList parseObjectListFunc) *urlWatcher {
return &urlWatcher{
role: role,
apiURL: apiURL,
parseObject: parseObject,
parseObjectList: parseObjectList,
objectsByKey: sharedObjectsGlobal.getByAPIURL(role, apiURL),
swosByKey: make(map[string][]interface{}),
aw: aw,
watchersCount: metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_url_watchers{role=%q}`, role)),
watchersCreated: metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_url_watchers_created_total{role=%q}`, role)),
watchersStopped: metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_url_watchers_stopped_total{role=%q}`, role)),
}
}
// Limit the concurrency for per-role objects reloading to 1.
//
// This should reduce memory usage when big number of watchers simultaneously receive an update for objects of the same role.
var reloadObjectsLocksByRole = map[string]*sync.Mutex{
"node": {},
"pod": {},
"service": {},
"endpoints": {},
"endpointslices": {},
"ingress": {},
}
func (uw *urlWatcher) resetResourceVersion() {
uw.mu.Lock()
uw.resourceVersion = ""
uw.mu.Unlock()
}
// reloadObjects reloads objects to the latest state and returns resourceVersion for the latest state.
func (uw *urlWatcher) reloadObjects() string {
lock := reloadObjectsLocksByRole[uw.role]
lock.Lock()
defer lock.Unlock()
uw.mu.Lock()
resourceVersion := uw.resourceVersion
uw.mu.Unlock()
if resourceVersion != "" {
// Fast path - objects have been already reloaded by concurrent goroutines.
return resourceVersion
}
aw := uw.aw
requestURL := uw.apiURL
resp, err := aw.doRequest(requestURL)
if err != nil {
if !aw.needStop() {
logger.Errorf("error when performing a request to %q: %s", requestURL, err)
}
return ""
}
body, _ := ioutil.ReadAll(resp.Body)
_ = resp.Body.Close()
if resp.StatusCode != http.StatusOK {
logger.Errorf("unexpected status code for request to %q: %d; want %d; response: %q", requestURL, resp.StatusCode, http.StatusOK, body)
return ""
}
objectsByKey, metadata, err := uw.parseObjectList(body)
if err != nil {
if !aw.needStop() {
logger.Errorf("cannot parse response from %q: %s", requestURL, err)
}
return ""
}
uw.objectsByKey.reload(objectsByKey)
swosByKey := make(map[string][]interface{})
for k, o := range objectsByKey {
labels := o.getTargetLabels(aw)
func (aw *apiWatcher) setScrapeWorks(key string, labels []map[string]string) {
swos := getScrapeWorkObjectsForLabels(aw.swcFunc, labels)
aw.swosByKeyLock.Lock()
if len(swos) > 0 {
swosByKey[k] = swos
aw.swosCount.Add(len(swos) - len(aw.swosByKey[key]))
aw.swosByKey[key] = swos
} else {
aw.swosCount.Add(-len(aw.swosByKey[key]))
delete(aw.swosByKey, key)
}
aw.swosByKeyLock.Unlock()
}
uw.mu.Lock()
uw.swosByKey = swosByKey
uw.resourceVersion = metadata.ResourceVersion
uw.mu.Unlock()
return metadata.ResourceVersion
func (aw *apiWatcher) removeScrapeWorks(key string) {
aw.swosByKeyLock.Lock()
aw.swosCount.Add(-len(aw.swosByKey[key]))
delete(aw.swosByKey, key)
aw.swosByKeyLock.Unlock()
}
func getScrapeWorkObjectsForLabels(swcFunc ScrapeWorkConstructorFunc, labelss []map[string]string) []interface{} {
@ -320,11 +116,362 @@ func getScrapeWorkObjectsForLabels(swcFunc ScrapeWorkConstructorFunc, labelss []
return swos
}
// getScrapeWorkObjects returns all the ScrapeWork objects for the given aw.
func (aw *apiWatcher) getScrapeWorkObjects() []interface{} {
aw.gw.startWatchersForRole(aw.role, aw)
aw.swosByKeyLock.Lock()
defer aw.swosByKeyLock.Unlock()
size := 0
for _, swosLocal := range aw.swosByKey {
size += len(swosLocal)
}
swos := make([]interface{}, 0, size)
for _, swosLocal := range aw.swosByKey {
swos = append(swos, swosLocal...)
}
return swos
}
// groupWatcher watches for Kubernetes objects on the given apiServer with the given namespaces,
// selectors and authorization using the given client.
type groupWatcher struct {
apiServer string
namespaces []string
selectors []Selector
authorization string
client *http.Client
mu sync.Mutex
m map[string]*urlWatcher
}
func newGroupWatcher(apiServer string, ac *promauth.Config, namespaces []string, selectors []Selector, proxyURL *url.URL) *groupWatcher {
var proxy func(*http.Request) (*url.URL, error)
if proxyURL != nil {
proxy = http.ProxyURL(proxyURL)
}
client := &http.Client{
Transport: &http.Transport{
TLSClientConfig: ac.NewTLSConfig(),
Proxy: proxy,
TLSHandshakeTimeout: 10 * time.Second,
IdleConnTimeout: *apiServerTimeout,
},
Timeout: *apiServerTimeout,
}
return &groupWatcher{
apiServer: apiServer,
authorization: ac.Authorization,
namespaces: namespaces,
selectors: selectors,
client: client,
m: make(map[string]*urlWatcher),
}
}
func getGroupWatcher(apiServer string, ac *promauth.Config, namespaces []string, selectors []Selector, proxyURL *url.URL) *groupWatcher {
key := fmt.Sprintf("apiServer=%s, namespaces=%s, selectors=%s, proxyURL=%v, authConfig=%s",
apiServer, namespaces, selectorsKey(selectors), proxyURL, ac.String())
groupWatchersLock.Lock()
gw := groupWatchers[key]
if gw == nil {
gw = newGroupWatcher(apiServer, ac, namespaces, selectors, proxyURL)
groupWatchers[key] = gw
}
groupWatchersLock.Unlock()
return gw
}
func selectorsKey(selectors []Selector) string {
var sb strings.Builder
for _, s := range selectors {
fmt.Fprintf(&sb, "{role=%q, label=%q, field=%q}", s.Role, s.Label, s.Field)
}
return sb.String()
}
var (
groupWatchersLock sync.Mutex
groupWatchers = make(map[string]*groupWatcher)
_ = metrics.NewGauge(`vm_promscrape_discovery_kubernetes_group_watchers`, func() float64 {
groupWatchersLock.Lock()
n := len(groupWatchers)
groupWatchersLock.Unlock()
return float64(n)
})
)
// getObjectByRole returns an object with the given (namespace, name) key and the given role.
func (gw *groupWatcher) getObjectByRole(role, namespace, name string) object {
if gw == nil {
// this is needed for testing
return nil
}
key := namespace + "/" + name
gw.startWatchersForRole(role, nil)
gw.mu.Lock()
defer gw.mu.Unlock()
for _, uw := range gw.m {
if uw.role != role {
continue
}
uw.mu.Lock()
o := uw.objectsByKey[key]
uw.mu.Unlock()
if o != nil {
return o
}
}
return nil
}
func (gw *groupWatcher) startWatchersForRole(role string, aw *apiWatcher) {
paths := getAPIPaths(role, gw.namespaces, gw.selectors)
for _, path := range paths {
apiURL := gw.apiServer + path
gw.mu.Lock()
uw := gw.m[apiURL]
if uw == nil {
uw = newURLWatcher(role, apiURL, gw)
gw.m[apiURL] = uw
}
gw.mu.Unlock()
uw.subscribeAPIWatcher(aw)
}
}
func (gw *groupWatcher) reloadScrapeWorksForAPIWatchers(aws []*apiWatcher, objectsByKey map[string]object) {
if len(aws) == 0 {
return
}
swosByKey := make([]map[string][]interface{}, len(aws))
for i := range aws {
swosByKey[i] = make(map[string][]interface{})
}
for key, o := range objectsByKey {
labels := o.getTargetLabels(gw)
for i, aw := range aws {
swos := getScrapeWorkObjectsForLabels(aw.swcFunc, labels)
if len(swos) > 0 {
swosByKey[i][key] = swos
}
}
}
for i, aw := range aws {
aw.reloadScrapeWorks(swosByKey[i])
}
}
// doRequest performs http request to the given requestURL.
func (gw *groupWatcher) doRequest(requestURL string) (*http.Response, error) {
req, err := http.NewRequest("GET", requestURL, nil)
if err != nil {
logger.Fatalf("cannot create a request for %q: %s", requestURL, err)
}
if gw.authorization != "" {
req.Header.Set("Authorization", gw.authorization)
}
return gw.client.Do(req)
}
func (gw *groupWatcher) unsubscribeAPIWatcher(aw *apiWatcher) {
gw.mu.Lock()
for _, uw := range gw.m {
uw.unsubscribeAPIWatcher(aw)
}
gw.mu.Unlock()
}
// urlWatcher watches for an apiURL and updates object states in objectsByKey.
type urlWatcher struct {
role string
apiURL string
gw *groupWatcher
parseObject parseObjectFunc
parseObjectList parseObjectListFunc
// mu protects aws, awsPending, objectsByKey and resourceVersion
mu sync.Mutex
// aws contains registered apiWatcher objects
aws map[*apiWatcher]struct{}
// awsPending contains pending apiWatcher objects, which must be moved to aws in a batch
awsPending map[*apiWatcher]struct{}
// objectsByKey contains the latest state for objects obtained from apiURL
objectsByKey map[string]object
resourceVersion string
objectsCount *metrics.Counter
objectsAdded *metrics.Counter
objectsRemoved *metrics.Counter
objectsUpdated *metrics.Counter
}
func newURLWatcher(role, apiURL string, gw *groupWatcher) *urlWatcher {
parseObject, parseObjectList := getObjectParsersForRole(role)
metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_url_watchers{role=%q}`, role)).Inc()
uw := &urlWatcher{
role: role,
apiURL: apiURL,
gw: gw,
parseObject: parseObject,
parseObjectList: parseObjectList,
aws: make(map[*apiWatcher]struct{}),
awsPending: make(map[*apiWatcher]struct{}),
objectsByKey: make(map[string]object),
objectsCount: metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_objects{role=%q}`, role)),
objectsAdded: metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_objects_added_total{role=%q}`, role)),
objectsRemoved: metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_objects_removed_total{role=%q}`, role)),
objectsUpdated: metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_objects_updated_total{role=%q}`, role)),
}
logger.Infof("started %s watcher for %q", uw.role, uw.apiURL)
go uw.watchForUpdates()
go uw.processPendingSubscribers()
return uw
}
func (uw *urlWatcher) subscribeAPIWatcher(aw *apiWatcher) {
if aw == nil {
return
}
uw.mu.Lock()
if _, ok := uw.aws[aw]; !ok {
if _, ok := uw.awsPending[aw]; !ok {
uw.awsPending[aw] = struct{}{}
metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_subscibers{role=%q,type="pending"}`, uw.role)).Inc()
}
}
uw.mu.Unlock()
}
func (uw *urlWatcher) unsubscribeAPIWatcher(aw *apiWatcher) {
uw.mu.Lock()
if _, ok := uw.aws[aw]; ok {
delete(uw.aws, aw)
metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_subscibers{role=%q,type="permanent"}`, uw.role)).Dec()
} else if _, ok := uw.awsPending[aw]; ok {
delete(uw.awsPending, aw)
metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_subscibers{role=%q,type="pending"}`, uw.role)).Dec()
}
uw.mu.Unlock()
}
func (uw *urlWatcher) processPendingSubscribers() {
t := time.NewTicker(time.Second)
for range t.C {
var awsPending []*apiWatcher
var objectsByKey map[string]object
uw.mu.Lock()
if len(uw.awsPending) > 0 {
awsPending = getAPIWatchers(uw.awsPending)
for _, aw := range awsPending {
if _, ok := uw.aws[aw]; ok {
logger.Panicf("BUG: aw=%p already exists in uw.aws", aw)
}
uw.aws[aw] = struct{}{}
delete(uw.awsPending, aw)
}
objectsByKey = make(map[string]object, len(uw.objectsByKey))
for key, o := range uw.objectsByKey {
objectsByKey[key] = o
}
}
metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_subscibers{role=%q,type="pending"}`, uw.role)).Add(-len(awsPending))
metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_subscibers{role=%q,type="permanent"}`, uw.role)).Add(len(awsPending))
uw.mu.Unlock()
uw.gw.reloadScrapeWorksForAPIWatchers(awsPending, objectsByKey)
}
}
func (uw *urlWatcher) setResourceVersion(resourceVersion string) {
uw.mu.Lock()
uw.resourceVersion = resourceVersion
uw.mu.Unlock()
}
// reloadObjects reloads objects to the latest state and returns resourceVersion for the latest state.
func (uw *urlWatcher) reloadObjects() string {
uw.mu.Lock()
resourceVersion := uw.resourceVersion
uw.mu.Unlock()
if resourceVersion != "" {
// Fast path - there is no need in reloading the objects.
return resourceVersion
}
requestURL := uw.apiURL
resp, err := uw.gw.doRequest(requestURL)
if err != nil {
logger.Errorf("cannot perform request to %q: %s", requestURL, err)
return ""
}
if resp.StatusCode != http.StatusOK {
body, _ := ioutil.ReadAll(resp.Body)
_ = resp.Body.Close()
logger.Errorf("unexpected status code for request to %q: %d; want %d; response: %q", requestURL, resp.StatusCode, http.StatusOK, body)
return ""
}
objectsByKey, metadata, err := uw.parseObjectList(resp.Body)
_ = resp.Body.Close()
if err != nil {
logger.Errorf("cannot parse objects from %q: %s", requestURL, err)
return ""
}
uw.mu.Lock()
var updated, removed, added int
for key := range uw.objectsByKey {
if o, ok := objectsByKey[key]; ok {
uw.objectsByKey[key] = o
updated++
} else {
delete(uw.objectsByKey, key)
removed++
}
}
for key, o := range objectsByKey {
if _, ok := uw.objectsByKey[key]; !ok {
uw.objectsByKey[key] = o
added++
}
}
uw.objectsUpdated.Add(updated)
uw.objectsRemoved.Add(removed)
uw.objectsAdded.Add(added)
uw.objectsCount.Add(added - removed)
uw.resourceVersion = metadata.ResourceVersion
aws := getAPIWatchers(uw.aws)
uw.mu.Unlock()
uw.gw.reloadScrapeWorksForAPIWatchers(aws, objectsByKey)
logger.Infof("reloaded %d objects from %q", len(objectsByKey), requestURL)
return metadata.ResourceVersion
}
func getAPIWatchers(awsMap map[*apiWatcher]struct{}) []*apiWatcher {
aws := make([]*apiWatcher, 0, len(awsMap))
for aw := range awsMap {
aws = append(aws, aw)
}
return aws
}
// watchForUpdates watches for object updates starting from uw.resourceVersion and updates the corresponding objects to the latest state.
//
// See https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes
func (uw *urlWatcher) watchForUpdates() {
aw := uw.aw
backoffDelay := time.Second
maxBackoffDelay := 30 * time.Second
backoffSleep := func() {
@ -339,25 +486,19 @@ func (uw *urlWatcher) watchForUpdates() {
if strings.Contains(apiURL, "?") {
delimiter = "&"
}
timeoutSeconds := time.Duration(0.9 * float64(aw.client.Timeout)).Seconds()
apiURL += delimiter + "watch=1&timeoutSeconds=" + strconv.Itoa(int(timeoutSeconds))
timeoutSeconds := time.Duration(0.9 * float64(uw.gw.client.Timeout)).Seconds()
apiURL += delimiter + "watch=1&allowWatchBookmarks=true&timeoutSeconds=" + strconv.Itoa(int(timeoutSeconds))
for {
if aw.needStop() {
return
}
resourceVersion := uw.reloadObjects()
requestURL := apiURL
if resourceVersion != "" {
requestURL += "&resourceVersion=" + url.QueryEscape(resourceVersion)
}
resp, err := aw.doRequest(requestURL)
if err != nil {
if aw.needStop() {
return
}
logger.Errorf("error when performing a request to %q: %s", requestURL, err)
if resourceVersion == "" {
backoffSleep()
continue
}
requestURL := apiURL + "&resourceVersion=" + url.QueryEscape(resourceVersion)
resp, err := uw.gw.doRequest(requestURL)
if err != nil {
logger.Errorf("cannot perform request to %q: %s", requestURL, err)
backoffSleep()
uw.resetResourceVersion()
continue
}
if resp.StatusCode != http.StatusOK {
@ -367,24 +508,20 @@ func (uw *urlWatcher) watchForUpdates() {
if resp.StatusCode == 410 {
// There is no need for sleep on 410 error. See https://kubernetes.io/docs/reference/using-api/api-concepts/#410-gone-responses
backoffDelay = time.Second
uw.setResourceVersion("")
} else {
backoffSleep()
}
uw.resetResourceVersion()
continue
}
backoffDelay = time.Second
err = uw.readObjectUpdateStream(resp.Body)
_ = resp.Body.Close()
if err != nil {
if aw.needStop() {
return
}
if !errors.Is(err, io.EOF) {
logger.Errorf("error when reading WatchEvent stream from %q: %s", requestURL, err)
}
backoffSleep()
uw.resetResourceVersion()
continue
}
}
@ -392,41 +529,79 @@ func (uw *urlWatcher) watchForUpdates() {
// readObjectUpdateStream reads Kuberntes watch events from r and updates locally cached objects according to the received events.
func (uw *urlWatcher) readObjectUpdateStream(r io.Reader) error {
aw := uw.aw
d := json.NewDecoder(r)
var we WatchEvent
for {
if err := d.Decode(&we); err != nil {
return err
}
switch we.Type {
case "ADDED", "MODIFIED":
o, err := uw.parseObject(we.Object)
if err != nil {
return err
}
key := o.key()
switch we.Type {
case "ADDED", "MODIFIED":
uw.objectsByKey.update(key, o)
labels := o.getTargetLabels(aw)
swos := getScrapeWorkObjectsForLabels(aw.swcFunc, labels)
uw.mu.Lock()
if len(swos) > 0 {
uw.swosByKey[key] = swos
if _, ok := uw.objectsByKey[key]; !ok {
uw.objectsCount.Inc()
uw.objectsAdded.Inc()
} else {
delete(uw.swosByKey, key)
uw.objectsUpdated.Inc()
}
uw.objectsByKey[key] = o
aws := getAPIWatchers(uw.aws)
uw.mu.Unlock()
labels := o.getTargetLabels(uw.gw)
for _, aw := range aws {
aw.setScrapeWorks(key, labels)
}
case "DELETED":
uw.objectsByKey.remove(key)
o, err := uw.parseObject(we.Object)
if err != nil {
return err
}
key := o.key()
uw.mu.Lock()
delete(uw.swosByKey, key)
if _, ok := uw.objectsByKey[key]; ok {
uw.objectsCount.Dec()
uw.objectsRemoved.Inc()
delete(uw.objectsByKey, key)
}
aws := getAPIWatchers(uw.aws)
uw.mu.Unlock()
for _, aw := range aws {
aw.removeScrapeWorks(key)
}
case "BOOKMARK":
// See https://kubernetes.io/docs/reference/using-api/api-concepts/#watch-bookmarks
bm, err := parseBookmark(we.Object)
if err != nil {
return fmt.Errorf("cannot parse bookmark from %q: %w", we.Object, err)
}
uw.setResourceVersion(bm.Metadata.ResourceVersion)
default:
return fmt.Errorf("unexpected WatchEvent type %q for role %q", we.Type, uw.role)
}
}
}
// Bookmark is a bookmark from Kubernetes Watch API.
// See https://kubernetes.io/docs/reference/using-api/api-concepts/#watch-bookmarks
type Bookmark struct {
Metadata struct {
ResourceVersion string
}
}
func parseBookmark(data []byte) (*Bookmark, error) {
var bm Bookmark
if err := json.Unmarshal(data, &bm); err != nil {
return nil, err
}
return &bm, nil
}
func getAPIPaths(role string, namespaces []string, selectors []Selector) []string {
objectName := getObjectNameByRole(role)
if objectName == "nodes" || len(namespaces) == 0 {
@ -521,105 +696,3 @@ func getObjectParsersForRole(role string) (parseObjectFunc, parseObjectListFunc)
return nil, nil
}
}
type objectsMap struct {
mu sync.Mutex
refCount int
m map[string]object
objectsAdded *metrics.Counter
objectsRemoved *metrics.Counter
objectsCount *metrics.Counter
}
func (om *objectsMap) incRef() {
om.mu.Lock()
om.refCount++
om.mu.Unlock()
}
func (om *objectsMap) decRef() {
om.mu.Lock()
om.refCount--
if om.refCount < 0 {
logger.Panicf("BUG: refCount cannot be smaller than 0; got %d", om.refCount)
}
if om.refCount == 0 {
// Free up memory occupied by om.m
om.objectsRemoved.Add(len(om.m))
om.objectsCount.Add(-len(om.m))
om.m = make(map[string]object)
}
om.mu.Unlock()
}
func (om *objectsMap) reload(m map[string]object) {
om.mu.Lock()
om.objectsAdded.Add(len(m))
om.objectsRemoved.Add(len(om.m))
om.objectsCount.Add(len(m) - len(om.m))
for k := range om.m {
delete(om.m, k)
}
for k, o := range m {
om.m[k] = o
}
om.mu.Unlock()
}
func (om *objectsMap) update(key string, o object) {
om.mu.Lock()
if om.m[key] == nil {
om.objectsAdded.Inc()
om.objectsCount.Inc()
}
om.m[key] = o
om.mu.Unlock()
}
func (om *objectsMap) remove(key string) {
om.mu.Lock()
if om.m[key] != nil {
om.objectsRemoved.Inc()
om.objectsCount.Dec()
delete(om.m, key)
}
om.mu.Unlock()
}
func (om *objectsMap) get(key string) object {
om.mu.Lock()
o, ok := om.m[key]
om.mu.Unlock()
if !ok {
return nil
}
return o
}
type sharedObjects struct {
mu sync.Mutex
oms map[string]*objectsMap
}
func (so *sharedObjects) getByAPIURL(role, apiURL string) *objectsMap {
so.mu.Lock()
om := so.oms[apiURL]
if om == nil {
om = &objectsMap{
m: make(map[string]object),
objectsCount: metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_objects{role=%q}`, role)),
objectsAdded: metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_objects_added_total{role=%q}`, role)),
objectsRemoved: metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_objects_removed_total{role=%q}`, role)),
}
so.oms[apiURL] = om
}
so.mu.Unlock()
om.incRef()
return om
}
var sharedObjectsGlobal = &sharedObjects{
oms: make(map[string]*objectsMap),
}

View file

@ -160,3 +160,15 @@ func TestGetAPIPaths(t *testing.T) {
"/apis/networking.k8s.io/v1beta1/namespaces/y/ingresses?labelSelector=cde%2Cbaaa&fieldSelector=abc",
})
}
func TestParseBookmark(t *testing.T) {
data := `{"kind": "Pod", "apiVersion": "v1", "metadata": {"resourceVersion": "12746"} }`
bm, err := parseBookmark([]byte(data))
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
expectedResourceVersion := "12746"
if bm.Metadata.ResourceVersion != expectedResourceVersion {
t.Fatalf("unexpected resourceVersion; got %q; want %q", bm.Metadata.ResourceVersion, expectedResourceVersion)
}
}

View file

@ -3,6 +3,7 @@ package kubernetes
import (
"encoding/json"
"fmt"
"io"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape/discoveryutils"
)
@ -11,10 +12,11 @@ func (eps *Endpoints) key() string {
return eps.Metadata.key()
}
func parseEndpointsList(data []byte) (map[string]object, ListMeta, error) {
func parseEndpointsList(r io.Reader) (map[string]object, ListMeta, error) {
var epsl EndpointsList
if err := json.Unmarshal(data, &epsl); err != nil {
return nil, epsl.Metadata, fmt.Errorf("cannot unmarshal EndpointsList from %q: %w", data, err)
d := json.NewDecoder(r)
if err := d.Decode(&epsl); err != nil {
return nil, epsl.Metadata, fmt.Errorf("cannot unmarshal EndpointsList: %w", err)
}
objectsByKey := make(map[string]object)
for _, eps := range epsl.Items {
@ -88,17 +90,17 @@ type EndpointPort struct {
// getTargetLabels returns labels for each endpoint in eps.
//
// See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#endpoints
func (eps *Endpoints) getTargetLabels(aw *apiWatcher) []map[string]string {
func (eps *Endpoints) getTargetLabels(gw *groupWatcher) []map[string]string {
var svc *Service
if o := aw.getObjectByRole("service", eps.Metadata.Namespace, eps.Metadata.Name); o != nil {
if o := gw.getObjectByRole("service", eps.Metadata.Namespace, eps.Metadata.Name); o != nil {
svc = o.(*Service)
}
podPortsSeen := make(map[*Pod][]int)
var ms []map[string]string
for _, ess := range eps.Subsets {
for _, epp := range ess.Ports {
ms = appendEndpointLabelsForAddresses(ms, aw, podPortsSeen, eps, ess.Addresses, epp, svc, "true")
ms = appendEndpointLabelsForAddresses(ms, aw, podPortsSeen, eps, ess.NotReadyAddresses, epp, svc, "false")
ms = appendEndpointLabelsForAddresses(ms, gw, podPortsSeen, eps, ess.Addresses, epp, svc, "true")
ms = appendEndpointLabelsForAddresses(ms, gw, podPortsSeen, eps, ess.NotReadyAddresses, epp, svc, "false")
}
}
@ -133,11 +135,11 @@ func (eps *Endpoints) getTargetLabels(aw *apiWatcher) []map[string]string {
return ms
}
func appendEndpointLabelsForAddresses(ms []map[string]string, aw *apiWatcher, podPortsSeen map[*Pod][]int, eps *Endpoints,
func appendEndpointLabelsForAddresses(ms []map[string]string, gw *groupWatcher, podPortsSeen map[*Pod][]int, eps *Endpoints,
eas []EndpointAddress, epp EndpointPort, svc *Service, ready string) []map[string]string {
for _, ea := range eas {
var p *Pod
if o := aw.getObjectByRole("pod", ea.TargetRef.Namespace, ea.TargetRef.Name); o != nil {
if o := gw.getObjectByRole("pod", ea.TargetRef.Namespace, ea.TargetRef.Name); o != nil {
p = o.(*Pod)
}
m := getEndpointLabelsForAddressAndPort(podPortsSeen, eps, ea, epp, p, svc, ready)

View file

@ -1,6 +1,7 @@
package kubernetes
import (
"bytes"
"testing"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
@ -10,7 +11,8 @@ import (
func TestParseEndpointsListFailure(t *testing.T) {
f := func(s string) {
t.Helper()
objectsByKey, _, err := parseEndpointsList([]byte(s))
r := bytes.NewBufferString(s)
objectsByKey, _, err := parseEndpointsList(r)
if err == nil {
t.Fatalf("expecting non-nil error")
}
@ -78,7 +80,8 @@ func TestParseEndpointsListSuccess(t *testing.T) {
]
}
`
objectsByKey, meta, err := parseEndpointsList([]byte(data))
r := bytes.NewBufferString(data)
objectsByKey, meta, err := parseEndpointsList(r)
if err != nil {
t.Fatalf("unexpected error: %s", err)
}

View file

@ -3,6 +3,7 @@ package kubernetes
import (
"encoding/json"
"fmt"
"io"
"strconv"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape/discoveryutils"
@ -12,10 +13,11 @@ func (eps *EndpointSlice) key() string {
return eps.Metadata.key()
}
func parseEndpointSliceList(data []byte) (map[string]object, ListMeta, error) {
func parseEndpointSliceList(r io.Reader) (map[string]object, ListMeta, error) {
var epsl EndpointSliceList
if err := json.Unmarshal(data, &epsl); err != nil {
return nil, epsl.Metadata, fmt.Errorf("cannot unmarshal EndpointSliceList from %q: %w", data, err)
d := json.NewDecoder(r)
if err := d.Decode(&epsl); err != nil {
return nil, epsl.Metadata, fmt.Errorf("cannot unmarshal EndpointSliceList: %w", err)
}
objectsByKey := make(map[string]object)
for _, eps := range epsl.Items {
@ -35,16 +37,16 @@ func parseEndpointSlice(data []byte) (object, error) {
// getTargetLabels returns labels for eps.
//
// See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#endpointslices
func (eps *EndpointSlice) getTargetLabels(aw *apiWatcher) []map[string]string {
func (eps *EndpointSlice) getTargetLabels(gw *groupWatcher) []map[string]string {
var svc *Service
if o := aw.getObjectByRole("service", eps.Metadata.Namespace, eps.Metadata.Name); o != nil {
if o := gw.getObjectByRole("service", eps.Metadata.Namespace, eps.Metadata.Name); o != nil {
svc = o.(*Service)
}
podPortsSeen := make(map[*Pod][]int)
var ms []map[string]string
for _, ess := range eps.Endpoints {
var p *Pod
if o := aw.getObjectByRole("pod", ess.TargetRef.Namespace, ess.TargetRef.Name); o != nil {
if o := gw.getObjectByRole("pod", ess.TargetRef.Namespace, ess.TargetRef.Name); o != nil {
p = o.(*Pod)
}
for _, epp := range eps.Ports {

View file

@ -1,6 +1,7 @@
package kubernetes
import (
"bytes"
"testing"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
@ -9,7 +10,8 @@ import (
func TestParseEndpointSliceListFail(t *testing.T) {
f := func(data string) {
objectsByKey, _, err := parseEndpointSliceList([]byte(data))
r := bytes.NewBufferString(data)
objectsByKey, _, err := parseEndpointSliceList(r)
if err == nil {
t.Errorf("unexpected result, test must fail! data: %s", data)
}
@ -175,7 +177,8 @@ func TestParseEndpointSliceListSuccess(t *testing.T) {
}
]
}`
objectsByKey, meta, err := parseEndpointSliceList([]byte(data))
r := bytes.NewBufferString(data)
objectsByKey, meta, err := parseEndpointSliceList(r)
if err != nil {
t.Errorf("cannot parse data for EndpointSliceList: %v", err)
return

View file

@ -3,16 +3,18 @@ package kubernetes
import (
"encoding/json"
"fmt"
"io"
)
func (ig *Ingress) key() string {
return ig.Metadata.key()
}
func parseIngressList(data []byte) (map[string]object, ListMeta, error) {
func parseIngressList(r io.Reader) (map[string]object, ListMeta, error) {
var igl IngressList
if err := json.Unmarshal(data, &igl); err != nil {
return nil, igl.Metadata, fmt.Errorf("cannot unmarshal IngressList from %q: %w", data, err)
d := json.NewDecoder(r)
if err := d.Decode(&igl); err != nil {
return nil, igl.Metadata, fmt.Errorf("cannot unmarshal IngressList: %w", err)
}
objectsByKey := make(map[string]object)
for _, ig := range igl.Items {
@ -85,7 +87,7 @@ type HTTPIngressPath struct {
// getTargetLabels returns labels for ig.
//
// See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ingress
func (ig *Ingress) getTargetLabels(aw *apiWatcher) []map[string]string {
func (ig *Ingress) getTargetLabels(gw *groupWatcher) []map[string]string {
tlsHosts := make(map[string]bool)
for _, tls := range ig.Spec.TLS {
for _, host := range tls.Hosts {

View file

@ -1,6 +1,7 @@
package kubernetes
import (
"bytes"
"testing"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
@ -10,7 +11,8 @@ import (
func TestParseIngressListFailure(t *testing.T) {
f := func(s string) {
t.Helper()
objectsByKey, _, err := parseIngressList([]byte(s))
r := bytes.NewBufferString(s)
objectsByKey, _, err := parseIngressList(r)
if err == nil {
t.Fatalf("expecting non-nil error")
}
@ -70,7 +72,8 @@ func TestParseIngressListSuccess(t *testing.T) {
}
]
}`
objectsByKey, meta, err := parseIngressList([]byte(data))
r := bytes.NewBufferString(data)
objectsByKey, meta, err := parseIngressList(r)
if err != nil {
t.Fatalf("unexpected error: %s", err)
}

View file

@ -48,12 +48,7 @@ func (sdc *SDConfig) GetScrapeWorkObjects(baseDir string, swcFunc ScrapeWorkCons
if err != nil {
return nil, fmt.Errorf("cannot create API config: %w", err)
}
switch sdc.Role {
case "node", "pod", "service", "endpoints", "endpointslices", "ingress":
return cfg.aw.getScrapeWorkObjectsForRole(sdc.Role), nil
default:
return nil, fmt.Errorf("unexpected `role`: %q; must be one of `node`, `pod`, `service`, `endpoints`, `endpointslices` or `ingress`; skipping it", sdc.Role)
}
return cfg.aw.getScrapeWorkObjects(), nil
}
// MustStop stops further usage for sdc.

View file

@ -3,6 +3,7 @@ package kubernetes
import (
"encoding/json"
"fmt"
"io"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape/discoveryutils"
)
@ -12,10 +13,11 @@ func (n *Node) key() string {
return n.Metadata.key()
}
func parseNodeList(data []byte) (map[string]object, ListMeta, error) {
func parseNodeList(r io.Reader) (map[string]object, ListMeta, error) {
var nl NodeList
if err := json.Unmarshal(data, &nl); err != nil {
return nil, nl.Metadata, fmt.Errorf("cannot unmarshal NodeList from %q: %w", data, err)
d := json.NewDecoder(r)
if err := d.Decode(&nl); err != nil {
return nil, nl.Metadata, fmt.Errorf("cannot unmarshal NodeList: %w", err)
}
objectsByKey := make(map[string]object)
for _, n := range nl.Items {
@ -74,7 +76,7 @@ type NodeDaemonEndpoints struct {
// getTargetLabels returs labels for the given n.
//
// See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#node
func (n *Node) getTargetLabels(aw *apiWatcher) []map[string]string {
func (n *Node) getTargetLabels(gw *groupWatcher) []map[string]string {
addr := getNodeAddr(n.Status.Addresses)
if len(addr) == 0 {
// Skip node without address

View file

@ -1,6 +1,7 @@
package kubernetes
import (
"bytes"
"reflect"
"sort"
"strconv"
@ -13,7 +14,8 @@ import (
func TestParseNodeListFailure(t *testing.T) {
f := func(s string) {
t.Helper()
objectsByKey, _, err := parseNodeList([]byte(s))
r := bytes.NewBufferString(s)
objectsByKey, _, err := parseNodeList(r)
if err == nil {
t.Fatalf("expecting non-nil error")
}
@ -229,7 +231,8 @@ func TestParseNodeListSuccess(t *testing.T) {
]
}
`
objectsByKey, meta, err := parseNodeList([]byte(data))
r := bytes.NewBufferString(data)
objectsByKey, meta, err := parseNodeList(r)
if err != nil {
t.Fatalf("unexpected error: %s", err)
}

View file

@ -3,6 +3,7 @@ package kubernetes
import (
"encoding/json"
"fmt"
"io"
"strconv"
"strings"
@ -13,10 +14,11 @@ func (p *Pod) key() string {
return p.Metadata.key()
}
func parsePodList(data []byte) (map[string]object, ListMeta, error) {
func parsePodList(r io.Reader) (map[string]object, ListMeta, error) {
var pl PodList
if err := json.Unmarshal(data, &pl); err != nil {
return nil, pl.Metadata, fmt.Errorf("cannot unmarshal PodList from %q: %w", data, err)
d := json.NewDecoder(r)
if err := d.Decode(&pl); err != nil {
return nil, pl.Metadata, fmt.Errorf("cannot unmarshal PodList: %w", err)
}
objectsByKey := make(map[string]object)
for _, p := range pl.Items {
@ -95,7 +97,7 @@ type PodCondition struct {
// getTargetLabels returns labels for each port of the given p.
//
// See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#pod
func (p *Pod) getTargetLabels(aw *apiWatcher) []map[string]string {
func (p *Pod) getTargetLabels(gw *groupWatcher) []map[string]string {
if len(p.Status.PodIP) == 0 {
// Skip pod without IP
return nil

View file

@ -1,6 +1,7 @@
package kubernetes
import (
"bytes"
"testing"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
@ -10,7 +11,8 @@ import (
func TestParsePodListFailure(t *testing.T) {
f := func(s string) {
t.Helper()
objectsByKey, _, err := parsePodList([]byte(s))
r := bytes.NewBufferString(s)
objectsByKey, _, err := parsePodList(r)
if err == nil {
t.Fatalf("expecting non-nil error")
}
@ -227,7 +229,8 @@ func TestParsePodListSuccess(t *testing.T) {
]
}
`
objectsByKey, meta, err := parsePodList([]byte(data))
r := bytes.NewBufferString(data)
objectsByKey, meta, err := parsePodList(r)
if err != nil {
t.Fatalf("unexpected error: %s", err)
}

View file

@ -3,6 +3,7 @@ package kubernetes
import (
"encoding/json"
"fmt"
"io"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape/discoveryutils"
)
@ -11,10 +12,11 @@ func (s *Service) key() string {
return s.Metadata.key()
}
func parseServiceList(data []byte) (map[string]object, ListMeta, error) {
func parseServiceList(r io.Reader) (map[string]object, ListMeta, error) {
var sl ServiceList
if err := json.Unmarshal(data, &sl); err != nil {
return nil, sl.Metadata, fmt.Errorf("cannot unmarshal ServiceList from %q: %w", data, err)
d := json.NewDecoder(r)
if err := d.Decode(&sl); err != nil {
return nil, sl.Metadata, fmt.Errorf("cannot unmarshal ServiceList: %w", err)
}
objectsByKey := make(map[string]object)
for _, s := range sl.Items {
@ -69,7 +71,7 @@ type ServicePort struct {
// getTargetLabels returns labels for each port of the given s.
//
// See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#service
func (s *Service) getTargetLabels(aw *apiWatcher) []map[string]string {
func (s *Service) getTargetLabels(gw *groupWatcher) []map[string]string {
host := fmt.Sprintf("%s.%s.svc", s.Metadata.Name, s.Metadata.Namespace)
var ms []map[string]string
for _, sp := range s.Spec.Ports {

View file

@ -1,6 +1,7 @@
package kubernetes
import (
"bytes"
"testing"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
@ -10,7 +11,8 @@ import (
func TestParseServiceListFailure(t *testing.T) {
f := func(s string) {
t.Helper()
objectsByKey, _, err := parseServiceList([]byte(s))
r := bytes.NewBufferString(s)
objectsByKey, _, err := parseServiceList(r)
if err == nil {
t.Fatalf("expecting non-nil error")
}
@ -88,7 +90,8 @@ func TestParseServiceListSuccess(t *testing.T) {
]
}
`
objectsByKey, meta, err := parseServiceList([]byte(data))
r := bytes.NewBufferString(data)
objectsByKey, meta, err := parseServiceList(r)
if err != nil {
t.Fatalf("unexpected error: %s", err)
}

View file

@ -66,7 +66,7 @@ func NewClient(apiServer string, ac *promauth.Config, proxyURL proxy.URL) (*Clie
hostPort := string(u.Host())
isTLS := string(u.Scheme()) == "https"
if isTLS && ac != nil {
if isTLS {
tlsCfg = ac.NewTLSConfig()
}
if !strings.Contains(hostPort, ":") {
@ -77,7 +77,7 @@ func NewClient(apiServer string, ac *promauth.Config, proxyURL proxy.URL) (*Clie
hostPort = net.JoinHostPort(hostPort, port)
}
if dialFunc == nil {
dialFunc, err = proxyURL.NewDialFunc(tlsCfg)
dialFunc, err = proxyURL.NewDialFunc(ac)
if err != nil {
return nil, err
}

View file

@ -21,29 +21,29 @@ var (
fileSDCheckInterval = flag.Duration("promscrape.fileSDCheckInterval", 30*time.Second, "Interval for checking for changes in 'file_sd_config'. "+
"See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#file_sd_config for details")
kubernetesSDCheckInterval = flag.Duration("promscrape.kubernetesSDCheckInterval", 30*time.Second, "Interval for checking for changes in Kubernetes API server. "+
"This works only if `kubernetes_sd_configs` is configured in '-promscrape.config' file. "+
"This works only if kubernetes_sd_configs is configured in '-promscrape.config' file. "+
"See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config for details")
openstackSDCheckInterval = flag.Duration("promscrape.openstackSDCheckInterval", 30*time.Second, "Interval for checking for changes in openstack API server. "+
"This works only if `openstack_sd_configs` is configured in '-promscrape.config' file. "+
"This works only if openstack_sd_configs is configured in '-promscrape.config' file. "+
"See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#openstack_sd_config for details")
eurekaSDCheckInterval = flag.Duration("promscrape.eurekaSDCheckInterval", 30*time.Second, "Interval for checking for changes in eureka. "+
"This works only if `eureka_sd_configs` is configured in '-promscrape.config' file. "+
"This works only if eureka_sd_configs is configured in '-promscrape.config' file. "+
"See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#eureka_sd_config for details")
dnsSDCheckInterval = flag.Duration("promscrape.dnsSDCheckInterval", 30*time.Second, "Interval for checking for changes in dns. "+
"This works only if `dns_sd_configs` is configured in '-promscrape.config' file. "+
"This works only if dns_sd_configs is configured in '-promscrape.config' file. "+
"See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config for details")
ec2SDCheckInterval = flag.Duration("promscrape.ec2SDCheckInterval", time.Minute, "Interval for checking for changes in ec2. "+
"This works only if `ec2_sd_configs` is configured in '-promscrape.config' file. "+
"This works only if ec2_sd_configs is configured in '-promscrape.config' file. "+
"See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ec2_sd_config for details")
gceSDCheckInterval = flag.Duration("promscrape.gceSDCheckInterval", time.Minute, "Interval for checking for changes in gce. "+
"This works only if `gce_sd_configs` is configured in '-promscrape.config' file. "+
"This works only if gce_sd_configs is configured in '-promscrape.config' file. "+
"See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#gce_sd_config for details")
dockerswarmSDCheckInterval = flag.Duration("promscrape.dockerswarmSDCheckInterval", 30*time.Second, "Interval for checking for changes in dockerswarm. "+
"This works only if `dockerswarm_sd_configs` is configured in '-promscrape.config' file. "+
"This works only if dockerswarm_sd_configs is configured in '-promscrape.config' file. "+
"See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dockerswarm_sd_config for details")
promscrapeConfigFile = flag.String("promscrape.config", "", "Optional path to Prometheus config file with 'scrape_configs' section containing targets to scrape. "+
"See https://victoriametrics.github.io/#how-to-scrape-prometheus-exporters-such-as-node-exporter for details")
suppressDuplicateScrapeTargetErrors = flag.Bool("promscrape.suppressDuplicateScrapeTargetErrors", false, "Whether to suppress `duplicate scrape target` errors; "+
suppressDuplicateScrapeTargetErrors = flag.Bool("promscrape.suppressDuplicateScrapeTargetErrors", false, "Whether to suppress 'duplicate scrape target' errors; "+
"see https://victoriametrics.github.io/vmagent.html#troubleshooting for details")
)
@ -231,11 +231,17 @@ func (scfg *scrapeConfig) run() {
cfg := <-scfg.cfgCh
var swsPrev []*ScrapeWork
updateScrapeWork := func(cfg *Config) {
for {
startTime := time.Now()
sws := scfg.getScrapeWork(cfg, swsPrev)
sg.update(sws)
retry := sg.update(sws)
swsPrev = sws
scfg.discoveryDuration.UpdateDuration(startTime)
if !retry {
return
}
time.Sleep(2 * time.Second)
}
}
updateScrapeWork(cfg)
atomic.AddInt32(&PendingScrapeConfigs, -1)
@ -295,7 +301,7 @@ func (sg *scraperGroup) stop() {
sg.wg.Wait()
}
func (sg *scraperGroup) update(sws []*ScrapeWork) {
func (sg *scraperGroup) update(sws []*ScrapeWork) (retry bool) {
sg.mLock.Lock()
defer sg.mLock.Unlock()
@ -352,6 +358,7 @@ func (sg *scraperGroup) update(sws []*ScrapeWork) {
sg.changesCount.Add(additionsCount + deletionsCount)
logger.Infof("%s: added targets: %d, removed targets: %d; total targets: %d", sg.name, additionsCount, deletionsCount, len(sg.m))
}
return deletionsCount > 0 && len(sg.m) == 0
}
type scraper struct {

View file

@ -68,12 +68,15 @@ type ScrapeWork struct {
// See also https://prometheus.io/docs/concepts/jobs_instances/
Labels []prompbmarshal.Label
// Auth config
AuthConfig *promauth.Config
// ProxyURL HTTP proxy url
ProxyURL proxy.URL
// Auth config for ProxyUR:
ProxyAuthConfig *promauth.Config
// Auth config
AuthConfig *promauth.Config
// Optional `metric_relabel_configs`.
MetricRelabelConfigs *promrelabel.ParsedConfigs
@ -105,9 +108,10 @@ type ScrapeWork struct {
func (sw *ScrapeWork) key() string {
// Do not take into account OriginalLabels.
key := fmt.Sprintf("ScrapeURL=%s, ScrapeInterval=%s, ScrapeTimeout=%s, HonorLabels=%v, HonorTimestamps=%v, Labels=%s, "+
"AuthConfig=%s, MetricRelabelConfigs=%s, SampleLimit=%d, DisableCompression=%v, DisableKeepAlive=%v, StreamParse=%v, "+
"ProxyURL=%s, ProxyAuthConfig=%s, AuthConfig=%s, MetricRelabelConfigs=%s, SampleLimit=%d, DisableCompression=%v, DisableKeepAlive=%v, StreamParse=%v, "+
"ScrapeAlignInterval=%s, ScrapeOffset=%s",
sw.ScrapeURL, sw.ScrapeInterval, sw.ScrapeTimeout, sw.HonorLabels, sw.HonorTimestamps, sw.LabelsString(),
sw.ProxyURL.String(), sw.ProxyAuthConfig.String(),
sw.AuthConfig.String(), sw.MetricRelabelConfigs.String(), sw.SampleLimit, sw.DisableCompression, sw.DisableKeepAlive, sw.StreamParse,
sw.ScrapeAlignInterval, sw.ScrapeOffset)
return key
@ -173,9 +177,9 @@ type scrapeWork struct {
// It is used as a hint in order to reduce memory usage for body buffers.
prevBodyLen int
// prevRowsLen contains the number rows scraped during the previous scrape.
// prevLabelsLen contains the number labels scraped during the previous scrape.
// It is used as a hint in order to reduce memory usage when parsing scrape responses.
prevRowsLen int
prevLabelsLen int
}
func (sw *scrapeWork) run(stopCh <-chan struct{}) {
@ -279,7 +283,7 @@ func (sw *scrapeWork) scrapeInternal(scrapeTimestamp, realTimestamp int64) error
scrapeDuration.Update(duration)
scrapeResponseSize.Update(float64(len(body.B)))
up := 1
wc := writeRequestCtxPool.Get(sw.prevRowsLen)
wc := writeRequestCtxPool.Get(sw.prevLabelsLen)
if err != nil {
up = 0
scrapesFailed.Inc()
@ -290,27 +294,15 @@ func (sw *scrapeWork) scrapeInternal(scrapeTimestamp, realTimestamp int64) error
srcRows := wc.rows.Rows
samplesScraped := len(srcRows)
scrapedSamples.Update(float64(samplesScraped))
if sw.Config.SampleLimit > 0 && samplesScraped > sw.Config.SampleLimit {
srcRows = srcRows[:0]
for i := range srcRows {
sw.addRowToTimeseries(wc, &srcRows[i], scrapeTimestamp, true)
}
samplesPostRelabeling := len(wc.writeRequest.Timeseries)
if sw.Config.SampleLimit > 0 && samplesPostRelabeling > sw.Config.SampleLimit {
wc.resetNoRows()
up = 0
scrapesSkippedBySampleLimit.Inc()
}
samplesPostRelabeling := 0
for i := range srcRows {
sw.addRowToTimeseries(wc, &srcRows[i], scrapeTimestamp, true)
if len(wc.labels) > 40000 {
// Limit the maximum size of wc.writeRequest.
// This should reduce memory usage when scraping targets with millions of metrics and/or labels.
// For example, when scraping /federate handler from Prometheus - see https://prometheus.io/docs/prometheus/latest/federation/
samplesPostRelabeling += len(wc.writeRequest.Timeseries)
sw.updateSeriesAdded(wc)
startTime := time.Now()
sw.PushData(&wc.writeRequest)
pushDataDuration.UpdateDuration(startTime)
wc.resetNoRows()
}
}
samplesPostRelabeling += len(wc.writeRequest.Timeseries)
sw.updateSeriesAdded(wc)
seriesAdded := sw.finalizeSeriesAdded(samplesPostRelabeling)
sw.addAutoTimeseries(wc, "up", float64(up), scrapeTimestamp)
@ -321,7 +313,7 @@ func (sw *scrapeWork) scrapeInternal(scrapeTimestamp, realTimestamp int64) error
startTime := time.Now()
sw.PushData(&wc.writeRequest)
pushDataDuration.UpdateDuration(startTime)
sw.prevRowsLen = samplesScraped
sw.prevLabelsLen = len(wc.labels)
wc.reset()
writeRequestCtxPool.Put(wc)
// body must be released only after wc is released, since wc refers to body.
@ -335,7 +327,7 @@ func (sw *scrapeWork) scrapeStream(scrapeTimestamp, realTimestamp int64) error {
samplesScraped := 0
samplesPostRelabeling := 0
responseSize := int64(0)
wc := writeRequestCtxPool.Get(sw.prevRowsLen)
wc := writeRequestCtxPool.Get(sw.prevLabelsLen)
sr, err := sw.GetStreamReader()
if err != nil {
@ -385,7 +377,7 @@ func (sw *scrapeWork) scrapeStream(scrapeTimestamp, realTimestamp int64) error {
startTime := time.Now()
sw.PushData(&wc.writeRequest)
pushDataDuration.UpdateDuration(startTime)
sw.prevRowsLen = len(wc.rows.Rows)
sw.prevLabelsLen = len(wc.labels)
wc.reset()
writeRequestCtxPool.Put(wc)
tsmGlobal.Update(sw.Config, sw.ScrapeGroup, up == 1, realTimestamp, int64(duration*1000), err)
@ -397,11 +389,11 @@ func (sw *scrapeWork) scrapeStream(scrapeTimestamp, realTimestamp int64) error {
//
// Its logic has been copied from leveledbytebufferpool.
type leveledWriteRequestCtxPool struct {
pools [30]sync.Pool
pools [13]sync.Pool
}
func (lwp *leveledWriteRequestCtxPool) Get(rowsCapacity int) *writeRequestCtx {
id, capacityNeeded := lwp.getPoolIDAndCapacity(rowsCapacity)
func (lwp *leveledWriteRequestCtxPool) Get(labelsCapacity int) *writeRequestCtx {
id, capacityNeeded := lwp.getPoolIDAndCapacity(labelsCapacity)
for i := 0; i < 2; i++ {
if id < 0 || id >= len(lwp.pools) {
break
@ -417,11 +409,13 @@ func (lwp *leveledWriteRequestCtxPool) Get(rowsCapacity int) *writeRequestCtx {
}
func (lwp *leveledWriteRequestCtxPool) Put(wc *writeRequestCtx) {
capacity := cap(wc.rows.Rows)
id, _ := lwp.getPoolIDAndCapacity(capacity)
capacity := cap(wc.labels)
id, poolCapacity := lwp.getPoolIDAndCapacity(capacity)
if capacity <= poolCapacity {
wc.reset()
lwp.pools[id].Put(wc)
}
}
func (lwp *leveledWriteRequestCtxPool) getPoolIDAndCapacity(size int) (int, int) {
size--
@ -430,7 +424,7 @@ func (lwp *leveledWriteRequestCtxPool) getPoolIDAndCapacity(size int) (int, int)
}
size >>= 3
id := bits.Len(uint(size))
if id > len(lwp.pools) {
if id >= len(lwp.pools) {
id = len(lwp.pools) - 1
}
return id, (1 << (id + 3))

View file

@ -332,7 +332,7 @@ func TestScrapeWorkScrapeInternalSuccess(t *testing.T) {
up 0 123
scrape_samples_scraped 2 123
scrape_duration_seconds 0 123
scrape_samples_post_metric_relabeling 0 123
scrape_samples_post_metric_relabeling 2 123
scrape_series_added 0 123
`)
}

View file

@ -2,7 +2,6 @@ package promscrape
import (
"context"
"crypto/tls"
"fmt"
"net"
"sync"
@ -10,6 +9,7 @@ import (
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/netutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/proxy"
"github.com/VictoriaMetrics/fasthttp"
"github.com/VictoriaMetrics/metrics"
@ -49,8 +49,8 @@ var (
stdDialerOnce sync.Once
)
func newStatDialFunc(proxyURL proxy.URL, tlsConfig *tls.Config) (fasthttp.DialFunc, error) {
dialFunc, err := proxyURL.NewDialFunc(tlsConfig)
func newStatDialFunc(proxyURL proxy.URL, ac *promauth.Config) (fasthttp.DialFunc, error) {
dialFunc, err := proxyURL.NewDialFunc(ac)
if err != nil {
return nil, err
}

View file

@ -18,7 +18,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
)
var maxDroppedTargets = flag.Int("promscrape.maxDroppedTargets", 1000, "The maximum number of `droppedTargets` shown at /api/v1/targets page. "+
var maxDroppedTargets = flag.Int("promscrape.maxDroppedTargets", 1000, "The maximum number of droppedTargets to show at /api/v1/targets page. "+
"Increase this value if your setup drops more scrape targets during relabeling and you need investigating labels for all the dropped targets. "+
"Note that the increased number of tracked dropped targets may result in increased memory usage")

View file

@ -15,7 +15,7 @@ import (
)
var maxLineLen = flagutil.NewBytes("import.maxLineLen", 100*1024*1024, "The maximum length in bytes of a single line accepted by /api/v1/import; "+
"the line length can be limited with `max_rows_per_line` query arg passed to /api/v1/export")
"the line length can be limited with 'max_rows_per_line' query arg passed to /api/v1/export")
// ParseStream parses /api/v1/import lines from req and calls callback for the parsed rows.
//

View file

@ -7,9 +7,12 @@ import (
"fmt"
"net"
"net/url"
"strings"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/netutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
"github.com/VictoriaMetrics/fasthttp"
)
@ -18,6 +21,17 @@ type URL struct {
url *url.URL
}
// MustNewURL returns new URL for the given u.
func MustNewURL(u string) URL {
pu, err := url.Parse(u)
if err != nil {
logger.Panicf("BUG: cannot parse u=%q: %s", u, err)
}
return URL{
url: pu,
}
}
// URL return the underlying url.
func (u *URL) URL() *url.URL {
if u == nil || u.url == nil {
@ -26,6 +40,15 @@ func (u *URL) URL() *url.URL {
return u.url
}
// String returns string representation of u.
func (u *URL) String() string {
pu := u.URL()
if pu == nil {
return ""
}
return pu.String()
}
// MarshalYAML implements yaml.Marshaler interface.
func (u *URL) MarshalYAML() (interface{}, error) {
if u.url == nil {
@ -48,38 +71,72 @@ func (u *URL) UnmarshalYAML(unmarshal func(interface{}) error) error {
return nil
}
// NewDialFunc returns dial func for the given pu and tlsConfig.
func (u *URL) NewDialFunc(tlsConfig *tls.Config) (fasthttp.DialFunc, error) {
// NewDialFunc returns dial func for the given u and ac.
func (u *URL) NewDialFunc(ac *promauth.Config) (fasthttp.DialFunc, error) {
if u == nil || u.url == nil {
return defaultDialFunc, nil
}
pu := u.url
if pu.Scheme != "http" && pu.Scheme != "https" {
return nil, fmt.Errorf("unknown scheme=%q for proxy_url=%q, must be http or https", pu.Scheme, pu)
return nil, fmt.Errorf("unknown scheme=%q for proxy_url=%q, must be http or https", pu.Scheme, pu.Redacted())
}
isTLS := pu.Scheme == "https"
proxyAddr := addMissingPort(pu.Host, isTLS)
var authHeader string
if ac != nil {
authHeader = ac.Authorization
}
if pu.User != nil && len(pu.User.Username()) > 0 {
userPasswordEncoded := base64.StdEncoding.EncodeToString([]byte(pu.User.String()))
authHeader = "Proxy-Authorization: Basic " + userPasswordEncoded + "\r\n"
authHeader = "Basic " + userPasswordEncoded
}
if authHeader != "" {
authHeader = "Proxy-Authorization: " + authHeader + "\r\n"
}
var tlsCfg *tls.Config
if isTLS {
tlsCfg = ac.NewTLSConfig()
if !tlsCfg.InsecureSkipVerify && tlsCfg.ServerName == "" {
tlsCfg.ServerName = tlsServerName(proxyAddr)
}
}
dialFunc := func(addr string) (net.Conn, error) {
proxyConn, err := defaultDialFunc(pu.Host)
proxyConn, err := defaultDialFunc(proxyAddr)
if err != nil {
return nil, fmt.Errorf("cannot connect to proxy %q: %w", pu, err)
return nil, fmt.Errorf("cannot connect to proxy %q: %w", pu.Redacted(), err)
}
if pu.Scheme == "https" {
proxyConn = tls.Client(proxyConn, tlsConfig)
if isTLS {
proxyConn = tls.Client(proxyConn, tlsCfg)
}
conn, err := sendConnectRequest(proxyConn, addr, authHeader)
conn, err := sendConnectRequest(proxyConn, proxyAddr, addr, authHeader)
if err != nil {
_ = proxyConn.Close()
return nil, fmt.Errorf("error when sending CONNECT request to proxy %q: %w", pu, err)
return nil, fmt.Errorf("error when sending CONNECT request to proxy %q: %w", pu.Redacted(), err)
}
return conn, nil
}
return dialFunc, nil
}
func addMissingPort(addr string, isTLS bool) string {
if strings.IndexByte(addr, ':') >= 0 {
return addr
}
port := "80"
if isTLS {
port = "443"
}
return addr + ":" + port
}
func tlsServerName(addr string) string {
host, _, err := net.SplitHostPort(addr)
if err != nil {
return addr
}
return host
}
func defaultDialFunc(addr string) (net.Conn, error) {
network := "tcp4"
if netutil.TCP6Enabled() {
@ -90,8 +147,8 @@ func defaultDialFunc(addr string) (net.Conn, error) {
}
// sendConnectRequest sends CONNECT request to proxyConn for the given addr and authHeader and returns the established connection to dstAddr.
func sendConnectRequest(proxyConn net.Conn, dstAddr, authHeader string) (net.Conn, error) {
req := "CONNECT " + dstAddr + " HTTP/1.1\r\nHost: " + dstAddr + "\r\n" + authHeader + "\r\n"
func sendConnectRequest(proxyConn net.Conn, proxyAddr, dstAddr, authHeader string) (net.Conn, error) {
req := "CONNECT " + dstAddr + " HTTP/1.1\r\nHost: " + proxyAddr + "\r\n" + authHeader + "\r\n"
if _, err := proxyConn.Write([]byte(req)); err != nil {
return nil, fmt.Errorf("cannot send CONNECT request for dstAddr=%q: %w", dstAddr, err)
}

View file

@ -577,9 +577,21 @@ func (db *indexDB) createTSIDByName(dst *TSID, metricName []byte) error {
// on db.tb flush via invalidateTagCache flushCallback passed to OpenTable.
atomic.AddUint64(&db.newTimeseriesCreated, 1)
if logNewSeries {
logger.Infof("new series created: %s", mn.String())
}
return nil
}
// SetLogNewSeries updates new series logging.
//
// This function must be called before any calling any storage functions.
func SetLogNewSeries(ok bool) {
logNewSeries = ok
}
var logNewSeries = false
func (db *indexDB) generateTSID(dst *TSID, metricName []byte, mn *MetricName) error {
// Search the TSID in the external storage.
// This is usually the db from the previous period.
@ -2048,15 +2060,6 @@ func (is *indexSearch) getTagFilterWithMinMetricIDsCount(tfs *TagFilters, maxMet
metricIDs, _, err := is.getMetricIDsForTagFilter(tf, nil, maxMetrics)
if err != nil {
if err == errFallbackToMetricNameMatch {
// Skip tag filters requiring to scan for too many metrics.
kb.B = append(kb.B[:0], uselessSingleTagFilterKeyPrefix)
kb.B = encoding.MarshalUint64(kb.B, uint64(maxMetrics))
kb.B = tf.Marshal(kb.B)
is.db.uselessTagFiltersCache.Set(kb.B, uselessTagFilterCacheValue)
uselessTagFilters++
continue
}
return nil, nil, fmt.Errorf("cannot find MetricIDs for tagFilter %s: %w", tf, err)
}
if metricIDs.Len() >= maxMetrics {
@ -2306,7 +2309,7 @@ func (is *indexSearch) updateMetricIDsForTagFilters(metricIDs *uint64set.Set, tf
// Fast path: found metricIDs by date range.
return nil
}
if err != errFallbackToMetricNameMatch {
if err != errFallbackToGlobalSearch {
return err
}
@ -2330,12 +2333,6 @@ func (is *indexSearch) updateMetricIDsForTagFilters(metricIDs *uint64set.Set, tf
continue
}
mIDs, err := is.intersectMetricIDsWithTagFilter(tf, minMetricIDs)
if err == errFallbackToMetricNameMatch {
// The tag filter requires too many index scans. Postpone it,
// so tag filters with lower number of index scans may be applied.
tfsPostponed = append(tfsPostponed, tf)
continue
}
if err != nil {
return err
}
@ -2345,11 +2342,8 @@ func (is *indexSearch) updateMetricIDsForTagFilters(metricIDs *uint64set.Set, tf
if len(tfsPostponed) > 0 && successfulIntersects == 0 {
return is.updateMetricIDsByMetricNameMatch(metricIDs, minMetricIDs, tfsPostponed)
}
for i, tf := range tfsPostponed {
for _, tf := range tfsPostponed {
mIDs, err := is.intersectMetricIDsWithTagFilter(tf, minMetricIDs)
if err == errFallbackToMetricNameMatch {
return is.updateMetricIDsByMetricNameMatch(metricIDs, minMetricIDs, tfsPostponed[i:])
}
if err != nil {
return err
}
@ -2363,7 +2357,6 @@ const (
uselessSingleTagFilterKeyPrefix = 0
uselessMultiTagFiltersKeyPrefix = 1
uselessNegativeTagFilterKeyPrefix = 2
uselessTagIntersectKeyPrefix = 3
)
var uselessTagFilterCacheValue = []byte("1")
@ -2375,29 +2368,28 @@ func (is *indexSearch) getMetricIDsForTagFilter(tf *tagFilter, filter *uint64set
metricIDs := &uint64set.Set{}
if len(tf.orSuffixes) > 0 {
// Fast path for orSuffixes - seek for rows for each value from orSuffixes.
loopsCount, err := is.updateMetricIDsForOrSuffixesNoFilter(tf, maxMetrics, metricIDs)
if err != nil {
if err == errFallbackToMetricNameMatch {
return nil, loopsCount, err
var loopsCount uint64
var err error
if filter != nil {
loopsCount, err = is.updateMetricIDsForOrSuffixesWithFilter(tf, metricIDs, filter)
} else {
loopsCount, err = is.updateMetricIDsForOrSuffixesNoFilter(tf, maxMetrics, metricIDs)
}
if err != nil {
return nil, loopsCount, fmt.Errorf("error when searching for metricIDs for tagFilter in fast path: %w; tagFilter=%s", err, tf)
}
return metricIDs, loopsCount, nil
}
// Slow path - scan for all the rows with the given prefix.
maxLoopsCount := uint64(maxMetrics) * maxIndexScanSlowLoopsPerMetric
loopsCount, err := is.getMetricIDsForTagFilterSlow(tf, filter, maxLoopsCount, metricIDs.Add)
loopsCount, err := is.getMetricIDsForTagFilterSlow(tf, filter, metricIDs.Add)
if err != nil {
if err == errFallbackToMetricNameMatch {
return nil, loopsCount, err
}
return nil, loopsCount, fmt.Errorf("error when searching for metricIDs for tagFilter in slow path: %w; tagFilter=%s", err, tf)
}
return metricIDs, loopsCount, nil
}
func (is *indexSearch) getMetricIDsForTagFilterSlow(tf *tagFilter, filter *uint64set.Set, maxLoopsCount uint64, f func(metricID uint64)) (uint64, error) {
func (is *indexSearch) getMetricIDsForTagFilterSlow(tf *tagFilter, filter *uint64set.Set, f func(metricID uint64)) (uint64, error) {
if len(tf.orSuffixes) > 0 {
logger.Panicf("BUG: the getMetricIDsForTagFilterSlow must be called only for empty tf.orSuffixes; got %s", tf.orSuffixes)
}
@ -2436,9 +2428,6 @@ func (is *indexSearch) getMetricIDsForTagFilterSlow(tf *tagFilter, filter *uint6
}
mp.ParseMetricIDs()
loopsCount += uint64(mp.MetricIDsLen())
if loopsCount > maxLoopsCount {
return loopsCount, errFallbackToMetricNameMatch
}
if prevMatch && string(suffix) == string(prevMatchingSuffix) {
// Fast path: the same tag value found.
// There is no need in checking it again with potentially
@ -2522,26 +2511,28 @@ func (is *indexSearch) updateMetricIDsForOrSuffixesNoFilter(tf *tagFilter, maxMe
return loopsCount, nil
}
func (is *indexSearch) updateMetricIDsForOrSuffixesWithFilter(tf *tagFilter, metricIDs, filter *uint64set.Set) error {
func (is *indexSearch) updateMetricIDsForOrSuffixesWithFilter(tf *tagFilter, metricIDs, filter *uint64set.Set) (uint64, error) {
sortedFilter := filter.AppendTo(nil)
kb := kbPool.Get()
defer kbPool.Put(kb)
var loopsCount uint64
for _, orSuffix := range tf.orSuffixes {
kb.B = append(kb.B[:0], tf.prefix...)
kb.B = append(kb.B, orSuffix...)
kb.B = append(kb.B, tagSeparatorChar)
if err := is.updateMetricIDsForOrSuffixWithFilter(kb.B, metricIDs, sortedFilter, tf.isNegative); err != nil {
return err
lc, err := is.updateMetricIDsForOrSuffixWithFilter(kb.B, metricIDs, sortedFilter, tf.isNegative)
if err != nil {
return loopsCount, err
}
loopsCount += lc
}
return nil
return loopsCount, nil
}
func (is *indexSearch) updateMetricIDsForOrSuffixNoFilter(prefix []byte, maxMetrics int, metricIDs *uint64set.Set) (uint64, error) {
ts := &is.ts
mp := &is.mp
mp.Reset()
maxLoopsCount := uint64(maxMetrics) * maxIndexScanLoopsPerMetric
var loopsCount uint64
loopsPaceLimiter := 0
ts.Seek(prefix)
@ -2560,9 +2551,6 @@ func (is *indexSearch) updateMetricIDsForOrSuffixNoFilter(prefix []byte, maxMetr
return loopsCount, err
}
loopsCount += uint64(mp.MetricIDsLen())
if loopsCount > maxLoopsCount {
return loopsCount, errFallbackToMetricNameMatch
}
mp.ParseMetricIDs()
metricIDs.AddMulti(mp.MetricIDs)
}
@ -2572,16 +2560,15 @@ func (is *indexSearch) updateMetricIDsForOrSuffixNoFilter(prefix []byte, maxMetr
return loopsCount, nil
}
func (is *indexSearch) updateMetricIDsForOrSuffixWithFilter(prefix []byte, metricIDs *uint64set.Set, sortedFilter []uint64, isNegative bool) error {
func (is *indexSearch) updateMetricIDsForOrSuffixWithFilter(prefix []byte, metricIDs *uint64set.Set, sortedFilter []uint64, isNegative bool) (uint64, error) {
if len(sortedFilter) == 0 {
return nil
return 0, nil
}
firstFilterMetricID := sortedFilter[0]
lastFilterMetricID := sortedFilter[len(sortedFilter)-1]
ts := &is.ts
mp := &is.mp
mp.Reset()
maxLoopsCount := uint64(len(sortedFilter)) * maxIndexScanLoopsPerMetric
var loopsCount uint64
loopsPaceLimiter := 0
ts.Seek(prefix)
@ -2590,17 +2577,18 @@ func (is *indexSearch) updateMetricIDsForOrSuffixWithFilter(prefix []byte, metri
for ts.NextItem() {
if loopsPaceLimiter&paceLimiterMediumIterationsMask == 0 {
if err := checkSearchDeadlineAndPace(is.deadline); err != nil {
return err
return loopsCount, err
}
}
loopsPaceLimiter++
item := ts.Item
if !bytes.HasPrefix(item, prefix) {
return nil
return loopsCount, nil
}
if err := mp.InitOnlyTail(item, item[len(prefix):]); err != nil {
return err
return loopsCount, err
}
loopsCount += uint64(mp.MetricIDsLen())
firstMetricID, lastMetricID := mp.FirstAndLastMetricIDs()
if lastMetricID < firstFilterMetricID {
// Skip the item, since it contains metricIDs lower
@ -2610,14 +2598,11 @@ func (is *indexSearch) updateMetricIDsForOrSuffixWithFilter(prefix []byte, metri
if firstMetricID > lastFilterMetricID {
// Stop searching, since the current item and all the subsequent items
// contain metricIDs higher than metricIDs in sortedFilter.
return nil
return loopsCount, nil
}
sf = sortedFilter
loopsCount += uint64(mp.MetricIDsLen())
if loopsCount > maxLoopsCount {
return errFallbackToMetricNameMatch
}
mp.ParseMetricIDs()
matchingMetricIDs := mp.MetricIDs[:0]
for _, metricID = range mp.MetricIDs {
if len(sf) == 0 {
break
@ -2632,18 +2617,23 @@ func (is *indexSearch) updateMetricIDsForOrSuffixWithFilter(prefix []byte, metri
if metricID < sf[0] {
continue
}
if isNegative {
metricIDs.Del(metricID)
} else {
metricIDs.Add(metricID)
}
matchingMetricIDs = append(matchingMetricIDs, metricID)
sf = sf[1:]
}
if len(matchingMetricIDs) > 0 {
if isNegative {
for _, metricID := range matchingMetricIDs {
metricIDs.Del(metricID)
}
} else {
metricIDs.AddMulti(matchingMetricIDs)
}
}
}
if err := ts.Error(); err != nil {
return fmt.Errorf("error when searching for tag filter prefix %q: %w", prefix, err)
return loopsCount, fmt.Errorf("error when searching for tag filter prefix %q: %w", prefix, err)
}
return nil
return loopsCount, nil
}
func binarySearchUint64(a []uint64, v uint64) uint {
@ -2660,7 +2650,7 @@ func binarySearchUint64(a []uint64, v uint64) uint {
return i
}
var errFallbackToMetricNameMatch = errors.New("fall back to updateMetricIDsByMetricNameMatch because of too many index scan loops")
var errFallbackToGlobalSearch = errors.New("fall back from per-day index search to global index search")
var errMissingMetricIDsForDate = errors.New("missing metricIDs for date")
@ -2725,11 +2715,11 @@ func (is *indexSearch) tryUpdatingMetricIDsForDateRange(metricIDs *uint64set.Set
maxDate := uint64(tr.MaxTimestamp) / msecPerDay
if maxDate < minDate {
// Per-day inverted index doesn't cover the selected date range.
return errFallbackToMetricNameMatch
return fmt.Errorf("maxDate=%d cannot be smaller than minDate=%d", maxDate, minDate)
}
if maxDate-minDate > maxDaysForDateMetricIDs {
// Too much dates must be covered. Give up, since it may be slow.
return errFallbackToMetricNameMatch
return errFallbackToGlobalSearch
}
if minDate == maxDate {
// Fast path - query only a single date.
@ -2759,14 +2749,14 @@ func (is *indexSearch) tryUpdatingMetricIDsForDateRange(metricIDs *uint64set.Set
return
}
if err != nil {
if err == errFallbackToMetricNameMatch {
if err == errFallbackToGlobalSearch {
// The per-date search is too expensive. Probably it is faster to perform global search
// using metric name match.
errGlobal = err
return
}
dateStr := time.Unix(int64(date*24*3600), 0)
errGlobal = fmt.Errorf("cannot search for metricIDs for %s: %w", dateStr, err)
errGlobal = fmt.Errorf("cannot search for metricIDs at %s: %w", dateStr, err)
return
}
if metricIDs.Len() < maxMetrics {
@ -2790,7 +2780,6 @@ func (is *indexSearch) getMetricIDsForDateAndFilters(date uint64, tfs *TagFilter
type tagFilterWithWeight struct {
tf *tagFilter
loopsCount uint64
lastQueryTimestamp uint64
}
tfws := make([]tagFilterWithWeight, len(tfs.tfs))
currentTime := fasttime.UnixTimestamp()
@ -2798,26 +2787,29 @@ func (is *indexSearch) getMetricIDsForDateAndFilters(date uint64, tfs *TagFilter
tf := &tfs.tfs[i]
loopsCount, lastQueryTimestamp := is.getLoopsCountAndTimestampForDateFilter(date, tf)
origLoopsCount := loopsCount
if currentTime > lastQueryTimestamp+3*3600 {
// Update stats once per 3 hours only for relatively fast tag filters.
// There is no need in spending CPU resources on updating stats for slow tag filters.
if loopsCount == 0 && tf.looksLikeHeavy() {
// Set high loopsCount for heavy tag filters instead of spending CPU time on their execution.
loopsCount = 11e6
is.storeLoopsCountForDateFilter(date, tf, loopsCount)
}
if currentTime > lastQueryTimestamp+3600 {
// Update stats once per hour for relatively fast tag filters.
// There is no need in spending CPU resources on updating stats for heavy tag filters.
if loopsCount <= 10e6 {
loopsCount = 0
}
}
if loopsCount == 0 {
// Prevent from possible thundering herd issue when heavy tf is executed from multiple concurrent queries
// Prevent from possible thundering herd issue when potentially heavy tf is executed from multiple concurrent queries
// by temporary persisting its position in the tag filters list.
if origLoopsCount == 0 {
origLoopsCount = 10e6
origLoopsCount = 9e6
}
lastQueryTimestamp = 0
is.storeLoopsCountForDateFilter(date, tf, origLoopsCount, lastQueryTimestamp)
is.storeLoopsCountForDateFilter(date, tf, origLoopsCount)
}
tfws[i] = tagFilterWithWeight{
tf: tf,
loopsCount: loopsCount,
lastQueryTimestamp: lastQueryTimestamp,
}
}
sort.Slice(tfws, func(i, j int) bool {
@ -2829,7 +2821,6 @@ func (is *indexSearch) getMetricIDsForDateAndFilters(date uint64, tfs *TagFilter
})
// Populate metricIDs for the first non-negative filter.
var tfsPostponed []*tagFilter
var metricIDs *uint64set.Set
tfwsRemaining := tfws[:0]
maxDateMetrics := maxMetrics * 50
@ -2841,13 +2832,16 @@ func (is *indexSearch) getMetricIDsForDateAndFilters(date uint64, tfs *TagFilter
continue
}
m, loopsCount, err := is.getMetricIDsForDateTagFilter(tf, date, nil, tfs.commonPrefix, maxDateMetrics)
is.storeLoopsCountForDateFilter(date, tf, loopsCount, tfw.lastQueryTimestamp)
if loopsCount > tfw.loopsCount {
is.storeLoopsCountForDateFilter(date, tf, loopsCount)
}
if err != nil {
return nil, err
}
if m.Len() >= maxDateMetrics {
// Too many time series found by a single tag filter. Postpone applying this filter via metricName match.
tfsPostponed = append(tfsPostponed, tf)
// Too many time series found by a single tag filter. Postpone applying this filter.
tfwsRemaining = append(tfwsRemaining, tfw)
tfw.loopsCount = loopsCount
continue
}
metricIDs = m
@ -2872,7 +2866,7 @@ func (is *indexSearch) getMetricIDsForDateAndFilters(date uint64, tfs *TagFilter
}
if m.Len() >= maxDateMetrics {
// Too many time series found for the given (date). Fall back to global search.
return nil, errFallbackToMetricNameMatch
return nil, errFallbackToGlobalSearch
}
metricIDs = m
}
@ -2883,6 +2877,7 @@ func (is *indexSearch) getMetricIDsForDateAndFilters(date uint64, tfs *TagFilter
// when the intial tag filters significantly reduce the number of found metricIDs,
// so the remaining filters could be performed via much faster metricName matching instead
// of slow selecting of matching metricIDs.
var tfsPostponed []*tagFilter
for i := range tfwsRemaining {
tfw := tfwsRemaining[i]
tf := tfw.tf
@ -2891,24 +2886,26 @@ func (is *indexSearch) getMetricIDsForDateAndFilters(date uint64, tfs *TagFilter
// Short circuit - there is no need in applying the remaining filters to an empty set.
break
}
if uint64(metricIDsLen)*maxIndexScanLoopsPerMetric < tfw.loopsCount {
if tfw.loopsCount > uint64(metricIDsLen)*loopsCountPerMetricNameMatch {
// It should be faster performing metricName match on the remaining filters
// instead of scanning big number of entries in the inverted index for these filters.
for i < len(tfwsRemaining) {
tfw := tfwsRemaining[i]
tf := tfw.tf
tfsPostponed = append(tfsPostponed, tf)
// Store stats for non-executed tf, since it could be updated during protection from thundered herd.
is.storeLoopsCountForDateFilter(date, tf, tfw.loopsCount, tfw.lastQueryTimestamp)
continue
is.storeLoopsCountForDateFilter(date, tf, tfw.loopsCount)
i++
}
break
}
m, loopsCount, err := is.getMetricIDsForDateTagFilter(tf, date, metricIDs, tfs.commonPrefix, 0)
if loopsCount > tfw.loopsCount {
is.storeLoopsCountForDateFilter(date, tf, loopsCount)
}
m, loopsCount, err := is.getMetricIDsForDateTagFilter(tf, date, metricIDs, tfs.commonPrefix, maxDateMetrics)
is.storeLoopsCountForDateFilter(date, tf, loopsCount, tfw.lastQueryTimestamp)
if err != nil {
return nil, err
}
if m.Len() >= maxDateMetrics {
// Too many time series found by a single tag filter. Postpone applying this filter via metricName match.
tfsPostponed = append(tfsPostponed, tf)
continue
}
if tf.isNegative {
metricIDs.Subtract(m)
} else {
@ -3092,9 +3089,9 @@ func (is *indexSearch) getMetricIDsForDateTagFilter(tf *tagFilter, date uint64,
kbPool.Put(kb)
if err != nil {
// Set high loopsCount for failing filter, so it is moved to the end of filter list.
loopsCount = 1e9
loopsCount = 20e9
}
if metricIDs.Len() >= maxMetrics {
if filter == nil && metricIDs.Len() >= maxMetrics {
// Increase loopsCount for tag filter matching too many metrics,
// So next time it is moved to the end of filter list.
loopsCount *= 2
@ -3115,13 +3112,8 @@ func (is *indexSearch) getLoopsCountAndTimestampForDateFilter(date uint64, tf *t
return loopsCount, timestamp
}
func (is *indexSearch) storeLoopsCountForDateFilter(date uint64, tf *tagFilter, loopsCount, prevTimestamp uint64) {
func (is *indexSearch) storeLoopsCountForDateFilter(date uint64, tf *tagFilter, loopsCount uint64) {
currentTimestamp := fasttime.UnixTimestamp()
if currentTimestamp < prevTimestamp+5 {
// The cache already contains quite fresh entry for the current (date, tf).
// Do not update it too frequently.
return
}
is.kb.B = appendDateTagFilterCacheKey(is.kb.B[:0], date, tf)
kb := kbPool.Get()
kb.B = encoding.MarshalUint64(kb.B[:0], loopsCount)
@ -3196,63 +3188,28 @@ func (is *indexSearch) updateMetricIDsForPrefix(prefix []byte, metricIDs *uint64
return nil
}
// The maximum number of index scan loops.
// Bigger number of loops is slower than updateMetricIDsByMetricNameMatch
// over the found metrics.
const maxIndexScanLoopsPerMetric = 100
// The maximum number of slow index scan loops.
// Bigger number of loops is slower than updateMetricIDsByMetricNameMatch
// over the found metrics.
const maxIndexScanSlowLoopsPerMetric = 20
// The estimated number of index scan loops a single loop in updateMetricIDsByMetricNameMatch takes.
const loopsCountPerMetricNameMatch = 500
func (is *indexSearch) intersectMetricIDsWithTagFilter(tf *tagFilter, filter *uint64set.Set) (*uint64set.Set, error) {
if filter.Len() == 0 {
return nil, nil
}
kb := &is.kb
filterLenRounded := (uint64(filter.Len()) / 1024) * 1024
kb.B = append(kb.B[:0], uselessTagIntersectKeyPrefix)
kb.B = encoding.MarshalUint64(kb.B, filterLenRounded)
kb.B = tf.Marshal(kb.B)
if len(is.db.uselessTagFiltersCache.Get(nil, kb.B)) > 0 {
// Skip useless work, since the intersection will return
// errFallbackToMetricNameMatc for the given filter.
return nil, errFallbackToMetricNameMatch
}
metricIDs, err := is.intersectMetricIDsWithTagFilterNocache(tf, filter)
if err == nil {
return metricIDs, err
}
if err != errFallbackToMetricNameMatch {
return nil, err
}
kb.B = append(kb.B[:0], uselessTagIntersectKeyPrefix)
kb.B = encoding.MarshalUint64(kb.B, filterLenRounded)
kb.B = tf.Marshal(kb.B)
is.db.uselessTagFiltersCache.Set(kb.B, uselessTagFilterCacheValue)
return nil, errFallbackToMetricNameMatch
}
func (is *indexSearch) intersectMetricIDsWithTagFilterNocache(tf *tagFilter, filter *uint64set.Set) (*uint64set.Set, error) {
metricIDs := filter
if !tf.isNegative {
metricIDs = &uint64set.Set{}
}
if len(tf.orSuffixes) > 0 {
// Fast path for orSuffixes - seek for rows for each value from orSuffixes.
if err := is.updateMetricIDsForOrSuffixesWithFilter(tf, metricIDs, filter); err != nil {
if err == errFallbackToMetricNameMatch {
return nil, err
}
_, err := is.updateMetricIDsForOrSuffixesWithFilter(tf, metricIDs, filter)
if err != nil {
return nil, fmt.Errorf("error when intersecting metricIDs for tagFilter in fast path: %w; tagFilter=%s", err, tf)
}
return metricIDs, nil
}
// Slow path - scan for all the rows with the given prefix.
maxLoopsCount := uint64(filter.Len()) * maxIndexScanSlowLoopsPerMetric
_, err := is.getMetricIDsForTagFilterSlow(tf, filter, maxLoopsCount, func(metricID uint64) {
_, err := is.getMetricIDsForTagFilterSlow(tf, filter, func(metricID uint64) {
if tf.isNegative {
// filter must be equal to metricIDs
metricIDs.Del(metricID)
@ -3261,9 +3218,6 @@ func (is *indexSearch) intersectMetricIDsWithTagFilterNocache(tf *tagFilter, fil
}
})
if err != nil {
if err == errFallbackToMetricNameMatch {
return nil, err
}
return nil, fmt.Errorf("error when intersecting metricIDs for tagFilter in slow path: %w; tagFilter=%s", err, tf)
}
return metricIDs, nil

View file

@ -248,6 +248,10 @@ type tagFilter struct {
graphiteReverseSuffix []byte
}
func (tf *tagFilter) looksLikeHeavy() bool {
return tf.isRegexp && len(tf.orSuffixes) == 0
}
func (tf *tagFilter) isComposite() bool {
k := tf.key
return len(k) > 0 && k[0] == compositeTagKeyPrefix

View file

@ -141,36 +141,32 @@ func (s *Set) AddMulti(a []uint64) {
if len(a) == 0 {
return
}
slowPath := false
hi := uint32(a[0] >> 32)
for _, x := range a[1:] {
if hi != uint32(x>>32) {
slowPath = true
break
hiPrev := uint32(a[0] >> 32)
i := 0
for j, x := range a {
hi := uint32(x >> 32)
if hi == hiPrev {
continue
}
b32 := s.getOrCreateBucket32(hiPrev)
s.itemsCount += b32.addMulti(a[i:j])
hiPrev = hi
i = j
}
if slowPath {
for _, x := range a {
s.Add(x)
b32 := s.getOrCreateBucket32(hiPrev)
s.itemsCount += b32.addMulti(a[i:])
}
return
}
// Fast path - all the items in a have identical higher 32 bits.
// Put them in a bulk into the corresponding bucket32.
func (s *Set) getOrCreateBucket32(hi uint32) *bucket32 {
bs := s.buckets
var b32 *bucket32
for i := range bs {
if bs[i].hi == hi {
b32 = &bs[i]
break
return &bs[i]
}
}
if b32 == nil {
b32 = s.addBucket32()
b32 := s.addBucket32()
b32.hi = hi
}
n := b32.addMulti(a)
s.itemsCount += n
return b32
}
func (s *Set) addBucket32() *bucket32 {
@ -609,41 +605,32 @@ func (b *bucket32) addMulti(a []uint64) int {
if len(a) == 0 {
return 0
}
hi := uint16(a[0] >> 16)
slowPath := false
for _, x := range a[1:] {
if hi != uint16(x>>16) {
slowPath = true
break
}
}
if slowPath {
count := 0
for _, x := range a {
if b.add(uint32(x)) {
count++
hiPrev := uint16(a[0] >> 16)
i := 0
for j, x := range a {
hi := uint16(x >> 16)
if hi == hiPrev {
continue
}
b16 := b.getOrCreateBucket16(hiPrev)
count += b16.addMulti(a[i:j])
hiPrev = hi
i = j
}
b16 := b.getOrCreateBucket16(hiPrev)
count += b16.addMulti(a[i:])
return count
}
// Fast path - all the items in a have identical higher 32+16 bits.
// Put them to a single bucket16 in a bulk.
var b16 *bucket16
func (b *bucket32) getOrCreateBucket16(hi uint16) *bucket16 {
his := b.b16his
bs := b.buckets
if n := b.getHint(); n < uint32(len(his)) && his[n] == hi {
b16 = &bs[n]
}
if b16 == nil {
n := binarySearch16(his, hi)
if n < 0 || n >= len(his) || his[n] != hi {
b16 = b.addBucketAtPos(hi, n)
} else {
b.setHint(n)
b16 = &bs[n]
return b.addBucketAtPos(hi, n)
}
}
return b16.addMulti(a)
return &bs[n]
}
func (b *bucket32) addSlow(hi, lo uint16) bool {
@ -742,8 +729,8 @@ const (
type bucket16 struct {
bits *[wordsPerBucket]uint64
smallPool *[smallPoolSize]uint16
smallPoolLen int
smallPool [smallPoolSize]uint16
}
const smallPoolSize = 56
@ -820,7 +807,14 @@ func (b *bucket16) intersect(a *bucket16) {
}
func (b *bucket16) sizeBytes() uint64 {
return uint64(unsafe.Sizeof(*b)) + uint64(unsafe.Sizeof(*b.bits))
n := unsafe.Sizeof(*b)
if b.bits != nil {
n += unsafe.Sizeof(*b.bits)
}
if b.smallPool != nil {
n += unsafe.Sizeof(*b.smallPool)
}
return uint64(n)
}
func (b *bucket16) copyTo(dst *bucket16) {
@ -831,23 +825,37 @@ func (b *bucket16) copyTo(dst *bucket16) {
dst.bits = &bits
}
dst.smallPoolLen = b.smallPoolLen
dst.smallPool = b.smallPool
if b.smallPool != nil {
sp := dst.getOrCreateSmallPool()
*sp = *b.smallPool
}
}
func (b *bucket16) getOrCreateSmallPool() *[smallPoolSize]uint16 {
if b.smallPool == nil {
var sp [smallPoolSize]uint16
b.smallPool = &sp
}
return b.smallPool
}
func (b *bucket16) add(x uint16) bool {
if b.bits == nil {
bits := b.bits
if bits == nil {
return b.addToSmallPool(x)
}
wordNum, bitMask := getWordNumBitMask(x)
word := &b.bits[wordNum]
ok := *word&bitMask == 0
*word |= bitMask
ok := bits[wordNum]&bitMask == 0
if ok {
bits[wordNum] |= bitMask
}
return ok
}
func (b *bucket16) addMulti(a []uint64) int {
count := 0
if b.bits == nil {
bits := b.bits
if bits == nil {
// Slow path
for _, x := range a {
if b.add(uint16(x)) {
@ -858,11 +866,10 @@ func (b *bucket16) addMulti(a []uint64) int {
// Fast path
for _, x := range a {
wordNum, bitMask := getWordNumBitMask(uint16(x))
word := &b.bits[wordNum]
if *word&bitMask == 0 {
if bits[wordNum]&bitMask == 0 {
bits[wordNum] |= bitMask
count++
}
*word |= bitMask
}
}
return count
@ -872,15 +879,16 @@ func (b *bucket16) addToSmallPool(x uint16) bool {
if b.hasInSmallPool(x) {
return false
}
if b.smallPoolLen < len(b.smallPool) {
b.smallPool[b.smallPoolLen] = x
sp := b.getOrCreateSmallPool()
if b.smallPoolLen < len(sp) {
sp[b.smallPoolLen] = x
b.smallPoolLen++
return true
}
b.smallPoolLen = 0
var bits [wordsPerBucket]uint64
b.bits = &bits
for _, v := range b.smallPool[:] {
for _, v := range sp[:] {
b.add(v)
}
b.add(x)
@ -896,7 +904,11 @@ func (b *bucket16) has(x uint16) bool {
}
func (b *bucket16) hasInSmallPool(x uint16) bool {
for _, v := range b.smallPool[:b.smallPoolLen] {
sp := b.smallPool
if sp == nil {
return false
}
for _, v := range sp[:b.smallPoolLen] {
if v == x {
return true
}
@ -916,9 +928,13 @@ func (b *bucket16) del(x uint16) bool {
}
func (b *bucket16) delFromSmallPool(x uint16) bool {
for i, v := range b.smallPool[:b.smallPoolLen] {
sp := b.smallPool
if sp == nil {
return false
}
for i, v := range sp[:b.smallPoolLen] {
if v == x {
copy(b.smallPool[i:], b.smallPool[i+1:])
copy(sp[i:], sp[i+1:])
b.smallPoolLen--
return true
}
@ -929,11 +945,15 @@ func (b *bucket16) delFromSmallPool(x uint16) bool {
func (b *bucket16) appendTo(dst []uint64, hi uint32, hi16 uint16) []uint64 {
hi64 := uint64(hi)<<32 | uint64(hi16)<<16
if b.bits == nil {
sp := b.smallPool
if sp == nil {
return dst
}
// Use smallPoolSorter instead of sort.Slice here in order to reduce memory allocations.
sps := smallPoolSorterPool.Get().(*smallPoolSorter)
// Sort a copy of b.smallPool, since b must be readonly in order to prevent from data races
// Sort a copy of sp, since b must be readonly in order to prevent from data races
// when b.appendTo is called from concurrent goroutines.
sps.smallPool = b.smallPool
sps.smallPool = *sp
sps.a = sps.smallPool[:b.smallPoolLen]
if len(sps.a) > 1 && !sort.IsSorted(sps) {
sort.Sort(sps)
@ -996,6 +1016,10 @@ func getWordNumBitMask(x uint16) (uint16, uint64) {
func binarySearch16(u16 []uint16, x uint16) int {
// The code has been adapted from sort.Search.
n := len(u16)
if n > 0 && u16[n-1] < x {
// Fast path for values scanned in ascending order.
return n
}
i, j := 0, n
for i < j {
h := int(uint(i+j) >> 1)

View file

@ -3,7 +3,7 @@ module github.com/VictoriaMetrics/fasthttp
go 1.13
require (
github.com/klauspost/compress v1.11.3
github.com/klauspost/compress v1.11.12
github.com/valyala/bytebufferpool v1.0.0
github.com/valyala/tcplisten v0.0.0-20161114210144-ceec8f93295a
)

View file

@ -1,5 +1,5 @@
github.com/klauspost/compress v1.11.3 h1:dB4Bn0tN3wdCzQxnS8r06kV74qN/TAfaIS0bVE8h3jc=
github.com/klauspost/compress v1.11.3/go.mod h1:aoV0uJVorq1K+umq18yTdKaF57EivdYsUV+/s2qKfXs=
github.com/klauspost/compress v1.11.12 h1:famVnQVu7QwryBN4jNseQdUKES71ZAOnB6UQQJPZvqk=
github.com/klauspost/compress v1.11.12/go.mod h1:aoV0uJVorq1K+umq18yTdKaF57EivdYsUV+/s2qKfXs=
github.com/valyala/bytebufferpool v1.0.0 h1:GqA5TC/0021Y/b9FG4Oi9Mr3q7XYx6KllzawFIhcdPw=
github.com/valyala/bytebufferpool v1.0.0/go.mod h1:6bBcMArwyJ5K/AmCkWv1jt77kVWyCJ6HpOuEn7z0Csc=
github.com/valyala/tcplisten v0.0.0-20161114210144-ceec8f93295a h1:0R4NLDRDZX6JcmhJgXi5E4b8Wg84ihbmUKp/GvSPEzc=

2
vendor/modules.txt vendored
View file

@ -13,7 +13,7 @@ cloud.google.com/go/storage
# github.com/VictoriaMetrics/fastcache v1.5.8
## explicit
github.com/VictoriaMetrics/fastcache
# github.com/VictoriaMetrics/fasthttp v1.0.13
# github.com/VictoriaMetrics/fasthttp v1.0.14
## explicit
github.com/VictoriaMetrics/fasthttp
github.com/VictoriaMetrics/fasthttp/fasthttputil