mirror of
https://github.com/VictoriaMetrics/VictoriaMetrics.git
synced 2024-12-01 14:47:38 +00:00
Merge branch 'public-single-node' into pmm-6401-read-prometheus-data-files
This commit is contained in:
commit
7d44cdd8ce
74 changed files with 3952 additions and 2314 deletions
259
README.md
259
README.md
|
@ -170,6 +170,7 @@ Alphabetically sorted links to case studies:
|
|||
* [Font used](#font-used)
|
||||
* [Color Palette](#color-palette)
|
||||
* [We kindly ask](#we-kindly-ask)
|
||||
* [List of command-line flags](#list-of-command-line-flags)
|
||||
|
||||
|
||||
## How to start VictoriaMetrics
|
||||
|
@ -182,7 +183,7 @@ The following command-line flags are used the most:
|
|||
* `-storageDataPath` - path to data directory. VictoriaMetrics stores all the data in this directory. Default path is `victoria-metrics-data` in the current working directory.
|
||||
* `-retentionPeriod` - retention for stored data. Older data is automatically deleted. Default retention is 1 month. See [these docs](#retention) for more details.
|
||||
|
||||
Other flags have good enough default values, so set them only if you really need this. Pass `-help` to see all the available flags with description and default values.
|
||||
Other flags have good enough default values, so set them only if you really need this. Pass `-help` to see [all the available flags with description and default values](#list-of-command-line-flags).
|
||||
|
||||
See how to [ingest data to VictoriaMetrics](#how-to-import-time-series-data), how to [query VictoriaMetrics](#grafana-setup)
|
||||
and how to [handle alerts](#alerting).
|
||||
|
@ -413,6 +414,10 @@ while VictoriaMetrics stores them with *milliseconds* precision.
|
|||
Extra labels may be added to all the written time series by passing `extra_label=name=value` query args.
|
||||
For example, `/write?extra_label=foo=bar` would add `{foo="bar"}` label to all the ingested metrics.
|
||||
|
||||
Some plugins for Telegraf such as [fluentd](https://github.com/fangli/fluent-plugin-influxdb), [Juniper/open-nti](https://github.com/Juniper/open-nti)
|
||||
or [Juniper/jitmon](https://github.com/Juniper/jtimon) send `SHOW DATABASES` query to `/query` and expect a particular database name in the response.
|
||||
Comma-separated list of expected databases can be passed to VictoriaMetrics via `-influx.databaseNames` command-line flag.
|
||||
|
||||
## How to send data from Graphite-compatible agents such as [StatsD](https://github.com/etsy/statsd)
|
||||
|
||||
Enable Graphite receiver in VictoriaMetrics by setting `-graphiteListenAddr` command line flag. For instance,
|
||||
|
@ -562,14 +567,17 @@ in front of VictoriaMetrics. [Contact us](mailto:sales@victoriametrics.com) if y
|
|||
VictoriaMetrics accepts relative times in `time`, `start` and `end` query args additionally to unix timestamps and [RFC3339](https://www.ietf.org/rfc/rfc3339.txt).
|
||||
For example, the following query would return data for the last 30 minutes: `/api/v1/query_range?start=-30m&query=...`.
|
||||
|
||||
VictoriaMetrics accepts `round_digits` query arg for `/api/v1/query` and `/api/v1/query_range` handlers. It can be used for rounding response values to the given number of digits after the decimal point. For example, `/api/v1/query?query=avg_over_time(temperature[1h])&round_digits=2` would round response values to up to two digits after the decimal point.
|
||||
|
||||
By default, VictoriaMetrics returns time series for the last 5 minutes from `/api/v1/series`, while the Prometheus API defaults to all time. Use `start` and `end` to select a different time range.
|
||||
|
||||
VictoriaMetrics accepts additional args for `/api/v1/labels` and `/api/v1/label/.../values` handlers.
|
||||
See [this feature request](https://github.com/prometheus/prometheus/issues/6178) for details:
|
||||
|
||||
* Any number [time series selectors](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors) via `match[]` query arg.
|
||||
* Optional `start` and `end` query args for limiting the time range for the selected labels or label values.
|
||||
|
||||
See [this feature request](https://github.com/prometheus/prometheus/issues/6178) for details.
|
||||
|
||||
Additionally VictoriaMetrics provides the following handlers:
|
||||
|
||||
* `/api/v1/series/count` - returns the total number of time series in the database. Some notes:
|
||||
|
@ -1367,6 +1375,8 @@ See the example of alerting rules for VM components [here](https://github.com/Vi
|
|||
VictoriaMetrics accepts optional `date=YYYY-MM-DD` and `topN=42` args on this page. By default `date` equals to the current date,
|
||||
while `topN` equals to 10.
|
||||
|
||||
* New time series can be logged if `-logNewSeries` command-line flag is passed to VictoriaMetrics.
|
||||
|
||||
* VictoriaMetrics limits the number of labels per each metric with `-maxLabelsPerTimeseries` command-line flag.
|
||||
This prevents from ingesting metrics with too many labels. It is recommended [monitoring](#monitoring) `vm_metrics_with_dropped_labels_total`
|
||||
metric in order to determine whether `-maxLabelsPerTimeseries` must be adjusted for your workload.
|
||||
|
@ -1538,3 +1548,248 @@ Files included in each folder:
|
|||
* There should be sufficient clear space around the logo.
|
||||
* Do not change spacing, alignment, or relative locations of the design elements.
|
||||
* Do not change the proportions of any of the design elements or the design itself. You may resize as needed but must retain all proportions.
|
||||
|
||||
|
||||
## List of command-line flags
|
||||
|
||||
Pass `-help` to VictoriaMetrics in order to see the list of supported command-line flags with their description:
|
||||
|
||||
```
|
||||
-bigMergeConcurrency int
|
||||
The maximum number of CPU cores to use for big merges. Default value is used if set to 0
|
||||
-csvTrimTimestamp duration
|
||||
Trim timestamps when importing csv data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
|
||||
-dedup.minScrapeInterval duration
|
||||
Remove superflouos samples from time series if they are located closer to each other than this duration. This may be useful for reducing overhead when multiple identically configured Prometheus instances write data to the same VictoriaMetrics. Deduplication is disabled if the -dedup.minScrapeInterval is 0
|
||||
-deleteAuthKey string
|
||||
authKey for metrics' deletion via /api/v1/admin/tsdb/delete_series and /tags/delSeries
|
||||
-denyQueriesOutsideRetention
|
||||
Whether to deny queries outside of the configured -retentionPeriod. When set, then /api/v1/query_range would return '503 Service Unavailable' error for queries with 'from' value outside -retentionPeriod. This may be useful when multiple data sources with distinct retentions are hidden behind query-tee
|
||||
-dryRun
|
||||
Whether to check only -promscrape.config and then exit. Unknown config entries are allowed in -promscrape.config by default. This can be changed with -promscrape.config.strictParse
|
||||
-enableTCP6
|
||||
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP is used
|
||||
-envflag.enable
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set
|
||||
-envflag.prefix string
|
||||
Prefix for environment variables if -envflag.enable is set
|
||||
-finalMergeDelay duration
|
||||
The delay before starting final merge for per-month partition after no new data is ingested into it. Final merge may require additional disk IO and CPU resources. Final merge may increase query speed and reduce disk space usage in some cases. Zero value disables final merge
|
||||
-forceFlushAuthKey string
|
||||
authKey, which must be passed in query string to /internal/force_flush pages
|
||||
-forceMergeAuthKey string
|
||||
authKey, which must be passed in query string to /internal/force_merge pages
|
||||
-fs.disableMmap
|
||||
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
|
||||
-graphiteListenAddr string
|
||||
TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty
|
||||
-graphiteTrimTimestamp duration
|
||||
Trim timestamps for Graphite data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
|
||||
-http.connTimeout duration
|
||||
Incoming http connections are closed after the configured timeout. This may help spreading incoming load among a cluster of services behind load balancer. Note that the real timeout may be bigger by up to 10% as a protection from Thundering herd problem (default 2m0s)
|
||||
-http.disableResponseCompression
|
||||
Disable compression of HTTP responses for saving CPU resources. By default compression is enabled to save network bandwidth
|
||||
-http.idleConnTimeout duration
|
||||
Timeout for incoming idle http connections (default 1m0s)
|
||||
-http.maxGracefulShutdownDuration duration
|
||||
The maximum duration for graceful shutdown of HTTP server. Highly loaded server may require increased value for graceful shutdown (default 7s)
|
||||
-http.pathPrefix string
|
||||
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
|
||||
-http.shutdownDelay duration
|
||||
Optional delay before http server shutdown. During this dealy the servier returns non-OK responses from /health page, so load balancers can route new requests to other servers
|
||||
-httpAuth.password string
|
||||
Password for HTTP Basic Auth. The authentication is disabled if -httpAuth.username is empty
|
||||
-httpAuth.username string
|
||||
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
|
||||
-httpListenAddr string
|
||||
TCP address to listen for http connections (default ":8428")
|
||||
-import.maxLineLen size
|
||||
The maximum length in bytes of a single line accepted by /api/v1/import; the line length can be limited with 'max_rows_per_line' query arg passed to /api/v1/export
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 104857600)
|
||||
-influx.databaseNames array
|
||||
Comma-separated list of database names to return from /query and /influx/query API. This can be needed for accepting data from Telegraf plugins such as https://github.com/fangli/fluent-plugin-influxdb
|
||||
Supports array of values separated by comma or specified via multiple flags.
|
||||
-influx.maxLineSize size
|
||||
The maximum size in bytes for a single Influx line during parsing
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 262144)
|
||||
-influxListenAddr string
|
||||
TCP and UDP address to listen for Influx line protocol data. Usually :8189 must be set. Doesn't work if empty. This flag isn't needed when ingesting data over HTTP - just send it to http://<victoriametrics>:8428/write
|
||||
-influxMeasurementFieldSeparator string
|
||||
Separator for '{measurement}{separator}{field_name}' metric name when inserted via Influx line protocol (default "_")
|
||||
-influxSkipMeasurement
|
||||
Uses '{field_name}' as a metric name while ignoring '{measurement}' and '-influxMeasurementFieldSeparator'
|
||||
-influxSkipSingleField
|
||||
Uses '{measurement}' instead of '{measurement}{separator}{field_name}' for metic name if Influx line contains only a single field
|
||||
-influxTrimTimestamp duration
|
||||
Trim timestamps for Influx line protocol data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
|
||||
-insert.maxQueueDuration duration
|
||||
The maximum duration for waiting in the queue for insert requests due to -maxConcurrentInserts (default 1m0s)
|
||||
-loggerDisableTimestamps
|
||||
Whether to disable writing timestamps in logs
|
||||
-loggerErrorsPerSecondLimit int
|
||||
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, then the remaining errors are suppressed. Zero value disables the rate limit
|
||||
-loggerFormat string
|
||||
Format for logs. Possible values: default, json (default "default")
|
||||
-loggerLevel string
|
||||
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
|
||||
-loggerOutput string
|
||||
Output for the logs. Supported values: stderr, stdout (default "stderr")
|
||||
-loggerTimezone string
|
||||
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
|
||||
-loggerWarnsPerSecondLimit int
|
||||
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
|
||||
-maxConcurrentInserts int
|
||||
The maximum number of concurrent inserts. Default value should work for most cases, since it minimizes the overhead for concurrent inserts. This option is tigthly coupled with -insert.maxQueueDuration (default 16)
|
||||
-maxInsertRequestSize size
|
||||
The maximum size in bytes of a single Prometheus remote_write API request
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
|
||||
-maxLabelsPerTimeseries int
|
||||
The maximum number of labels accepted per time series. Superfluous labels are dropped (default 30)
|
||||
-memory.allowedBytes size
|
||||
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-memory.allowedPercent float
|
||||
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
|
||||
-metricsAuthKey string
|
||||
Auth key for /metrics. It overrides httpAuth settings
|
||||
-opentsdbHTTPListenAddr string
|
||||
TCP address to listen for OpentTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty
|
||||
-opentsdbListenAddr string
|
||||
TCP and UDP address to listen for OpentTSDB metrics. Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. Usually :4242 must be set. Doesn't work if empty
|
||||
-opentsdbTrimTimestamp duration
|
||||
Trim timestamps for OpenTSDB 'telnet put' data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
|
||||
-opentsdbhttp.maxInsertRequestSize size
|
||||
The maximum size of OpenTSDB HTTP put request
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
|
||||
-opentsdbhttpTrimTimestamp duration
|
||||
Trim timestamps for OpenTSDB HTTP data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
|
||||
-pprofAuthKey string
|
||||
Auth key for /debug/pprof. It overrides httpAuth settings
|
||||
-precisionBits int
|
||||
The number of precision bits to store per each value. Lower precision bits improves data compression at the cost of precision loss (default 64)
|
||||
-promscrape.cluster.memberNum int
|
||||
The number of number in the cluster of scrapers. It must be an unique value in the range 0 ... promscrape.cluster.membersCount-1 across scrapers in the cluster
|
||||
-promscrape.cluster.membersCount int
|
||||
The number of members in a cluster of scrapers. Each member must have an unique -promscrape.cluster.memberNum in the range 0 ... promscrape.cluster.membersCount-1 . Each member then scrapes roughly 1/N of all the targets. By default cluster scraping is disabled, i.e. a single scraper scrapes all the targets
|
||||
-promscrape.cluster.replicationFactor int
|
||||
The number of members in the cluster, which scrape the same targets. If the replication factor is greater than 2, then the deduplication must be enabled at remote storage side. See https://victoriametrics.github.io/#deduplication (default 1)
|
||||
-promscrape.config string
|
||||
Optional path to Prometheus config file with 'scrape_configs' section containing targets to scrape. See https://victoriametrics.github.io/#how-to-scrape-prometheus-exporters-such-as-node-exporter for details
|
||||
-promscrape.config.dryRun
|
||||
Checks -promscrape.config file for errors and unsupported fields and then exits. Returns non-zero exit code on parsing errors and emits these errors to stderr. See also -promscrape.config.strictParse command-line flag. Pass -loggerLevel=ERROR if you don't need to see info messages in the output.
|
||||
-promscrape.config.strictParse
|
||||
Whether to allow only supported fields in -promscrape.config . By default unsupported fields are silently skipped
|
||||
-promscrape.configCheckInterval duration
|
||||
Interval for checking for changes in '-promscrape.config' file. By default the checking is disabled. Send SIGHUP signal in order to force config check for changes
|
||||
-promscrape.consulSDCheckInterval duration
|
||||
Interval for checking for changes in Consul. This works only if consul_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config for details (default 30s)
|
||||
-promscrape.disableCompression
|
||||
Whether to disable sending 'Accept-Encoding: gzip' request headers to all the scrape targets. This may reduce CPU usage on scrape targets at the cost of higher network bandwidth utilization. It is possible to set 'disable_compression: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
|
||||
-promscrape.disableKeepAlive
|
||||
Whether to disable HTTP keep-alive connections when scraping all the targets. This may be useful when targets has no support for HTTP keep-alive connection. It is possible to set 'disable_keepalive: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control. Note that disabling HTTP keep-alive may increase load on both vmagent and scrape targets
|
||||
-promscrape.discovery.concurrency int
|
||||
The maximum number of concurrent requests to Prometheus autodiscovery API (Consul, Kubernetes, etc.) (default 100)
|
||||
-promscrape.discovery.concurrentWaitTime duration
|
||||
The maximum duration for waiting to perform API requests if more than -promscrape.discovery.concurrency requests are simultaneously performed (default 1m0s)
|
||||
-promscrape.dnsSDCheckInterval duration
|
||||
Interval for checking for changes in dns. This works only if dns_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config for details (default 30s)
|
||||
-promscrape.dockerswarmSDCheckInterval duration
|
||||
Interval for checking for changes in dockerswarm. This works only if dockerswarm_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dockerswarm_sd_config for details (default 30s)
|
||||
-promscrape.dropOriginalLabels
|
||||
Whether to drop original labels for scrape targets at /targets and /api/v1/targets pages. This may be needed for reducing memory usage when original labels for big number of scrape targets occupy big amounts of memory. Note that this reduces debuggability for improper per-target relabeling configs
|
||||
-promscrape.ec2SDCheckInterval duration
|
||||
Interval for checking for changes in ec2. This works only if ec2_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ec2_sd_config for details (default 1m0s)
|
||||
-promscrape.eurekaSDCheckInterval duration
|
||||
Interval for checking for changes in eureka. This works only if eureka_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#eureka_sd_config for details (default 30s)
|
||||
-promscrape.fileSDCheckInterval duration
|
||||
Interval for checking for changes in 'file_sd_config'. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#file_sd_config for details (default 30s)
|
||||
-promscrape.gceSDCheckInterval duration
|
||||
Interval for checking for changes in gce. This works only if gce_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#gce_sd_config for details (default 1m0s)
|
||||
-promscrape.kubernetes.apiServerTimeout duration
|
||||
How frequently to reload the full state from Kuberntes API server (default 30m0s)
|
||||
-promscrape.kubernetesSDCheckInterval duration
|
||||
Interval for checking for changes in Kubernetes API server. This works only if kubernetes_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config for details (default 30s)
|
||||
-promscrape.maxDroppedTargets int
|
||||
The maximum number of droppedTargets shown at /api/v1/targets page. Increase this value if your setup drops more scrape targets during relabeling and you need investigating labels for all the dropped targets. Note that the increased number of tracked dropped targets may result in increased memory usage (default 1000)
|
||||
-promscrape.maxScrapeSize size
|
||||
The maximum size of scrape response in bytes to process from Prometheus targets. Bigger responses are rejected
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 16777216)
|
||||
-promscrape.openstackSDCheckInterval duration
|
||||
Interval for checking for changes in openstack API server. This works only if openstack_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#openstack_sd_config for details (default 30s)
|
||||
-promscrape.streamParse
|
||||
Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful for reducing memory usage when millions of metrics are exposed per each scrape target. It is posible to set 'stream_parse: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
|
||||
-promscrape.suppressDuplicateScrapeTargetErrors
|
||||
Whether to suppress 'duplicate scrape target' errors; see https://victoriametrics.github.io/vmagent.html#troubleshooting for details
|
||||
-promscrape.suppressScrapeErrors
|
||||
Whether to suppress scrape errors logging. The last error for each target is always available at '/targets' page even if scrape errors logging is suppressed
|
||||
-relabelConfig string
|
||||
Optional path to a file with relabeling rules, which are applied to all the ingested metrics. See https://victoriametrics.github.io/#relabeling for details
|
||||
-retentionPeriod value
|
||||
Data with timestamps outside the retentionPeriod is automatically deleted
|
||||
The following optional suffixes are supported: h (hour), d (day), w (week), y (year). If suffix isn't set, then the duration is counted in months (default 1)
|
||||
-search.cacheTimestampOffset duration
|
||||
The maximum duration since the current time for response data, which is always queried from the original raw data, without using the response cache. Increase this value if you see gaps in responses due to time synchronization issues between VictoriaMetrics and data sources (default 5m0s)
|
||||
-search.disableCache
|
||||
Whether to disable response caching. This may be useful during data backfilling
|
||||
-search.latencyOffset duration
|
||||
The time when data points become visible in query results after the collection. Too small value can result in incomplete last points for query results (default 30s)
|
||||
-search.logSlowQueryDuration duration
|
||||
Log queries with execution time exceeding this value. Zero disables slow query logging (default 5s)
|
||||
-search.maxConcurrentRequests int
|
||||
The maximum number of concurrent search requests. It shouldn't be high, since a single request can saturate all the CPU cores. See also -search.maxQueueDuration (default 8)
|
||||
-search.maxExportDuration duration
|
||||
The maximum duration for /api/v1/export call (default 720h0m0s)
|
||||
-search.maxLookback duration
|
||||
Synonim to -search.lookback-delta from Prometheus. The value is dynamically detected from interval between time series datapoints if not set. It can be overridden on per-query basis via max_lookback arg. See also '-search.maxStalenessInterval' flag, which has the same meaining due to historical reasons
|
||||
-search.maxPointsPerTimeseries int
|
||||
The maximum points per a single timeseries returned from /api/v1/query_range. This option doesn't limit the number of scanned raw samples in the database. The main purpose of this option is to limit the number of per-series points returned to graphing UI such as Grafana. There is no sense in setting this limit to values significantly exceeding horizontal resoultion of the graph (default 30000)
|
||||
-search.maxQueryDuration duration
|
||||
The maximum duration for query execution (default 30s)
|
||||
-search.maxQueryLen size
|
||||
The maximum search query length in bytes
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 16384)
|
||||
-search.maxQueueDuration duration
|
||||
The maximum time the request waits for execution when -search.maxConcurrentRequests limit is reached; see also -search.maxQueryDuration (default 10s)
|
||||
-search.maxStalenessInterval duration
|
||||
The maximum interval for staleness calculations. By default it is automatically calculated from the median interval between samples. This flag could be useful for tuning Prometheus data model closer to Influx-style data model. See https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness for details. See also '-search.maxLookback' flag, which has the same meaning due to historical reasons
|
||||
-search.maxStepForPointsAdjustment duration
|
||||
The maximum step when /api/v1/query_range handler adjusts points with timestamps closer than -search.latencyOffset to the current time. The adjustment is needed because such points may contain incomplete data (default 1m0s)
|
||||
-search.maxTagKeys int
|
||||
The maximum number of tag keys returned from /api/v1/labels (default 100000)
|
||||
-search.maxTagValueSuffixesPerSearch int
|
||||
The maximum number of tag value suffixes returned from /metrics/find (default 100000)
|
||||
-search.maxTagValues int
|
||||
The maximum number of tag values returned from /api/v1/label/<label_name>/values (default 100000)
|
||||
-search.maxUniqueTimeseries int
|
||||
The maximum number of unique time series each search can scan (default 300000)
|
||||
-search.minStalenessInterval duration
|
||||
The minimum interval for staleness calculations. This flag could be useful for removing gaps on graphs generated from time series with irregular intervals between samples. See also '-search.maxStalenessInterval'
|
||||
-search.queryStats.lastQueriesCount int
|
||||
Query stats for /api/v1/status/top_queries is tracked on this number of last queries. Zero value disables query stats tracking (default 20000)
|
||||
-search.queryStats.minQueryDuration int
|
||||
The minimum duration for queries to track in query stats at /api/v1/status/top_queries. Queries with lower duration are ignored in query stats
|
||||
-search.resetCacheAuthKey string
|
||||
Optional authKey for resetting rollup cache via /internal/resetRollupResultCache call
|
||||
-search.treatDotsAsIsInRegexps
|
||||
Whether to treat dots as is in regexp label filters used in queries. For example, foo{bar=~"a.b.c"} will be automatically converted to foo{bar=~"a\\.b\\.c"}, i.e. all the dots in regexp filters will be automatically escaped in order to match only dot char instead of matching any char. Dots in ".+", ".*" and ".{n}" regexps aren't escaped. This option is DEPRECATED in favor of {__graphite__="a.*.c"} syntax for selecting metrics matching the given Graphite metrics filter
|
||||
-selfScrapeInstance string
|
||||
Value for 'instance' label, which is added to self-scraped metrics (default "self")
|
||||
-selfScrapeInterval duration
|
||||
Interval for self-scraping own metrics at /metrics page
|
||||
-selfScrapeJob string
|
||||
Value for 'job' label, which is added to self-scraped metrics (default "victoria-metrics")
|
||||
-smallMergeConcurrency int
|
||||
The maximum number of CPU cores to use for small merges. Default value is used if set to 0
|
||||
-snapshotAuthKey string
|
||||
authKey, which must be passed in query string to /snapshot* pages
|
||||
-storageDataPath string
|
||||
Path to storage data (default "victoria-metrics-data")
|
||||
-tls
|
||||
Whether to enable TLS (aka HTTPS) for incoming requests. -tlsCertFile and -tlsKeyFile must be set if -tls is set
|
||||
-tlsCertFile string
|
||||
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs, since RSA certs are slow
|
||||
-tlsKeyFile string
|
||||
Path to file with TLS key. Used only if -tls is set
|
||||
-version
|
||||
Show VictoriaMetrics version
|
||||
```
|
||||
|
|
|
@ -255,6 +255,41 @@ If each target is scraped by multiple `vmagent` instances, then data deduplicati
|
|||
See [these docs](https://victoriametrics.github.io/#deduplication) for details.
|
||||
|
||||
|
||||
## Scraping targets via a proxy
|
||||
|
||||
`vmagent` supports scraping targets via http and https proxies. Proxy address must be specified in `proxy_url` option. For example, the following scrape config instructs
|
||||
target scraping via https proxy at `https://proxy-addr:1234`:
|
||||
|
||||
```yml
|
||||
scrape_configs:
|
||||
- job_name: foo
|
||||
proxy_url: https://proxy-addr:1234
|
||||
```
|
||||
|
||||
Proxy can be configured with the following optional settings:
|
||||
|
||||
* `proxy_bearer_token` and `proxy_bearer_token_file` for Bearer token authorization
|
||||
* `proxy_basic_auth` for Basic authorization. See [these docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config).
|
||||
* `proxy_tls_config` for TLS config. See [these docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#tls_config).
|
||||
|
||||
For example:
|
||||
|
||||
```yml
|
||||
scrape_configs:
|
||||
- job_name: foo
|
||||
proxy_url: https://proxy-addr:1234
|
||||
proxy_basic_auth:
|
||||
username: foobar
|
||||
password: secret
|
||||
proxy_tls_config:
|
||||
insecure_skip_verify: true
|
||||
cert_file: /path/to/cert
|
||||
key_file: /path/to/key
|
||||
ca_file: /path/to/ca
|
||||
server_name: real-server-name
|
||||
```
|
||||
|
||||
|
||||
## Monitoring
|
||||
|
||||
`vmagent` exports various metrics in Prometheus exposition format at `http://vmagent-host:8429/metrics` page. We recommend setting up regular scraping of this page
|
||||
|
@ -477,13 +512,16 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
|
|||
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
|
||||
-httpListenAddr string
|
||||
TCP address to listen for http connections. Set this flag to empty value in order to disable listening on any port. This mode may be useful for running multiple vmagent instances on the same server. Note that /targets and /metrics pages aren't available if -httpListenAddr='' (default ":8429")
|
||||
-import.maxLineLen max_rows_per_line
|
||||
The maximum length in bytes of a single line accepted by /api/v1/import; the line length can be limited with max_rows_per_line query arg passed to /api/v1/export
|
||||
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 104857600)
|
||||
-influx.maxLineSize value
|
||||
-import.maxLineLen size
|
||||
The maximum length in bytes of a single line accepted by /api/v1/import; the line length can be limited with 'max_rows_per_line' query arg passed to /api/v1/export
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 104857600)
|
||||
-influx.databaseNames array
|
||||
Comma-separated list of database names to return from /query and /influx/query API. This can be needed for accepting data from Telegraf plugins such as https://github.com/fangli/fluent-plugin-influxdb
|
||||
Supports array of values separated by comma or specified via multiple flags.
|
||||
-influx.maxLineSize size
|
||||
The maximum size in bytes for a single Influx line during parsing
|
||||
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 262144)
|
||||
-influxListenAddr http://<vmagent>:8429/write
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 262144)
|
||||
-influxListenAddr string
|
||||
TCP and UDP address to listen for Influx line protocol data. Usually :8189 must be set. Doesn't work if empty. This flag isn't needed when ingesting data over HTTP - just send it to http://<vmagent>:8429/write
|
||||
-influxMeasurementFieldSeparator string
|
||||
Separator for '{measurement}{separator}{field_name}' metric name when inserted via Influx line protocol (default "_")
|
||||
|
@ -511,12 +549,12 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
|
|||
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
|
||||
-maxConcurrentInserts int
|
||||
The maximum number of concurrent inserts. Default value should work for most cases, since it minimizes the overhead for concurrent inserts. This option is tigthly coupled with -insert.maxQueueDuration (default 16)
|
||||
-maxInsertRequestSize value
|
||||
-maxInsertRequestSize size
|
||||
The maximum size in bytes of a single Prometheus remote_write API request
|
||||
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
|
||||
-memory.allowedBytes value
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
|
||||
-memory.allowedBytes size
|
||||
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
|
||||
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-memory.allowedPercent float
|
||||
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
|
||||
-metricsAuthKey string
|
||||
|
@ -527,9 +565,9 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
|
|||
TCP and UDP address to listen for OpentTSDB metrics. Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. Usually :4242 must be set. Doesn't work if empty
|
||||
-opentsdbTrimTimestamp duration
|
||||
Trim timestamps for OpenTSDB 'telnet put' data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
|
||||
-opentsdbhttp.maxInsertRequestSize value
|
||||
-opentsdbhttp.maxInsertRequestSize size
|
||||
The maximum size of OpenTSDB HTTP put request
|
||||
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
|
||||
-opentsdbhttpTrimTimestamp duration
|
||||
Trim timestamps for OpenTSDB HTTP data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
|
||||
-pprofAuthKey string
|
||||
|
@ -538,6 +576,8 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
|
|||
The number of number in the cluster of scrapers. It must be an unique value in the range 0 ... promscrape.cluster.membersCount-1 across scrapers in the cluster
|
||||
-promscrape.cluster.membersCount int
|
||||
The number of members in a cluster of scrapers. Each member must have an unique -promscrape.cluster.memberNum in the range 0 ... promscrape.cluster.membersCount-1 . Each member then scrapes roughly 1/N of all the targets. By default cluster scraping is disabled, i.e. a single scraper scrapes all the targets
|
||||
-promscrape.cluster.replicationFactor int
|
||||
The number of members in the cluster, which scrape the same targets. If the replication factor is greater than 2, then the deduplication must be enabled at remote storage side. See https://victoriametrics.github.io/#deduplication (default 1)
|
||||
-promscrape.config string
|
||||
Optional path to Prometheus config file with 'scrape_configs' section containing targets to scrape. See https://victoriametrics.github.io/#how-to-scrape-prometheus-exporters-such-as-node-exporter for details
|
||||
-promscrape.config.dryRun
|
||||
|
@ -546,45 +586,45 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
|
|||
Whether to allow only supported fields in -promscrape.config . By default unsupported fields are silently skipped
|
||||
-promscrape.configCheckInterval duration
|
||||
Interval for checking for changes in '-promscrape.config' file. By default the checking is disabled. Send SIGHUP signal in order to force config check for changes
|
||||
-promscrape.consulSDCheckInterval consul_sd_configs
|
||||
-promscrape.consulSDCheckInterval duration
|
||||
Interval for checking for changes in Consul. This works only if consul_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config for details (default 30s)
|
||||
-promscrape.disableCompression
|
||||
Whether to disable sending 'Accept-Encoding: gzip' request headers to all the scrape targets. This may reduce CPU usage on scrape targets at the cost of higher network bandwidth utilization. It is possible to set 'disable_compression: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
|
||||
-promscrape.disableKeepAlive disable_keepalive: true
|
||||
Whether to disable HTTP keep-alive connections when scraping all the targets. This may be useful when targets has no support for HTTP keep-alive connection. It is possible to set disable_keepalive: true individually per each 'scrape_config` section in '-promscrape.config' for fine grained control. Note that disabling HTTP keep-alive may increase load on both vmagent and scrape targets
|
||||
-promscrape.disableKeepAlive
|
||||
Whether to disable HTTP keep-alive connections when scraping all the targets. This may be useful when targets has no support for HTTP keep-alive connection. It is possible to set 'disable_keepalive: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control. Note that disabling HTTP keep-alive may increase load on both vmagent and scrape targets
|
||||
-promscrape.discovery.concurrency int
|
||||
The maximum number of concurrent requests to Prometheus autodiscovery API (Consul, Kubernetes, etc.) (default 100)
|
||||
-promscrape.discovery.concurrentWaitTime duration
|
||||
The maximum duration for waiting to perform API requests if more than -promscrape.discovery.concurrency requests are simultaneously performed (default 1m0s)
|
||||
-promscrape.dnsSDCheckInterval dns_sd_configs
|
||||
-promscrape.dnsSDCheckInterval duration
|
||||
Interval for checking for changes in dns. This works only if dns_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config for details (default 30s)
|
||||
-promscrape.dockerswarmSDCheckInterval dockerswarm_sd_configs
|
||||
-promscrape.dockerswarmSDCheckInterval duration
|
||||
Interval for checking for changes in dockerswarm. This works only if dockerswarm_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dockerswarm_sd_config for details (default 30s)
|
||||
-promscrape.dropOriginalLabels
|
||||
Whether to drop original labels for scrape targets at /targets and /api/v1/targets pages. This may be needed for reducing memory usage when original labels for big number of scrape targets occupy big amounts of memory. Note that this reduces debuggability for improper per-target relabeling configs
|
||||
-promscrape.ec2SDCheckInterval ec2_sd_configs
|
||||
-promscrape.ec2SDCheckInterval duration
|
||||
Interval for checking for changes in ec2. This works only if ec2_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ec2_sd_config for details (default 1m0s)
|
||||
-promscrape.eurekaSDCheckInterval eureka_sd_configs
|
||||
-promscrape.eurekaSDCheckInterval duration
|
||||
Interval for checking for changes in eureka. This works only if eureka_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#eureka_sd_config for details (default 30s)
|
||||
-promscrape.fileSDCheckInterval duration
|
||||
Interval for checking for changes in 'file_sd_config'. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#file_sd_config for details (default 30s)
|
||||
-promscrape.gceSDCheckInterval gce_sd_configs
|
||||
-promscrape.gceSDCheckInterval duration
|
||||
Interval for checking for changes in gce. This works only if gce_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#gce_sd_config for details (default 1m0s)
|
||||
-promscrape.kubernetes.apiServerTimeout duration
|
||||
How frequently to reload the full state from Kuberntes API server (default 10m0s)
|
||||
-promscrape.kubernetesSDCheckInterval kubernetes_sd_configs
|
||||
How frequently to reload the full state from Kuberntes API server (default 30m0s)
|
||||
-promscrape.kubernetesSDCheckInterval duration
|
||||
Interval for checking for changes in Kubernetes API server. This works only if kubernetes_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config for details (default 30s)
|
||||
-promscrape.maxDroppedTargets droppedTargets
|
||||
-promscrape.maxDroppedTargets int
|
||||
The maximum number of droppedTargets shown at /api/v1/targets page. Increase this value if your setup drops more scrape targets during relabeling and you need investigating labels for all the dropped targets. Note that the increased number of tracked dropped targets may result in increased memory usage (default 1000)
|
||||
-promscrape.maxScrapeSize value
|
||||
-promscrape.maxScrapeSize size
|
||||
The maximum size of scrape response in bytes to process from Prometheus targets. Bigger responses are rejected
|
||||
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 16777216)
|
||||
-promscrape.openstackSDCheckInterval openstack_sd_configs
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 16777216)
|
||||
-promscrape.openstackSDCheckInterval duration
|
||||
Interval for checking for changes in openstack API server. This works only if openstack_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#openstack_sd_config for details (default 30s)
|
||||
-promscrape.streamParse stream_parse: true
|
||||
Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful for reducing memory usage when millions of metrics are exposed per each scrape target. It is posible to set stream_parse: true individually per each `scrape_config` section in `-promscrape.config` for fine grained control
|
||||
-promscrape.suppressDuplicateScrapeTargetErrors duplicate scrape target
|
||||
Whether to suppress duplicate scrape target errors; see https://victoriametrics.github.io/vmagent.html#troubleshooting for details
|
||||
-promscrape.streamParse
|
||||
Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful for reducing memory usage when millions of metrics are exposed per each scrape target. It is posible to set 'stream_parse: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
|
||||
-promscrape.suppressDuplicateScrapeTargetErrors
|
||||
Whether to suppress 'duplicate scrape target' errors; see https://victoriametrics.github.io/vmagent.html#troubleshooting for details
|
||||
-promscrape.suppressScrapeErrors
|
||||
Whether to suppress scrape errors logging. The last error for each target is always available at '/targets' page even if scrape errors logging is suppressed
|
||||
-remoteWrite.basicAuth.password array
|
||||
|
@ -601,12 +641,12 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
|
|||
-remoteWrite.label array
|
||||
Optional label in the form 'name=value' to add to all the metrics before sending them to -remoteWrite.url. Pass multiple -remoteWrite.label flags in order to add multiple flags to metrics before sending them to remote storage
|
||||
Supports array of values separated by comma or specified via multiple flags.
|
||||
-remoteWrite.maxBlockSize value
|
||||
-remoteWrite.maxBlockSize size
|
||||
The maximum size in bytes of unpacked request to send to remote storage. It shouldn't exceed -maxInsertRequestSize from VictoriaMetrics
|
||||
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 8388608)
|
||||
-remoteWrite.maxDiskUsagePerURL value
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 8388608)
|
||||
-remoteWrite.maxDiskUsagePerURL size
|
||||
The maximum file-based buffer size in bytes at -remoteWrite.tmpDataPath for each -remoteWrite.url. When buffer size reaches the configured maximum, then old data is dropped when adding new data to the buffer. Buffered data is stored in ~500MB chunks, so the minimum practical value for this flag is 500000000. Disk usage is unlimited if the value is set to 0
|
||||
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-remoteWrite.proxyURL array
|
||||
Optional proxy URL for writing data to -remoteWrite.url. Supported proxies: http, https, socks5. Example: -remoteWrite.proxyURL=socks5://proxy:1234
|
||||
Supports array of values separated by comma or specified via multiple flags.
|
||||
|
|
|
@ -23,6 +23,7 @@ import (
|
|||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envflag"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/influxutils"
|
||||
graphiteserver "github.com/VictoriaMetrics/VictoriaMetrics/lib/ingestserver/graphite"
|
||||
influxserver "github.com/VictoriaMetrics/VictoriaMetrics/lib/ingestserver/influx"
|
||||
opentsdbserver "github.com/VictoriaMetrics/VictoriaMetrics/lib/ingestserver/opentsdb"
|
||||
|
@ -40,7 +41,7 @@ var (
|
|||
"Set this flag to empty value in order to disable listening on any port. This mode may be useful for running multiple vmagent instances on the same server. "+
|
||||
"Note that /targets and /metrics pages aren't available if -httpListenAddr=''")
|
||||
influxListenAddr = flag.String("influxListenAddr", "", "TCP and UDP address to listen for Influx line protocol data. Usually :8189 must be set. Doesn't work if empty. "+
|
||||
"This flag isn't needed when ingesting data over HTTP - just send it to `http://<vmagent>:8429/write`")
|
||||
"This flag isn't needed when ingesting data over HTTP - just send it to http://<vmagent>:8429/write")
|
||||
graphiteListenAddr = flag.String("graphiteListenAddr", "", "TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty")
|
||||
opentsdbListenAddr = flag.String("opentsdbListenAddr", "", "TCP and UDP address to listen for OpentTSDB metrics. "+
|
||||
"Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. "+
|
||||
|
@ -204,10 +205,8 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
|
|||
w.WriteHeader(http.StatusNoContent)
|
||||
return true
|
||||
case "/query":
|
||||
// Emulate fake response for influx query.
|
||||
// This is required for TSBS benchmark.
|
||||
influxQueryRequests.Inc()
|
||||
fmt.Fprintf(w, `{"results":[{"series":[{"values":[]}]}]}`)
|
||||
influxutils.WriteDatabaseNames(w)
|
||||
return true
|
||||
case "/targets":
|
||||
promscrapeTargetsRequests.Inc()
|
||||
|
|
|
@ -232,7 +232,7 @@ The shortlist of configuration flags is the following:
|
|||
How often to evaluate the rules (default 1m0s)
|
||||
-external.alert.source string
|
||||
External Alert Source allows to override the Source link for alerts sent to AlertManager for cases where you want to build a custom link to Grafana, Prometheus or any other service.
|
||||
eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{{$expr|quotesEscape|crlfEscape|pathEscape}}\"},{\"mode\":\"Metrics\"},{\"ui\":[true,true,true,\"none\"]}]'.If empty '/api/v1/:groupID/alertID/status' is used
|
||||
eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{{$expr|quotesEscape|crlfEscape|queryEscape}}\"},{\"mode\":\"Metrics\"},{\"ui\":[true,true,true,\"none\"]}]'.If empty '/api/v1/:groupID/alertID/status' is used
|
||||
-external.label array
|
||||
Optional label in the form 'name=value' to add to all generated recording rules and alerts. Pass multiple -label flags in order to add multiple label sets.
|
||||
Supports array of values separated by comma or specified via multiple flags.
|
||||
|
@ -272,9 +272,9 @@ The shortlist of configuration flags is the following:
|
|||
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
|
||||
-loggerWarnsPerSecondLimit int
|
||||
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
|
||||
-memory.allowedBytes value
|
||||
-memory.allowedBytes size
|
||||
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
|
||||
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-memory.allowedPercent float
|
||||
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
|
||||
-metricsAuthKey string
|
||||
|
|
|
@ -41,7 +41,7 @@ Rule files may contain %{ENV_VAR} placeholders, which are substituted by the cor
|
|||
validateExpressions = flag.Bool("rule.validateExpressions", true, "Whether to validate rules expressions via MetricsQL engine")
|
||||
externalURL = flag.String("external.url", "", "External URL is used as alert's source for sent alerts to the notifier")
|
||||
externalAlertSource = flag.String("external.alert.source", "", `External Alert Source allows to override the Source link for alerts sent to AlertManager for cases where you want to build a custom link to Grafana, Prometheus or any other service.
|
||||
eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{{$expr|quotesEscape|crlfEscape|pathEscape}}\"},{\"mode\":\"Metrics\"},{\"ui\":[true,true,true,\"none\"]}]'.If empty '/api/v1/:groupID/alertID/status' is used`)
|
||||
eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{{$expr|quotesEscape|crlfEscape|queryEscape}}\"},{\"mode\":\"Metrics\"},{\"ui\":[true,true,true,\"none\"]}]'.If empty '/api/v1/:groupID/alertID/status' is used`)
|
||||
externalLabels = flagutil.NewArray("external.label", "Optional label in the form 'name=value' to add to all generated recording rules and alerts. "+
|
||||
"Pass multiple -label flags in order to add multiple label sets.")
|
||||
|
||||
|
|
|
@ -208,9 +208,9 @@ See the docs at https://victoriametrics.github.io/vmauth.html .
|
|||
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
|
||||
-loggerWarnsPerSecondLimit int
|
||||
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
|
||||
-memory.allowedBytes value
|
||||
-memory.allowedBytes size
|
||||
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
|
||||
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-memory.allowedPercent float
|
||||
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
|
||||
-metricsAuthKey string
|
||||
|
|
|
@ -205,12 +205,12 @@ See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time-
|
|||
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
|
||||
-loggerWarnsPerSecondLimit int
|
||||
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
|
||||
-maxBytesPerSecond value
|
||||
-maxBytesPerSecond size
|
||||
The maximum upload speed. There is no limit if it is set to 0
|
||||
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-memory.allowedBytes value
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-memory.allowedBytes size
|
||||
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
|
||||
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-memory.allowedPercent float
|
||||
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
|
||||
-origin string
|
||||
|
|
|
@ -19,6 +19,7 @@ import (
|
|||
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/relabel"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/vmimport"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/influxutils"
|
||||
graphiteserver "github.com/VictoriaMetrics/VictoriaMetrics/lib/ingestserver/graphite"
|
||||
influxserver "github.com/VictoriaMetrics/VictoriaMetrics/lib/ingestserver/influx"
|
||||
opentsdbserver "github.com/VictoriaMetrics/VictoriaMetrics/lib/ingestserver/opentsdb"
|
||||
|
@ -34,7 +35,7 @@ import (
|
|||
var (
|
||||
graphiteListenAddr = flag.String("graphiteListenAddr", "", "TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty")
|
||||
influxListenAddr = flag.String("influxListenAddr", "", "TCP and UDP address to listen for Influx line protocol data. Usually :8189 must be set. Doesn't work if empty. "+
|
||||
"This flag isn't needed when ingesting data over HTTP - just send it to `http://<victoriametrics>:8428/write`")
|
||||
"This flag isn't needed when ingesting data over HTTP - just send it to http://<victoriametrics>:8428/write")
|
||||
opentsdbListenAddr = flag.String("opentsdbListenAddr", "", "TCP and UDP address to listen for OpentTSDB metrics. "+
|
||||
"Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. "+
|
||||
"Usually :4242 must be set. Doesn't work if empty")
|
||||
|
@ -147,10 +148,8 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
|
|||
w.WriteHeader(http.StatusNoContent)
|
||||
return true
|
||||
case "/influx/query", "/query":
|
||||
// Emulate fake response for influx query.
|
||||
// This is required for TSBS benchmark.
|
||||
influxQueryRequests.Inc()
|
||||
fmt.Fprintf(w, `{"results":[{"series":[{"values":[]}]}]}`)
|
||||
influxutils.WriteDatabaseNames(w)
|
||||
return true
|
||||
case "/prometheus/targets", "/targets":
|
||||
promscrapeTargetsRequests.Inc()
|
||||
|
|
|
@ -105,12 +105,12 @@ i.e. the end result would be similar to [rsync --delete](https://askubuntu.com/q
|
|||
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
|
||||
-loggerWarnsPerSecondLimit int
|
||||
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
|
||||
-maxBytesPerSecond value
|
||||
-maxBytesPerSecond size
|
||||
The maximum download speed. There is no limit if it is set to 0
|
||||
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-memory.allowedBytes value
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-memory.allowedBytes size
|
||||
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
|
||||
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-memory.allowedPercent float
|
||||
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
|
||||
-skipBackupCompleteCheck
|
||||
|
|
|
@ -968,6 +968,11 @@ func QueryHandler(startTime time.Time, w http.ResponseWriter, r *http.Request) e
|
|||
start -= offset
|
||||
end := start
|
||||
start = end - window
|
||||
// Do not include data point with a timestamp matching the lower boundary of the window as Prometheus does.
|
||||
start++
|
||||
if end < start {
|
||||
end = start
|
||||
}
|
||||
if err := exportHandler(w, []string{childQuery}, etf, start, end, "promapi", 0, false, deadline); err != nil {
|
||||
return fmt.Errorf("error when exporting data for query=%q on the time range (start=%d, end=%d): %w", childQuery, start, end, err)
|
||||
}
|
||||
|
@ -1017,6 +1022,7 @@ func QueryHandler(startTime time.Time, w http.ResponseWriter, r *http.Request) e
|
|||
QuotedRemoteAddr: httpserver.GetQuotedRemoteAddr(r),
|
||||
Deadline: deadline,
|
||||
LookbackDelta: lookbackDelta,
|
||||
RoundDigits: getRoundDigits(r),
|
||||
EnforcedTagFilters: etf,
|
||||
}
|
||||
result, err := promql.Exec(&ec, query, true)
|
||||
|
@ -1121,6 +1127,7 @@ func queryRangeHandler(startTime time.Time, w http.ResponseWriter, query string,
|
|||
Deadline: deadline,
|
||||
MayCache: mayCache,
|
||||
LookbackDelta: lookbackDelta,
|
||||
RoundDigits: getRoundDigits(r),
|
||||
EnforcedTagFilters: etf,
|
||||
}
|
||||
result, err := promql.Exec(&ec, query, false)
|
||||
|
@ -1297,6 +1304,18 @@ func getMatchesFromRequest(r *http.Request) []string {
|
|||
return matches
|
||||
}
|
||||
|
||||
func getRoundDigits(r *http.Request) int {
|
||||
s := r.FormValue("round_digits")
|
||||
if len(s) == 0 {
|
||||
return 100
|
||||
}
|
||||
n, err := strconv.Atoi(s)
|
||||
if err != nil {
|
||||
return 100
|
||||
}
|
||||
return n
|
||||
}
|
||||
|
||||
func getLatencyOffsetMilliseconds() int64 {
|
||||
d := latencyOffset.Milliseconds()
|
||||
if d <= 1000 {
|
||||
|
|
|
@ -98,11 +98,14 @@ type EvalConfig struct {
|
|||
// LookbackDelta is analog to `-query.lookback-delta` from Prometheus.
|
||||
LookbackDelta int64
|
||||
|
||||
timestamps []int64
|
||||
timestampsOnce sync.Once
|
||||
// How many decimal digits after the point to leave in response.
|
||||
RoundDigits int
|
||||
|
||||
// EnforcedTagFilters used for apply additional label filters to query.
|
||||
EnforcedTagFilters []storage.TagFilter
|
||||
|
||||
timestamps []int64
|
||||
timestampsOnce sync.Once
|
||||
}
|
||||
|
||||
// newEvalConfig returns new EvalConfig copy from src.
|
||||
|
@ -114,6 +117,7 @@ func newEvalConfig(src *EvalConfig) *EvalConfig {
|
|||
ec.Deadline = src.Deadline
|
||||
ec.MayCache = src.MayCache
|
||||
ec.LookbackDelta = src.LookbackDelta
|
||||
ec.RoundDigits = src.RoundDigits
|
||||
ec.EnforcedTagFilters = src.EnforcedTagFilters
|
||||
|
||||
// do not copy src.timestamps - they must be generated again.
|
||||
|
|
|
@ -12,6 +12,7 @@ import (
|
|||
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/netstorage"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/querystats"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/decimal"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
|
||||
"github.com/VictoriaMetrics/metrics"
|
||||
"github.com/VictoriaMetrics/metricsql"
|
||||
|
@ -72,6 +73,14 @@ func Exec(ec *EvalConfig, q string, isFirstPointOnly bool) ([]netstorage.Result,
|
|||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
if n := ec.RoundDigits; n < 100 {
|
||||
for i := range result {
|
||||
values := result[i].Values
|
||||
for j, v := range values {
|
||||
values[j] = decimal.RoundToDecimalDigits(v, n)
|
||||
}
|
||||
}
|
||||
}
|
||||
return result, err
|
||||
}
|
||||
|
||||
|
|
|
@ -61,6 +61,7 @@ func TestExecSuccess(t *testing.T) {
|
|||
End: end,
|
||||
Step: step,
|
||||
Deadline: searchutils.NewDeadline(time.Now(), time.Minute, ""),
|
||||
RoundDigits: 100,
|
||||
}
|
||||
for i := 0; i < 5; i++ {
|
||||
result, err := Exec(ec, q, false)
|
||||
|
@ -3653,6 +3654,210 @@ func TestExecSuccess(t *testing.T) {
|
|||
resultExpected := []netstorage.Result{r1, r2, r3, r4}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run(`prometheus_buckets(overlapped ranges)`, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `sort(prometheus_buckets((
|
||||
alias(label_set(90, "foo", "bar", "vmrange", "0...0"), "xxx"),
|
||||
alias(label_set(time()/20, "foo", "bar", "vmrange", "0...0.2"), "xxx"),
|
||||
alias(label_set(time()/20, "foo", "bar", "vmrange", "0.2...0.25"), "xxx"),
|
||||
alias(label_set(time()/20, "foo", "bar", "vmrange", "0...0.26"), "xxx"),
|
||||
alias(label_set(time()/100, "foo", "bar", "vmrange", "0.2...40"), "xxx"),
|
||||
alias(label_set(time()/10, "foo", "bar", "vmrange", "40...Inf"), "xxx"),
|
||||
)))`
|
||||
r1 := netstorage.Result{
|
||||
MetricName: metricNameExpected,
|
||||
Values: []float64{90, 90, 90, 90, 90, 90},
|
||||
Timestamps: timestampsExpected,
|
||||
}
|
||||
r1.MetricName.MetricGroup = []byte("xxx")
|
||||
r1.MetricName.Tags = []storage.Tag{
|
||||
{
|
||||
Key: []byte("foo"),
|
||||
Value: []byte("bar"),
|
||||
},
|
||||
{
|
||||
Key: []byte("le"),
|
||||
Value: []byte("0"),
|
||||
},
|
||||
}
|
||||
r2 := netstorage.Result{
|
||||
MetricName: metricNameExpected,
|
||||
Values: []float64{140, 150, 160, 170, 180, 190},
|
||||
Timestamps: timestampsExpected,
|
||||
}
|
||||
r2.MetricName.MetricGroup = []byte("xxx")
|
||||
r2.MetricName.Tags = []storage.Tag{
|
||||
{
|
||||
Key: []byte("foo"),
|
||||
Value: []byte("bar"),
|
||||
},
|
||||
{
|
||||
Key: []byte("le"),
|
||||
Value: []byte("0.2"),
|
||||
},
|
||||
}
|
||||
r3 := netstorage.Result{
|
||||
MetricName: metricNameExpected,
|
||||
Values: []float64{190, 210, 230, 250, 270, 290},
|
||||
Timestamps: timestampsExpected,
|
||||
}
|
||||
r3.MetricName.MetricGroup = []byte("xxx")
|
||||
r3.MetricName.Tags = []storage.Tag{
|
||||
{
|
||||
Key: []byte("foo"),
|
||||
Value: []byte("bar"),
|
||||
},
|
||||
{
|
||||
Key: []byte("le"),
|
||||
Value: []byte("0.25"),
|
||||
},
|
||||
}
|
||||
r4 := netstorage.Result{
|
||||
MetricName: metricNameExpected,
|
||||
Values: []float64{240, 270, 300, 330, 360, 390},
|
||||
Timestamps: timestampsExpected,
|
||||
}
|
||||
r4.MetricName.MetricGroup = []byte("xxx")
|
||||
r4.MetricName.Tags = []storage.Tag{
|
||||
{
|
||||
Key: []byte("foo"),
|
||||
Value: []byte("bar"),
|
||||
},
|
||||
{
|
||||
Key: []byte("le"),
|
||||
Value: []byte("0.26"),
|
||||
},
|
||||
}
|
||||
r5 := netstorage.Result{
|
||||
MetricName: metricNameExpected,
|
||||
Values: []float64{250, 282, 314, 346, 378, 410},
|
||||
Timestamps: timestampsExpected,
|
||||
}
|
||||
r5.MetricName.MetricGroup = []byte("xxx")
|
||||
r5.MetricName.Tags = []storage.Tag{
|
||||
{
|
||||
Key: []byte("foo"),
|
||||
Value: []byte("bar"),
|
||||
},
|
||||
{
|
||||
Key: []byte("le"),
|
||||
Value: []byte("40"),
|
||||
},
|
||||
}
|
||||
r6 := netstorage.Result{
|
||||
MetricName: metricNameExpected,
|
||||
Values: []float64{350, 402, 454, 506, 558, 610},
|
||||
Timestamps: timestampsExpected,
|
||||
}
|
||||
r6.MetricName.MetricGroup = []byte("xxx")
|
||||
r6.MetricName.Tags = []storage.Tag{
|
||||
{
|
||||
Key: []byte("foo"),
|
||||
Value: []byte("bar"),
|
||||
},
|
||||
{
|
||||
Key: []byte("le"),
|
||||
Value: []byte("Inf"),
|
||||
},
|
||||
}
|
||||
|
||||
resultExpected := []netstorage.Result{r1, r2, r3, r4, r5, r6}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run(`prometheus_buckets(overlapped ranges at the end)`, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `sort(prometheus_buckets((
|
||||
alias(label_set(90, "foo", "bar", "vmrange", "0...0"), "xxx"),
|
||||
alias(label_set(time()/20, "foo", "bar", "vmrange", "0...0.2"), "xxx"),
|
||||
alias(label_set(time()/20, "foo", "bar", "vmrange", "0.2...0.25"), "xxx"),
|
||||
alias(label_set(time()/20, "foo", "bar", "vmrange", "0...0.25"), "xxx"),
|
||||
alias(label_set(time()/100, "foo", "bar", "vmrange", "0.2...40"), "xxx"),
|
||||
alias(label_set(time()/10, "foo", "bar", "vmrange", "40...Inf"), "xxx"),
|
||||
)))`
|
||||
r1 := netstorage.Result{
|
||||
MetricName: metricNameExpected,
|
||||
Values: []float64{90, 90, 90, 90, 90, 90},
|
||||
Timestamps: timestampsExpected,
|
||||
}
|
||||
r1.MetricName.MetricGroup = []byte("xxx")
|
||||
r1.MetricName.Tags = []storage.Tag{
|
||||
{
|
||||
Key: []byte("foo"),
|
||||
Value: []byte("bar"),
|
||||
},
|
||||
{
|
||||
Key: []byte("le"),
|
||||
Value: []byte("0"),
|
||||
},
|
||||
}
|
||||
r2 := netstorage.Result{
|
||||
MetricName: metricNameExpected,
|
||||
Values: []float64{140, 150, 160, 170, 180, 190},
|
||||
Timestamps: timestampsExpected,
|
||||
}
|
||||
r2.MetricName.MetricGroup = []byte("xxx")
|
||||
r2.MetricName.Tags = []storage.Tag{
|
||||
{
|
||||
Key: []byte("foo"),
|
||||
Value: []byte("bar"),
|
||||
},
|
||||
{
|
||||
Key: []byte("le"),
|
||||
Value: []byte("0.2"),
|
||||
},
|
||||
}
|
||||
r3 := netstorage.Result{
|
||||
MetricName: metricNameExpected,
|
||||
Values: []float64{190, 210, 230, 250, 270, 290},
|
||||
Timestamps: timestampsExpected,
|
||||
}
|
||||
r3.MetricName.MetricGroup = []byte("xxx")
|
||||
r3.MetricName.Tags = []storage.Tag{
|
||||
{
|
||||
Key: []byte("foo"),
|
||||
Value: []byte("bar"),
|
||||
},
|
||||
{
|
||||
Key: []byte("le"),
|
||||
Value: []byte("0.25"),
|
||||
},
|
||||
}
|
||||
r4 := netstorage.Result{
|
||||
MetricName: metricNameExpected,
|
||||
Values: []float64{200, 222, 244, 266, 288, 310},
|
||||
Timestamps: timestampsExpected,
|
||||
}
|
||||
r4.MetricName.MetricGroup = []byte("xxx")
|
||||
r4.MetricName.Tags = []storage.Tag{
|
||||
{
|
||||
Key: []byte("foo"),
|
||||
Value: []byte("bar"),
|
||||
},
|
||||
{
|
||||
Key: []byte("le"),
|
||||
Value: []byte("40"),
|
||||
},
|
||||
}
|
||||
r5 := netstorage.Result{
|
||||
MetricName: metricNameExpected,
|
||||
Values: []float64{300, 342, 384, 426, 468, 510},
|
||||
Timestamps: timestampsExpected,
|
||||
}
|
||||
r5.MetricName.MetricGroup = []byte("xxx")
|
||||
r5.MetricName.Tags = []storage.Tag{
|
||||
{
|
||||
Key: []byte("foo"),
|
||||
Value: []byte("bar"),
|
||||
},
|
||||
{
|
||||
Key: []byte("le"),
|
||||
Value: []byte("Inf"),
|
||||
},
|
||||
}
|
||||
|
||||
resultExpected := []netstorage.Result{r1, r2, r3, r4, r5}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run(`median_over_time()`, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `median_over_time({})`
|
||||
|
@ -6375,6 +6580,7 @@ func TestExecError(t *testing.T) {
|
|||
End: 2000,
|
||||
Step: 100,
|
||||
Deadline: searchutils.NewDeadline(time.Now(), time.Minute, ""),
|
||||
RoundDigits: 100,
|
||||
}
|
||||
for i := 0; i < 4; i++ {
|
||||
rv, err := Exec(ec, q, false)
|
||||
|
|
|
@ -538,6 +538,7 @@ func (rc *rollupConfig) doInternal(dstValues []float64, tsm *timeseriesMap, valu
|
|||
// Do not drop trailing data points for queries, which return 2 or 1 point (aka instant queries).
|
||||
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/845
|
||||
canDropLastSample := rc.CanDropLastSample && len(rc.Timestamps) > 2
|
||||
f := rc.Func
|
||||
for _, tEnd := range rc.Timestamps {
|
||||
tStart := tEnd - window
|
||||
ni = seekFirstTimestampIdxAfter(timestamps[i:], tStart, ni)
|
||||
|
@ -577,7 +578,7 @@ func (rc *rollupConfig) doInternal(dstValues []float64, tsm *timeseriesMap, valu
|
|||
rfa.realNextValue = nan
|
||||
}
|
||||
rfa.currTimestamp = tEnd
|
||||
value := rc.Func(rfa)
|
||||
value := f(rfa)
|
||||
rfa.idx++
|
||||
dstValues = append(dstValues, value)
|
||||
}
|
||||
|
@ -643,12 +644,12 @@ func getScrapeInterval(timestamps []int64) int64 {
|
|||
return int64(maxSilenceInterval)
|
||||
}
|
||||
|
||||
// Estimate scrape interval as 0.6 quantile for the first 100 intervals.
|
||||
// Estimate scrape interval as 0.6 quantile for the first 20 intervals.
|
||||
h := histogram.GetFast()
|
||||
tsPrev := timestamps[0]
|
||||
timestamps = timestamps[1:]
|
||||
if len(timestamps) > 100 {
|
||||
timestamps = timestamps[:100]
|
||||
if len(timestamps) > 20 {
|
||||
timestamps = timestamps[:20]
|
||||
}
|
||||
for _, ts := range timestamps {
|
||||
h.Update(float64(ts - tsPrev))
|
||||
|
|
|
@ -518,6 +518,7 @@ func vmrangeBucketsToLE(tss []*timeseries) []*timeseries {
|
|||
sort.Slice(xss, func(i, j int) bool { return xss[i].end < xss[j].end })
|
||||
xssNew := make([]x, 0, len(xss)+2)
|
||||
var xsPrev x
|
||||
uniqTs := make(map[string]*timeseries, len(xss))
|
||||
for _, xs := range xss {
|
||||
ts := xs.ts
|
||||
if isZeroTS(ts) {
|
||||
|
@ -525,7 +526,8 @@ func vmrangeBucketsToLE(tss []*timeseries) []*timeseries {
|
|||
xsPrev = xs
|
||||
continue
|
||||
}
|
||||
if xs.start != xsPrev.end {
|
||||
if xs.start != xsPrev.end && uniqTs[xs.startStr] == nil {
|
||||
uniqTs[xs.startStr] = xs.ts
|
||||
xssNew = append(xssNew, x{
|
||||
endStr: xs.startStr,
|
||||
end: xs.start,
|
||||
|
@ -533,7 +535,14 @@ func vmrangeBucketsToLE(tss []*timeseries) []*timeseries {
|
|||
})
|
||||
}
|
||||
ts.MetricName.AddTag("le", xs.endStr)
|
||||
prevTs := uniqTs[xs.endStr]
|
||||
if prevTs != nil {
|
||||
// the end of the current bucket is not unique, need to merge it with the existing bucket.
|
||||
mergeNonOverlappingTimeseries(prevTs, xs.ts)
|
||||
} else {
|
||||
xssNew = append(xssNew, xs)
|
||||
uniqTs[xs.endStr] = xs.ts
|
||||
}
|
||||
xsPrev = xs
|
||||
}
|
||||
if !math.IsInf(xsPrev.end, 1) {
|
||||
|
|
|
@ -12,9 +12,9 @@ import (
|
|||
)
|
||||
|
||||
var (
|
||||
lastQueriesCount = flag.Int("search.queryStats.lastQueriesCount", 20000, "Query stats for `/api/v1/status/top_queries` is tracked on this number of last queries. "+
|
||||
lastQueriesCount = flag.Int("search.queryStats.lastQueriesCount", 20000, "Query stats for /api/v1/status/top_queries is tracked on this number of last queries. "+
|
||||
"Zero value disables query stats tracking")
|
||||
minQueryDuration = flag.Duration("search.queryStats.minQueryDuration", 0, "The minimum duration for queries to track in query stats at `/api/v1/status/top_queries`. "+
|
||||
minQueryDuration = flag.Duration("search.queryStats.minQueryDuration", 0, "The minimum duration for queries to track in query stats at /api/v1/status/top_queries. "+
|
||||
"Queries with lower duration are ignored in query stats")
|
||||
)
|
||||
|
||||
|
|
|
@ -37,6 +37,8 @@ var (
|
|||
bigMergeConcurrency = flag.Int("bigMergeConcurrency", 0, "The maximum number of CPU cores to use for big merges. Default value is used if set to 0")
|
||||
smallMergeConcurrency = flag.Int("smallMergeConcurrency", 0, "The maximum number of CPU cores to use for small merges. Default value is used if set to 0")
|
||||
|
||||
logNewSeries = flag.Bool("logNewSeries", false, "Whether to log new series. This option is for debug purposes only. It can lead to performance issues "+
|
||||
"when big number of new series are ingested into VictoriaMetrics")
|
||||
denyQueriesOutsideRetention = flag.Bool("denyQueriesOutsideRetention", false, "Whether to deny queries outside of the configured -retentionPeriod. "+
|
||||
"When set, then /api/v1/query_range would return '503 Service Unavailable' error for queries with 'from' value outside -retentionPeriod. "+
|
||||
"This may be useful when multiple data sources with distinct retentions are hidden behind query-tee")
|
||||
|
@ -72,6 +74,7 @@ func InitWithoutMetrics(resetCacheIfNeeded func(mrs []storage.MetricRow)) {
|
|||
}
|
||||
|
||||
resetResponseCacheIfNeeded = resetCacheIfNeeded
|
||||
storage.SetLogNewSeries(*logNewSeries)
|
||||
storage.SetFinalMergeDelay(*finalMergeDelay)
|
||||
storage.SetBigMergeWorkersCount(*bigMergeConcurrency)
|
||||
storage.SetSmallMergeWorkersCount(*smallMergeConcurrency)
|
||||
|
|
File diff suppressed because it is too large
Load diff
|
@ -4,7 +4,7 @@ DOCKER_NAMESPACE := victoriametrics
|
|||
|
||||
ROOT_IMAGE ?= alpine:3.13.2
|
||||
CERTS_IMAGE := alpine:3.13.2
|
||||
GO_BUILDER_IMAGE := golang:1.16.0
|
||||
GO_BUILDER_IMAGE := golang:1.16.2
|
||||
BUILDER_IMAGE := local/builder:2.0.0-$(shell echo $(GO_BUILDER_IMAGE) | tr : _)
|
||||
BASE_IMAGE := local/base:1.1.3-$(shell echo $(ROOT_IMAGE) | tr : _)-$(shell echo $(CERTS_IMAGE) | tr : _)
|
||||
|
||||
|
|
|
@ -122,6 +122,16 @@ groups:
|
|||
description: "High rate of slow inserts on \"{{ $labels.instance }}\" may be a sign of resource exhaustion
|
||||
for the current load. It is likely more RAM is needed for optimal handling of the current number of active time series."
|
||||
|
||||
- alert: ProcessNearFDLimits
|
||||
expr: process_open_fds / process_max_fds > 0.8
|
||||
for: 10m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Number of free file descriptors is less than 20% for \"{{ $labels.job }}\"(\"{{ $labels.instance }}\") for the last 10m"
|
||||
description: "Exhausting OS file descriptors limit can cause severe degradation of the process.
|
||||
Consider to increase the limit as fast as possible."
|
||||
|
||||
# Alerts group for vmagent assumes that Grafana dashboard
|
||||
# https://grafana.com/grafana/dashboards/12683 is installed.
|
||||
# Pls update the `dashboard` annotation according to your setup.
|
||||
|
|
|
@ -70,8 +70,7 @@ services:
|
|||
- '--rule=/etc/alerts/*.yml'
|
||||
# display source of alerts in grafana
|
||||
- '-external.url=http://127.0.0.1:3000' #grafana outside container
|
||||
- '--external.alert.source=explore?orgId=1&left=["now-1h","now","VictoriaMetrics",{"expr":"{{$$expr|quotesEscape|pathEscape}}"},{"mode":"Metrics"},{"ui":[true,true,true,"none"]}]' ## when copypaste the line be aware of '$$' for escaping in '$expr'
|
||||
networks:
|
||||
- '--external.alert.source=explore?orgId=1&left=["now-1h","now","VictoriaMetrics",{"expr":"{{$$expr|quotesEscape|crlfEscape|queryEscape}}"},{"mode":"Metrics"},{"ui":[true,true,true,"none"]}]' ## when copypaste the line be aware of '$$' for escaping in '$expr' networks:
|
||||
- vm_net
|
||||
restart: always
|
||||
alertmanager:
|
||||
|
|
|
@ -27,6 +27,7 @@
|
|||
* [Observability, Availability & DORA’s Research Program](https://medium.com/alteos-tech-blog/observability-availability-and-doras-research-program-85deb6680e78)
|
||||
* [Tame Kubernetes Costs with Percona Monitoring and Management and Prometheus Operator](https://www.percona.com/blog/2021/02/12/tame-kubernetes-costs-with-percona-monitoring-and-management-and-prometheus-operator/)
|
||||
* [Prometheus Victoria Metrics On AWS ECS](https://dalefro.medium.com/prometheus-victoria-metrics-on-aws-ecs-62448e266090)
|
||||
* [Monitoring with Prometheus, Grafana, AlertManager and VictoriaMetrics](https://www.sensedia.com/post/monitoring-with-prometheus-alertmanager)
|
||||
|
||||
|
||||
## Our articles
|
||||
|
|
62
docs/BestPractices.md
Normal file
62
docs/BestPractices.md
Normal file
|
@ -0,0 +1,62 @@
|
|||
# VM best practices
|
||||
|
||||
VictoriaMetrics is a fast, cost-effective and scalable monitoring solution and time series database. It can be used as a long-term, remote storage for Prometheus which allows it to gather metrics from different systems and store them in a single location or separate them for different purposes (short-, long-term, responsibility zones etc).
|
||||
|
||||
## Install Recommendation
|
||||
There is no need to tune VictoriaMetrics because it uses reasonable defaults for command-line flags. These flags are automatically adjusted for the available CPU and RAM resources. There is no need for Operating System tuning because VictoriaMetrics is optimized for default OS settings. The only option is to increase the limit on the [number of open files in the OS](https://medium.com/@muhammadtriwibowo/set-permanently-ulimit-n-open-files-in-ubuntu-4d61064429a), so Prometheus instances could establish more connections to VictoriaMetrics (65535 standard production value).
|
||||
## Filesystem Considerations
|
||||
|
||||
The recommended filesystem is ext4. If you plan to store more than 1TB of data on ext4 partition or plan to extend it to more than 16TB, then the following options are recommended to pass to mkfs.ext4:
|
||||
mkfs.ext4 ... -O 64bit,huge_file,extent -T huge
|
||||
|
||||
## Operation System
|
||||
When configuring VictoriaMetrics, the best practice is to use the latest Ubuntu OS version.
|
||||
|
||||
## VictoriaMetrics Versions
|
||||
Always update VictoriaMetrics instances in the environment to avoid version and build mismatch that will result in differences in performance and operational features. It is strongly recommended that you keep VictoriaMetrics in the environment up-to-date and install all VictoriaMetrics updates as soon as they are available. The best place to find the most recent updates as soon as they are available is to follow [this link](https://github.com/VictoriaMetrics/VictoriaMetrics/releases).
|
||||
|
||||
## Upgrade
|
||||
It is safe to upgrade VictoriaMetrics to new versions unless the [release notes](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) say otherwise. It is safe to skip multiple versions during the upgrade unless release notes say otherwise. It is recommended to perform regular upgrades to the latest version, since it may contain important bug fixes, performance optimizations or new features.
|
||||
It is also safe to downgrade to the previous version unless release notes say otherwise.
|
||||
The following steps must be performed during the upgrade / downgrade process:
|
||||
|
||||
* Send SIGINT signal to VictoriaMetrics process so that it is stopped gracefully.
|
||||
* Wait until the process stops. This can take a few seconds.
|
||||
* Start the upgraded VictoriaMetrics.
|
||||
|
||||
Prometheus doesn't drop data during the VictoriaMetrics restart. See [this article](https://grafana.com/blog/2019/03/25/whats-new-in-prometheus-2.8-wal-based-remote-write/) for details.
|
||||
|
||||
## Security
|
||||
Do not forget to protect sensitive endpoints in VictoriaMetrics when exposing them to untrusted networks such as the internet. Please consider setting the following command-line flags:
|
||||
|
||||
* tls, -tlsCertFile and -tlsKeyFile for switching from HTTP to HTTPS.
|
||||
* httpAuth.username and -httpAuth.password for protecting all the HTTP endpoints with [HTTP Basic Authentication](https://en.wikipedia.org/wiki/Basic_access_authentication).
|
||||
* deleteAuthKey for protecting /api/v1/admin/tsdb/delete_series endpoint. See how to [delete time series](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/README.md#how-to-delete-time-series).
|
||||
* snapshotAuthKey for protecting /snapshot* endpoints. See [how to work with snapshots](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/README.md#how-to-work-with-snapshots).
|
||||
* forceMergeAuthKey for protecting /internal/force_merge endpoint. See [force merge docs](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/README.md#forced-merge).
|
||||
* search.resetCacheAuthKey for protecting /internal/resetRollupResultCache endpoint. See [backfilling](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/README.md#backfilling) for more details.
|
||||
|
||||
Explicitly set internal network interface to TCP and UDP ports for data ingestion with Graphite and OpenTSDB formats. For example, substitute -graphiteListenAddr=:2003 with -graphiteListenAddr=<internal_iface_ip>:2003.
|
||||
It is preferable to authorize all incoming requests from untrusted networks with [vmauth](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmauth/README.md) or a similar auth proxy.
|
||||
|
||||
## Backup Recommendations
|
||||
VictoriaMetrics supports backups via [vmbackup](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmbackup/README.md) and [vmrestore](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmrestore/README.md) tools. We also provide the vmbackuper tool for our paid, enterprise subscribers - see [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/466) for additional details.
|
||||
|
||||
## Networking
|
||||
Network usage: outbound traffic is negligible. Ingress traffic is ~100 bytes per ingested data point via [Prometheus remote_write API](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write). The actual ingress bandwidth usage depends on the average number of labels per ingested metric and the average size of label values. A higher number of per-metric labels and longer label values result inhigher ingress bandwidth.
|
||||
|
||||
## Storage Considerations
|
||||
Storage space: VictoriaMetrics needs less than a byte per data point on average. So, ~260GB is required to store a month-long insert stream of 100K data points per second. The actual storage size depends largely on data randomness (entropy). Higher randomness means higher storage size requirements. Read [this article](https://medium.com/faun/victoriametrics-achieving-better-compression-for-time-series-data-than-gorilla-317bc1f95932) for details.
|
||||
|
||||
## RAM
|
||||
RAM size: VictoriaMetrics needs less than 1KB per active time series. Therefore, ~1GB of RAM is required for 1M active time series. Time series are considered active if new data points have been added recently or if they have been recently queried. The number of active time series may be obtained from vm_cache_entries{type="storage/hour_metric_ids"} metric exported on the /metrics page. VictoriaMetrics stores various caches in RAM. Memory size for these caches may be limited with -memory.allowedPercent or -memory.allowedBytes flags.
|
||||
|
||||
## CPU
|
||||
CPU cores: VictoriaMetrics needs one CPU core per 300K inserted data points per second. So, ~4 CPU cores are required for processing the insert stream of 1M data points per second. The ingestion rate may be lower for high cardinality data or for time series with a high number of labels. See [this article](https://valyala.medium.com/insert-benchmarks-with-inch-influxdb-vs-victoriametrics-e31a41ae2893) for details. If you see lower numbers per CPU core, it is likely that the active time series info doesn't fit in your caches and you will need more RAM to lower CPU usage.
|
||||
|
||||
## Technical Support and Services
|
||||
If you have questions about installing or using this software pleasecheck this and other documents first. Answers to the most frequently askedquestions can be found on the Technical Papers webpage or in VictoriaMetrics community channels. If you need further assistance with VictoriaMetrics, please contact us at info@victoriametrics.com - we’ll be happy to help.
|
||||
|
||||
Following VictoriaMetrics best practices allows for the optimal configuration of our fast and scalable monitoring solution and time series database while minimizing or avoiding downtime or performance issues during installation and software usage. Our best practices also allow you to quickly troubleshoot any issues that might arise.
|
||||
|
||||
|
|
@ -6,13 +6,24 @@
|
|||
- `histogram_avg(buckets)` - returns the average value for the given buckets.
|
||||
- `histogram_stdvar(buckets)` - returns standard variance for the given buckets.
|
||||
- `histogram_stddev(buckets)` - returns standard deviation for the given buckets.
|
||||
* FEATURE: reduce median query duration by up to 2x. See https://github.com/VictoriaMetrics/VictoriaMetrics/commit/18fe0ff14bc78860c5569e2b70de1db78fac61be
|
||||
* FEATURE: export `vm_available_memory_bytes` and `vm_available_cpu_cores` metrics, which show the number of available RAM and available CPU cores for VictoriaMetrics apps.
|
||||
* FEATURE: vmagent: add ability to replicate scrape targets among `vmagent` instances in the cluster with `-promscrape.cluster.replicationFactor` command-line flag. See [these docs](https://victoriametrics.github.io/vmagent.html#scraping-big-number-of-targets).
|
||||
* FATURE: vmagent: accept `scrape_offset` option at `scrape_config`. This option may be useful when scrapes must start at the specified offset of every scrape interval. See [these docs](https://victoriametrics.github.io/vmagent.html#troubleshooting) for details.
|
||||
* FEATURE: vmagent: accept `scrape_offset` option at `scrape_config`. This option may be useful when scrapes must start at the specified offset of every scrape interval. See [these docs](https://victoriametrics.github.io/vmagent.html#troubleshooting) for details.
|
||||
* FEATURE: vmagent: support `proxy_tls_config`, `proxy_basic_auth`, `proxy_bearer_token` and `proxy_bearer_token_file` options at `scrape_config` section for configuring proxies specified via `proxy_url`. See [these docs](https://victoriametrics.github.io/vmagent.html#scraping-targets-via-a-proxy).
|
||||
* FEATURE: vmauth: allow using regexp paths in `url_map`. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1112) for details.
|
||||
* FEATURE: accept `round_digits` query arg at `/api/v1/query` and `/api/v1/query_range` handlers. This option can be set at Prometheus datasource in Grafana for limiting the number of digits after the decimal point in response values.
|
||||
* FEATURE: add `-influx.databaseNames` command-line flag, which can be used for accepting data from some Telegraf plugins such as [fluentd plugin](https://github.com/fangli/fluent-plugin-influxdb). See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1124).
|
||||
* FEATURE: add `-logNewSeries` command-line flag, which can be used for debugging the source of time series churn rate.
|
||||
|
||||
* BUGFIX: vmagent: prevent from high CPU usage bug during failing scrapes with small `scrape_timeout` (less than a few seconds).
|
||||
* BUGFIX: vmagent: reduce memory usage when Kubernetes service discovery is used in big number of distinct jobs by sharing the cache. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1113
|
||||
* BUGFIX: vmagent: reduce memory usage when Kubernetes service discovery is used in big number of distinct scrape config jobs by sharing Kubernetes object cache. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1113
|
||||
* BUGFIX: vmagent: apply `sample_limit` only after `metric_relabel_configs` are applied as Prometheus does. Previously the `sample_limit` was applied before metrics relabeling.
|
||||
* BUGFIX: vmagent: properly apply `tls_config`, `basic_auth` and `bearer_token` to proxy connections if `proxy_url` option is set. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1116
|
||||
* BUGFIX: vmagent: properly scrape targets via https proxy specified in `proxy_url` if `insecure_skip_verify` flag isn't set in `tls_config` section. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1116
|
||||
* BUGFUX: avoid `duplicate time series` error if `prometheus_buckets()` covers a time range with distinct set of buckets.
|
||||
* BUGFIX: prevent exponent overflow when processing extremely small values close to zero such as `2.964393875E-314`. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1114
|
||||
* BUGFIX: do not include datapoints with a timestamp `t-d` when returning results from `/api/v1/query?query=m[d]&time=t` as Prometheus does.
|
||||
|
||||
|
||||
# [v1.55.1](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.55.1)
|
||||
|
|
|
@ -338,7 +338,7 @@ Please see [Monitoring K8S with VictoriaMetrics](https://docs.google.com/present
|
|||
|
||||
Numbers:
|
||||
|
||||
- Active time series: ~2500 Million
|
||||
- Active time series: ~25 Million
|
||||
- Datapoints: ~20 Trillion
|
||||
- Ingestion rate: ~1800k/s
|
||||
- Disk usage: ~20 TB
|
||||
|
|
|
@ -373,7 +373,7 @@ for protecting from user errors such as accidental data deletion.
|
|||
The following steps must be performed for each `vmstorage` node for creating a backup:
|
||||
|
||||
1. Create an instant snapshot by navigating to `/snapshot/create` HTTP handler. It will create snapshot and return its name.
|
||||
2. Archive the created snapshot from `<-storageDataPath>/snapshots/<snapshot_name>` folder using [vmbackup](https://victoriametrics.github.io/vbackup.html).
|
||||
2. Archive the created snapshot from `<-storageDataPath>/snapshots/<snapshot_name>` folder using [vmbackup](https://victoriametrics.github.io/vmbackup.html).
|
||||
The archival process doesn't interfere with `vmstorage` work, so it may be performed at any suitable time.
|
||||
3. Delete unused snapshots via `/snapshot/delete?snapshot=<snapshot_name>` or `/snapshot/delete_all` in order to free up occupied storage space.
|
||||
|
||||
|
|
|
@ -1,3 +0,0 @@
|
|||
# MetricsQL
|
||||
|
||||
The page has been moved to [MetricsQL](https://victoriametrics.github.io/MetricsQL.html).
|
|
@ -170,6 +170,7 @@ Alphabetically sorted links to case studies:
|
|||
* [Font used](#font-used)
|
||||
* [Color Palette](#color-palette)
|
||||
* [We kindly ask](#we-kindly-ask)
|
||||
* [List of command-line flags](#list-of-command-line-flags)
|
||||
|
||||
|
||||
## How to start VictoriaMetrics
|
||||
|
@ -182,7 +183,7 @@ The following command-line flags are used the most:
|
|||
* `-storageDataPath` - path to data directory. VictoriaMetrics stores all the data in this directory. Default path is `victoria-metrics-data` in the current working directory.
|
||||
* `-retentionPeriod` - retention for stored data. Older data is automatically deleted. Default retention is 1 month. See [these docs](#retention) for more details.
|
||||
|
||||
Other flags have good enough default values, so set them only if you really need this. Pass `-help` to see all the available flags with description and default values.
|
||||
Other flags have good enough default values, so set them only if you really need this. Pass `-help` to see [all the available flags with description and default values](#list-of-command-line-flags).
|
||||
|
||||
See how to [ingest data to VictoriaMetrics](#how-to-import-time-series-data), how to [query VictoriaMetrics](#grafana-setup)
|
||||
and how to [handle alerts](#alerting).
|
||||
|
@ -413,6 +414,10 @@ while VictoriaMetrics stores them with *milliseconds* precision.
|
|||
Extra labels may be added to all the written time series by passing `extra_label=name=value` query args.
|
||||
For example, `/write?extra_label=foo=bar` would add `{foo="bar"}` label to all the ingested metrics.
|
||||
|
||||
Some plugins for Telegraf such as [fluentd](https://github.com/fangli/fluent-plugin-influxdb), [Juniper/open-nti](https://github.com/Juniper/open-nti)
|
||||
or [Juniper/jitmon](https://github.com/Juniper/jtimon) send `SHOW DATABASES` query to `/query` and expect a particular database name in the response.
|
||||
Comma-separated list of expected databases can be passed to VictoriaMetrics via `-influx.databaseNames` command-line flag.
|
||||
|
||||
## How to send data from Graphite-compatible agents such as [StatsD](https://github.com/etsy/statsd)
|
||||
|
||||
Enable Graphite receiver in VictoriaMetrics by setting `-graphiteListenAddr` command line flag. For instance,
|
||||
|
@ -562,14 +567,17 @@ in front of VictoriaMetrics. [Contact us](mailto:sales@victoriametrics.com) if y
|
|||
VictoriaMetrics accepts relative times in `time`, `start` and `end` query args additionally to unix timestamps and [RFC3339](https://www.ietf.org/rfc/rfc3339.txt).
|
||||
For example, the following query would return data for the last 30 minutes: `/api/v1/query_range?start=-30m&query=...`.
|
||||
|
||||
VictoriaMetrics accepts `round_digits` query arg for `/api/v1/query` and `/api/v1/query_range` handlers. It can be used for rounding response values to the given number of digits after the decimal point. For example, `/api/v1/query?query=avg_over_time(temperature[1h])&round_digits=2` would round response values to up to two digits after the decimal point.
|
||||
|
||||
By default, VictoriaMetrics returns time series for the last 5 minutes from `/api/v1/series`, while the Prometheus API defaults to all time. Use `start` and `end` to select a different time range.
|
||||
|
||||
VictoriaMetrics accepts additional args for `/api/v1/labels` and `/api/v1/label/.../values` handlers.
|
||||
See [this feature request](https://github.com/prometheus/prometheus/issues/6178) for details:
|
||||
|
||||
* Any number [time series selectors](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors) via `match[]` query arg.
|
||||
* Optional `start` and `end` query args for limiting the time range for the selected labels or label values.
|
||||
|
||||
See [this feature request](https://github.com/prometheus/prometheus/issues/6178) for details.
|
||||
|
||||
Additionally VictoriaMetrics provides the following handlers:
|
||||
|
||||
* `/api/v1/series/count` - returns the total number of time series in the database. Some notes:
|
||||
|
@ -1367,6 +1375,8 @@ See the example of alerting rules for VM components [here](https://github.com/Vi
|
|||
VictoriaMetrics accepts optional `date=YYYY-MM-DD` and `topN=42` args on this page. By default `date` equals to the current date,
|
||||
while `topN` equals to 10.
|
||||
|
||||
* New time series can be logged if `-logNewSeries` command-line flag is passed to VictoriaMetrics.
|
||||
|
||||
* VictoriaMetrics limits the number of labels per each metric with `-maxLabelsPerTimeseries` command-line flag.
|
||||
This prevents from ingesting metrics with too many labels. It is recommended [monitoring](#monitoring) `vm_metrics_with_dropped_labels_total`
|
||||
metric in order to determine whether `-maxLabelsPerTimeseries` must be adjusted for your workload.
|
||||
|
@ -1538,3 +1548,248 @@ Files included in each folder:
|
|||
* There should be sufficient clear space around the logo.
|
||||
* Do not change spacing, alignment, or relative locations of the design elements.
|
||||
* Do not change the proportions of any of the design elements or the design itself. You may resize as needed but must retain all proportions.
|
||||
|
||||
|
||||
## List of command-line flags
|
||||
|
||||
Pass `-help` to VictoriaMetrics in order to see the list of supported command-line flags with their description:
|
||||
|
||||
```
|
||||
-bigMergeConcurrency int
|
||||
The maximum number of CPU cores to use for big merges. Default value is used if set to 0
|
||||
-csvTrimTimestamp duration
|
||||
Trim timestamps when importing csv data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
|
||||
-dedup.minScrapeInterval duration
|
||||
Remove superflouos samples from time series if they are located closer to each other than this duration. This may be useful for reducing overhead when multiple identically configured Prometheus instances write data to the same VictoriaMetrics. Deduplication is disabled if the -dedup.minScrapeInterval is 0
|
||||
-deleteAuthKey string
|
||||
authKey for metrics' deletion via /api/v1/admin/tsdb/delete_series and /tags/delSeries
|
||||
-denyQueriesOutsideRetention
|
||||
Whether to deny queries outside of the configured -retentionPeriod. When set, then /api/v1/query_range would return '503 Service Unavailable' error for queries with 'from' value outside -retentionPeriod. This may be useful when multiple data sources with distinct retentions are hidden behind query-tee
|
||||
-dryRun
|
||||
Whether to check only -promscrape.config and then exit. Unknown config entries are allowed in -promscrape.config by default. This can be changed with -promscrape.config.strictParse
|
||||
-enableTCP6
|
||||
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP is used
|
||||
-envflag.enable
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set
|
||||
-envflag.prefix string
|
||||
Prefix for environment variables if -envflag.enable is set
|
||||
-finalMergeDelay duration
|
||||
The delay before starting final merge for per-month partition after no new data is ingested into it. Final merge may require additional disk IO and CPU resources. Final merge may increase query speed and reduce disk space usage in some cases. Zero value disables final merge
|
||||
-forceFlushAuthKey string
|
||||
authKey, which must be passed in query string to /internal/force_flush pages
|
||||
-forceMergeAuthKey string
|
||||
authKey, which must be passed in query string to /internal/force_merge pages
|
||||
-fs.disableMmap
|
||||
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
|
||||
-graphiteListenAddr string
|
||||
TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty
|
||||
-graphiteTrimTimestamp duration
|
||||
Trim timestamps for Graphite data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
|
||||
-http.connTimeout duration
|
||||
Incoming http connections are closed after the configured timeout. This may help spreading incoming load among a cluster of services behind load balancer. Note that the real timeout may be bigger by up to 10% as a protection from Thundering herd problem (default 2m0s)
|
||||
-http.disableResponseCompression
|
||||
Disable compression of HTTP responses for saving CPU resources. By default compression is enabled to save network bandwidth
|
||||
-http.idleConnTimeout duration
|
||||
Timeout for incoming idle http connections (default 1m0s)
|
||||
-http.maxGracefulShutdownDuration duration
|
||||
The maximum duration for graceful shutdown of HTTP server. Highly loaded server may require increased value for graceful shutdown (default 7s)
|
||||
-http.pathPrefix string
|
||||
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
|
||||
-http.shutdownDelay duration
|
||||
Optional delay before http server shutdown. During this dealy the servier returns non-OK responses from /health page, so load balancers can route new requests to other servers
|
||||
-httpAuth.password string
|
||||
Password for HTTP Basic Auth. The authentication is disabled if -httpAuth.username is empty
|
||||
-httpAuth.username string
|
||||
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
|
||||
-httpListenAddr string
|
||||
TCP address to listen for http connections (default ":8428")
|
||||
-import.maxLineLen size
|
||||
The maximum length in bytes of a single line accepted by /api/v1/import; the line length can be limited with 'max_rows_per_line' query arg passed to /api/v1/export
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 104857600)
|
||||
-influx.databaseNames array
|
||||
Comma-separated list of database names to return from /query and /influx/query API. This can be needed for accepting data from Telegraf plugins such as https://github.com/fangli/fluent-plugin-influxdb
|
||||
Supports array of values separated by comma or specified via multiple flags.
|
||||
-influx.maxLineSize size
|
||||
The maximum size in bytes for a single Influx line during parsing
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 262144)
|
||||
-influxListenAddr string
|
||||
TCP and UDP address to listen for Influx line protocol data. Usually :8189 must be set. Doesn't work if empty. This flag isn't needed when ingesting data over HTTP - just send it to http://<victoriametrics>:8428/write
|
||||
-influxMeasurementFieldSeparator string
|
||||
Separator for '{measurement}{separator}{field_name}' metric name when inserted via Influx line protocol (default "_")
|
||||
-influxSkipMeasurement
|
||||
Uses '{field_name}' as a metric name while ignoring '{measurement}' and '-influxMeasurementFieldSeparator'
|
||||
-influxSkipSingleField
|
||||
Uses '{measurement}' instead of '{measurement}{separator}{field_name}' for metic name if Influx line contains only a single field
|
||||
-influxTrimTimestamp duration
|
||||
Trim timestamps for Influx line protocol data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
|
||||
-insert.maxQueueDuration duration
|
||||
The maximum duration for waiting in the queue for insert requests due to -maxConcurrentInserts (default 1m0s)
|
||||
-loggerDisableTimestamps
|
||||
Whether to disable writing timestamps in logs
|
||||
-loggerErrorsPerSecondLimit int
|
||||
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, then the remaining errors are suppressed. Zero value disables the rate limit
|
||||
-loggerFormat string
|
||||
Format for logs. Possible values: default, json (default "default")
|
||||
-loggerLevel string
|
||||
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
|
||||
-loggerOutput string
|
||||
Output for the logs. Supported values: stderr, stdout (default "stderr")
|
||||
-loggerTimezone string
|
||||
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
|
||||
-loggerWarnsPerSecondLimit int
|
||||
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
|
||||
-maxConcurrentInserts int
|
||||
The maximum number of concurrent inserts. Default value should work for most cases, since it minimizes the overhead for concurrent inserts. This option is tigthly coupled with -insert.maxQueueDuration (default 16)
|
||||
-maxInsertRequestSize size
|
||||
The maximum size in bytes of a single Prometheus remote_write API request
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
|
||||
-maxLabelsPerTimeseries int
|
||||
The maximum number of labels accepted per time series. Superfluous labels are dropped (default 30)
|
||||
-memory.allowedBytes size
|
||||
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-memory.allowedPercent float
|
||||
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
|
||||
-metricsAuthKey string
|
||||
Auth key for /metrics. It overrides httpAuth settings
|
||||
-opentsdbHTTPListenAddr string
|
||||
TCP address to listen for OpentTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty
|
||||
-opentsdbListenAddr string
|
||||
TCP and UDP address to listen for OpentTSDB metrics. Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. Usually :4242 must be set. Doesn't work if empty
|
||||
-opentsdbTrimTimestamp duration
|
||||
Trim timestamps for OpenTSDB 'telnet put' data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
|
||||
-opentsdbhttp.maxInsertRequestSize size
|
||||
The maximum size of OpenTSDB HTTP put request
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
|
||||
-opentsdbhttpTrimTimestamp duration
|
||||
Trim timestamps for OpenTSDB HTTP data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
|
||||
-pprofAuthKey string
|
||||
Auth key for /debug/pprof. It overrides httpAuth settings
|
||||
-precisionBits int
|
||||
The number of precision bits to store per each value. Lower precision bits improves data compression at the cost of precision loss (default 64)
|
||||
-promscrape.cluster.memberNum int
|
||||
The number of number in the cluster of scrapers. It must be an unique value in the range 0 ... promscrape.cluster.membersCount-1 across scrapers in the cluster
|
||||
-promscrape.cluster.membersCount int
|
||||
The number of members in a cluster of scrapers. Each member must have an unique -promscrape.cluster.memberNum in the range 0 ... promscrape.cluster.membersCount-1 . Each member then scrapes roughly 1/N of all the targets. By default cluster scraping is disabled, i.e. a single scraper scrapes all the targets
|
||||
-promscrape.cluster.replicationFactor int
|
||||
The number of members in the cluster, which scrape the same targets. If the replication factor is greater than 2, then the deduplication must be enabled at remote storage side. See https://victoriametrics.github.io/#deduplication (default 1)
|
||||
-promscrape.config string
|
||||
Optional path to Prometheus config file with 'scrape_configs' section containing targets to scrape. See https://victoriametrics.github.io/#how-to-scrape-prometheus-exporters-such-as-node-exporter for details
|
||||
-promscrape.config.dryRun
|
||||
Checks -promscrape.config file for errors and unsupported fields and then exits. Returns non-zero exit code on parsing errors and emits these errors to stderr. See also -promscrape.config.strictParse command-line flag. Pass -loggerLevel=ERROR if you don't need to see info messages in the output.
|
||||
-promscrape.config.strictParse
|
||||
Whether to allow only supported fields in -promscrape.config . By default unsupported fields are silently skipped
|
||||
-promscrape.configCheckInterval duration
|
||||
Interval for checking for changes in '-promscrape.config' file. By default the checking is disabled. Send SIGHUP signal in order to force config check for changes
|
||||
-promscrape.consulSDCheckInterval duration
|
||||
Interval for checking for changes in Consul. This works only if consul_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config for details (default 30s)
|
||||
-promscrape.disableCompression
|
||||
Whether to disable sending 'Accept-Encoding: gzip' request headers to all the scrape targets. This may reduce CPU usage on scrape targets at the cost of higher network bandwidth utilization. It is possible to set 'disable_compression: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
|
||||
-promscrape.disableKeepAlive
|
||||
Whether to disable HTTP keep-alive connections when scraping all the targets. This may be useful when targets has no support for HTTP keep-alive connection. It is possible to set 'disable_keepalive: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control. Note that disabling HTTP keep-alive may increase load on both vmagent and scrape targets
|
||||
-promscrape.discovery.concurrency int
|
||||
The maximum number of concurrent requests to Prometheus autodiscovery API (Consul, Kubernetes, etc.) (default 100)
|
||||
-promscrape.discovery.concurrentWaitTime duration
|
||||
The maximum duration for waiting to perform API requests if more than -promscrape.discovery.concurrency requests are simultaneously performed (default 1m0s)
|
||||
-promscrape.dnsSDCheckInterval duration
|
||||
Interval for checking for changes in dns. This works only if dns_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config for details (default 30s)
|
||||
-promscrape.dockerswarmSDCheckInterval duration
|
||||
Interval for checking for changes in dockerswarm. This works only if dockerswarm_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dockerswarm_sd_config for details (default 30s)
|
||||
-promscrape.dropOriginalLabels
|
||||
Whether to drop original labels for scrape targets at /targets and /api/v1/targets pages. This may be needed for reducing memory usage when original labels for big number of scrape targets occupy big amounts of memory. Note that this reduces debuggability for improper per-target relabeling configs
|
||||
-promscrape.ec2SDCheckInterval duration
|
||||
Interval for checking for changes in ec2. This works only if ec2_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ec2_sd_config for details (default 1m0s)
|
||||
-promscrape.eurekaSDCheckInterval duration
|
||||
Interval for checking for changes in eureka. This works only if eureka_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#eureka_sd_config for details (default 30s)
|
||||
-promscrape.fileSDCheckInterval duration
|
||||
Interval for checking for changes in 'file_sd_config'. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#file_sd_config for details (default 30s)
|
||||
-promscrape.gceSDCheckInterval duration
|
||||
Interval for checking for changes in gce. This works only if gce_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#gce_sd_config for details (default 1m0s)
|
||||
-promscrape.kubernetes.apiServerTimeout duration
|
||||
How frequently to reload the full state from Kuberntes API server (default 30m0s)
|
||||
-promscrape.kubernetesSDCheckInterval duration
|
||||
Interval for checking for changes in Kubernetes API server. This works only if kubernetes_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config for details (default 30s)
|
||||
-promscrape.maxDroppedTargets int
|
||||
The maximum number of droppedTargets shown at /api/v1/targets page. Increase this value if your setup drops more scrape targets during relabeling and you need investigating labels for all the dropped targets. Note that the increased number of tracked dropped targets may result in increased memory usage (default 1000)
|
||||
-promscrape.maxScrapeSize size
|
||||
The maximum size of scrape response in bytes to process from Prometheus targets. Bigger responses are rejected
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 16777216)
|
||||
-promscrape.openstackSDCheckInterval duration
|
||||
Interval for checking for changes in openstack API server. This works only if openstack_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#openstack_sd_config for details (default 30s)
|
||||
-promscrape.streamParse
|
||||
Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful for reducing memory usage when millions of metrics are exposed per each scrape target. It is posible to set 'stream_parse: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
|
||||
-promscrape.suppressDuplicateScrapeTargetErrors
|
||||
Whether to suppress 'duplicate scrape target' errors; see https://victoriametrics.github.io/vmagent.html#troubleshooting for details
|
||||
-promscrape.suppressScrapeErrors
|
||||
Whether to suppress scrape errors logging. The last error for each target is always available at '/targets' page even if scrape errors logging is suppressed
|
||||
-relabelConfig string
|
||||
Optional path to a file with relabeling rules, which are applied to all the ingested metrics. See https://victoriametrics.github.io/#relabeling for details
|
||||
-retentionPeriod value
|
||||
Data with timestamps outside the retentionPeriod is automatically deleted
|
||||
The following optional suffixes are supported: h (hour), d (day), w (week), y (year). If suffix isn't set, then the duration is counted in months (default 1)
|
||||
-search.cacheTimestampOffset duration
|
||||
The maximum duration since the current time for response data, which is always queried from the original raw data, without using the response cache. Increase this value if you see gaps in responses due to time synchronization issues between VictoriaMetrics and data sources (default 5m0s)
|
||||
-search.disableCache
|
||||
Whether to disable response caching. This may be useful during data backfilling
|
||||
-search.latencyOffset duration
|
||||
The time when data points become visible in query results after the collection. Too small value can result in incomplete last points for query results (default 30s)
|
||||
-search.logSlowQueryDuration duration
|
||||
Log queries with execution time exceeding this value. Zero disables slow query logging (default 5s)
|
||||
-search.maxConcurrentRequests int
|
||||
The maximum number of concurrent search requests. It shouldn't be high, since a single request can saturate all the CPU cores. See also -search.maxQueueDuration (default 8)
|
||||
-search.maxExportDuration duration
|
||||
The maximum duration for /api/v1/export call (default 720h0m0s)
|
||||
-search.maxLookback duration
|
||||
Synonim to -search.lookback-delta from Prometheus. The value is dynamically detected from interval between time series datapoints if not set. It can be overridden on per-query basis via max_lookback arg. See also '-search.maxStalenessInterval' flag, which has the same meaining due to historical reasons
|
||||
-search.maxPointsPerTimeseries int
|
||||
The maximum points per a single timeseries returned from /api/v1/query_range. This option doesn't limit the number of scanned raw samples in the database. The main purpose of this option is to limit the number of per-series points returned to graphing UI such as Grafana. There is no sense in setting this limit to values significantly exceeding horizontal resoultion of the graph (default 30000)
|
||||
-search.maxQueryDuration duration
|
||||
The maximum duration for query execution (default 30s)
|
||||
-search.maxQueryLen size
|
||||
The maximum search query length in bytes
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 16384)
|
||||
-search.maxQueueDuration duration
|
||||
The maximum time the request waits for execution when -search.maxConcurrentRequests limit is reached; see also -search.maxQueryDuration (default 10s)
|
||||
-search.maxStalenessInterval duration
|
||||
The maximum interval for staleness calculations. By default it is automatically calculated from the median interval between samples. This flag could be useful for tuning Prometheus data model closer to Influx-style data model. See https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness for details. See also '-search.maxLookback' flag, which has the same meaning due to historical reasons
|
||||
-search.maxStepForPointsAdjustment duration
|
||||
The maximum step when /api/v1/query_range handler adjusts points with timestamps closer than -search.latencyOffset to the current time. The adjustment is needed because such points may contain incomplete data (default 1m0s)
|
||||
-search.maxTagKeys int
|
||||
The maximum number of tag keys returned from /api/v1/labels (default 100000)
|
||||
-search.maxTagValueSuffixesPerSearch int
|
||||
The maximum number of tag value suffixes returned from /metrics/find (default 100000)
|
||||
-search.maxTagValues int
|
||||
The maximum number of tag values returned from /api/v1/label/<label_name>/values (default 100000)
|
||||
-search.maxUniqueTimeseries int
|
||||
The maximum number of unique time series each search can scan (default 300000)
|
||||
-search.minStalenessInterval duration
|
||||
The minimum interval for staleness calculations. This flag could be useful for removing gaps on graphs generated from time series with irregular intervals between samples. See also '-search.maxStalenessInterval'
|
||||
-search.queryStats.lastQueriesCount int
|
||||
Query stats for /api/v1/status/top_queries is tracked on this number of last queries. Zero value disables query stats tracking (default 20000)
|
||||
-search.queryStats.minQueryDuration int
|
||||
The minimum duration for queries to track in query stats at /api/v1/status/top_queries. Queries with lower duration are ignored in query stats
|
||||
-search.resetCacheAuthKey string
|
||||
Optional authKey for resetting rollup cache via /internal/resetRollupResultCache call
|
||||
-search.treatDotsAsIsInRegexps
|
||||
Whether to treat dots as is in regexp label filters used in queries. For example, foo{bar=~"a.b.c"} will be automatically converted to foo{bar=~"a\\.b\\.c"}, i.e. all the dots in regexp filters will be automatically escaped in order to match only dot char instead of matching any char. Dots in ".+", ".*" and ".{n}" regexps aren't escaped. This option is DEPRECATED in favor of {__graphite__="a.*.c"} syntax for selecting metrics matching the given Graphite metrics filter
|
||||
-selfScrapeInstance string
|
||||
Value for 'instance' label, which is added to self-scraped metrics (default "self")
|
||||
-selfScrapeInterval duration
|
||||
Interval for self-scraping own metrics at /metrics page
|
||||
-selfScrapeJob string
|
||||
Value for 'job' label, which is added to self-scraped metrics (default "victoria-metrics")
|
||||
-smallMergeConcurrency int
|
||||
The maximum number of CPU cores to use for small merges. Default value is used if set to 0
|
||||
-snapshotAuthKey string
|
||||
authKey, which must be passed in query string to /snapshot* pages
|
||||
-storageDataPath string
|
||||
Path to storage data (default "victoria-metrics-data")
|
||||
-tls
|
||||
Whether to enable TLS (aka HTTPS) for incoming requests. -tlsCertFile and -tlsKeyFile must be set if -tls is set
|
||||
-tlsCertFile string
|
||||
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs, since RSA certs are slow
|
||||
-tlsKeyFile string
|
||||
Path to file with TLS key. Used only if -tls is set
|
||||
-version
|
||||
Show VictoriaMetrics version
|
||||
```
|
||||
|
|
108
docs/vmagent.md
108
docs/vmagent.md
|
@ -255,6 +255,41 @@ If each target is scraped by multiple `vmagent` instances, then data deduplicati
|
|||
See [these docs](https://victoriametrics.github.io/#deduplication) for details.
|
||||
|
||||
|
||||
## Scraping targets via a proxy
|
||||
|
||||
`vmagent` supports scraping targets via http and https proxies. Proxy address must be specified in `proxy_url` option. For example, the following scrape config instructs
|
||||
target scraping via https proxy at `https://proxy-addr:1234`:
|
||||
|
||||
```yml
|
||||
scrape_configs:
|
||||
- job_name: foo
|
||||
proxy_url: https://proxy-addr:1234
|
||||
```
|
||||
|
||||
Proxy can be configured with the following optional settings:
|
||||
|
||||
* `proxy_bearer_token` and `proxy_bearer_token_file` for Bearer token authorization
|
||||
* `proxy_basic_auth` for Basic authorization. See [these docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config).
|
||||
* `proxy_tls_config` for TLS config. See [these docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#tls_config).
|
||||
|
||||
For example:
|
||||
|
||||
```yml
|
||||
scrape_configs:
|
||||
- job_name: foo
|
||||
proxy_url: https://proxy-addr:1234
|
||||
proxy_basic_auth:
|
||||
username: foobar
|
||||
password: secret
|
||||
proxy_tls_config:
|
||||
insecure_skip_verify: true
|
||||
cert_file: /path/to/cert
|
||||
key_file: /path/to/key
|
||||
ca_file: /path/to/ca
|
||||
server_name: real-server-name
|
||||
```
|
||||
|
||||
|
||||
## Monitoring
|
||||
|
||||
`vmagent` exports various metrics in Prometheus exposition format at `http://vmagent-host:8429/metrics` page. We recommend setting up regular scraping of this page
|
||||
|
@ -477,13 +512,16 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
|
|||
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
|
||||
-httpListenAddr string
|
||||
TCP address to listen for http connections. Set this flag to empty value in order to disable listening on any port. This mode may be useful for running multiple vmagent instances on the same server. Note that /targets and /metrics pages aren't available if -httpListenAddr='' (default ":8429")
|
||||
-import.maxLineLen max_rows_per_line
|
||||
The maximum length in bytes of a single line accepted by /api/v1/import; the line length can be limited with max_rows_per_line query arg passed to /api/v1/export
|
||||
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 104857600)
|
||||
-influx.maxLineSize value
|
||||
-import.maxLineLen size
|
||||
The maximum length in bytes of a single line accepted by /api/v1/import; the line length can be limited with 'max_rows_per_line' query arg passed to /api/v1/export
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 104857600)
|
||||
-influx.databaseNames array
|
||||
Comma-separated list of database names to return from /query and /influx/query API. This can be needed for accepting data from Telegraf plugins such as https://github.com/fangli/fluent-plugin-influxdb
|
||||
Supports array of values separated by comma or specified via multiple flags.
|
||||
-influx.maxLineSize size
|
||||
The maximum size in bytes for a single Influx line during parsing
|
||||
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 262144)
|
||||
-influxListenAddr http://<vmagent>:8429/write
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 262144)
|
||||
-influxListenAddr string
|
||||
TCP and UDP address to listen for Influx line protocol data. Usually :8189 must be set. Doesn't work if empty. This flag isn't needed when ingesting data over HTTP - just send it to http://<vmagent>:8429/write
|
||||
-influxMeasurementFieldSeparator string
|
||||
Separator for '{measurement}{separator}{field_name}' metric name when inserted via Influx line protocol (default "_")
|
||||
|
@ -511,12 +549,12 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
|
|||
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
|
||||
-maxConcurrentInserts int
|
||||
The maximum number of concurrent inserts. Default value should work for most cases, since it minimizes the overhead for concurrent inserts. This option is tigthly coupled with -insert.maxQueueDuration (default 16)
|
||||
-maxInsertRequestSize value
|
||||
-maxInsertRequestSize size
|
||||
The maximum size in bytes of a single Prometheus remote_write API request
|
||||
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
|
||||
-memory.allowedBytes value
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
|
||||
-memory.allowedBytes size
|
||||
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
|
||||
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-memory.allowedPercent float
|
||||
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
|
||||
-metricsAuthKey string
|
||||
|
@ -527,9 +565,9 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
|
|||
TCP and UDP address to listen for OpentTSDB metrics. Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. Usually :4242 must be set. Doesn't work if empty
|
||||
-opentsdbTrimTimestamp duration
|
||||
Trim timestamps for OpenTSDB 'telnet put' data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
|
||||
-opentsdbhttp.maxInsertRequestSize value
|
||||
-opentsdbhttp.maxInsertRequestSize size
|
||||
The maximum size of OpenTSDB HTTP put request
|
||||
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
|
||||
-opentsdbhttpTrimTimestamp duration
|
||||
Trim timestamps for OpenTSDB HTTP data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
|
||||
-pprofAuthKey string
|
||||
|
@ -538,6 +576,8 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
|
|||
The number of number in the cluster of scrapers. It must be an unique value in the range 0 ... promscrape.cluster.membersCount-1 across scrapers in the cluster
|
||||
-promscrape.cluster.membersCount int
|
||||
The number of members in a cluster of scrapers. Each member must have an unique -promscrape.cluster.memberNum in the range 0 ... promscrape.cluster.membersCount-1 . Each member then scrapes roughly 1/N of all the targets. By default cluster scraping is disabled, i.e. a single scraper scrapes all the targets
|
||||
-promscrape.cluster.replicationFactor int
|
||||
The number of members in the cluster, which scrape the same targets. If the replication factor is greater than 2, then the deduplication must be enabled at remote storage side. See https://victoriametrics.github.io/#deduplication (default 1)
|
||||
-promscrape.config string
|
||||
Optional path to Prometheus config file with 'scrape_configs' section containing targets to scrape. See https://victoriametrics.github.io/#how-to-scrape-prometheus-exporters-such-as-node-exporter for details
|
||||
-promscrape.config.dryRun
|
||||
|
@ -546,45 +586,45 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
|
|||
Whether to allow only supported fields in -promscrape.config . By default unsupported fields are silently skipped
|
||||
-promscrape.configCheckInterval duration
|
||||
Interval for checking for changes in '-promscrape.config' file. By default the checking is disabled. Send SIGHUP signal in order to force config check for changes
|
||||
-promscrape.consulSDCheckInterval consul_sd_configs
|
||||
-promscrape.consulSDCheckInterval duration
|
||||
Interval for checking for changes in Consul. This works only if consul_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config for details (default 30s)
|
||||
-promscrape.disableCompression
|
||||
Whether to disable sending 'Accept-Encoding: gzip' request headers to all the scrape targets. This may reduce CPU usage on scrape targets at the cost of higher network bandwidth utilization. It is possible to set 'disable_compression: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
|
||||
-promscrape.disableKeepAlive disable_keepalive: true
|
||||
Whether to disable HTTP keep-alive connections when scraping all the targets. This may be useful when targets has no support for HTTP keep-alive connection. It is possible to set disable_keepalive: true individually per each 'scrape_config` section in '-promscrape.config' for fine grained control. Note that disabling HTTP keep-alive may increase load on both vmagent and scrape targets
|
||||
-promscrape.disableKeepAlive
|
||||
Whether to disable HTTP keep-alive connections when scraping all the targets. This may be useful when targets has no support for HTTP keep-alive connection. It is possible to set 'disable_keepalive: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control. Note that disabling HTTP keep-alive may increase load on both vmagent and scrape targets
|
||||
-promscrape.discovery.concurrency int
|
||||
The maximum number of concurrent requests to Prometheus autodiscovery API (Consul, Kubernetes, etc.) (default 100)
|
||||
-promscrape.discovery.concurrentWaitTime duration
|
||||
The maximum duration for waiting to perform API requests if more than -promscrape.discovery.concurrency requests are simultaneously performed (default 1m0s)
|
||||
-promscrape.dnsSDCheckInterval dns_sd_configs
|
||||
-promscrape.dnsSDCheckInterval duration
|
||||
Interval for checking for changes in dns. This works only if dns_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config for details (default 30s)
|
||||
-promscrape.dockerswarmSDCheckInterval dockerswarm_sd_configs
|
||||
-promscrape.dockerswarmSDCheckInterval duration
|
||||
Interval for checking for changes in dockerswarm. This works only if dockerswarm_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dockerswarm_sd_config for details (default 30s)
|
||||
-promscrape.dropOriginalLabels
|
||||
Whether to drop original labels for scrape targets at /targets and /api/v1/targets pages. This may be needed for reducing memory usage when original labels for big number of scrape targets occupy big amounts of memory. Note that this reduces debuggability for improper per-target relabeling configs
|
||||
-promscrape.ec2SDCheckInterval ec2_sd_configs
|
||||
-promscrape.ec2SDCheckInterval duration
|
||||
Interval for checking for changes in ec2. This works only if ec2_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ec2_sd_config for details (default 1m0s)
|
||||
-promscrape.eurekaSDCheckInterval eureka_sd_configs
|
||||
-promscrape.eurekaSDCheckInterval duration
|
||||
Interval for checking for changes in eureka. This works only if eureka_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#eureka_sd_config for details (default 30s)
|
||||
-promscrape.fileSDCheckInterval duration
|
||||
Interval for checking for changes in 'file_sd_config'. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#file_sd_config for details (default 30s)
|
||||
-promscrape.gceSDCheckInterval gce_sd_configs
|
||||
-promscrape.gceSDCheckInterval duration
|
||||
Interval for checking for changes in gce. This works only if gce_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#gce_sd_config for details (default 1m0s)
|
||||
-promscrape.kubernetes.apiServerTimeout duration
|
||||
How frequently to reload the full state from Kuberntes API server (default 10m0s)
|
||||
-promscrape.kubernetesSDCheckInterval kubernetes_sd_configs
|
||||
How frequently to reload the full state from Kuberntes API server (default 30m0s)
|
||||
-promscrape.kubernetesSDCheckInterval duration
|
||||
Interval for checking for changes in Kubernetes API server. This works only if kubernetes_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config for details (default 30s)
|
||||
-promscrape.maxDroppedTargets droppedTargets
|
||||
-promscrape.maxDroppedTargets int
|
||||
The maximum number of droppedTargets shown at /api/v1/targets page. Increase this value if your setup drops more scrape targets during relabeling and you need investigating labels for all the dropped targets. Note that the increased number of tracked dropped targets may result in increased memory usage (default 1000)
|
||||
-promscrape.maxScrapeSize value
|
||||
-promscrape.maxScrapeSize size
|
||||
The maximum size of scrape response in bytes to process from Prometheus targets. Bigger responses are rejected
|
||||
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 16777216)
|
||||
-promscrape.openstackSDCheckInterval openstack_sd_configs
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 16777216)
|
||||
-promscrape.openstackSDCheckInterval duration
|
||||
Interval for checking for changes in openstack API server. This works only if openstack_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#openstack_sd_config for details (default 30s)
|
||||
-promscrape.streamParse stream_parse: true
|
||||
Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful for reducing memory usage when millions of metrics are exposed per each scrape target. It is posible to set stream_parse: true individually per each `scrape_config` section in `-promscrape.config` for fine grained control
|
||||
-promscrape.suppressDuplicateScrapeTargetErrors duplicate scrape target
|
||||
Whether to suppress duplicate scrape target errors; see https://victoriametrics.github.io/vmagent.html#troubleshooting for details
|
||||
-promscrape.streamParse
|
||||
Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful for reducing memory usage when millions of metrics are exposed per each scrape target. It is posible to set 'stream_parse: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
|
||||
-promscrape.suppressDuplicateScrapeTargetErrors
|
||||
Whether to suppress 'duplicate scrape target' errors; see https://victoriametrics.github.io/vmagent.html#troubleshooting for details
|
||||
-promscrape.suppressScrapeErrors
|
||||
Whether to suppress scrape errors logging. The last error for each target is always available at '/targets' page even if scrape errors logging is suppressed
|
||||
-remoteWrite.basicAuth.password array
|
||||
|
@ -601,12 +641,12 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
|
|||
-remoteWrite.label array
|
||||
Optional label in the form 'name=value' to add to all the metrics before sending them to -remoteWrite.url. Pass multiple -remoteWrite.label flags in order to add multiple flags to metrics before sending them to remote storage
|
||||
Supports array of values separated by comma or specified via multiple flags.
|
||||
-remoteWrite.maxBlockSize value
|
||||
-remoteWrite.maxBlockSize size
|
||||
The maximum size in bytes of unpacked request to send to remote storage. It shouldn't exceed -maxInsertRequestSize from VictoriaMetrics
|
||||
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 8388608)
|
||||
-remoteWrite.maxDiskUsagePerURL value
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 8388608)
|
||||
-remoteWrite.maxDiskUsagePerURL size
|
||||
The maximum file-based buffer size in bytes at -remoteWrite.tmpDataPath for each -remoteWrite.url. When buffer size reaches the configured maximum, then old data is dropped when adding new data to the buffer. Buffered data is stored in ~500MB chunks, so the minimum practical value for this flag is 500000000. Disk usage is unlimited if the value is set to 0
|
||||
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-remoteWrite.proxyURL array
|
||||
Optional proxy URL for writing data to -remoteWrite.url. Supported proxies: http, https, socks5. Example: -remoteWrite.proxyURL=socks5://proxy:1234
|
||||
Supports array of values separated by comma or specified via multiple flags.
|
||||
|
|
|
@ -232,7 +232,7 @@ The shortlist of configuration flags is the following:
|
|||
How often to evaluate the rules (default 1m0s)
|
||||
-external.alert.source string
|
||||
External Alert Source allows to override the Source link for alerts sent to AlertManager for cases where you want to build a custom link to Grafana, Prometheus or any other service.
|
||||
eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{{$expr|quotesEscape|crlfEscape|pathEscape}}\"},{\"mode\":\"Metrics\"},{\"ui\":[true,true,true,\"none\"]}]'.If empty '/api/v1/:groupID/alertID/status' is used
|
||||
eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{{$expr|quotesEscape|crlfEscape|queryEscape}}\"},{\"mode\":\"Metrics\"},{\"ui\":[true,true,true,\"none\"]}]'.If empty '/api/v1/:groupID/alertID/status' is used
|
||||
-external.label array
|
||||
Optional label in the form 'name=value' to add to all generated recording rules and alerts. Pass multiple -label flags in order to add multiple label sets.
|
||||
Supports array of values separated by comma or specified via multiple flags.
|
||||
|
@ -272,9 +272,9 @@ The shortlist of configuration flags is the following:
|
|||
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
|
||||
-loggerWarnsPerSecondLimit int
|
||||
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
|
||||
-memory.allowedBytes value
|
||||
-memory.allowedBytes size
|
||||
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
|
||||
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-memory.allowedPercent float
|
||||
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
|
||||
-metricsAuthKey string
|
||||
|
|
|
@ -208,9 +208,9 @@ See the docs at https://victoriametrics.github.io/vmauth.html .
|
|||
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
|
||||
-loggerWarnsPerSecondLimit int
|
||||
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
|
||||
-memory.allowedBytes value
|
||||
-memory.allowedBytes size
|
||||
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
|
||||
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-memory.allowedPercent float
|
||||
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
|
||||
-metricsAuthKey string
|
||||
|
|
|
@ -205,12 +205,12 @@ See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time-
|
|||
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
|
||||
-loggerWarnsPerSecondLimit int
|
||||
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
|
||||
-maxBytesPerSecond value
|
||||
-maxBytesPerSecond size
|
||||
The maximum upload speed. There is no limit if it is set to 0
|
||||
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-memory.allowedBytes value
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-memory.allowedBytes size
|
||||
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
|
||||
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-memory.allowedPercent float
|
||||
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
|
||||
-origin string
|
||||
|
|
|
@ -105,12 +105,12 @@ i.e. the end result would be similar to [rsync --delete](https://askubuntu.com/q
|
|||
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
|
||||
-loggerWarnsPerSecondLimit int
|
||||
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
|
||||
-maxBytesPerSecond value
|
||||
-maxBytesPerSecond size
|
||||
The maximum download speed. There is no limit if it is set to 0
|
||||
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-memory.allowedBytes value
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-memory.allowedBytes size
|
||||
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
|
||||
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-memory.allowedPercent float
|
||||
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
|
||||
-skipBackupCompleteCheck
|
||||
|
|
2
go.mod
2
go.mod
|
@ -7,7 +7,7 @@ require (
|
|||
|
||||
// Do not use the original github.com/valyala/fasthttp because of issues
|
||||
// like https://github.com/valyala/fasthttp/commit/996610f021ff45fdc98c2ce7884d5fa4e7f9199b
|
||||
github.com/VictoriaMetrics/fasthttp v1.0.13
|
||||
github.com/VictoriaMetrics/fasthttp v1.0.14
|
||||
github.com/VictoriaMetrics/metrics v1.15.2
|
||||
github.com/VictoriaMetrics/metricsql v0.14.0
|
||||
github.com/aws/aws-sdk-go v1.37.26
|
||||
|
|
5
go.sum
5
go.sum
|
@ -82,8 +82,8 @@ github.com/Shopify/sarama v1.19.0/go.mod h1:FVkBWblsNy7DGZRfXLU0O9RCGt5g3g3yEuWX
|
|||
github.com/Shopify/toxiproxy v2.1.4+incompatible/go.mod h1:OXgGpZ6Cli1/URJOF1DMxUHB2q5Ap20/P/eIdh4G0pI=
|
||||
github.com/VictoriaMetrics/fastcache v1.5.8 h1:XW+YVx9lEXITBVv35ugK9OyotdNJVcbza69o3jmqWuI=
|
||||
github.com/VictoriaMetrics/fastcache v1.5.8/go.mod h1:SiMZNgwEPJ9qWLshu9tyuE6bKc9ZWYhcNV/L7jurprQ=
|
||||
github.com/VictoriaMetrics/fasthttp v1.0.13 h1:5JNS4vSPdN4QyfcpAg3Y1Wznf0uXEuSOFpeIlFw3MgM=
|
||||
github.com/VictoriaMetrics/fasthttp v1.0.13/go.mod h1:3SeUL4zwB/p/a9aEeRc6gdlbrtNHXBJR6N376EgiSHU=
|
||||
github.com/VictoriaMetrics/fasthttp v1.0.14 h1:iWCdHg7JQ1SO0xvPAgw3QFpFT3he+Ugdshg+1clN6CQ=
|
||||
github.com/VictoriaMetrics/fasthttp v1.0.14/go.mod h1:eDVgYyGts3xXpYpVGDxQ3ZlQKW5TSvOqfc9FryjH1JA=
|
||||
github.com/VictoriaMetrics/metrics v1.12.2/go.mod h1:Z1tSfPfngDn12bTfZSCqArT3OPY3u88J12hSoOhuiRE=
|
||||
github.com/VictoriaMetrics/metrics v1.15.2 h1:w/GD8L9tm+gvx1oZvAofRRXwammiicdI0jgLghA2Gdo=
|
||||
github.com/VictoriaMetrics/metrics v1.15.2/go.mod h1:Z1tSfPfngDn12bTfZSCqArT3OPY3u88J12hSoOhuiRE=
|
||||
|
@ -507,7 +507,6 @@ github.com/klauspost/compress v1.4.0/go.mod h1:RyIbtBH6LamlWaDj8nUwkbUhJ87Yi3uG0
|
|||
github.com/klauspost/compress v1.9.5/go.mod h1:RyIbtBH6LamlWaDj8nUwkbUhJ87Yi3uG0guNDohfE1A=
|
||||
github.com/klauspost/compress v1.10.7/go.mod h1:aoV0uJVorq1K+umq18yTdKaF57EivdYsUV+/s2qKfXs=
|
||||
github.com/klauspost/compress v1.11.0/go.mod h1:aoV0uJVorq1K+umq18yTdKaF57EivdYsUV+/s2qKfXs=
|
||||
github.com/klauspost/compress v1.11.3/go.mod h1:aoV0uJVorq1K+umq18yTdKaF57EivdYsUV+/s2qKfXs=
|
||||
github.com/klauspost/compress v1.11.12 h1:famVnQVu7QwryBN4jNseQdUKES71ZAOnB6UQQJPZvqk=
|
||||
github.com/klauspost/compress v1.11.12/go.mod h1:aoV0uJVorq1K+umq18yTdKaF57EivdYsUV+/s2qKfXs=
|
||||
github.com/klauspost/cpuid v0.0.0-20170728055534-ae7887de9fa5/go.mod h1:Pj4uuM528wm8OyEC2QMXAi2YiTZ96dNQPGgoMS4s3ek=
|
||||
|
|
|
@ -9,7 +9,7 @@ import (
|
|||
|
||||
// NewBytes returns new `bytes` flag with the given name, defaultValue and description.
|
||||
func NewBytes(name string, defaultValue int, description string) *Bytes {
|
||||
description += "\nSupports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB"
|
||||
description += "\nSupports the following optional suffixes for `size` values: KB, MB, GB, KiB, MiB, GiB"
|
||||
b := Bytes{
|
||||
N: defaultValue,
|
||||
valueString: fmt.Sprintf("%d", defaultValue),
|
||||
|
|
|
@ -12,6 +12,7 @@ import (
|
|||
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/buildinfo"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/memory"
|
||||
"github.com/VictoriaMetrics/metrics"
|
||||
|
@ -48,6 +49,8 @@ func writePrometheusMetrics(w io.Writer) {
|
|||
fmt.Fprintf(w, "vm_app_version{version=%q, short_version=%q} 1\n", buildinfo.Version,
|
||||
versionRe.FindString(buildinfo.Version))
|
||||
fmt.Fprintf(w, "vm_allowed_memory_bytes %d\n", memory.Allowed())
|
||||
fmt.Fprintf(w, "vm_available_memory_bytes %d\n", memory.Allowed()+memory.Remaining())
|
||||
fmt.Fprintf(w, "vm_available_cpu_cores %d\n", cgroup.AvailableCPUs())
|
||||
|
||||
// Export start time and uptime in seconds
|
||||
fmt.Fprintf(w, "vm_app_start_timestamp %d\n", startTime.Unix())
|
||||
|
|
29
lib/influxutils/influxutils.go
Normal file
29
lib/influxutils/influxutils.go
Normal file
|
@ -0,0 +1,29 @@
|
|||
package influxutils
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"net/http"
|
||||
"strings"
|
||||
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
|
||||
)
|
||||
|
||||
var influxDatabaseNames = flagutil.NewArray("influx.databaseNames", "Comma-separated list of database names to return from /query and /influx/query API. "+
|
||||
"This can be needed for accepting data from Telegraf plugins such as https://github.com/fangli/fluent-plugin-influxdb")
|
||||
|
||||
// WriteDatabaseNames writes influxDatabaseNames to w.
|
||||
func WriteDatabaseNames(w http.ResponseWriter) {
|
||||
// Emulate fake response for influx query.
|
||||
// This is required for TSBS benchmark and some Telegraf plugins.
|
||||
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1124
|
||||
w.Header().Set("Content-Type", "application/json; charset=utf-8")
|
||||
dbNames := *influxDatabaseNames
|
||||
if len(dbNames) == 0 {
|
||||
dbNames = []string{"_internal"}
|
||||
}
|
||||
dbs := make([]string, len(dbNames))
|
||||
for i := range dbNames {
|
||||
dbs[i] = fmt.Sprintf(`[%q]`, dbNames[i])
|
||||
}
|
||||
fmt.Fprintf(w, `{"results":[{"statement_id":0,"series":[{"name":"databases","columns":["name"],"values":[%s]}]}]}`, strings.Join(dbs, ","))
|
||||
}
|
|
@ -15,7 +15,9 @@ import (
|
|||
// ...
|
||||
// pools[n] is for capacities from 2^(n+2)+1 to 2^(n+3)
|
||||
//
|
||||
var pools [30]sync.Pool
|
||||
// Limit the maximum capacity to 2^18, since there are no performance benefits
|
||||
// in caching byte slices with bigger capacities.
|
||||
var pools [17]sync.Pool
|
||||
|
||||
// Get returns byte buffer with the given capacity.
|
||||
func Get(capacity int) *bytesutil.ByteBuffer {
|
||||
|
@ -37,10 +39,12 @@ func Get(capacity int) *bytesutil.ByteBuffer {
|
|||
// Put returns bb to the pool.
|
||||
func Put(bb *bytesutil.ByteBuffer) {
|
||||
capacity := cap(bb.B)
|
||||
id, _ := getPoolIDAndCapacity(capacity)
|
||||
id, poolCapacity := getPoolIDAndCapacity(capacity)
|
||||
if capacity <= poolCapacity {
|
||||
bb.Reset()
|
||||
pools[id].Put(bb)
|
||||
}
|
||||
}
|
||||
|
||||
func getPoolIDAndCapacity(size int) (int, int) {
|
||||
size--
|
||||
|
@ -49,7 +53,7 @@ func getPoolIDAndCapacity(size int) (int, int) {
|
|||
}
|
||||
size >>= 3
|
||||
id := bits.Len(uint(size))
|
||||
if id > len(pools) {
|
||||
if id >= len(pools) {
|
||||
id = len(pools) - 1
|
||||
}
|
||||
return id, (1 << (id + 3))
|
||||
|
|
|
@ -65,13 +65,16 @@ func (ac *Config) tlsCertificateString() string {
|
|||
// NewTLSConfig returns new TLS config for the given ac.
|
||||
func (ac *Config) NewTLSConfig() *tls.Config {
|
||||
tlsCfg := &tls.Config{
|
||||
RootCAs: ac.TLSRootCA,
|
||||
ClientSessionCache: tls.NewLRUClientSessionCache(0),
|
||||
}
|
||||
if ac == nil {
|
||||
return tlsCfg
|
||||
}
|
||||
if ac.TLSCertificate != nil {
|
||||
// Do not set tlsCfg.GetClientCertificate, since tlsCfg.Certificates should work OK.
|
||||
tlsCfg.Certificates = []tls.Certificate{*ac.TLSCertificate}
|
||||
}
|
||||
tlsCfg.RootCAs = ac.TLSRootCA
|
||||
tlsCfg.ServerName = ac.TLSServerName
|
||||
tlsCfg.InsecureSkipVerify = ac.TLSInsecureSkipVerify
|
||||
return tlsCfg
|
||||
|
|
|
@ -27,11 +27,11 @@ var (
|
|||
"It is possible to set 'disable_compression: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control")
|
||||
disableKeepAlive = flag.Bool("promscrape.disableKeepAlive", false, "Whether to disable HTTP keep-alive connections when scraping all the targets. "+
|
||||
"This may be useful when targets has no support for HTTP keep-alive connection. "+
|
||||
"It is possible to set `disable_keepalive: true` individually per each 'scrape_config` section in '-promscrape.config' for fine grained control. "+
|
||||
"It is possible to set 'disable_keepalive: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control. "+
|
||||
"Note that disabling HTTP keep-alive may increase load on both vmagent and scrape targets")
|
||||
streamParse = flag.Bool("promscrape.streamParse", false, "Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful "+
|
||||
"for reducing memory usage when millions of metrics are exposed per each scrape target. "+
|
||||
"It is posible to set `stream_parse: true` individually per each `scrape_config` section in `-promscrape.config` for fine grained control")
|
||||
"It is posible to set 'stream_parse: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control")
|
||||
)
|
||||
|
||||
type client struct {
|
||||
|
@ -67,7 +67,7 @@ func newClient(sw *ScrapeWork) *client {
|
|||
host += ":443"
|
||||
}
|
||||
}
|
||||
dialFunc, err := newStatDialFunc(sw.ProxyURL, tlsCfg)
|
||||
dialFunc, err := newStatDialFunc(sw.ProxyURL, sw.ProxyAuthConfig)
|
||||
if err != nil {
|
||||
logger.Fatalf("cannot create dial func: %s", err)
|
||||
}
|
||||
|
|
|
@ -115,6 +115,10 @@ type ScrapeConfig struct {
|
|||
StreamParse bool `yaml:"stream_parse,omitempty"`
|
||||
ScrapeAlignInterval time.Duration `yaml:"scrape_align_interval,omitempty"`
|
||||
ScrapeOffset time.Duration `yaml:"scrape_offset,omitempty"`
|
||||
ProxyTLSConfig *promauth.TLSConfig `yaml:"proxy_tls_config,omitempty"`
|
||||
ProxyBasicAuth *promauth.BasicAuthConfig `yaml:"proxy_basic_auth,omitempty"`
|
||||
ProxyBearerToken string `yaml:"proxy_bearer_token,omitempty"`
|
||||
ProxyBearerTokenFile string `yaml:"proxy_bearer_token_file,omitempty"`
|
||||
|
||||
// This is set in loadConfig
|
||||
swc *scrapeWorkConfig
|
||||
|
@ -247,7 +251,7 @@ func (cfg *Config) getKubernetesSDScrapeWork(prev []*ScrapeWork) []*ScrapeWork {
|
|||
target := metaLabels["__address__"]
|
||||
sw, err := sc.swc.getScrapeWork(target, nil, metaLabels)
|
||||
if err != nil {
|
||||
logger.Errorf("cannot create kubernetes_sd_config target target %q for job_name %q: %s", target, sc.swc.jobName, err)
|
||||
logger.Errorf("cannot create kubernetes_sd_config target %q for job_name %q: %s", target, sc.swc.jobName, err)
|
||||
return nil
|
||||
}
|
||||
return sw
|
||||
|
@ -543,6 +547,10 @@ func getScrapeWorkConfig(sc *ScrapeConfig, baseDir string, globalCfg *GlobalConf
|
|||
if err != nil {
|
||||
return nil, fmt.Errorf("cannot parse auth config for `job_name` %q: %w", jobName, err)
|
||||
}
|
||||
proxyAC, err := promauth.NewConfig(baseDir, sc.ProxyBasicAuth, sc.ProxyBearerToken, sc.ProxyBearerTokenFile, sc.ProxyTLSConfig)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("cannot parse proxy auth config for `job_name` %q: %w", jobName, err)
|
||||
}
|
||||
relabelConfigs, err := promrelabel.ParseRelabelConfigs(sc.RelabelConfigs)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("cannot parse `relabel_configs` for `job_name` %q: %w", jobName, err)
|
||||
|
@ -559,6 +567,7 @@ func getScrapeWorkConfig(sc *ScrapeConfig, baseDir string, globalCfg *GlobalConf
|
|||
scheme: scheme,
|
||||
params: params,
|
||||
proxyURL: sc.ProxyURL,
|
||||
proxyAuthConfig: proxyAC,
|
||||
authConfig: ac,
|
||||
honorLabels: honorLabels,
|
||||
honorTimestamps: honorTimestamps,
|
||||
|
@ -583,6 +592,7 @@ type scrapeWorkConfig struct {
|
|||
scheme string
|
||||
params map[string][]string
|
||||
proxyURL proxy.URL
|
||||
proxyAuthConfig *promauth.Config
|
||||
authConfig *promauth.Config
|
||||
honorLabels bool
|
||||
honorTimestamps bool
|
||||
|
@ -849,6 +859,7 @@ func (swc *scrapeWorkConfig) getScrapeWork(target string, extraLabels, metaLabel
|
|||
OriginalLabels: originalLabels,
|
||||
Labels: labels,
|
||||
ProxyURL: swc.proxyURL,
|
||||
ProxyAuthConfig: swc.proxyAuthConfig,
|
||||
AuthConfig: swc.authConfig,
|
||||
MetricRelabelConfigs: swc.metricRelabelConfigs,
|
||||
SampleLimit: swc.sampleLimit,
|
||||
|
|
|
@ -10,6 +10,7 @@ import (
|
|||
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/proxy"
|
||||
)
|
||||
|
||||
func TestNeedSkipScrapeWork(t *testing.T) {
|
||||
|
@ -154,6 +155,7 @@ scrape_configs:
|
|||
},
|
||||
},
|
||||
AuthConfig: &promauth.Config{},
|
||||
ProxyAuthConfig: &promauth.Config{},
|
||||
jobNameOriginal: "blackbox",
|
||||
}}
|
||||
if !reflect.DeepEqual(sws, swsExpected) {
|
||||
|
@ -548,6 +550,7 @@ scrape_configs:
|
|||
},
|
||||
},
|
||||
AuthConfig: &promauth.Config{},
|
||||
ProxyAuthConfig: &promauth.Config{},
|
||||
jobNameOriginal: "foo",
|
||||
},
|
||||
{
|
||||
|
@ -587,6 +590,7 @@ scrape_configs:
|
|||
},
|
||||
},
|
||||
AuthConfig: &promauth.Config{},
|
||||
ProxyAuthConfig: &promauth.Config{},
|
||||
jobNameOriginal: "foo",
|
||||
},
|
||||
{
|
||||
|
@ -626,6 +630,7 @@ scrape_configs:
|
|||
},
|
||||
},
|
||||
AuthConfig: &promauth.Config{},
|
||||
ProxyAuthConfig: &promauth.Config{},
|
||||
jobNameOriginal: "foo",
|
||||
},
|
||||
})
|
||||
|
@ -679,6 +684,7 @@ scrape_configs:
|
|||
},
|
||||
},
|
||||
AuthConfig: &promauth.Config{},
|
||||
ProxyAuthConfig: &promauth.Config{},
|
||||
jobNameOriginal: "foo",
|
||||
},
|
||||
})
|
||||
|
@ -729,6 +735,7 @@ scrape_configs:
|
|||
},
|
||||
},
|
||||
AuthConfig: &promauth.Config{},
|
||||
ProxyAuthConfig: &promauth.Config{},
|
||||
jobNameOriginal: "foo",
|
||||
},
|
||||
})
|
||||
|
@ -748,6 +755,10 @@ scrape_configs:
|
|||
p: ["x&y", "="]
|
||||
xaa:
|
||||
bearer_token: xyz
|
||||
proxy_url: http://foo.bar
|
||||
proxy_basic_auth:
|
||||
username: foo
|
||||
password: bar
|
||||
static_configs:
|
||||
- targets: ["foo.bar", "aaa"]
|
||||
labels:
|
||||
|
@ -801,6 +812,10 @@ scrape_configs:
|
|||
AuthConfig: &promauth.Config{
|
||||
Authorization: "Bearer xyz",
|
||||
},
|
||||
ProxyAuthConfig: &promauth.Config{
|
||||
Authorization: "Basic Zm9vOmJhcg==",
|
||||
},
|
||||
ProxyURL: proxy.MustNewURL("http://foo.bar"),
|
||||
jobNameOriginal: "foo",
|
||||
},
|
||||
{
|
||||
|
@ -842,6 +857,10 @@ scrape_configs:
|
|||
AuthConfig: &promauth.Config{
|
||||
Authorization: "Bearer xyz",
|
||||
},
|
||||
ProxyAuthConfig: &promauth.Config{
|
||||
Authorization: "Basic Zm9vOmJhcg==",
|
||||
},
|
||||
ProxyURL: proxy.MustNewURL("http://foo.bar"),
|
||||
jobNameOriginal: "foo",
|
||||
},
|
||||
{
|
||||
|
@ -877,6 +896,7 @@ scrape_configs:
|
|||
TLSServerName: "foobar",
|
||||
TLSInsecureSkipVerify: true,
|
||||
},
|
||||
ProxyAuthConfig: &promauth.Config{},
|
||||
jobNameOriginal: "qwer",
|
||||
},
|
||||
})
|
||||
|
@ -955,6 +975,7 @@ scrape_configs:
|
|||
},
|
||||
},
|
||||
AuthConfig: &promauth.Config{},
|
||||
ProxyAuthConfig: &promauth.Config{},
|
||||
jobNameOriginal: "foo",
|
||||
},
|
||||
})
|
||||
|
@ -1017,6 +1038,7 @@ scrape_configs:
|
|||
},
|
||||
},
|
||||
AuthConfig: &promauth.Config{},
|
||||
ProxyAuthConfig: &promauth.Config{},
|
||||
jobNameOriginal: "foo",
|
||||
},
|
||||
})
|
||||
|
@ -1060,6 +1082,7 @@ scrape_configs:
|
|||
},
|
||||
},
|
||||
AuthConfig: &promauth.Config{},
|
||||
ProxyAuthConfig: &promauth.Config{},
|
||||
jobNameOriginal: "foo",
|
||||
},
|
||||
})
|
||||
|
@ -1100,6 +1123,7 @@ scrape_configs:
|
|||
},
|
||||
},
|
||||
AuthConfig: &promauth.Config{},
|
||||
ProxyAuthConfig: &promauth.Config{},
|
||||
MetricRelabelConfigs: mustParseRelabelConfigs(`
|
||||
- source_labels: [foo]
|
||||
target_label: abc
|
||||
|
@ -1145,6 +1169,7 @@ scrape_configs:
|
|||
AuthConfig: &promauth.Config{
|
||||
Authorization: "Basic eHl6OnNlY3JldC1wYXNz",
|
||||
},
|
||||
ProxyAuthConfig: &promauth.Config{},
|
||||
jobNameOriginal: "foo",
|
||||
},
|
||||
})
|
||||
|
@ -1184,6 +1209,7 @@ scrape_configs:
|
|||
AuthConfig: &promauth.Config{
|
||||
Authorization: "Bearer secret-pass",
|
||||
},
|
||||
ProxyAuthConfig: &promauth.Config{},
|
||||
jobNameOriginal: "foo",
|
||||
},
|
||||
})
|
||||
|
@ -1229,6 +1255,7 @@ scrape_configs:
|
|||
AuthConfig: &promauth.Config{
|
||||
TLSCertificate: &snakeoilCert,
|
||||
},
|
||||
ProxyAuthConfig: &promauth.Config{},
|
||||
jobNameOriginal: "foo",
|
||||
},
|
||||
})
|
||||
|
@ -1291,6 +1318,7 @@ scrape_configs:
|
|||
},
|
||||
},
|
||||
AuthConfig: &promauth.Config{},
|
||||
ProxyAuthConfig: &promauth.Config{},
|
||||
jobNameOriginal: "aaa",
|
||||
},
|
||||
})
|
||||
|
@ -1352,6 +1380,7 @@ scrape_configs:
|
|||
},
|
||||
},
|
||||
AuthConfig: &promauth.Config{},
|
||||
ProxyAuthConfig: &promauth.Config{},
|
||||
SampleLimit: 100,
|
||||
DisableKeepAlive: true,
|
||||
DisableCompression: true,
|
||||
|
@ -1398,6 +1427,7 @@ scrape_configs:
|
|||
},
|
||||
jobNameOriginal: "path wo slash",
|
||||
AuthConfig: &promauth.Config{},
|
||||
ProxyAuthConfig: &promauth.Config{},
|
||||
},
|
||||
})
|
||||
}
|
||||
|
|
|
@ -15,7 +15,7 @@ import (
|
|||
|
||||
// SDCheckInterval is check interval for Consul service discovery.
|
||||
var SDCheckInterval = flag.Duration("promscrape.consulSDCheckInterval", 30*time.Second, "Interval for checking for changes in Consul. "+
|
||||
"This works only if `consul_sd_configs` is configured in '-promscrape.config' file. "+
|
||||
"This works only if consul_sd_configs is configured in '-promscrape.config' file. "+
|
||||
"See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config for details")
|
||||
|
||||
// consulWatcher is a watcher for consul api, updates services map in background with long-polling.
|
||||
|
|
|
@ -1,21 +1,15 @@
|
|||
package kubernetes
|
||||
|
||||
import (
|
||||
"flag"
|
||||
"fmt"
|
||||
"net"
|
||||
"net/http"
|
||||
"net/url"
|
||||
"os"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape/discoveryutils"
|
||||
)
|
||||
|
||||
var apiServerTimeout = flag.Duration("promscrape.kubernetes.apiServerTimeout", 30*time.Minute, "How frequently to reload the full state from Kuberntes API server")
|
||||
|
||||
// apiConfig contains config for API server
|
||||
type apiConfig struct {
|
||||
aw *apiWatcher
|
||||
|
@ -36,6 +30,11 @@ func getAPIConfig(sdc *SDConfig, baseDir string, swcFunc ScrapeWorkConstructorFu
|
|||
}
|
||||
|
||||
func newAPIConfig(sdc *SDConfig, baseDir string, swcFunc ScrapeWorkConstructorFunc) (*apiConfig, error) {
|
||||
switch sdc.Role {
|
||||
case "node", "pod", "service", "endpoints", "endpointslices", "ingress":
|
||||
default:
|
||||
return nil, fmt.Errorf("unexpected `role`: %q; must be one of `node`, `pod`, `service`, `endpoints`, `endpointslices` or `ingress`", sdc.Role)
|
||||
}
|
||||
ac, err := promauth.NewConfig(baseDir, sdc.BasicAuth, sdc.BearerToken, sdc.BearerTokenFile, sdc.TLSConfig)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("cannot parse auth config: %w", err)
|
||||
|
@ -75,20 +74,7 @@ func newAPIConfig(sdc *SDConfig, baseDir string, swcFunc ScrapeWorkConstructorFu
|
|||
for strings.HasSuffix(apiServer, "/") {
|
||||
apiServer = apiServer[:len(apiServer)-1]
|
||||
}
|
||||
var proxy func(*http.Request) (*url.URL, error)
|
||||
if proxyURL := sdc.ProxyURL.URL(); proxyURL != nil {
|
||||
proxy = http.ProxyURL(proxyURL)
|
||||
}
|
||||
client := &http.Client{
|
||||
Transport: &http.Transport{
|
||||
TLSClientConfig: ac.NewTLSConfig(),
|
||||
Proxy: proxy,
|
||||
TLSHandshakeTimeout: 10 * time.Second,
|
||||
IdleConnTimeout: *apiServerTimeout,
|
||||
},
|
||||
Timeout: *apiServerTimeout,
|
||||
}
|
||||
aw := newAPIWatcher(client, apiServer, ac.Authorization, sdc.Namespaces.Names, sdc.Selectors, swcFunc)
|
||||
aw := newAPIWatcher(apiServer, ac, sdc, swcFunc)
|
||||
cfg := &apiConfig{
|
||||
aw: aw,
|
||||
}
|
||||
|
|
|
@ -1,9 +1,9 @@
|
|||
package kubernetes
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"errors"
|
||||
"flag"
|
||||
"fmt"
|
||||
"io"
|
||||
"io/ioutil"
|
||||
|
@ -16,9 +16,12 @@ import (
|
|||
"time"
|
||||
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
|
||||
"github.com/VictoriaMetrics/metrics"
|
||||
)
|
||||
|
||||
var apiServerTimeout = flag.Duration("promscrape.kubernetes.apiServerTimeout", 30*time.Minute, "How frequently to reload the full state from Kuberntes API server")
|
||||
|
||||
// WatchEvent is a watch event returned from API server endpoints if `watch=1` query arg is set.
|
||||
//
|
||||
// See https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes
|
||||
|
@ -30,282 +33,75 @@ type WatchEvent struct {
|
|||
// object is any Kubernetes object.
|
||||
type object interface {
|
||||
key() string
|
||||
getTargetLabels(aw *apiWatcher) []map[string]string
|
||||
getTargetLabels(gw *groupWatcher) []map[string]string
|
||||
}
|
||||
|
||||
// parseObjectFunc must parse object from the given data.
|
||||
type parseObjectFunc func(data []byte) (object, error)
|
||||
|
||||
// parseObjectListFunc must parse objectList from the given data.
|
||||
type parseObjectListFunc func(data []byte) (map[string]object, ListMeta, error)
|
||||
// parseObjectListFunc must parse objectList from the given r.
|
||||
type parseObjectListFunc func(r io.Reader) (map[string]object, ListMeta, error)
|
||||
|
||||
// apiWatcher is used for watching for Kuberntes object changes and caching their latest states.
|
||||
type apiWatcher struct {
|
||||
// The client used for watching for object changes
|
||||
client *http.Client
|
||||
role string
|
||||
|
||||
// Kubenetes API server address in the form http://api-server
|
||||
apiServer string
|
||||
|
||||
// The contents for `Authorization` HTTP request header
|
||||
authorization string
|
||||
|
||||
// Namespaces to watch
|
||||
namespaces []string
|
||||
|
||||
// Selectors to apply during watch
|
||||
selectors []Selector
|
||||
|
||||
// Constructor for creating ScrapeWork objects from labels.
|
||||
// Constructor for creating ScrapeWork objects from labels
|
||||
swcFunc ScrapeWorkConstructorFunc
|
||||
|
||||
// mu protects watchersByURL
|
||||
mu sync.Mutex
|
||||
gw *groupWatcher
|
||||
|
||||
// a map of watchers keyed by request urls
|
||||
watchersByURL map[string]*urlWatcher
|
||||
// swos contains a map of ScrapeWork objects for the given apiWatcher
|
||||
swosByKey map[string][]interface{}
|
||||
swosByKeyLock sync.Mutex
|
||||
|
||||
stopFunc func()
|
||||
stopCtx context.Context
|
||||
wg sync.WaitGroup
|
||||
swosCount *metrics.Counter
|
||||
}
|
||||
|
||||
func newAPIWatcher(apiServer string, ac *promauth.Config, sdc *SDConfig, swcFunc ScrapeWorkConstructorFunc) *apiWatcher {
|
||||
namespaces := sdc.Namespaces.Names
|
||||
selectors := sdc.Selectors
|
||||
proxyURL := sdc.ProxyURL.URL()
|
||||
gw := getGroupWatcher(apiServer, ac, namespaces, selectors, proxyURL)
|
||||
return &apiWatcher{
|
||||
role: sdc.Role,
|
||||
swcFunc: swcFunc,
|
||||
gw: gw,
|
||||
swosByKey: make(map[string][]interface{}),
|
||||
swosCount: metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_scrape_works{role=%q}`, sdc.Role)),
|
||||
}
|
||||
}
|
||||
|
||||
func (aw *apiWatcher) mustStop() {
|
||||
aw.stopFunc()
|
||||
aw.wg.Wait()
|
||||
aw.gw.unsubscribeAPIWatcher(aw)
|
||||
aw.reloadScrapeWorks(make(map[string][]interface{}))
|
||||
}
|
||||
|
||||
func newAPIWatcher(client *http.Client, apiServer, authorization string, namespaces []string, selectors []Selector, swcFunc ScrapeWorkConstructorFunc) *apiWatcher {
|
||||
stopCtx, stopFunc := context.WithCancel(context.Background())
|
||||
return &apiWatcher{
|
||||
apiServer: apiServer,
|
||||
authorization: authorization,
|
||||
client: client,
|
||||
namespaces: namespaces,
|
||||
selectors: selectors,
|
||||
swcFunc: swcFunc,
|
||||
|
||||
watchersByURL: make(map[string]*urlWatcher),
|
||||
|
||||
stopFunc: stopFunc,
|
||||
stopCtx: stopCtx,
|
||||
}
|
||||
func (aw *apiWatcher) reloadScrapeWorks(swosByKey map[string][]interface{}) {
|
||||
aw.swosByKeyLock.Lock()
|
||||
aw.swosCount.Add(len(swosByKey) - len(aw.swosByKey))
|
||||
aw.swosByKey = swosByKey
|
||||
aw.swosByKeyLock.Unlock()
|
||||
}
|
||||
|
||||
// getScrapeWorkObjectsForRole returns all the ScrapeWork objects for the given role.
|
||||
func (aw *apiWatcher) getScrapeWorkObjectsForRole(role string) []interface{} {
|
||||
aw.startWatchersForRole(role)
|
||||
var swos []interface{}
|
||||
aw.mu.Lock()
|
||||
for _, uw := range aw.watchersByURL {
|
||||
if uw.role != role {
|
||||
continue
|
||||
}
|
||||
uw.mu.Lock()
|
||||
for _, swosLocal := range uw.swosByKey {
|
||||
swos = append(swos, swosLocal...)
|
||||
}
|
||||
uw.mu.Unlock()
|
||||
}
|
||||
aw.mu.Unlock()
|
||||
return swos
|
||||
}
|
||||
|
||||
// getObjectByRole returns an object with the given (namespace, name) key and the given role.
|
||||
func (aw *apiWatcher) getObjectByRole(role, namespace, name string) object {
|
||||
if aw == nil {
|
||||
return nil
|
||||
}
|
||||
key := namespace + "/" + name
|
||||
aw.startWatchersForRole(role)
|
||||
var o object
|
||||
aw.mu.Lock()
|
||||
for _, uw := range aw.watchersByURL {
|
||||
if uw.role != role {
|
||||
continue
|
||||
}
|
||||
o = uw.objectsByKey.get(key)
|
||||
if o != nil {
|
||||
break
|
||||
}
|
||||
}
|
||||
aw.mu.Unlock()
|
||||
return o
|
||||
}
|
||||
|
||||
func (aw *apiWatcher) startWatchersForRole(role string) {
|
||||
parseObject, parseObjectList := getObjectParsersForRole(role)
|
||||
paths := getAPIPaths(role, aw.namespaces, aw.selectors)
|
||||
for _, path := range paths {
|
||||
apiURL := aw.apiServer + path
|
||||
aw.startWatcherForURL(role, apiURL, parseObject, parseObjectList)
|
||||
}
|
||||
}
|
||||
|
||||
func (aw *apiWatcher) startWatcherForURL(role, apiURL string, parseObject parseObjectFunc, parseObjectList parseObjectListFunc) {
|
||||
aw.mu.Lock()
|
||||
if aw.watchersByURL[apiURL] != nil {
|
||||
// Watcher for the given path already exists.
|
||||
aw.mu.Unlock()
|
||||
return
|
||||
}
|
||||
uw := aw.newURLWatcher(role, apiURL, parseObject, parseObjectList)
|
||||
aw.watchersByURL[apiURL] = uw
|
||||
aw.mu.Unlock()
|
||||
|
||||
uw.watchersCount.Inc()
|
||||
uw.watchersCreated.Inc()
|
||||
uw.reloadObjects()
|
||||
aw.wg.Add(1)
|
||||
go func() {
|
||||
defer aw.wg.Done()
|
||||
logger.Infof("started watcher for %q", apiURL)
|
||||
uw.watchForUpdates()
|
||||
logger.Infof("stopped watcher for %q", apiURL)
|
||||
uw.objectsByKey.decRef()
|
||||
|
||||
aw.mu.Lock()
|
||||
delete(aw.watchersByURL, apiURL)
|
||||
aw.mu.Unlock()
|
||||
uw.watchersCount.Dec()
|
||||
uw.watchersStopped.Inc()
|
||||
}()
|
||||
}
|
||||
|
||||
// needStop returns true if aw must be stopped.
|
||||
func (aw *apiWatcher) needStop() bool {
|
||||
select {
|
||||
case <-aw.stopCtx.Done():
|
||||
return true
|
||||
default:
|
||||
return false
|
||||
}
|
||||
}
|
||||
|
||||
// doRequest performs http request to the given requestURL.
|
||||
func (aw *apiWatcher) doRequest(requestURL string) (*http.Response, error) {
|
||||
req, err := http.NewRequestWithContext(aw.stopCtx, "GET", requestURL, nil)
|
||||
if err != nil {
|
||||
logger.Fatalf("cannot create a request for %q: %s", requestURL, err)
|
||||
}
|
||||
if aw.authorization != "" {
|
||||
req.Header.Set("Authorization", aw.authorization)
|
||||
}
|
||||
return aw.client.Do(req)
|
||||
}
|
||||
|
||||
// urlWatcher watches for an apiURL and updates object states in objectsByKey.
|
||||
type urlWatcher struct {
|
||||
role string
|
||||
apiURL string
|
||||
|
||||
parseObject parseObjectFunc
|
||||
parseObjectList parseObjectListFunc
|
||||
|
||||
// objectsByKey contains the latest state for objects obtained from apiURL
|
||||
objectsByKey *objectsMap
|
||||
|
||||
// mu protects swosByKey and resourceVersion
|
||||
mu sync.Mutex
|
||||
swosByKey map[string][]interface{}
|
||||
resourceVersion string
|
||||
|
||||
// the parent apiWatcher
|
||||
aw *apiWatcher
|
||||
|
||||
watchersCount *metrics.Counter
|
||||
watchersCreated *metrics.Counter
|
||||
watchersStopped *metrics.Counter
|
||||
}
|
||||
|
||||
func (aw *apiWatcher) newURLWatcher(role, apiURL string, parseObject parseObjectFunc, parseObjectList parseObjectListFunc) *urlWatcher {
|
||||
return &urlWatcher{
|
||||
role: role,
|
||||
apiURL: apiURL,
|
||||
|
||||
parseObject: parseObject,
|
||||
parseObjectList: parseObjectList,
|
||||
|
||||
objectsByKey: sharedObjectsGlobal.getByAPIURL(role, apiURL),
|
||||
swosByKey: make(map[string][]interface{}),
|
||||
|
||||
aw: aw,
|
||||
|
||||
watchersCount: metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_url_watchers{role=%q}`, role)),
|
||||
watchersCreated: metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_url_watchers_created_total{role=%q}`, role)),
|
||||
watchersStopped: metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_url_watchers_stopped_total{role=%q}`, role)),
|
||||
}
|
||||
}
|
||||
|
||||
// Limit the concurrency for per-role objects reloading to 1.
|
||||
//
|
||||
// This should reduce memory usage when big number of watchers simultaneously receive an update for objects of the same role.
|
||||
var reloadObjectsLocksByRole = map[string]*sync.Mutex{
|
||||
"node": {},
|
||||
"pod": {},
|
||||
"service": {},
|
||||
"endpoints": {},
|
||||
"endpointslices": {},
|
||||
"ingress": {},
|
||||
}
|
||||
|
||||
func (uw *urlWatcher) resetResourceVersion() {
|
||||
uw.mu.Lock()
|
||||
uw.resourceVersion = ""
|
||||
uw.mu.Unlock()
|
||||
}
|
||||
|
||||
// reloadObjects reloads objects to the latest state and returns resourceVersion for the latest state.
|
||||
func (uw *urlWatcher) reloadObjects() string {
|
||||
lock := reloadObjectsLocksByRole[uw.role]
|
||||
lock.Lock()
|
||||
defer lock.Unlock()
|
||||
|
||||
uw.mu.Lock()
|
||||
resourceVersion := uw.resourceVersion
|
||||
uw.mu.Unlock()
|
||||
if resourceVersion != "" {
|
||||
// Fast path - objects have been already reloaded by concurrent goroutines.
|
||||
return resourceVersion
|
||||
}
|
||||
|
||||
aw := uw.aw
|
||||
requestURL := uw.apiURL
|
||||
resp, err := aw.doRequest(requestURL)
|
||||
if err != nil {
|
||||
if !aw.needStop() {
|
||||
logger.Errorf("error when performing a request to %q: %s", requestURL, err)
|
||||
}
|
||||
return ""
|
||||
}
|
||||
body, _ := ioutil.ReadAll(resp.Body)
|
||||
_ = resp.Body.Close()
|
||||
if resp.StatusCode != http.StatusOK {
|
||||
logger.Errorf("unexpected status code for request to %q: %d; want %d; response: %q", requestURL, resp.StatusCode, http.StatusOK, body)
|
||||
return ""
|
||||
}
|
||||
objectsByKey, metadata, err := uw.parseObjectList(body)
|
||||
if err != nil {
|
||||
if !aw.needStop() {
|
||||
logger.Errorf("cannot parse response from %q: %s", requestURL, err)
|
||||
}
|
||||
return ""
|
||||
}
|
||||
uw.objectsByKey.reload(objectsByKey)
|
||||
swosByKey := make(map[string][]interface{})
|
||||
for k, o := range objectsByKey {
|
||||
labels := o.getTargetLabels(aw)
|
||||
func (aw *apiWatcher) setScrapeWorks(key string, labels []map[string]string) {
|
||||
swos := getScrapeWorkObjectsForLabels(aw.swcFunc, labels)
|
||||
aw.swosByKeyLock.Lock()
|
||||
if len(swos) > 0 {
|
||||
swosByKey[k] = swos
|
||||
aw.swosCount.Add(len(swos) - len(aw.swosByKey[key]))
|
||||
aw.swosByKey[key] = swos
|
||||
} else {
|
||||
aw.swosCount.Add(-len(aw.swosByKey[key]))
|
||||
delete(aw.swosByKey, key)
|
||||
}
|
||||
aw.swosByKeyLock.Unlock()
|
||||
}
|
||||
uw.mu.Lock()
|
||||
uw.swosByKey = swosByKey
|
||||
uw.resourceVersion = metadata.ResourceVersion
|
||||
uw.mu.Unlock()
|
||||
|
||||
return metadata.ResourceVersion
|
||||
func (aw *apiWatcher) removeScrapeWorks(key string) {
|
||||
aw.swosByKeyLock.Lock()
|
||||
aw.swosCount.Add(-len(aw.swosByKey[key]))
|
||||
delete(aw.swosByKey, key)
|
||||
aw.swosByKeyLock.Unlock()
|
||||
}
|
||||
|
||||
func getScrapeWorkObjectsForLabels(swcFunc ScrapeWorkConstructorFunc, labelss []map[string]string) []interface{} {
|
||||
|
@ -320,11 +116,362 @@ func getScrapeWorkObjectsForLabels(swcFunc ScrapeWorkConstructorFunc, labelss []
|
|||
return swos
|
||||
}
|
||||
|
||||
// getScrapeWorkObjects returns all the ScrapeWork objects for the given aw.
|
||||
func (aw *apiWatcher) getScrapeWorkObjects() []interface{} {
|
||||
aw.gw.startWatchersForRole(aw.role, aw)
|
||||
aw.swosByKeyLock.Lock()
|
||||
defer aw.swosByKeyLock.Unlock()
|
||||
|
||||
size := 0
|
||||
for _, swosLocal := range aw.swosByKey {
|
||||
size += len(swosLocal)
|
||||
}
|
||||
swos := make([]interface{}, 0, size)
|
||||
for _, swosLocal := range aw.swosByKey {
|
||||
swos = append(swos, swosLocal...)
|
||||
}
|
||||
return swos
|
||||
}
|
||||
|
||||
// groupWatcher watches for Kubernetes objects on the given apiServer with the given namespaces,
|
||||
// selectors and authorization using the given client.
|
||||
type groupWatcher struct {
|
||||
apiServer string
|
||||
namespaces []string
|
||||
selectors []Selector
|
||||
authorization string
|
||||
client *http.Client
|
||||
|
||||
mu sync.Mutex
|
||||
m map[string]*urlWatcher
|
||||
}
|
||||
|
||||
func newGroupWatcher(apiServer string, ac *promauth.Config, namespaces []string, selectors []Selector, proxyURL *url.URL) *groupWatcher {
|
||||
var proxy func(*http.Request) (*url.URL, error)
|
||||
if proxyURL != nil {
|
||||
proxy = http.ProxyURL(proxyURL)
|
||||
}
|
||||
client := &http.Client{
|
||||
Transport: &http.Transport{
|
||||
TLSClientConfig: ac.NewTLSConfig(),
|
||||
Proxy: proxy,
|
||||
TLSHandshakeTimeout: 10 * time.Second,
|
||||
IdleConnTimeout: *apiServerTimeout,
|
||||
},
|
||||
Timeout: *apiServerTimeout,
|
||||
}
|
||||
return &groupWatcher{
|
||||
apiServer: apiServer,
|
||||
authorization: ac.Authorization,
|
||||
namespaces: namespaces,
|
||||
selectors: selectors,
|
||||
client: client,
|
||||
m: make(map[string]*urlWatcher),
|
||||
}
|
||||
}
|
||||
|
||||
func getGroupWatcher(apiServer string, ac *promauth.Config, namespaces []string, selectors []Selector, proxyURL *url.URL) *groupWatcher {
|
||||
key := fmt.Sprintf("apiServer=%s, namespaces=%s, selectors=%s, proxyURL=%v, authConfig=%s",
|
||||
apiServer, namespaces, selectorsKey(selectors), proxyURL, ac.String())
|
||||
groupWatchersLock.Lock()
|
||||
gw := groupWatchers[key]
|
||||
if gw == nil {
|
||||
gw = newGroupWatcher(apiServer, ac, namespaces, selectors, proxyURL)
|
||||
groupWatchers[key] = gw
|
||||
}
|
||||
groupWatchersLock.Unlock()
|
||||
return gw
|
||||
}
|
||||
|
||||
func selectorsKey(selectors []Selector) string {
|
||||
var sb strings.Builder
|
||||
for _, s := range selectors {
|
||||
fmt.Fprintf(&sb, "{role=%q, label=%q, field=%q}", s.Role, s.Label, s.Field)
|
||||
}
|
||||
return sb.String()
|
||||
}
|
||||
|
||||
var (
|
||||
groupWatchersLock sync.Mutex
|
||||
groupWatchers = make(map[string]*groupWatcher)
|
||||
|
||||
_ = metrics.NewGauge(`vm_promscrape_discovery_kubernetes_group_watchers`, func() float64 {
|
||||
groupWatchersLock.Lock()
|
||||
n := len(groupWatchers)
|
||||
groupWatchersLock.Unlock()
|
||||
return float64(n)
|
||||
})
|
||||
)
|
||||
|
||||
// getObjectByRole returns an object with the given (namespace, name) key and the given role.
|
||||
func (gw *groupWatcher) getObjectByRole(role, namespace, name string) object {
|
||||
if gw == nil {
|
||||
// this is needed for testing
|
||||
return nil
|
||||
}
|
||||
key := namespace + "/" + name
|
||||
gw.startWatchersForRole(role, nil)
|
||||
gw.mu.Lock()
|
||||
defer gw.mu.Unlock()
|
||||
|
||||
for _, uw := range gw.m {
|
||||
if uw.role != role {
|
||||
continue
|
||||
}
|
||||
uw.mu.Lock()
|
||||
o := uw.objectsByKey[key]
|
||||
uw.mu.Unlock()
|
||||
if o != nil {
|
||||
return o
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func (gw *groupWatcher) startWatchersForRole(role string, aw *apiWatcher) {
|
||||
paths := getAPIPaths(role, gw.namespaces, gw.selectors)
|
||||
for _, path := range paths {
|
||||
apiURL := gw.apiServer + path
|
||||
gw.mu.Lock()
|
||||
uw := gw.m[apiURL]
|
||||
if uw == nil {
|
||||
uw = newURLWatcher(role, apiURL, gw)
|
||||
gw.m[apiURL] = uw
|
||||
}
|
||||
gw.mu.Unlock()
|
||||
uw.subscribeAPIWatcher(aw)
|
||||
}
|
||||
}
|
||||
|
||||
func (gw *groupWatcher) reloadScrapeWorksForAPIWatchers(aws []*apiWatcher, objectsByKey map[string]object) {
|
||||
if len(aws) == 0 {
|
||||
return
|
||||
}
|
||||
swosByKey := make([]map[string][]interface{}, len(aws))
|
||||
for i := range aws {
|
||||
swosByKey[i] = make(map[string][]interface{})
|
||||
}
|
||||
for key, o := range objectsByKey {
|
||||
labels := o.getTargetLabels(gw)
|
||||
for i, aw := range aws {
|
||||
swos := getScrapeWorkObjectsForLabels(aw.swcFunc, labels)
|
||||
if len(swos) > 0 {
|
||||
swosByKey[i][key] = swos
|
||||
}
|
||||
}
|
||||
}
|
||||
for i, aw := range aws {
|
||||
aw.reloadScrapeWorks(swosByKey[i])
|
||||
}
|
||||
}
|
||||
|
||||
// doRequest performs http request to the given requestURL.
|
||||
func (gw *groupWatcher) doRequest(requestURL string) (*http.Response, error) {
|
||||
req, err := http.NewRequest("GET", requestURL, nil)
|
||||
if err != nil {
|
||||
logger.Fatalf("cannot create a request for %q: %s", requestURL, err)
|
||||
}
|
||||
if gw.authorization != "" {
|
||||
req.Header.Set("Authorization", gw.authorization)
|
||||
}
|
||||
return gw.client.Do(req)
|
||||
}
|
||||
|
||||
func (gw *groupWatcher) unsubscribeAPIWatcher(aw *apiWatcher) {
|
||||
gw.mu.Lock()
|
||||
for _, uw := range gw.m {
|
||||
uw.unsubscribeAPIWatcher(aw)
|
||||
}
|
||||
gw.mu.Unlock()
|
||||
}
|
||||
|
||||
// urlWatcher watches for an apiURL and updates object states in objectsByKey.
|
||||
type urlWatcher struct {
|
||||
role string
|
||||
apiURL string
|
||||
gw *groupWatcher
|
||||
|
||||
parseObject parseObjectFunc
|
||||
parseObjectList parseObjectListFunc
|
||||
|
||||
// mu protects aws, awsPending, objectsByKey and resourceVersion
|
||||
mu sync.Mutex
|
||||
|
||||
// aws contains registered apiWatcher objects
|
||||
aws map[*apiWatcher]struct{}
|
||||
|
||||
// awsPending contains pending apiWatcher objects, which must be moved to aws in a batch
|
||||
awsPending map[*apiWatcher]struct{}
|
||||
|
||||
// objectsByKey contains the latest state for objects obtained from apiURL
|
||||
objectsByKey map[string]object
|
||||
|
||||
resourceVersion string
|
||||
|
||||
objectsCount *metrics.Counter
|
||||
objectsAdded *metrics.Counter
|
||||
objectsRemoved *metrics.Counter
|
||||
objectsUpdated *metrics.Counter
|
||||
}
|
||||
|
||||
func newURLWatcher(role, apiURL string, gw *groupWatcher) *urlWatcher {
|
||||
parseObject, parseObjectList := getObjectParsersForRole(role)
|
||||
metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_url_watchers{role=%q}`, role)).Inc()
|
||||
uw := &urlWatcher{
|
||||
role: role,
|
||||
apiURL: apiURL,
|
||||
gw: gw,
|
||||
|
||||
parseObject: parseObject,
|
||||
parseObjectList: parseObjectList,
|
||||
|
||||
aws: make(map[*apiWatcher]struct{}),
|
||||
awsPending: make(map[*apiWatcher]struct{}),
|
||||
objectsByKey: make(map[string]object),
|
||||
|
||||
objectsCount: metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_objects{role=%q}`, role)),
|
||||
objectsAdded: metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_objects_added_total{role=%q}`, role)),
|
||||
objectsRemoved: metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_objects_removed_total{role=%q}`, role)),
|
||||
objectsUpdated: metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_objects_updated_total{role=%q}`, role)),
|
||||
}
|
||||
logger.Infof("started %s watcher for %q", uw.role, uw.apiURL)
|
||||
go uw.watchForUpdates()
|
||||
go uw.processPendingSubscribers()
|
||||
return uw
|
||||
}
|
||||
|
||||
func (uw *urlWatcher) subscribeAPIWatcher(aw *apiWatcher) {
|
||||
if aw == nil {
|
||||
return
|
||||
}
|
||||
uw.mu.Lock()
|
||||
if _, ok := uw.aws[aw]; !ok {
|
||||
if _, ok := uw.awsPending[aw]; !ok {
|
||||
uw.awsPending[aw] = struct{}{}
|
||||
metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_subscibers{role=%q,type="pending"}`, uw.role)).Inc()
|
||||
}
|
||||
}
|
||||
uw.mu.Unlock()
|
||||
}
|
||||
|
||||
func (uw *urlWatcher) unsubscribeAPIWatcher(aw *apiWatcher) {
|
||||
uw.mu.Lock()
|
||||
if _, ok := uw.aws[aw]; ok {
|
||||
delete(uw.aws, aw)
|
||||
metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_subscibers{role=%q,type="permanent"}`, uw.role)).Dec()
|
||||
} else if _, ok := uw.awsPending[aw]; ok {
|
||||
delete(uw.awsPending, aw)
|
||||
metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_subscibers{role=%q,type="pending"}`, uw.role)).Dec()
|
||||
}
|
||||
uw.mu.Unlock()
|
||||
}
|
||||
|
||||
func (uw *urlWatcher) processPendingSubscribers() {
|
||||
t := time.NewTicker(time.Second)
|
||||
for range t.C {
|
||||
var awsPending []*apiWatcher
|
||||
var objectsByKey map[string]object
|
||||
|
||||
uw.mu.Lock()
|
||||
if len(uw.awsPending) > 0 {
|
||||
awsPending = getAPIWatchers(uw.awsPending)
|
||||
for _, aw := range awsPending {
|
||||
if _, ok := uw.aws[aw]; ok {
|
||||
logger.Panicf("BUG: aw=%p already exists in uw.aws", aw)
|
||||
}
|
||||
uw.aws[aw] = struct{}{}
|
||||
delete(uw.awsPending, aw)
|
||||
}
|
||||
objectsByKey = make(map[string]object, len(uw.objectsByKey))
|
||||
for key, o := range uw.objectsByKey {
|
||||
objectsByKey[key] = o
|
||||
}
|
||||
}
|
||||
metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_subscibers{role=%q,type="pending"}`, uw.role)).Add(-len(awsPending))
|
||||
metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_subscibers{role=%q,type="permanent"}`, uw.role)).Add(len(awsPending))
|
||||
uw.mu.Unlock()
|
||||
|
||||
uw.gw.reloadScrapeWorksForAPIWatchers(awsPending, objectsByKey)
|
||||
}
|
||||
}
|
||||
|
||||
func (uw *urlWatcher) setResourceVersion(resourceVersion string) {
|
||||
uw.mu.Lock()
|
||||
uw.resourceVersion = resourceVersion
|
||||
uw.mu.Unlock()
|
||||
}
|
||||
|
||||
// reloadObjects reloads objects to the latest state and returns resourceVersion for the latest state.
|
||||
func (uw *urlWatcher) reloadObjects() string {
|
||||
uw.mu.Lock()
|
||||
resourceVersion := uw.resourceVersion
|
||||
uw.mu.Unlock()
|
||||
if resourceVersion != "" {
|
||||
// Fast path - there is no need in reloading the objects.
|
||||
return resourceVersion
|
||||
}
|
||||
|
||||
requestURL := uw.apiURL
|
||||
resp, err := uw.gw.doRequest(requestURL)
|
||||
if err != nil {
|
||||
logger.Errorf("cannot perform request to %q: %s", requestURL, err)
|
||||
return ""
|
||||
}
|
||||
if resp.StatusCode != http.StatusOK {
|
||||
body, _ := ioutil.ReadAll(resp.Body)
|
||||
_ = resp.Body.Close()
|
||||
logger.Errorf("unexpected status code for request to %q: %d; want %d; response: %q", requestURL, resp.StatusCode, http.StatusOK, body)
|
||||
return ""
|
||||
}
|
||||
objectsByKey, metadata, err := uw.parseObjectList(resp.Body)
|
||||
_ = resp.Body.Close()
|
||||
if err != nil {
|
||||
logger.Errorf("cannot parse objects from %q: %s", requestURL, err)
|
||||
return ""
|
||||
}
|
||||
|
||||
uw.mu.Lock()
|
||||
var updated, removed, added int
|
||||
for key := range uw.objectsByKey {
|
||||
if o, ok := objectsByKey[key]; ok {
|
||||
uw.objectsByKey[key] = o
|
||||
updated++
|
||||
} else {
|
||||
delete(uw.objectsByKey, key)
|
||||
removed++
|
||||
}
|
||||
}
|
||||
for key, o := range objectsByKey {
|
||||
if _, ok := uw.objectsByKey[key]; !ok {
|
||||
uw.objectsByKey[key] = o
|
||||
added++
|
||||
}
|
||||
}
|
||||
uw.objectsUpdated.Add(updated)
|
||||
uw.objectsRemoved.Add(removed)
|
||||
uw.objectsAdded.Add(added)
|
||||
uw.objectsCount.Add(added - removed)
|
||||
uw.resourceVersion = metadata.ResourceVersion
|
||||
aws := getAPIWatchers(uw.aws)
|
||||
uw.mu.Unlock()
|
||||
|
||||
uw.gw.reloadScrapeWorksForAPIWatchers(aws, objectsByKey)
|
||||
logger.Infof("reloaded %d objects from %q", len(objectsByKey), requestURL)
|
||||
return metadata.ResourceVersion
|
||||
}
|
||||
|
||||
func getAPIWatchers(awsMap map[*apiWatcher]struct{}) []*apiWatcher {
|
||||
aws := make([]*apiWatcher, 0, len(awsMap))
|
||||
for aw := range awsMap {
|
||||
aws = append(aws, aw)
|
||||
}
|
||||
return aws
|
||||
}
|
||||
|
||||
// watchForUpdates watches for object updates starting from uw.resourceVersion and updates the corresponding objects to the latest state.
|
||||
//
|
||||
// See https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes
|
||||
func (uw *urlWatcher) watchForUpdates() {
|
||||
aw := uw.aw
|
||||
backoffDelay := time.Second
|
||||
maxBackoffDelay := 30 * time.Second
|
||||
backoffSleep := func() {
|
||||
|
@ -339,25 +486,19 @@ func (uw *urlWatcher) watchForUpdates() {
|
|||
if strings.Contains(apiURL, "?") {
|
||||
delimiter = "&"
|
||||
}
|
||||
timeoutSeconds := time.Duration(0.9 * float64(aw.client.Timeout)).Seconds()
|
||||
apiURL += delimiter + "watch=1&timeoutSeconds=" + strconv.Itoa(int(timeoutSeconds))
|
||||
timeoutSeconds := time.Duration(0.9 * float64(uw.gw.client.Timeout)).Seconds()
|
||||
apiURL += delimiter + "watch=1&allowWatchBookmarks=true&timeoutSeconds=" + strconv.Itoa(int(timeoutSeconds))
|
||||
for {
|
||||
if aw.needStop() {
|
||||
return
|
||||
}
|
||||
resourceVersion := uw.reloadObjects()
|
||||
requestURL := apiURL
|
||||
if resourceVersion != "" {
|
||||
requestURL += "&resourceVersion=" + url.QueryEscape(resourceVersion)
|
||||
}
|
||||
resp, err := aw.doRequest(requestURL)
|
||||
if err != nil {
|
||||
if aw.needStop() {
|
||||
return
|
||||
}
|
||||
logger.Errorf("error when performing a request to %q: %s", requestURL, err)
|
||||
if resourceVersion == "" {
|
||||
backoffSleep()
|
||||
continue
|
||||
}
|
||||
requestURL := apiURL + "&resourceVersion=" + url.QueryEscape(resourceVersion)
|
||||
resp, err := uw.gw.doRequest(requestURL)
|
||||
if err != nil {
|
||||
logger.Errorf("cannot perform request to %q: %s", requestURL, err)
|
||||
backoffSleep()
|
||||
uw.resetResourceVersion()
|
||||
continue
|
||||
}
|
||||
if resp.StatusCode != http.StatusOK {
|
||||
|
@ -367,24 +508,20 @@ func (uw *urlWatcher) watchForUpdates() {
|
|||
if resp.StatusCode == 410 {
|
||||
// There is no need for sleep on 410 error. See https://kubernetes.io/docs/reference/using-api/api-concepts/#410-gone-responses
|
||||
backoffDelay = time.Second
|
||||
uw.setResourceVersion("")
|
||||
} else {
|
||||
backoffSleep()
|
||||
}
|
||||
uw.resetResourceVersion()
|
||||
continue
|
||||
}
|
||||
backoffDelay = time.Second
|
||||
err = uw.readObjectUpdateStream(resp.Body)
|
||||
_ = resp.Body.Close()
|
||||
if err != nil {
|
||||
if aw.needStop() {
|
||||
return
|
||||
}
|
||||
if !errors.Is(err, io.EOF) {
|
||||
logger.Errorf("error when reading WatchEvent stream from %q: %s", requestURL, err)
|
||||
}
|
||||
backoffSleep()
|
||||
uw.resetResourceVersion()
|
||||
continue
|
||||
}
|
||||
}
|
||||
|
@ -392,41 +529,79 @@ func (uw *urlWatcher) watchForUpdates() {
|
|||
|
||||
// readObjectUpdateStream reads Kuberntes watch events from r and updates locally cached objects according to the received events.
|
||||
func (uw *urlWatcher) readObjectUpdateStream(r io.Reader) error {
|
||||
aw := uw.aw
|
||||
d := json.NewDecoder(r)
|
||||
var we WatchEvent
|
||||
for {
|
||||
if err := d.Decode(&we); err != nil {
|
||||
return err
|
||||
}
|
||||
switch we.Type {
|
||||
case "ADDED", "MODIFIED":
|
||||
o, err := uw.parseObject(we.Object)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
key := o.key()
|
||||
switch we.Type {
|
||||
case "ADDED", "MODIFIED":
|
||||
uw.objectsByKey.update(key, o)
|
||||
labels := o.getTargetLabels(aw)
|
||||
swos := getScrapeWorkObjectsForLabels(aw.swcFunc, labels)
|
||||
uw.mu.Lock()
|
||||
if len(swos) > 0 {
|
||||
uw.swosByKey[key] = swos
|
||||
if _, ok := uw.objectsByKey[key]; !ok {
|
||||
uw.objectsCount.Inc()
|
||||
uw.objectsAdded.Inc()
|
||||
} else {
|
||||
delete(uw.swosByKey, key)
|
||||
uw.objectsUpdated.Inc()
|
||||
}
|
||||
uw.objectsByKey[key] = o
|
||||
aws := getAPIWatchers(uw.aws)
|
||||
uw.mu.Unlock()
|
||||
labels := o.getTargetLabels(uw.gw)
|
||||
for _, aw := range aws {
|
||||
aw.setScrapeWorks(key, labels)
|
||||
}
|
||||
case "DELETED":
|
||||
uw.objectsByKey.remove(key)
|
||||
o, err := uw.parseObject(we.Object)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
key := o.key()
|
||||
uw.mu.Lock()
|
||||
delete(uw.swosByKey, key)
|
||||
if _, ok := uw.objectsByKey[key]; ok {
|
||||
uw.objectsCount.Dec()
|
||||
uw.objectsRemoved.Inc()
|
||||
delete(uw.objectsByKey, key)
|
||||
}
|
||||
aws := getAPIWatchers(uw.aws)
|
||||
uw.mu.Unlock()
|
||||
for _, aw := range aws {
|
||||
aw.removeScrapeWorks(key)
|
||||
}
|
||||
case "BOOKMARK":
|
||||
// See https://kubernetes.io/docs/reference/using-api/api-concepts/#watch-bookmarks
|
||||
bm, err := parseBookmark(we.Object)
|
||||
if err != nil {
|
||||
return fmt.Errorf("cannot parse bookmark from %q: %w", we.Object, err)
|
||||
}
|
||||
uw.setResourceVersion(bm.Metadata.ResourceVersion)
|
||||
default:
|
||||
return fmt.Errorf("unexpected WatchEvent type %q for role %q", we.Type, uw.role)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Bookmark is a bookmark from Kubernetes Watch API.
|
||||
// See https://kubernetes.io/docs/reference/using-api/api-concepts/#watch-bookmarks
|
||||
type Bookmark struct {
|
||||
Metadata struct {
|
||||
ResourceVersion string
|
||||
}
|
||||
}
|
||||
|
||||
func parseBookmark(data []byte) (*Bookmark, error) {
|
||||
var bm Bookmark
|
||||
if err := json.Unmarshal(data, &bm); err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return &bm, nil
|
||||
}
|
||||
|
||||
func getAPIPaths(role string, namespaces []string, selectors []Selector) []string {
|
||||
objectName := getObjectNameByRole(role)
|
||||
if objectName == "nodes" || len(namespaces) == 0 {
|
||||
|
@ -521,105 +696,3 @@ func getObjectParsersForRole(role string) (parseObjectFunc, parseObjectListFunc)
|
|||
return nil, nil
|
||||
}
|
||||
}
|
||||
|
||||
type objectsMap struct {
|
||||
mu sync.Mutex
|
||||
refCount int
|
||||
m map[string]object
|
||||
|
||||
objectsAdded *metrics.Counter
|
||||
objectsRemoved *metrics.Counter
|
||||
objectsCount *metrics.Counter
|
||||
}
|
||||
|
||||
func (om *objectsMap) incRef() {
|
||||
om.mu.Lock()
|
||||
om.refCount++
|
||||
om.mu.Unlock()
|
||||
}
|
||||
|
||||
func (om *objectsMap) decRef() {
|
||||
om.mu.Lock()
|
||||
om.refCount--
|
||||
if om.refCount < 0 {
|
||||
logger.Panicf("BUG: refCount cannot be smaller than 0; got %d", om.refCount)
|
||||
}
|
||||
if om.refCount == 0 {
|
||||
// Free up memory occupied by om.m
|
||||
om.objectsRemoved.Add(len(om.m))
|
||||
om.objectsCount.Add(-len(om.m))
|
||||
om.m = make(map[string]object)
|
||||
}
|
||||
om.mu.Unlock()
|
||||
}
|
||||
|
||||
func (om *objectsMap) reload(m map[string]object) {
|
||||
om.mu.Lock()
|
||||
om.objectsAdded.Add(len(m))
|
||||
om.objectsRemoved.Add(len(om.m))
|
||||
om.objectsCount.Add(len(m) - len(om.m))
|
||||
for k := range om.m {
|
||||
delete(om.m, k)
|
||||
}
|
||||
for k, o := range m {
|
||||
om.m[k] = o
|
||||
}
|
||||
om.mu.Unlock()
|
||||
}
|
||||
|
||||
func (om *objectsMap) update(key string, o object) {
|
||||
om.mu.Lock()
|
||||
if om.m[key] == nil {
|
||||
om.objectsAdded.Inc()
|
||||
om.objectsCount.Inc()
|
||||
}
|
||||
om.m[key] = o
|
||||
om.mu.Unlock()
|
||||
}
|
||||
|
||||
func (om *objectsMap) remove(key string) {
|
||||
om.mu.Lock()
|
||||
if om.m[key] != nil {
|
||||
om.objectsRemoved.Inc()
|
||||
om.objectsCount.Dec()
|
||||
delete(om.m, key)
|
||||
}
|
||||
om.mu.Unlock()
|
||||
}
|
||||
|
||||
func (om *objectsMap) get(key string) object {
|
||||
om.mu.Lock()
|
||||
o, ok := om.m[key]
|
||||
om.mu.Unlock()
|
||||
if !ok {
|
||||
return nil
|
||||
}
|
||||
return o
|
||||
}
|
||||
|
||||
type sharedObjects struct {
|
||||
mu sync.Mutex
|
||||
oms map[string]*objectsMap
|
||||
}
|
||||
|
||||
func (so *sharedObjects) getByAPIURL(role, apiURL string) *objectsMap {
|
||||
so.mu.Lock()
|
||||
om := so.oms[apiURL]
|
||||
if om == nil {
|
||||
om = &objectsMap{
|
||||
m: make(map[string]object),
|
||||
|
||||
objectsCount: metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_objects{role=%q}`, role)),
|
||||
objectsAdded: metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_objects_added_total{role=%q}`, role)),
|
||||
objectsRemoved: metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_discovery_kubernetes_objects_removed_total{role=%q}`, role)),
|
||||
}
|
||||
so.oms[apiURL] = om
|
||||
}
|
||||
so.mu.Unlock()
|
||||
om.incRef()
|
||||
return om
|
||||
}
|
||||
|
||||
var sharedObjectsGlobal = &sharedObjects{
|
||||
oms: make(map[string]*objectsMap),
|
||||
}
|
||||
|
|
|
@ -160,3 +160,15 @@ func TestGetAPIPaths(t *testing.T) {
|
|||
"/apis/networking.k8s.io/v1beta1/namespaces/y/ingresses?labelSelector=cde%2Cbaaa&fieldSelector=abc",
|
||||
})
|
||||
}
|
||||
|
||||
func TestParseBookmark(t *testing.T) {
|
||||
data := `{"kind": "Pod", "apiVersion": "v1", "metadata": {"resourceVersion": "12746"} }`
|
||||
bm, err := parseBookmark([]byte(data))
|
||||
if err != nil {
|
||||
t.Fatalf("unexpected error: %s", err)
|
||||
}
|
||||
expectedResourceVersion := "12746"
|
||||
if bm.Metadata.ResourceVersion != expectedResourceVersion {
|
||||
t.Fatalf("unexpected resourceVersion; got %q; want %q", bm.Metadata.ResourceVersion, expectedResourceVersion)
|
||||
}
|
||||
}
|
||||
|
|
|
@ -3,6 +3,7 @@ package kubernetes
|
|||
import (
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"io"
|
||||
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape/discoveryutils"
|
||||
)
|
||||
|
@ -11,10 +12,11 @@ func (eps *Endpoints) key() string {
|
|||
return eps.Metadata.key()
|
||||
}
|
||||
|
||||
func parseEndpointsList(data []byte) (map[string]object, ListMeta, error) {
|
||||
func parseEndpointsList(r io.Reader) (map[string]object, ListMeta, error) {
|
||||
var epsl EndpointsList
|
||||
if err := json.Unmarshal(data, &epsl); err != nil {
|
||||
return nil, epsl.Metadata, fmt.Errorf("cannot unmarshal EndpointsList from %q: %w", data, err)
|
||||
d := json.NewDecoder(r)
|
||||
if err := d.Decode(&epsl); err != nil {
|
||||
return nil, epsl.Metadata, fmt.Errorf("cannot unmarshal EndpointsList: %w", err)
|
||||
}
|
||||
objectsByKey := make(map[string]object)
|
||||
for _, eps := range epsl.Items {
|
||||
|
@ -88,17 +90,17 @@ type EndpointPort struct {
|
|||
// getTargetLabels returns labels for each endpoint in eps.
|
||||
//
|
||||
// See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#endpoints
|
||||
func (eps *Endpoints) getTargetLabels(aw *apiWatcher) []map[string]string {
|
||||
func (eps *Endpoints) getTargetLabels(gw *groupWatcher) []map[string]string {
|
||||
var svc *Service
|
||||
if o := aw.getObjectByRole("service", eps.Metadata.Namespace, eps.Metadata.Name); o != nil {
|
||||
if o := gw.getObjectByRole("service", eps.Metadata.Namespace, eps.Metadata.Name); o != nil {
|
||||
svc = o.(*Service)
|
||||
}
|
||||
podPortsSeen := make(map[*Pod][]int)
|
||||
var ms []map[string]string
|
||||
for _, ess := range eps.Subsets {
|
||||
for _, epp := range ess.Ports {
|
||||
ms = appendEndpointLabelsForAddresses(ms, aw, podPortsSeen, eps, ess.Addresses, epp, svc, "true")
|
||||
ms = appendEndpointLabelsForAddresses(ms, aw, podPortsSeen, eps, ess.NotReadyAddresses, epp, svc, "false")
|
||||
ms = appendEndpointLabelsForAddresses(ms, gw, podPortsSeen, eps, ess.Addresses, epp, svc, "true")
|
||||
ms = appendEndpointLabelsForAddresses(ms, gw, podPortsSeen, eps, ess.NotReadyAddresses, epp, svc, "false")
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -133,11 +135,11 @@ func (eps *Endpoints) getTargetLabels(aw *apiWatcher) []map[string]string {
|
|||
return ms
|
||||
}
|
||||
|
||||
func appendEndpointLabelsForAddresses(ms []map[string]string, aw *apiWatcher, podPortsSeen map[*Pod][]int, eps *Endpoints,
|
||||
func appendEndpointLabelsForAddresses(ms []map[string]string, gw *groupWatcher, podPortsSeen map[*Pod][]int, eps *Endpoints,
|
||||
eas []EndpointAddress, epp EndpointPort, svc *Service, ready string) []map[string]string {
|
||||
for _, ea := range eas {
|
||||
var p *Pod
|
||||
if o := aw.getObjectByRole("pod", ea.TargetRef.Namespace, ea.TargetRef.Name); o != nil {
|
||||
if o := gw.getObjectByRole("pod", ea.TargetRef.Namespace, ea.TargetRef.Name); o != nil {
|
||||
p = o.(*Pod)
|
||||
}
|
||||
m := getEndpointLabelsForAddressAndPort(podPortsSeen, eps, ea, epp, p, svc, ready)
|
||||
|
|
|
@ -1,6 +1,7 @@
|
|||
package kubernetes
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"testing"
|
||||
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
|
||||
|
@ -10,7 +11,8 @@ import (
|
|||
func TestParseEndpointsListFailure(t *testing.T) {
|
||||
f := func(s string) {
|
||||
t.Helper()
|
||||
objectsByKey, _, err := parseEndpointsList([]byte(s))
|
||||
r := bytes.NewBufferString(s)
|
||||
objectsByKey, _, err := parseEndpointsList(r)
|
||||
if err == nil {
|
||||
t.Fatalf("expecting non-nil error")
|
||||
}
|
||||
|
@ -78,7 +80,8 @@ func TestParseEndpointsListSuccess(t *testing.T) {
|
|||
]
|
||||
}
|
||||
`
|
||||
objectsByKey, meta, err := parseEndpointsList([]byte(data))
|
||||
r := bytes.NewBufferString(data)
|
||||
objectsByKey, meta, err := parseEndpointsList(r)
|
||||
if err != nil {
|
||||
t.Fatalf("unexpected error: %s", err)
|
||||
}
|
||||
|
|
|
@ -3,6 +3,7 @@ package kubernetes
|
|||
import (
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"io"
|
||||
"strconv"
|
||||
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape/discoveryutils"
|
||||
|
@ -12,10 +13,11 @@ func (eps *EndpointSlice) key() string {
|
|||
return eps.Metadata.key()
|
||||
}
|
||||
|
||||
func parseEndpointSliceList(data []byte) (map[string]object, ListMeta, error) {
|
||||
func parseEndpointSliceList(r io.Reader) (map[string]object, ListMeta, error) {
|
||||
var epsl EndpointSliceList
|
||||
if err := json.Unmarshal(data, &epsl); err != nil {
|
||||
return nil, epsl.Metadata, fmt.Errorf("cannot unmarshal EndpointSliceList from %q: %w", data, err)
|
||||
d := json.NewDecoder(r)
|
||||
if err := d.Decode(&epsl); err != nil {
|
||||
return nil, epsl.Metadata, fmt.Errorf("cannot unmarshal EndpointSliceList: %w", err)
|
||||
}
|
||||
objectsByKey := make(map[string]object)
|
||||
for _, eps := range epsl.Items {
|
||||
|
@ -35,16 +37,16 @@ func parseEndpointSlice(data []byte) (object, error) {
|
|||
// getTargetLabels returns labels for eps.
|
||||
//
|
||||
// See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#endpointslices
|
||||
func (eps *EndpointSlice) getTargetLabels(aw *apiWatcher) []map[string]string {
|
||||
func (eps *EndpointSlice) getTargetLabels(gw *groupWatcher) []map[string]string {
|
||||
var svc *Service
|
||||
if o := aw.getObjectByRole("service", eps.Metadata.Namespace, eps.Metadata.Name); o != nil {
|
||||
if o := gw.getObjectByRole("service", eps.Metadata.Namespace, eps.Metadata.Name); o != nil {
|
||||
svc = o.(*Service)
|
||||
}
|
||||
podPortsSeen := make(map[*Pod][]int)
|
||||
var ms []map[string]string
|
||||
for _, ess := range eps.Endpoints {
|
||||
var p *Pod
|
||||
if o := aw.getObjectByRole("pod", ess.TargetRef.Namespace, ess.TargetRef.Name); o != nil {
|
||||
if o := gw.getObjectByRole("pod", ess.TargetRef.Namespace, ess.TargetRef.Name); o != nil {
|
||||
p = o.(*Pod)
|
||||
}
|
||||
for _, epp := range eps.Ports {
|
||||
|
|
|
@ -1,6 +1,7 @@
|
|||
package kubernetes
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"testing"
|
||||
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
|
||||
|
@ -9,7 +10,8 @@ import (
|
|||
|
||||
func TestParseEndpointSliceListFail(t *testing.T) {
|
||||
f := func(data string) {
|
||||
objectsByKey, _, err := parseEndpointSliceList([]byte(data))
|
||||
r := bytes.NewBufferString(data)
|
||||
objectsByKey, _, err := parseEndpointSliceList(r)
|
||||
if err == nil {
|
||||
t.Errorf("unexpected result, test must fail! data: %s", data)
|
||||
}
|
||||
|
@ -175,7 +177,8 @@ func TestParseEndpointSliceListSuccess(t *testing.T) {
|
|||
}
|
||||
]
|
||||
}`
|
||||
objectsByKey, meta, err := parseEndpointSliceList([]byte(data))
|
||||
r := bytes.NewBufferString(data)
|
||||
objectsByKey, meta, err := parseEndpointSliceList(r)
|
||||
if err != nil {
|
||||
t.Errorf("cannot parse data for EndpointSliceList: %v", err)
|
||||
return
|
||||
|
|
|
@ -3,16 +3,18 @@ package kubernetes
|
|||
import (
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"io"
|
||||
)
|
||||
|
||||
func (ig *Ingress) key() string {
|
||||
return ig.Metadata.key()
|
||||
}
|
||||
|
||||
func parseIngressList(data []byte) (map[string]object, ListMeta, error) {
|
||||
func parseIngressList(r io.Reader) (map[string]object, ListMeta, error) {
|
||||
var igl IngressList
|
||||
if err := json.Unmarshal(data, &igl); err != nil {
|
||||
return nil, igl.Metadata, fmt.Errorf("cannot unmarshal IngressList from %q: %w", data, err)
|
||||
d := json.NewDecoder(r)
|
||||
if err := d.Decode(&igl); err != nil {
|
||||
return nil, igl.Metadata, fmt.Errorf("cannot unmarshal IngressList: %w", err)
|
||||
}
|
||||
objectsByKey := make(map[string]object)
|
||||
for _, ig := range igl.Items {
|
||||
|
@ -85,7 +87,7 @@ type HTTPIngressPath struct {
|
|||
// getTargetLabels returns labels for ig.
|
||||
//
|
||||
// See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ingress
|
||||
func (ig *Ingress) getTargetLabels(aw *apiWatcher) []map[string]string {
|
||||
func (ig *Ingress) getTargetLabels(gw *groupWatcher) []map[string]string {
|
||||
tlsHosts := make(map[string]bool)
|
||||
for _, tls := range ig.Spec.TLS {
|
||||
for _, host := range tls.Hosts {
|
||||
|
|
|
@ -1,6 +1,7 @@
|
|||
package kubernetes
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"testing"
|
||||
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
|
||||
|
@ -10,7 +11,8 @@ import (
|
|||
func TestParseIngressListFailure(t *testing.T) {
|
||||
f := func(s string) {
|
||||
t.Helper()
|
||||
objectsByKey, _, err := parseIngressList([]byte(s))
|
||||
r := bytes.NewBufferString(s)
|
||||
objectsByKey, _, err := parseIngressList(r)
|
||||
if err == nil {
|
||||
t.Fatalf("expecting non-nil error")
|
||||
}
|
||||
|
@ -70,7 +72,8 @@ func TestParseIngressListSuccess(t *testing.T) {
|
|||
}
|
||||
]
|
||||
}`
|
||||
objectsByKey, meta, err := parseIngressList([]byte(data))
|
||||
r := bytes.NewBufferString(data)
|
||||
objectsByKey, meta, err := parseIngressList(r)
|
||||
if err != nil {
|
||||
t.Fatalf("unexpected error: %s", err)
|
||||
}
|
||||
|
|
|
@ -48,12 +48,7 @@ func (sdc *SDConfig) GetScrapeWorkObjects(baseDir string, swcFunc ScrapeWorkCons
|
|||
if err != nil {
|
||||
return nil, fmt.Errorf("cannot create API config: %w", err)
|
||||
}
|
||||
switch sdc.Role {
|
||||
case "node", "pod", "service", "endpoints", "endpointslices", "ingress":
|
||||
return cfg.aw.getScrapeWorkObjectsForRole(sdc.Role), nil
|
||||
default:
|
||||
return nil, fmt.Errorf("unexpected `role`: %q; must be one of `node`, `pod`, `service`, `endpoints`, `endpointslices` or `ingress`; skipping it", sdc.Role)
|
||||
}
|
||||
return cfg.aw.getScrapeWorkObjects(), nil
|
||||
}
|
||||
|
||||
// MustStop stops further usage for sdc.
|
||||
|
|
|
@ -3,6 +3,7 @@ package kubernetes
|
|||
import (
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"io"
|
||||
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape/discoveryutils"
|
||||
)
|
||||
|
@ -12,10 +13,11 @@ func (n *Node) key() string {
|
|||
return n.Metadata.key()
|
||||
}
|
||||
|
||||
func parseNodeList(data []byte) (map[string]object, ListMeta, error) {
|
||||
func parseNodeList(r io.Reader) (map[string]object, ListMeta, error) {
|
||||
var nl NodeList
|
||||
if err := json.Unmarshal(data, &nl); err != nil {
|
||||
return nil, nl.Metadata, fmt.Errorf("cannot unmarshal NodeList from %q: %w", data, err)
|
||||
d := json.NewDecoder(r)
|
||||
if err := d.Decode(&nl); err != nil {
|
||||
return nil, nl.Metadata, fmt.Errorf("cannot unmarshal NodeList: %w", err)
|
||||
}
|
||||
objectsByKey := make(map[string]object)
|
||||
for _, n := range nl.Items {
|
||||
|
@ -74,7 +76,7 @@ type NodeDaemonEndpoints struct {
|
|||
// getTargetLabels returs labels for the given n.
|
||||
//
|
||||
// See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#node
|
||||
func (n *Node) getTargetLabels(aw *apiWatcher) []map[string]string {
|
||||
func (n *Node) getTargetLabels(gw *groupWatcher) []map[string]string {
|
||||
addr := getNodeAddr(n.Status.Addresses)
|
||||
if len(addr) == 0 {
|
||||
// Skip node without address
|
||||
|
|
|
@ -1,6 +1,7 @@
|
|||
package kubernetes
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"reflect"
|
||||
"sort"
|
||||
"strconv"
|
||||
|
@ -13,7 +14,8 @@ import (
|
|||
func TestParseNodeListFailure(t *testing.T) {
|
||||
f := func(s string) {
|
||||
t.Helper()
|
||||
objectsByKey, _, err := parseNodeList([]byte(s))
|
||||
r := bytes.NewBufferString(s)
|
||||
objectsByKey, _, err := parseNodeList(r)
|
||||
if err == nil {
|
||||
t.Fatalf("expecting non-nil error")
|
||||
}
|
||||
|
@ -229,7 +231,8 @@ func TestParseNodeListSuccess(t *testing.T) {
|
|||
]
|
||||
}
|
||||
`
|
||||
objectsByKey, meta, err := parseNodeList([]byte(data))
|
||||
r := bytes.NewBufferString(data)
|
||||
objectsByKey, meta, err := parseNodeList(r)
|
||||
if err != nil {
|
||||
t.Fatalf("unexpected error: %s", err)
|
||||
}
|
||||
|
|
|
@ -3,6 +3,7 @@ package kubernetes
|
|||
import (
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"io"
|
||||
"strconv"
|
||||
"strings"
|
||||
|
||||
|
@ -13,10 +14,11 @@ func (p *Pod) key() string {
|
|||
return p.Metadata.key()
|
||||
}
|
||||
|
||||
func parsePodList(data []byte) (map[string]object, ListMeta, error) {
|
||||
func parsePodList(r io.Reader) (map[string]object, ListMeta, error) {
|
||||
var pl PodList
|
||||
if err := json.Unmarshal(data, &pl); err != nil {
|
||||
return nil, pl.Metadata, fmt.Errorf("cannot unmarshal PodList from %q: %w", data, err)
|
||||
d := json.NewDecoder(r)
|
||||
if err := d.Decode(&pl); err != nil {
|
||||
return nil, pl.Metadata, fmt.Errorf("cannot unmarshal PodList: %w", err)
|
||||
}
|
||||
objectsByKey := make(map[string]object)
|
||||
for _, p := range pl.Items {
|
||||
|
@ -95,7 +97,7 @@ type PodCondition struct {
|
|||
// getTargetLabels returns labels for each port of the given p.
|
||||
//
|
||||
// See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#pod
|
||||
func (p *Pod) getTargetLabels(aw *apiWatcher) []map[string]string {
|
||||
func (p *Pod) getTargetLabels(gw *groupWatcher) []map[string]string {
|
||||
if len(p.Status.PodIP) == 0 {
|
||||
// Skip pod without IP
|
||||
return nil
|
||||
|
|
|
@ -1,6 +1,7 @@
|
|||
package kubernetes
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"testing"
|
||||
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
|
||||
|
@ -10,7 +11,8 @@ import (
|
|||
func TestParsePodListFailure(t *testing.T) {
|
||||
f := func(s string) {
|
||||
t.Helper()
|
||||
objectsByKey, _, err := parsePodList([]byte(s))
|
||||
r := bytes.NewBufferString(s)
|
||||
objectsByKey, _, err := parsePodList(r)
|
||||
if err == nil {
|
||||
t.Fatalf("expecting non-nil error")
|
||||
}
|
||||
|
@ -227,7 +229,8 @@ func TestParsePodListSuccess(t *testing.T) {
|
|||
]
|
||||
}
|
||||
`
|
||||
objectsByKey, meta, err := parsePodList([]byte(data))
|
||||
r := bytes.NewBufferString(data)
|
||||
objectsByKey, meta, err := parsePodList(r)
|
||||
if err != nil {
|
||||
t.Fatalf("unexpected error: %s", err)
|
||||
}
|
||||
|
|
|
@ -3,6 +3,7 @@ package kubernetes
|
|||
import (
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"io"
|
||||
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape/discoveryutils"
|
||||
)
|
||||
|
@ -11,10 +12,11 @@ func (s *Service) key() string {
|
|||
return s.Metadata.key()
|
||||
}
|
||||
|
||||
func parseServiceList(data []byte) (map[string]object, ListMeta, error) {
|
||||
func parseServiceList(r io.Reader) (map[string]object, ListMeta, error) {
|
||||
var sl ServiceList
|
||||
if err := json.Unmarshal(data, &sl); err != nil {
|
||||
return nil, sl.Metadata, fmt.Errorf("cannot unmarshal ServiceList from %q: %w", data, err)
|
||||
d := json.NewDecoder(r)
|
||||
if err := d.Decode(&sl); err != nil {
|
||||
return nil, sl.Metadata, fmt.Errorf("cannot unmarshal ServiceList: %w", err)
|
||||
}
|
||||
objectsByKey := make(map[string]object)
|
||||
for _, s := range sl.Items {
|
||||
|
@ -69,7 +71,7 @@ type ServicePort struct {
|
|||
// getTargetLabels returns labels for each port of the given s.
|
||||
//
|
||||
// See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#service
|
||||
func (s *Service) getTargetLabels(aw *apiWatcher) []map[string]string {
|
||||
func (s *Service) getTargetLabels(gw *groupWatcher) []map[string]string {
|
||||
host := fmt.Sprintf("%s.%s.svc", s.Metadata.Name, s.Metadata.Namespace)
|
||||
var ms []map[string]string
|
||||
for _, sp := range s.Spec.Ports {
|
||||
|
|
|
@ -1,6 +1,7 @@
|
|||
package kubernetes
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"testing"
|
||||
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
|
||||
|
@ -10,7 +11,8 @@ import (
|
|||
func TestParseServiceListFailure(t *testing.T) {
|
||||
f := func(s string) {
|
||||
t.Helper()
|
||||
objectsByKey, _, err := parseServiceList([]byte(s))
|
||||
r := bytes.NewBufferString(s)
|
||||
objectsByKey, _, err := parseServiceList(r)
|
||||
if err == nil {
|
||||
t.Fatalf("expecting non-nil error")
|
||||
}
|
||||
|
@ -88,7 +90,8 @@ func TestParseServiceListSuccess(t *testing.T) {
|
|||
]
|
||||
}
|
||||
`
|
||||
objectsByKey, meta, err := parseServiceList([]byte(data))
|
||||
r := bytes.NewBufferString(data)
|
||||
objectsByKey, meta, err := parseServiceList(r)
|
||||
if err != nil {
|
||||
t.Fatalf("unexpected error: %s", err)
|
||||
}
|
||||
|
|
|
@ -66,7 +66,7 @@ func NewClient(apiServer string, ac *promauth.Config, proxyURL proxy.URL) (*Clie
|
|||
|
||||
hostPort := string(u.Host())
|
||||
isTLS := string(u.Scheme()) == "https"
|
||||
if isTLS && ac != nil {
|
||||
if isTLS {
|
||||
tlsCfg = ac.NewTLSConfig()
|
||||
}
|
||||
if !strings.Contains(hostPort, ":") {
|
||||
|
@ -77,7 +77,7 @@ func NewClient(apiServer string, ac *promauth.Config, proxyURL proxy.URL) (*Clie
|
|||
hostPort = net.JoinHostPort(hostPort, port)
|
||||
}
|
||||
if dialFunc == nil {
|
||||
dialFunc, err = proxyURL.NewDialFunc(tlsCfg)
|
||||
dialFunc, err = proxyURL.NewDialFunc(ac)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
|
|
@ -21,29 +21,29 @@ var (
|
|||
fileSDCheckInterval = flag.Duration("promscrape.fileSDCheckInterval", 30*time.Second, "Interval for checking for changes in 'file_sd_config'. "+
|
||||
"See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#file_sd_config for details")
|
||||
kubernetesSDCheckInterval = flag.Duration("promscrape.kubernetesSDCheckInterval", 30*time.Second, "Interval for checking for changes in Kubernetes API server. "+
|
||||
"This works only if `kubernetes_sd_configs` is configured in '-promscrape.config' file. "+
|
||||
"This works only if kubernetes_sd_configs is configured in '-promscrape.config' file. "+
|
||||
"See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config for details")
|
||||
openstackSDCheckInterval = flag.Duration("promscrape.openstackSDCheckInterval", 30*time.Second, "Interval for checking for changes in openstack API server. "+
|
||||
"This works only if `openstack_sd_configs` is configured in '-promscrape.config' file. "+
|
||||
"This works only if openstack_sd_configs is configured in '-promscrape.config' file. "+
|
||||
"See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#openstack_sd_config for details")
|
||||
eurekaSDCheckInterval = flag.Duration("promscrape.eurekaSDCheckInterval", 30*time.Second, "Interval for checking for changes in eureka. "+
|
||||
"This works only if `eureka_sd_configs` is configured in '-promscrape.config' file. "+
|
||||
"This works only if eureka_sd_configs is configured in '-promscrape.config' file. "+
|
||||
"See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#eureka_sd_config for details")
|
||||
dnsSDCheckInterval = flag.Duration("promscrape.dnsSDCheckInterval", 30*time.Second, "Interval for checking for changes in dns. "+
|
||||
"This works only if `dns_sd_configs` is configured in '-promscrape.config' file. "+
|
||||
"This works only if dns_sd_configs is configured in '-promscrape.config' file. "+
|
||||
"See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config for details")
|
||||
ec2SDCheckInterval = flag.Duration("promscrape.ec2SDCheckInterval", time.Minute, "Interval for checking for changes in ec2. "+
|
||||
"This works only if `ec2_sd_configs` is configured in '-promscrape.config' file. "+
|
||||
"This works only if ec2_sd_configs is configured in '-promscrape.config' file. "+
|
||||
"See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ec2_sd_config for details")
|
||||
gceSDCheckInterval = flag.Duration("promscrape.gceSDCheckInterval", time.Minute, "Interval for checking for changes in gce. "+
|
||||
"This works only if `gce_sd_configs` is configured in '-promscrape.config' file. "+
|
||||
"This works only if gce_sd_configs is configured in '-promscrape.config' file. "+
|
||||
"See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#gce_sd_config for details")
|
||||
dockerswarmSDCheckInterval = flag.Duration("promscrape.dockerswarmSDCheckInterval", 30*time.Second, "Interval for checking for changes in dockerswarm. "+
|
||||
"This works only if `dockerswarm_sd_configs` is configured in '-promscrape.config' file. "+
|
||||
"This works only if dockerswarm_sd_configs is configured in '-promscrape.config' file. "+
|
||||
"See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dockerswarm_sd_config for details")
|
||||
promscrapeConfigFile = flag.String("promscrape.config", "", "Optional path to Prometheus config file with 'scrape_configs' section containing targets to scrape. "+
|
||||
"See https://victoriametrics.github.io/#how-to-scrape-prometheus-exporters-such-as-node-exporter for details")
|
||||
suppressDuplicateScrapeTargetErrors = flag.Bool("promscrape.suppressDuplicateScrapeTargetErrors", false, "Whether to suppress `duplicate scrape target` errors; "+
|
||||
suppressDuplicateScrapeTargetErrors = flag.Bool("promscrape.suppressDuplicateScrapeTargetErrors", false, "Whether to suppress 'duplicate scrape target' errors; "+
|
||||
"see https://victoriametrics.github.io/vmagent.html#troubleshooting for details")
|
||||
)
|
||||
|
||||
|
@ -231,11 +231,17 @@ func (scfg *scrapeConfig) run() {
|
|||
cfg := <-scfg.cfgCh
|
||||
var swsPrev []*ScrapeWork
|
||||
updateScrapeWork := func(cfg *Config) {
|
||||
for {
|
||||
startTime := time.Now()
|
||||
sws := scfg.getScrapeWork(cfg, swsPrev)
|
||||
sg.update(sws)
|
||||
retry := sg.update(sws)
|
||||
swsPrev = sws
|
||||
scfg.discoveryDuration.UpdateDuration(startTime)
|
||||
if !retry {
|
||||
return
|
||||
}
|
||||
time.Sleep(2 * time.Second)
|
||||
}
|
||||
}
|
||||
updateScrapeWork(cfg)
|
||||
atomic.AddInt32(&PendingScrapeConfigs, -1)
|
||||
|
@ -295,7 +301,7 @@ func (sg *scraperGroup) stop() {
|
|||
sg.wg.Wait()
|
||||
}
|
||||
|
||||
func (sg *scraperGroup) update(sws []*ScrapeWork) {
|
||||
func (sg *scraperGroup) update(sws []*ScrapeWork) (retry bool) {
|
||||
sg.mLock.Lock()
|
||||
defer sg.mLock.Unlock()
|
||||
|
||||
|
@ -352,6 +358,7 @@ func (sg *scraperGroup) update(sws []*ScrapeWork) {
|
|||
sg.changesCount.Add(additionsCount + deletionsCount)
|
||||
logger.Infof("%s: added targets: %d, removed targets: %d; total targets: %d", sg.name, additionsCount, deletionsCount, len(sg.m))
|
||||
}
|
||||
return deletionsCount > 0 && len(sg.m) == 0
|
||||
}
|
||||
|
||||
type scraper struct {
|
||||
|
|
|
@ -68,12 +68,15 @@ type ScrapeWork struct {
|
|||
// See also https://prometheus.io/docs/concepts/jobs_instances/
|
||||
Labels []prompbmarshal.Label
|
||||
|
||||
// Auth config
|
||||
AuthConfig *promauth.Config
|
||||
|
||||
// ProxyURL HTTP proxy url
|
||||
ProxyURL proxy.URL
|
||||
|
||||
// Auth config for ProxyUR:
|
||||
ProxyAuthConfig *promauth.Config
|
||||
|
||||
// Auth config
|
||||
AuthConfig *promauth.Config
|
||||
|
||||
// Optional `metric_relabel_configs`.
|
||||
MetricRelabelConfigs *promrelabel.ParsedConfigs
|
||||
|
||||
|
@ -105,9 +108,10 @@ type ScrapeWork struct {
|
|||
func (sw *ScrapeWork) key() string {
|
||||
// Do not take into account OriginalLabels.
|
||||
key := fmt.Sprintf("ScrapeURL=%s, ScrapeInterval=%s, ScrapeTimeout=%s, HonorLabels=%v, HonorTimestamps=%v, Labels=%s, "+
|
||||
"AuthConfig=%s, MetricRelabelConfigs=%s, SampleLimit=%d, DisableCompression=%v, DisableKeepAlive=%v, StreamParse=%v, "+
|
||||
"ProxyURL=%s, ProxyAuthConfig=%s, AuthConfig=%s, MetricRelabelConfigs=%s, SampleLimit=%d, DisableCompression=%v, DisableKeepAlive=%v, StreamParse=%v, "+
|
||||
"ScrapeAlignInterval=%s, ScrapeOffset=%s",
|
||||
sw.ScrapeURL, sw.ScrapeInterval, sw.ScrapeTimeout, sw.HonorLabels, sw.HonorTimestamps, sw.LabelsString(),
|
||||
sw.ProxyURL.String(), sw.ProxyAuthConfig.String(),
|
||||
sw.AuthConfig.String(), sw.MetricRelabelConfigs.String(), sw.SampleLimit, sw.DisableCompression, sw.DisableKeepAlive, sw.StreamParse,
|
||||
sw.ScrapeAlignInterval, sw.ScrapeOffset)
|
||||
return key
|
||||
|
@ -173,9 +177,9 @@ type scrapeWork struct {
|
|||
// It is used as a hint in order to reduce memory usage for body buffers.
|
||||
prevBodyLen int
|
||||
|
||||
// prevRowsLen contains the number rows scraped during the previous scrape.
|
||||
// prevLabelsLen contains the number labels scraped during the previous scrape.
|
||||
// It is used as a hint in order to reduce memory usage when parsing scrape responses.
|
||||
prevRowsLen int
|
||||
prevLabelsLen int
|
||||
}
|
||||
|
||||
func (sw *scrapeWork) run(stopCh <-chan struct{}) {
|
||||
|
@ -279,7 +283,7 @@ func (sw *scrapeWork) scrapeInternal(scrapeTimestamp, realTimestamp int64) error
|
|||
scrapeDuration.Update(duration)
|
||||
scrapeResponseSize.Update(float64(len(body.B)))
|
||||
up := 1
|
||||
wc := writeRequestCtxPool.Get(sw.prevRowsLen)
|
||||
wc := writeRequestCtxPool.Get(sw.prevLabelsLen)
|
||||
if err != nil {
|
||||
up = 0
|
||||
scrapesFailed.Inc()
|
||||
|
@ -290,27 +294,15 @@ func (sw *scrapeWork) scrapeInternal(scrapeTimestamp, realTimestamp int64) error
|
|||
srcRows := wc.rows.Rows
|
||||
samplesScraped := len(srcRows)
|
||||
scrapedSamples.Update(float64(samplesScraped))
|
||||
if sw.Config.SampleLimit > 0 && samplesScraped > sw.Config.SampleLimit {
|
||||
srcRows = srcRows[:0]
|
||||
for i := range srcRows {
|
||||
sw.addRowToTimeseries(wc, &srcRows[i], scrapeTimestamp, true)
|
||||
}
|
||||
samplesPostRelabeling := len(wc.writeRequest.Timeseries)
|
||||
if sw.Config.SampleLimit > 0 && samplesPostRelabeling > sw.Config.SampleLimit {
|
||||
wc.resetNoRows()
|
||||
up = 0
|
||||
scrapesSkippedBySampleLimit.Inc()
|
||||
}
|
||||
samplesPostRelabeling := 0
|
||||
for i := range srcRows {
|
||||
sw.addRowToTimeseries(wc, &srcRows[i], scrapeTimestamp, true)
|
||||
if len(wc.labels) > 40000 {
|
||||
// Limit the maximum size of wc.writeRequest.
|
||||
// This should reduce memory usage when scraping targets with millions of metrics and/or labels.
|
||||
// For example, when scraping /federate handler from Prometheus - see https://prometheus.io/docs/prometheus/latest/federation/
|
||||
samplesPostRelabeling += len(wc.writeRequest.Timeseries)
|
||||
sw.updateSeriesAdded(wc)
|
||||
startTime := time.Now()
|
||||
sw.PushData(&wc.writeRequest)
|
||||
pushDataDuration.UpdateDuration(startTime)
|
||||
wc.resetNoRows()
|
||||
}
|
||||
}
|
||||
samplesPostRelabeling += len(wc.writeRequest.Timeseries)
|
||||
sw.updateSeriesAdded(wc)
|
||||
seriesAdded := sw.finalizeSeriesAdded(samplesPostRelabeling)
|
||||
sw.addAutoTimeseries(wc, "up", float64(up), scrapeTimestamp)
|
||||
|
@ -321,7 +313,7 @@ func (sw *scrapeWork) scrapeInternal(scrapeTimestamp, realTimestamp int64) error
|
|||
startTime := time.Now()
|
||||
sw.PushData(&wc.writeRequest)
|
||||
pushDataDuration.UpdateDuration(startTime)
|
||||
sw.prevRowsLen = samplesScraped
|
||||
sw.prevLabelsLen = len(wc.labels)
|
||||
wc.reset()
|
||||
writeRequestCtxPool.Put(wc)
|
||||
// body must be released only after wc is released, since wc refers to body.
|
||||
|
@ -335,7 +327,7 @@ func (sw *scrapeWork) scrapeStream(scrapeTimestamp, realTimestamp int64) error {
|
|||
samplesScraped := 0
|
||||
samplesPostRelabeling := 0
|
||||
responseSize := int64(0)
|
||||
wc := writeRequestCtxPool.Get(sw.prevRowsLen)
|
||||
wc := writeRequestCtxPool.Get(sw.prevLabelsLen)
|
||||
|
||||
sr, err := sw.GetStreamReader()
|
||||
if err != nil {
|
||||
|
@ -385,7 +377,7 @@ func (sw *scrapeWork) scrapeStream(scrapeTimestamp, realTimestamp int64) error {
|
|||
startTime := time.Now()
|
||||
sw.PushData(&wc.writeRequest)
|
||||
pushDataDuration.UpdateDuration(startTime)
|
||||
sw.prevRowsLen = len(wc.rows.Rows)
|
||||
sw.prevLabelsLen = len(wc.labels)
|
||||
wc.reset()
|
||||
writeRequestCtxPool.Put(wc)
|
||||
tsmGlobal.Update(sw.Config, sw.ScrapeGroup, up == 1, realTimestamp, int64(duration*1000), err)
|
||||
|
@ -397,11 +389,11 @@ func (sw *scrapeWork) scrapeStream(scrapeTimestamp, realTimestamp int64) error {
|
|||
//
|
||||
// Its logic has been copied from leveledbytebufferpool.
|
||||
type leveledWriteRequestCtxPool struct {
|
||||
pools [30]sync.Pool
|
||||
pools [13]sync.Pool
|
||||
}
|
||||
|
||||
func (lwp *leveledWriteRequestCtxPool) Get(rowsCapacity int) *writeRequestCtx {
|
||||
id, capacityNeeded := lwp.getPoolIDAndCapacity(rowsCapacity)
|
||||
func (lwp *leveledWriteRequestCtxPool) Get(labelsCapacity int) *writeRequestCtx {
|
||||
id, capacityNeeded := lwp.getPoolIDAndCapacity(labelsCapacity)
|
||||
for i := 0; i < 2; i++ {
|
||||
if id < 0 || id >= len(lwp.pools) {
|
||||
break
|
||||
|
@ -417,11 +409,13 @@ func (lwp *leveledWriteRequestCtxPool) Get(rowsCapacity int) *writeRequestCtx {
|
|||
}
|
||||
|
||||
func (lwp *leveledWriteRequestCtxPool) Put(wc *writeRequestCtx) {
|
||||
capacity := cap(wc.rows.Rows)
|
||||
id, _ := lwp.getPoolIDAndCapacity(capacity)
|
||||
capacity := cap(wc.labels)
|
||||
id, poolCapacity := lwp.getPoolIDAndCapacity(capacity)
|
||||
if capacity <= poolCapacity {
|
||||
wc.reset()
|
||||
lwp.pools[id].Put(wc)
|
||||
}
|
||||
}
|
||||
|
||||
func (lwp *leveledWriteRequestCtxPool) getPoolIDAndCapacity(size int) (int, int) {
|
||||
size--
|
||||
|
@ -430,7 +424,7 @@ func (lwp *leveledWriteRequestCtxPool) getPoolIDAndCapacity(size int) (int, int)
|
|||
}
|
||||
size >>= 3
|
||||
id := bits.Len(uint(size))
|
||||
if id > len(lwp.pools) {
|
||||
if id >= len(lwp.pools) {
|
||||
id = len(lwp.pools) - 1
|
||||
}
|
||||
return id, (1 << (id + 3))
|
||||
|
|
|
@ -332,7 +332,7 @@ func TestScrapeWorkScrapeInternalSuccess(t *testing.T) {
|
|||
up 0 123
|
||||
scrape_samples_scraped 2 123
|
||||
scrape_duration_seconds 0 123
|
||||
scrape_samples_post_metric_relabeling 0 123
|
||||
scrape_samples_post_metric_relabeling 2 123
|
||||
scrape_series_added 0 123
|
||||
`)
|
||||
}
|
||||
|
|
|
@ -2,7 +2,6 @@ package promscrape
|
|||
|
||||
import (
|
||||
"context"
|
||||
"crypto/tls"
|
||||
"fmt"
|
||||
"net"
|
||||
"sync"
|
||||
|
@ -10,6 +9,7 @@ import (
|
|||
"time"
|
||||
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/netutil"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/proxy"
|
||||
"github.com/VictoriaMetrics/fasthttp"
|
||||
"github.com/VictoriaMetrics/metrics"
|
||||
|
@ -49,8 +49,8 @@ var (
|
|||
stdDialerOnce sync.Once
|
||||
)
|
||||
|
||||
func newStatDialFunc(proxyURL proxy.URL, tlsConfig *tls.Config) (fasthttp.DialFunc, error) {
|
||||
dialFunc, err := proxyURL.NewDialFunc(tlsConfig)
|
||||
func newStatDialFunc(proxyURL proxy.URL, ac *promauth.Config) (fasthttp.DialFunc, error) {
|
||||
dialFunc, err := proxyURL.NewDialFunc(ac)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
|
|
@ -18,7 +18,7 @@ import (
|
|||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
|
||||
)
|
||||
|
||||
var maxDroppedTargets = flag.Int("promscrape.maxDroppedTargets", 1000, "The maximum number of `droppedTargets` shown at /api/v1/targets page. "+
|
||||
var maxDroppedTargets = flag.Int("promscrape.maxDroppedTargets", 1000, "The maximum number of droppedTargets to show at /api/v1/targets page. "+
|
||||
"Increase this value if your setup drops more scrape targets during relabeling and you need investigating labels for all the dropped targets. "+
|
||||
"Note that the increased number of tracked dropped targets may result in increased memory usage")
|
||||
|
||||
|
|
|
@ -15,7 +15,7 @@ import (
|
|||
)
|
||||
|
||||
var maxLineLen = flagutil.NewBytes("import.maxLineLen", 100*1024*1024, "The maximum length in bytes of a single line accepted by /api/v1/import; "+
|
||||
"the line length can be limited with `max_rows_per_line` query arg passed to /api/v1/export")
|
||||
"the line length can be limited with 'max_rows_per_line' query arg passed to /api/v1/export")
|
||||
|
||||
// ParseStream parses /api/v1/import lines from req and calls callback for the parsed rows.
|
||||
//
|
||||
|
|
|
@ -7,9 +7,12 @@ import (
|
|||
"fmt"
|
||||
"net"
|
||||
"net/url"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/netutil"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
|
||||
"github.com/VictoriaMetrics/fasthttp"
|
||||
)
|
||||
|
||||
|
@ -18,6 +21,17 @@ type URL struct {
|
|||
url *url.URL
|
||||
}
|
||||
|
||||
// MustNewURL returns new URL for the given u.
|
||||
func MustNewURL(u string) URL {
|
||||
pu, err := url.Parse(u)
|
||||
if err != nil {
|
||||
logger.Panicf("BUG: cannot parse u=%q: %s", u, err)
|
||||
}
|
||||
return URL{
|
||||
url: pu,
|
||||
}
|
||||
}
|
||||
|
||||
// URL return the underlying url.
|
||||
func (u *URL) URL() *url.URL {
|
||||
if u == nil || u.url == nil {
|
||||
|
@ -26,6 +40,15 @@ func (u *URL) URL() *url.URL {
|
|||
return u.url
|
||||
}
|
||||
|
||||
// String returns string representation of u.
|
||||
func (u *URL) String() string {
|
||||
pu := u.URL()
|
||||
if pu == nil {
|
||||
return ""
|
||||
}
|
||||
return pu.String()
|
||||
}
|
||||
|
||||
// MarshalYAML implements yaml.Marshaler interface.
|
||||
func (u *URL) MarshalYAML() (interface{}, error) {
|
||||
if u.url == nil {
|
||||
|
@ -48,38 +71,72 @@ func (u *URL) UnmarshalYAML(unmarshal func(interface{}) error) error {
|
|||
return nil
|
||||
}
|
||||
|
||||
// NewDialFunc returns dial func for the given pu and tlsConfig.
|
||||
func (u *URL) NewDialFunc(tlsConfig *tls.Config) (fasthttp.DialFunc, error) {
|
||||
// NewDialFunc returns dial func for the given u and ac.
|
||||
func (u *URL) NewDialFunc(ac *promauth.Config) (fasthttp.DialFunc, error) {
|
||||
if u == nil || u.url == nil {
|
||||
return defaultDialFunc, nil
|
||||
}
|
||||
pu := u.url
|
||||
if pu.Scheme != "http" && pu.Scheme != "https" {
|
||||
return nil, fmt.Errorf("unknown scheme=%q for proxy_url=%q, must be http or https", pu.Scheme, pu)
|
||||
return nil, fmt.Errorf("unknown scheme=%q for proxy_url=%q, must be http or https", pu.Scheme, pu.Redacted())
|
||||
}
|
||||
isTLS := pu.Scheme == "https"
|
||||
proxyAddr := addMissingPort(pu.Host, isTLS)
|
||||
var authHeader string
|
||||
if ac != nil {
|
||||
authHeader = ac.Authorization
|
||||
}
|
||||
if pu.User != nil && len(pu.User.Username()) > 0 {
|
||||
userPasswordEncoded := base64.StdEncoding.EncodeToString([]byte(pu.User.String()))
|
||||
authHeader = "Proxy-Authorization: Basic " + userPasswordEncoded + "\r\n"
|
||||
authHeader = "Basic " + userPasswordEncoded
|
||||
}
|
||||
if authHeader != "" {
|
||||
authHeader = "Proxy-Authorization: " + authHeader + "\r\n"
|
||||
}
|
||||
var tlsCfg *tls.Config
|
||||
if isTLS {
|
||||
tlsCfg = ac.NewTLSConfig()
|
||||
if !tlsCfg.InsecureSkipVerify && tlsCfg.ServerName == "" {
|
||||
tlsCfg.ServerName = tlsServerName(proxyAddr)
|
||||
}
|
||||
}
|
||||
dialFunc := func(addr string) (net.Conn, error) {
|
||||
proxyConn, err := defaultDialFunc(pu.Host)
|
||||
proxyConn, err := defaultDialFunc(proxyAddr)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("cannot connect to proxy %q: %w", pu, err)
|
||||
return nil, fmt.Errorf("cannot connect to proxy %q: %w", pu.Redacted(), err)
|
||||
}
|
||||
if pu.Scheme == "https" {
|
||||
proxyConn = tls.Client(proxyConn, tlsConfig)
|
||||
if isTLS {
|
||||
proxyConn = tls.Client(proxyConn, tlsCfg)
|
||||
}
|
||||
conn, err := sendConnectRequest(proxyConn, addr, authHeader)
|
||||
conn, err := sendConnectRequest(proxyConn, proxyAddr, addr, authHeader)
|
||||
if err != nil {
|
||||
_ = proxyConn.Close()
|
||||
return nil, fmt.Errorf("error when sending CONNECT request to proxy %q: %w", pu, err)
|
||||
return nil, fmt.Errorf("error when sending CONNECT request to proxy %q: %w", pu.Redacted(), err)
|
||||
}
|
||||
return conn, nil
|
||||
}
|
||||
return dialFunc, nil
|
||||
}
|
||||
|
||||
func addMissingPort(addr string, isTLS bool) string {
|
||||
if strings.IndexByte(addr, ':') >= 0 {
|
||||
return addr
|
||||
}
|
||||
port := "80"
|
||||
if isTLS {
|
||||
port = "443"
|
||||
}
|
||||
return addr + ":" + port
|
||||
}
|
||||
|
||||
func tlsServerName(addr string) string {
|
||||
host, _, err := net.SplitHostPort(addr)
|
||||
if err != nil {
|
||||
return addr
|
||||
}
|
||||
return host
|
||||
}
|
||||
|
||||
func defaultDialFunc(addr string) (net.Conn, error) {
|
||||
network := "tcp4"
|
||||
if netutil.TCP6Enabled() {
|
||||
|
@ -90,8 +147,8 @@ func defaultDialFunc(addr string) (net.Conn, error) {
|
|||
}
|
||||
|
||||
// sendConnectRequest sends CONNECT request to proxyConn for the given addr and authHeader and returns the established connection to dstAddr.
|
||||
func sendConnectRequest(proxyConn net.Conn, dstAddr, authHeader string) (net.Conn, error) {
|
||||
req := "CONNECT " + dstAddr + " HTTP/1.1\r\nHost: " + dstAddr + "\r\n" + authHeader + "\r\n"
|
||||
func sendConnectRequest(proxyConn net.Conn, proxyAddr, dstAddr, authHeader string) (net.Conn, error) {
|
||||
req := "CONNECT " + dstAddr + " HTTP/1.1\r\nHost: " + proxyAddr + "\r\n" + authHeader + "\r\n"
|
||||
if _, err := proxyConn.Write([]byte(req)); err != nil {
|
||||
return nil, fmt.Errorf("cannot send CONNECT request for dstAddr=%q: %w", dstAddr, err)
|
||||
}
|
||||
|
|
|
@ -577,9 +577,21 @@ func (db *indexDB) createTSIDByName(dst *TSID, metricName []byte) error {
|
|||
// on db.tb flush via invalidateTagCache flushCallback passed to OpenTable.
|
||||
|
||||
atomic.AddUint64(&db.newTimeseriesCreated, 1)
|
||||
if logNewSeries {
|
||||
logger.Infof("new series created: %s", mn.String())
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// SetLogNewSeries updates new series logging.
|
||||
//
|
||||
// This function must be called before any calling any storage functions.
|
||||
func SetLogNewSeries(ok bool) {
|
||||
logNewSeries = ok
|
||||
}
|
||||
|
||||
var logNewSeries = false
|
||||
|
||||
func (db *indexDB) generateTSID(dst *TSID, metricName []byte, mn *MetricName) error {
|
||||
// Search the TSID in the external storage.
|
||||
// This is usually the db from the previous period.
|
||||
|
@ -2048,15 +2060,6 @@ func (is *indexSearch) getTagFilterWithMinMetricIDsCount(tfs *TagFilters, maxMet
|
|||
|
||||
metricIDs, _, err := is.getMetricIDsForTagFilter(tf, nil, maxMetrics)
|
||||
if err != nil {
|
||||
if err == errFallbackToMetricNameMatch {
|
||||
// Skip tag filters requiring to scan for too many metrics.
|
||||
kb.B = append(kb.B[:0], uselessSingleTagFilterKeyPrefix)
|
||||
kb.B = encoding.MarshalUint64(kb.B, uint64(maxMetrics))
|
||||
kb.B = tf.Marshal(kb.B)
|
||||
is.db.uselessTagFiltersCache.Set(kb.B, uselessTagFilterCacheValue)
|
||||
uselessTagFilters++
|
||||
continue
|
||||
}
|
||||
return nil, nil, fmt.Errorf("cannot find MetricIDs for tagFilter %s: %w", tf, err)
|
||||
}
|
||||
if metricIDs.Len() >= maxMetrics {
|
||||
|
@ -2306,7 +2309,7 @@ func (is *indexSearch) updateMetricIDsForTagFilters(metricIDs *uint64set.Set, tf
|
|||
// Fast path: found metricIDs by date range.
|
||||
return nil
|
||||
}
|
||||
if err != errFallbackToMetricNameMatch {
|
||||
if err != errFallbackToGlobalSearch {
|
||||
return err
|
||||
}
|
||||
|
||||
|
@ -2330,12 +2333,6 @@ func (is *indexSearch) updateMetricIDsForTagFilters(metricIDs *uint64set.Set, tf
|
|||
continue
|
||||
}
|
||||
mIDs, err := is.intersectMetricIDsWithTagFilter(tf, minMetricIDs)
|
||||
if err == errFallbackToMetricNameMatch {
|
||||
// The tag filter requires too many index scans. Postpone it,
|
||||
// so tag filters with lower number of index scans may be applied.
|
||||
tfsPostponed = append(tfsPostponed, tf)
|
||||
continue
|
||||
}
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
@ -2345,11 +2342,8 @@ func (is *indexSearch) updateMetricIDsForTagFilters(metricIDs *uint64set.Set, tf
|
|||
if len(tfsPostponed) > 0 && successfulIntersects == 0 {
|
||||
return is.updateMetricIDsByMetricNameMatch(metricIDs, minMetricIDs, tfsPostponed)
|
||||
}
|
||||
for i, tf := range tfsPostponed {
|
||||
for _, tf := range tfsPostponed {
|
||||
mIDs, err := is.intersectMetricIDsWithTagFilter(tf, minMetricIDs)
|
||||
if err == errFallbackToMetricNameMatch {
|
||||
return is.updateMetricIDsByMetricNameMatch(metricIDs, minMetricIDs, tfsPostponed[i:])
|
||||
}
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
@ -2363,7 +2357,6 @@ const (
|
|||
uselessSingleTagFilterKeyPrefix = 0
|
||||
uselessMultiTagFiltersKeyPrefix = 1
|
||||
uselessNegativeTagFilterKeyPrefix = 2
|
||||
uselessTagIntersectKeyPrefix = 3
|
||||
)
|
||||
|
||||
var uselessTagFilterCacheValue = []byte("1")
|
||||
|
@ -2375,29 +2368,28 @@ func (is *indexSearch) getMetricIDsForTagFilter(tf *tagFilter, filter *uint64set
|
|||
metricIDs := &uint64set.Set{}
|
||||
if len(tf.orSuffixes) > 0 {
|
||||
// Fast path for orSuffixes - seek for rows for each value from orSuffixes.
|
||||
loopsCount, err := is.updateMetricIDsForOrSuffixesNoFilter(tf, maxMetrics, metricIDs)
|
||||
if err != nil {
|
||||
if err == errFallbackToMetricNameMatch {
|
||||
return nil, loopsCount, err
|
||||
var loopsCount uint64
|
||||
var err error
|
||||
if filter != nil {
|
||||
loopsCount, err = is.updateMetricIDsForOrSuffixesWithFilter(tf, metricIDs, filter)
|
||||
} else {
|
||||
loopsCount, err = is.updateMetricIDsForOrSuffixesNoFilter(tf, maxMetrics, metricIDs)
|
||||
}
|
||||
if err != nil {
|
||||
return nil, loopsCount, fmt.Errorf("error when searching for metricIDs for tagFilter in fast path: %w; tagFilter=%s", err, tf)
|
||||
}
|
||||
return metricIDs, loopsCount, nil
|
||||
}
|
||||
|
||||
// Slow path - scan for all the rows with the given prefix.
|
||||
maxLoopsCount := uint64(maxMetrics) * maxIndexScanSlowLoopsPerMetric
|
||||
loopsCount, err := is.getMetricIDsForTagFilterSlow(tf, filter, maxLoopsCount, metricIDs.Add)
|
||||
loopsCount, err := is.getMetricIDsForTagFilterSlow(tf, filter, metricIDs.Add)
|
||||
if err != nil {
|
||||
if err == errFallbackToMetricNameMatch {
|
||||
return nil, loopsCount, err
|
||||
}
|
||||
return nil, loopsCount, fmt.Errorf("error when searching for metricIDs for tagFilter in slow path: %w; tagFilter=%s", err, tf)
|
||||
}
|
||||
return metricIDs, loopsCount, nil
|
||||
}
|
||||
|
||||
func (is *indexSearch) getMetricIDsForTagFilterSlow(tf *tagFilter, filter *uint64set.Set, maxLoopsCount uint64, f func(metricID uint64)) (uint64, error) {
|
||||
func (is *indexSearch) getMetricIDsForTagFilterSlow(tf *tagFilter, filter *uint64set.Set, f func(metricID uint64)) (uint64, error) {
|
||||
if len(tf.orSuffixes) > 0 {
|
||||
logger.Panicf("BUG: the getMetricIDsForTagFilterSlow must be called only for empty tf.orSuffixes; got %s", tf.orSuffixes)
|
||||
}
|
||||
|
@ -2436,9 +2428,6 @@ func (is *indexSearch) getMetricIDsForTagFilterSlow(tf *tagFilter, filter *uint6
|
|||
}
|
||||
mp.ParseMetricIDs()
|
||||
loopsCount += uint64(mp.MetricIDsLen())
|
||||
if loopsCount > maxLoopsCount {
|
||||
return loopsCount, errFallbackToMetricNameMatch
|
||||
}
|
||||
if prevMatch && string(suffix) == string(prevMatchingSuffix) {
|
||||
// Fast path: the same tag value found.
|
||||
// There is no need in checking it again with potentially
|
||||
|
@ -2522,26 +2511,28 @@ func (is *indexSearch) updateMetricIDsForOrSuffixesNoFilter(tf *tagFilter, maxMe
|
|||
return loopsCount, nil
|
||||
}
|
||||
|
||||
func (is *indexSearch) updateMetricIDsForOrSuffixesWithFilter(tf *tagFilter, metricIDs, filter *uint64set.Set) error {
|
||||
func (is *indexSearch) updateMetricIDsForOrSuffixesWithFilter(tf *tagFilter, metricIDs, filter *uint64set.Set) (uint64, error) {
|
||||
sortedFilter := filter.AppendTo(nil)
|
||||
kb := kbPool.Get()
|
||||
defer kbPool.Put(kb)
|
||||
var loopsCount uint64
|
||||
for _, orSuffix := range tf.orSuffixes {
|
||||
kb.B = append(kb.B[:0], tf.prefix...)
|
||||
kb.B = append(kb.B, orSuffix...)
|
||||
kb.B = append(kb.B, tagSeparatorChar)
|
||||
if err := is.updateMetricIDsForOrSuffixWithFilter(kb.B, metricIDs, sortedFilter, tf.isNegative); err != nil {
|
||||
return err
|
||||
lc, err := is.updateMetricIDsForOrSuffixWithFilter(kb.B, metricIDs, sortedFilter, tf.isNegative)
|
||||
if err != nil {
|
||||
return loopsCount, err
|
||||
}
|
||||
loopsCount += lc
|
||||
}
|
||||
return nil
|
||||
return loopsCount, nil
|
||||
}
|
||||
|
||||
func (is *indexSearch) updateMetricIDsForOrSuffixNoFilter(prefix []byte, maxMetrics int, metricIDs *uint64set.Set) (uint64, error) {
|
||||
ts := &is.ts
|
||||
mp := &is.mp
|
||||
mp.Reset()
|
||||
maxLoopsCount := uint64(maxMetrics) * maxIndexScanLoopsPerMetric
|
||||
var loopsCount uint64
|
||||
loopsPaceLimiter := 0
|
||||
ts.Seek(prefix)
|
||||
|
@ -2560,9 +2551,6 @@ func (is *indexSearch) updateMetricIDsForOrSuffixNoFilter(prefix []byte, maxMetr
|
|||
return loopsCount, err
|
||||
}
|
||||
loopsCount += uint64(mp.MetricIDsLen())
|
||||
if loopsCount > maxLoopsCount {
|
||||
return loopsCount, errFallbackToMetricNameMatch
|
||||
}
|
||||
mp.ParseMetricIDs()
|
||||
metricIDs.AddMulti(mp.MetricIDs)
|
||||
}
|
||||
|
@ -2572,16 +2560,15 @@ func (is *indexSearch) updateMetricIDsForOrSuffixNoFilter(prefix []byte, maxMetr
|
|||
return loopsCount, nil
|
||||
}
|
||||
|
||||
func (is *indexSearch) updateMetricIDsForOrSuffixWithFilter(prefix []byte, metricIDs *uint64set.Set, sortedFilter []uint64, isNegative bool) error {
|
||||
func (is *indexSearch) updateMetricIDsForOrSuffixWithFilter(prefix []byte, metricIDs *uint64set.Set, sortedFilter []uint64, isNegative bool) (uint64, error) {
|
||||
if len(sortedFilter) == 0 {
|
||||
return nil
|
||||
return 0, nil
|
||||
}
|
||||
firstFilterMetricID := sortedFilter[0]
|
||||
lastFilterMetricID := sortedFilter[len(sortedFilter)-1]
|
||||
ts := &is.ts
|
||||
mp := &is.mp
|
||||
mp.Reset()
|
||||
maxLoopsCount := uint64(len(sortedFilter)) * maxIndexScanLoopsPerMetric
|
||||
var loopsCount uint64
|
||||
loopsPaceLimiter := 0
|
||||
ts.Seek(prefix)
|
||||
|
@ -2590,17 +2577,18 @@ func (is *indexSearch) updateMetricIDsForOrSuffixWithFilter(prefix []byte, metri
|
|||
for ts.NextItem() {
|
||||
if loopsPaceLimiter&paceLimiterMediumIterationsMask == 0 {
|
||||
if err := checkSearchDeadlineAndPace(is.deadline); err != nil {
|
||||
return err
|
||||
return loopsCount, err
|
||||
}
|
||||
}
|
||||
loopsPaceLimiter++
|
||||
item := ts.Item
|
||||
if !bytes.HasPrefix(item, prefix) {
|
||||
return nil
|
||||
return loopsCount, nil
|
||||
}
|
||||
if err := mp.InitOnlyTail(item, item[len(prefix):]); err != nil {
|
||||
return err
|
||||
return loopsCount, err
|
||||
}
|
||||
loopsCount += uint64(mp.MetricIDsLen())
|
||||
firstMetricID, lastMetricID := mp.FirstAndLastMetricIDs()
|
||||
if lastMetricID < firstFilterMetricID {
|
||||
// Skip the item, since it contains metricIDs lower
|
||||
|
@ -2610,14 +2598,11 @@ func (is *indexSearch) updateMetricIDsForOrSuffixWithFilter(prefix []byte, metri
|
|||
if firstMetricID > lastFilterMetricID {
|
||||
// Stop searching, since the current item and all the subsequent items
|
||||
// contain metricIDs higher than metricIDs in sortedFilter.
|
||||
return nil
|
||||
return loopsCount, nil
|
||||
}
|
||||
sf = sortedFilter
|
||||
loopsCount += uint64(mp.MetricIDsLen())
|
||||
if loopsCount > maxLoopsCount {
|
||||
return errFallbackToMetricNameMatch
|
||||
}
|
||||
mp.ParseMetricIDs()
|
||||
matchingMetricIDs := mp.MetricIDs[:0]
|
||||
for _, metricID = range mp.MetricIDs {
|
||||
if len(sf) == 0 {
|
||||
break
|
||||
|
@ -2632,18 +2617,23 @@ func (is *indexSearch) updateMetricIDsForOrSuffixWithFilter(prefix []byte, metri
|
|||
if metricID < sf[0] {
|
||||
continue
|
||||
}
|
||||
if isNegative {
|
||||
metricIDs.Del(metricID)
|
||||
} else {
|
||||
metricIDs.Add(metricID)
|
||||
}
|
||||
matchingMetricIDs = append(matchingMetricIDs, metricID)
|
||||
sf = sf[1:]
|
||||
}
|
||||
if len(matchingMetricIDs) > 0 {
|
||||
if isNegative {
|
||||
for _, metricID := range matchingMetricIDs {
|
||||
metricIDs.Del(metricID)
|
||||
}
|
||||
} else {
|
||||
metricIDs.AddMulti(matchingMetricIDs)
|
||||
}
|
||||
}
|
||||
}
|
||||
if err := ts.Error(); err != nil {
|
||||
return fmt.Errorf("error when searching for tag filter prefix %q: %w", prefix, err)
|
||||
return loopsCount, fmt.Errorf("error when searching for tag filter prefix %q: %w", prefix, err)
|
||||
}
|
||||
return nil
|
||||
return loopsCount, nil
|
||||
}
|
||||
|
||||
func binarySearchUint64(a []uint64, v uint64) uint {
|
||||
|
@ -2660,7 +2650,7 @@ func binarySearchUint64(a []uint64, v uint64) uint {
|
|||
return i
|
||||
}
|
||||
|
||||
var errFallbackToMetricNameMatch = errors.New("fall back to updateMetricIDsByMetricNameMatch because of too many index scan loops")
|
||||
var errFallbackToGlobalSearch = errors.New("fall back from per-day index search to global index search")
|
||||
|
||||
var errMissingMetricIDsForDate = errors.New("missing metricIDs for date")
|
||||
|
||||
|
@ -2725,11 +2715,11 @@ func (is *indexSearch) tryUpdatingMetricIDsForDateRange(metricIDs *uint64set.Set
|
|||
maxDate := uint64(tr.MaxTimestamp) / msecPerDay
|
||||
if maxDate < minDate {
|
||||
// Per-day inverted index doesn't cover the selected date range.
|
||||
return errFallbackToMetricNameMatch
|
||||
return fmt.Errorf("maxDate=%d cannot be smaller than minDate=%d", maxDate, minDate)
|
||||
}
|
||||
if maxDate-minDate > maxDaysForDateMetricIDs {
|
||||
// Too much dates must be covered. Give up, since it may be slow.
|
||||
return errFallbackToMetricNameMatch
|
||||
return errFallbackToGlobalSearch
|
||||
}
|
||||
if minDate == maxDate {
|
||||
// Fast path - query only a single date.
|
||||
|
@ -2759,14 +2749,14 @@ func (is *indexSearch) tryUpdatingMetricIDsForDateRange(metricIDs *uint64set.Set
|
|||
return
|
||||
}
|
||||
if err != nil {
|
||||
if err == errFallbackToMetricNameMatch {
|
||||
if err == errFallbackToGlobalSearch {
|
||||
// The per-date search is too expensive. Probably it is faster to perform global search
|
||||
// using metric name match.
|
||||
errGlobal = err
|
||||
return
|
||||
}
|
||||
dateStr := time.Unix(int64(date*24*3600), 0)
|
||||
errGlobal = fmt.Errorf("cannot search for metricIDs for %s: %w", dateStr, err)
|
||||
errGlobal = fmt.Errorf("cannot search for metricIDs at %s: %w", dateStr, err)
|
||||
return
|
||||
}
|
||||
if metricIDs.Len() < maxMetrics {
|
||||
|
@ -2790,7 +2780,6 @@ func (is *indexSearch) getMetricIDsForDateAndFilters(date uint64, tfs *TagFilter
|
|||
type tagFilterWithWeight struct {
|
||||
tf *tagFilter
|
||||
loopsCount uint64
|
||||
lastQueryTimestamp uint64
|
||||
}
|
||||
tfws := make([]tagFilterWithWeight, len(tfs.tfs))
|
||||
currentTime := fasttime.UnixTimestamp()
|
||||
|
@ -2798,26 +2787,29 @@ func (is *indexSearch) getMetricIDsForDateAndFilters(date uint64, tfs *TagFilter
|
|||
tf := &tfs.tfs[i]
|
||||
loopsCount, lastQueryTimestamp := is.getLoopsCountAndTimestampForDateFilter(date, tf)
|
||||
origLoopsCount := loopsCount
|
||||
if currentTime > lastQueryTimestamp+3*3600 {
|
||||
// Update stats once per 3 hours only for relatively fast tag filters.
|
||||
// There is no need in spending CPU resources on updating stats for slow tag filters.
|
||||
if loopsCount == 0 && tf.looksLikeHeavy() {
|
||||
// Set high loopsCount for heavy tag filters instead of spending CPU time on their execution.
|
||||
loopsCount = 11e6
|
||||
is.storeLoopsCountForDateFilter(date, tf, loopsCount)
|
||||
}
|
||||
if currentTime > lastQueryTimestamp+3600 {
|
||||
// Update stats once per hour for relatively fast tag filters.
|
||||
// There is no need in spending CPU resources on updating stats for heavy tag filters.
|
||||
if loopsCount <= 10e6 {
|
||||
loopsCount = 0
|
||||
}
|
||||
}
|
||||
if loopsCount == 0 {
|
||||
// Prevent from possible thundering herd issue when heavy tf is executed from multiple concurrent queries
|
||||
// Prevent from possible thundering herd issue when potentially heavy tf is executed from multiple concurrent queries
|
||||
// by temporary persisting its position in the tag filters list.
|
||||
if origLoopsCount == 0 {
|
||||
origLoopsCount = 10e6
|
||||
origLoopsCount = 9e6
|
||||
}
|
||||
lastQueryTimestamp = 0
|
||||
is.storeLoopsCountForDateFilter(date, tf, origLoopsCount, lastQueryTimestamp)
|
||||
is.storeLoopsCountForDateFilter(date, tf, origLoopsCount)
|
||||
}
|
||||
tfws[i] = tagFilterWithWeight{
|
||||
tf: tf,
|
||||
loopsCount: loopsCount,
|
||||
lastQueryTimestamp: lastQueryTimestamp,
|
||||
}
|
||||
}
|
||||
sort.Slice(tfws, func(i, j int) bool {
|
||||
|
@ -2829,7 +2821,6 @@ func (is *indexSearch) getMetricIDsForDateAndFilters(date uint64, tfs *TagFilter
|
|||
})
|
||||
|
||||
// Populate metricIDs for the first non-negative filter.
|
||||
var tfsPostponed []*tagFilter
|
||||
var metricIDs *uint64set.Set
|
||||
tfwsRemaining := tfws[:0]
|
||||
maxDateMetrics := maxMetrics * 50
|
||||
|
@ -2841,13 +2832,16 @@ func (is *indexSearch) getMetricIDsForDateAndFilters(date uint64, tfs *TagFilter
|
|||
continue
|
||||
}
|
||||
m, loopsCount, err := is.getMetricIDsForDateTagFilter(tf, date, nil, tfs.commonPrefix, maxDateMetrics)
|
||||
is.storeLoopsCountForDateFilter(date, tf, loopsCount, tfw.lastQueryTimestamp)
|
||||
if loopsCount > tfw.loopsCount {
|
||||
is.storeLoopsCountForDateFilter(date, tf, loopsCount)
|
||||
}
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
if m.Len() >= maxDateMetrics {
|
||||
// Too many time series found by a single tag filter. Postpone applying this filter via metricName match.
|
||||
tfsPostponed = append(tfsPostponed, tf)
|
||||
// Too many time series found by a single tag filter. Postpone applying this filter.
|
||||
tfwsRemaining = append(tfwsRemaining, tfw)
|
||||
tfw.loopsCount = loopsCount
|
||||
continue
|
||||
}
|
||||
metricIDs = m
|
||||
|
@ -2872,7 +2866,7 @@ func (is *indexSearch) getMetricIDsForDateAndFilters(date uint64, tfs *TagFilter
|
|||
}
|
||||
if m.Len() >= maxDateMetrics {
|
||||
// Too many time series found for the given (date). Fall back to global search.
|
||||
return nil, errFallbackToMetricNameMatch
|
||||
return nil, errFallbackToGlobalSearch
|
||||
}
|
||||
metricIDs = m
|
||||
}
|
||||
|
@ -2883,6 +2877,7 @@ func (is *indexSearch) getMetricIDsForDateAndFilters(date uint64, tfs *TagFilter
|
|||
// when the intial tag filters significantly reduce the number of found metricIDs,
|
||||
// so the remaining filters could be performed via much faster metricName matching instead
|
||||
// of slow selecting of matching metricIDs.
|
||||
var tfsPostponed []*tagFilter
|
||||
for i := range tfwsRemaining {
|
||||
tfw := tfwsRemaining[i]
|
||||
tf := tfw.tf
|
||||
|
@ -2891,24 +2886,26 @@ func (is *indexSearch) getMetricIDsForDateAndFilters(date uint64, tfs *TagFilter
|
|||
// Short circuit - there is no need in applying the remaining filters to an empty set.
|
||||
break
|
||||
}
|
||||
if uint64(metricIDsLen)*maxIndexScanLoopsPerMetric < tfw.loopsCount {
|
||||
if tfw.loopsCount > uint64(metricIDsLen)*loopsCountPerMetricNameMatch {
|
||||
// It should be faster performing metricName match on the remaining filters
|
||||
// instead of scanning big number of entries in the inverted index for these filters.
|
||||
for i < len(tfwsRemaining) {
|
||||
tfw := tfwsRemaining[i]
|
||||
tf := tfw.tf
|
||||
tfsPostponed = append(tfsPostponed, tf)
|
||||
// Store stats for non-executed tf, since it could be updated during protection from thundered herd.
|
||||
is.storeLoopsCountForDateFilter(date, tf, tfw.loopsCount, tfw.lastQueryTimestamp)
|
||||
continue
|
||||
is.storeLoopsCountForDateFilter(date, tf, tfw.loopsCount)
|
||||
i++
|
||||
}
|
||||
break
|
||||
}
|
||||
m, loopsCount, err := is.getMetricIDsForDateTagFilter(tf, date, metricIDs, tfs.commonPrefix, 0)
|
||||
if loopsCount > tfw.loopsCount {
|
||||
is.storeLoopsCountForDateFilter(date, tf, loopsCount)
|
||||
}
|
||||
m, loopsCount, err := is.getMetricIDsForDateTagFilter(tf, date, metricIDs, tfs.commonPrefix, maxDateMetrics)
|
||||
is.storeLoopsCountForDateFilter(date, tf, loopsCount, tfw.lastQueryTimestamp)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
if m.Len() >= maxDateMetrics {
|
||||
// Too many time series found by a single tag filter. Postpone applying this filter via metricName match.
|
||||
tfsPostponed = append(tfsPostponed, tf)
|
||||
continue
|
||||
}
|
||||
if tf.isNegative {
|
||||
metricIDs.Subtract(m)
|
||||
} else {
|
||||
|
@ -3092,9 +3089,9 @@ func (is *indexSearch) getMetricIDsForDateTagFilter(tf *tagFilter, date uint64,
|
|||
kbPool.Put(kb)
|
||||
if err != nil {
|
||||
// Set high loopsCount for failing filter, so it is moved to the end of filter list.
|
||||
loopsCount = 1e9
|
||||
loopsCount = 20e9
|
||||
}
|
||||
if metricIDs.Len() >= maxMetrics {
|
||||
if filter == nil && metricIDs.Len() >= maxMetrics {
|
||||
// Increase loopsCount for tag filter matching too many metrics,
|
||||
// So next time it is moved to the end of filter list.
|
||||
loopsCount *= 2
|
||||
|
@ -3115,13 +3112,8 @@ func (is *indexSearch) getLoopsCountAndTimestampForDateFilter(date uint64, tf *t
|
|||
return loopsCount, timestamp
|
||||
}
|
||||
|
||||
func (is *indexSearch) storeLoopsCountForDateFilter(date uint64, tf *tagFilter, loopsCount, prevTimestamp uint64) {
|
||||
func (is *indexSearch) storeLoopsCountForDateFilter(date uint64, tf *tagFilter, loopsCount uint64) {
|
||||
currentTimestamp := fasttime.UnixTimestamp()
|
||||
if currentTimestamp < prevTimestamp+5 {
|
||||
// The cache already contains quite fresh entry for the current (date, tf).
|
||||
// Do not update it too frequently.
|
||||
return
|
||||
}
|
||||
is.kb.B = appendDateTagFilterCacheKey(is.kb.B[:0], date, tf)
|
||||
kb := kbPool.Get()
|
||||
kb.B = encoding.MarshalUint64(kb.B[:0], loopsCount)
|
||||
|
@ -3196,63 +3188,28 @@ func (is *indexSearch) updateMetricIDsForPrefix(prefix []byte, metricIDs *uint64
|
|||
return nil
|
||||
}
|
||||
|
||||
// The maximum number of index scan loops.
|
||||
// Bigger number of loops is slower than updateMetricIDsByMetricNameMatch
|
||||
// over the found metrics.
|
||||
const maxIndexScanLoopsPerMetric = 100
|
||||
|
||||
// The maximum number of slow index scan loops.
|
||||
// Bigger number of loops is slower than updateMetricIDsByMetricNameMatch
|
||||
// over the found metrics.
|
||||
const maxIndexScanSlowLoopsPerMetric = 20
|
||||
// The estimated number of index scan loops a single loop in updateMetricIDsByMetricNameMatch takes.
|
||||
const loopsCountPerMetricNameMatch = 500
|
||||
|
||||
func (is *indexSearch) intersectMetricIDsWithTagFilter(tf *tagFilter, filter *uint64set.Set) (*uint64set.Set, error) {
|
||||
if filter.Len() == 0 {
|
||||
return nil, nil
|
||||
}
|
||||
kb := &is.kb
|
||||
filterLenRounded := (uint64(filter.Len()) / 1024) * 1024
|
||||
kb.B = append(kb.B[:0], uselessTagIntersectKeyPrefix)
|
||||
kb.B = encoding.MarshalUint64(kb.B, filterLenRounded)
|
||||
kb.B = tf.Marshal(kb.B)
|
||||
if len(is.db.uselessTagFiltersCache.Get(nil, kb.B)) > 0 {
|
||||
// Skip useless work, since the intersection will return
|
||||
// errFallbackToMetricNameMatc for the given filter.
|
||||
return nil, errFallbackToMetricNameMatch
|
||||
}
|
||||
metricIDs, err := is.intersectMetricIDsWithTagFilterNocache(tf, filter)
|
||||
if err == nil {
|
||||
return metricIDs, err
|
||||
}
|
||||
if err != errFallbackToMetricNameMatch {
|
||||
return nil, err
|
||||
}
|
||||
kb.B = append(kb.B[:0], uselessTagIntersectKeyPrefix)
|
||||
kb.B = encoding.MarshalUint64(kb.B, filterLenRounded)
|
||||
kb.B = tf.Marshal(kb.B)
|
||||
is.db.uselessTagFiltersCache.Set(kb.B, uselessTagFilterCacheValue)
|
||||
return nil, errFallbackToMetricNameMatch
|
||||
}
|
||||
|
||||
func (is *indexSearch) intersectMetricIDsWithTagFilterNocache(tf *tagFilter, filter *uint64set.Set) (*uint64set.Set, error) {
|
||||
metricIDs := filter
|
||||
if !tf.isNegative {
|
||||
metricIDs = &uint64set.Set{}
|
||||
}
|
||||
if len(tf.orSuffixes) > 0 {
|
||||
// Fast path for orSuffixes - seek for rows for each value from orSuffixes.
|
||||
if err := is.updateMetricIDsForOrSuffixesWithFilter(tf, metricIDs, filter); err != nil {
|
||||
if err == errFallbackToMetricNameMatch {
|
||||
return nil, err
|
||||
}
|
||||
_, err := is.updateMetricIDsForOrSuffixesWithFilter(tf, metricIDs, filter)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("error when intersecting metricIDs for tagFilter in fast path: %w; tagFilter=%s", err, tf)
|
||||
}
|
||||
return metricIDs, nil
|
||||
}
|
||||
|
||||
// Slow path - scan for all the rows with the given prefix.
|
||||
maxLoopsCount := uint64(filter.Len()) * maxIndexScanSlowLoopsPerMetric
|
||||
_, err := is.getMetricIDsForTagFilterSlow(tf, filter, maxLoopsCount, func(metricID uint64) {
|
||||
_, err := is.getMetricIDsForTagFilterSlow(tf, filter, func(metricID uint64) {
|
||||
if tf.isNegative {
|
||||
// filter must be equal to metricIDs
|
||||
metricIDs.Del(metricID)
|
||||
|
@ -3261,9 +3218,6 @@ func (is *indexSearch) intersectMetricIDsWithTagFilterNocache(tf *tagFilter, fil
|
|||
}
|
||||
})
|
||||
if err != nil {
|
||||
if err == errFallbackToMetricNameMatch {
|
||||
return nil, err
|
||||
}
|
||||
return nil, fmt.Errorf("error when intersecting metricIDs for tagFilter in slow path: %w; tagFilter=%s", err, tf)
|
||||
}
|
||||
return metricIDs, nil
|
||||
|
|
|
@ -248,6 +248,10 @@ type tagFilter struct {
|
|||
graphiteReverseSuffix []byte
|
||||
}
|
||||
|
||||
func (tf *tagFilter) looksLikeHeavy() bool {
|
||||
return tf.isRegexp && len(tf.orSuffixes) == 0
|
||||
}
|
||||
|
||||
func (tf *tagFilter) isComposite() bool {
|
||||
k := tf.key
|
||||
return len(k) > 0 && k[0] == compositeTagKeyPrefix
|
||||
|
|
|
@ -141,36 +141,32 @@ func (s *Set) AddMulti(a []uint64) {
|
|||
if len(a) == 0 {
|
||||
return
|
||||
}
|
||||
slowPath := false
|
||||
hi := uint32(a[0] >> 32)
|
||||
for _, x := range a[1:] {
|
||||
if hi != uint32(x>>32) {
|
||||
slowPath = true
|
||||
break
|
||||
hiPrev := uint32(a[0] >> 32)
|
||||
i := 0
|
||||
for j, x := range a {
|
||||
hi := uint32(x >> 32)
|
||||
if hi == hiPrev {
|
||||
continue
|
||||
}
|
||||
b32 := s.getOrCreateBucket32(hiPrev)
|
||||
s.itemsCount += b32.addMulti(a[i:j])
|
||||
hiPrev = hi
|
||||
i = j
|
||||
}
|
||||
if slowPath {
|
||||
for _, x := range a {
|
||||
s.Add(x)
|
||||
b32 := s.getOrCreateBucket32(hiPrev)
|
||||
s.itemsCount += b32.addMulti(a[i:])
|
||||
}
|
||||
return
|
||||
}
|
||||
// Fast path - all the items in a have identical higher 32 bits.
|
||||
// Put them in a bulk into the corresponding bucket32.
|
||||
|
||||
func (s *Set) getOrCreateBucket32(hi uint32) *bucket32 {
|
||||
bs := s.buckets
|
||||
var b32 *bucket32
|
||||
for i := range bs {
|
||||
if bs[i].hi == hi {
|
||||
b32 = &bs[i]
|
||||
break
|
||||
return &bs[i]
|
||||
}
|
||||
}
|
||||
if b32 == nil {
|
||||
b32 = s.addBucket32()
|
||||
b32 := s.addBucket32()
|
||||
b32.hi = hi
|
||||
}
|
||||
n := b32.addMulti(a)
|
||||
s.itemsCount += n
|
||||
return b32
|
||||
}
|
||||
|
||||
func (s *Set) addBucket32() *bucket32 {
|
||||
|
@ -609,41 +605,32 @@ func (b *bucket32) addMulti(a []uint64) int {
|
|||
if len(a) == 0 {
|
||||
return 0
|
||||
}
|
||||
hi := uint16(a[0] >> 16)
|
||||
slowPath := false
|
||||
for _, x := range a[1:] {
|
||||
if hi != uint16(x>>16) {
|
||||
slowPath = true
|
||||
break
|
||||
}
|
||||
}
|
||||
if slowPath {
|
||||
count := 0
|
||||
for _, x := range a {
|
||||
if b.add(uint32(x)) {
|
||||
count++
|
||||
hiPrev := uint16(a[0] >> 16)
|
||||
i := 0
|
||||
for j, x := range a {
|
||||
hi := uint16(x >> 16)
|
||||
if hi == hiPrev {
|
||||
continue
|
||||
}
|
||||
b16 := b.getOrCreateBucket16(hiPrev)
|
||||
count += b16.addMulti(a[i:j])
|
||||
hiPrev = hi
|
||||
i = j
|
||||
}
|
||||
b16 := b.getOrCreateBucket16(hiPrev)
|
||||
count += b16.addMulti(a[i:])
|
||||
return count
|
||||
}
|
||||
// Fast path - all the items in a have identical higher 32+16 bits.
|
||||
// Put them to a single bucket16 in a bulk.
|
||||
var b16 *bucket16
|
||||
|
||||
func (b *bucket32) getOrCreateBucket16(hi uint16) *bucket16 {
|
||||
his := b.b16his
|
||||
bs := b.buckets
|
||||
if n := b.getHint(); n < uint32(len(his)) && his[n] == hi {
|
||||
b16 = &bs[n]
|
||||
}
|
||||
if b16 == nil {
|
||||
n := binarySearch16(his, hi)
|
||||
if n < 0 || n >= len(his) || his[n] != hi {
|
||||
b16 = b.addBucketAtPos(hi, n)
|
||||
} else {
|
||||
b.setHint(n)
|
||||
b16 = &bs[n]
|
||||
return b.addBucketAtPos(hi, n)
|
||||
}
|
||||
}
|
||||
return b16.addMulti(a)
|
||||
return &bs[n]
|
||||
}
|
||||
|
||||
func (b *bucket32) addSlow(hi, lo uint16) bool {
|
||||
|
@ -742,8 +729,8 @@ const (
|
|||
|
||||
type bucket16 struct {
|
||||
bits *[wordsPerBucket]uint64
|
||||
smallPool *[smallPoolSize]uint16
|
||||
smallPoolLen int
|
||||
smallPool [smallPoolSize]uint16
|
||||
}
|
||||
|
||||
const smallPoolSize = 56
|
||||
|
@ -820,7 +807,14 @@ func (b *bucket16) intersect(a *bucket16) {
|
|||
}
|
||||
|
||||
func (b *bucket16) sizeBytes() uint64 {
|
||||
return uint64(unsafe.Sizeof(*b)) + uint64(unsafe.Sizeof(*b.bits))
|
||||
n := unsafe.Sizeof(*b)
|
||||
if b.bits != nil {
|
||||
n += unsafe.Sizeof(*b.bits)
|
||||
}
|
||||
if b.smallPool != nil {
|
||||
n += unsafe.Sizeof(*b.smallPool)
|
||||
}
|
||||
return uint64(n)
|
||||
}
|
||||
|
||||
func (b *bucket16) copyTo(dst *bucket16) {
|
||||
|
@ -831,23 +825,37 @@ func (b *bucket16) copyTo(dst *bucket16) {
|
|||
dst.bits = &bits
|
||||
}
|
||||
dst.smallPoolLen = b.smallPoolLen
|
||||
dst.smallPool = b.smallPool
|
||||
if b.smallPool != nil {
|
||||
sp := dst.getOrCreateSmallPool()
|
||||
*sp = *b.smallPool
|
||||
}
|
||||
}
|
||||
|
||||
func (b *bucket16) getOrCreateSmallPool() *[smallPoolSize]uint16 {
|
||||
if b.smallPool == nil {
|
||||
var sp [smallPoolSize]uint16
|
||||
b.smallPool = &sp
|
||||
}
|
||||
return b.smallPool
|
||||
}
|
||||
|
||||
func (b *bucket16) add(x uint16) bool {
|
||||
if b.bits == nil {
|
||||
bits := b.bits
|
||||
if bits == nil {
|
||||
return b.addToSmallPool(x)
|
||||
}
|
||||
wordNum, bitMask := getWordNumBitMask(x)
|
||||
word := &b.bits[wordNum]
|
||||
ok := *word&bitMask == 0
|
||||
*word |= bitMask
|
||||
ok := bits[wordNum]&bitMask == 0
|
||||
if ok {
|
||||
bits[wordNum] |= bitMask
|
||||
}
|
||||
return ok
|
||||
}
|
||||
|
||||
func (b *bucket16) addMulti(a []uint64) int {
|
||||
count := 0
|
||||
if b.bits == nil {
|
||||
bits := b.bits
|
||||
if bits == nil {
|
||||
// Slow path
|
||||
for _, x := range a {
|
||||
if b.add(uint16(x)) {
|
||||
|
@ -858,11 +866,10 @@ func (b *bucket16) addMulti(a []uint64) int {
|
|||
// Fast path
|
||||
for _, x := range a {
|
||||
wordNum, bitMask := getWordNumBitMask(uint16(x))
|
||||
word := &b.bits[wordNum]
|
||||
if *word&bitMask == 0 {
|
||||
if bits[wordNum]&bitMask == 0 {
|
||||
bits[wordNum] |= bitMask
|
||||
count++
|
||||
}
|
||||
*word |= bitMask
|
||||
}
|
||||
}
|
||||
return count
|
||||
|
@ -872,15 +879,16 @@ func (b *bucket16) addToSmallPool(x uint16) bool {
|
|||
if b.hasInSmallPool(x) {
|
||||
return false
|
||||
}
|
||||
if b.smallPoolLen < len(b.smallPool) {
|
||||
b.smallPool[b.smallPoolLen] = x
|
||||
sp := b.getOrCreateSmallPool()
|
||||
if b.smallPoolLen < len(sp) {
|
||||
sp[b.smallPoolLen] = x
|
||||
b.smallPoolLen++
|
||||
return true
|
||||
}
|
||||
b.smallPoolLen = 0
|
||||
var bits [wordsPerBucket]uint64
|
||||
b.bits = &bits
|
||||
for _, v := range b.smallPool[:] {
|
||||
for _, v := range sp[:] {
|
||||
b.add(v)
|
||||
}
|
||||
b.add(x)
|
||||
|
@ -896,7 +904,11 @@ func (b *bucket16) has(x uint16) bool {
|
|||
}
|
||||
|
||||
func (b *bucket16) hasInSmallPool(x uint16) bool {
|
||||
for _, v := range b.smallPool[:b.smallPoolLen] {
|
||||
sp := b.smallPool
|
||||
if sp == nil {
|
||||
return false
|
||||
}
|
||||
for _, v := range sp[:b.smallPoolLen] {
|
||||
if v == x {
|
||||
return true
|
||||
}
|
||||
|
@ -916,9 +928,13 @@ func (b *bucket16) del(x uint16) bool {
|
|||
}
|
||||
|
||||
func (b *bucket16) delFromSmallPool(x uint16) bool {
|
||||
for i, v := range b.smallPool[:b.smallPoolLen] {
|
||||
sp := b.smallPool
|
||||
if sp == nil {
|
||||
return false
|
||||
}
|
||||
for i, v := range sp[:b.smallPoolLen] {
|
||||
if v == x {
|
||||
copy(b.smallPool[i:], b.smallPool[i+1:])
|
||||
copy(sp[i:], sp[i+1:])
|
||||
b.smallPoolLen--
|
||||
return true
|
||||
}
|
||||
|
@ -929,11 +945,15 @@ func (b *bucket16) delFromSmallPool(x uint16) bool {
|
|||
func (b *bucket16) appendTo(dst []uint64, hi uint32, hi16 uint16) []uint64 {
|
||||
hi64 := uint64(hi)<<32 | uint64(hi16)<<16
|
||||
if b.bits == nil {
|
||||
sp := b.smallPool
|
||||
if sp == nil {
|
||||
return dst
|
||||
}
|
||||
// Use smallPoolSorter instead of sort.Slice here in order to reduce memory allocations.
|
||||
sps := smallPoolSorterPool.Get().(*smallPoolSorter)
|
||||
// Sort a copy of b.smallPool, since b must be readonly in order to prevent from data races
|
||||
// Sort a copy of sp, since b must be readonly in order to prevent from data races
|
||||
// when b.appendTo is called from concurrent goroutines.
|
||||
sps.smallPool = b.smallPool
|
||||
sps.smallPool = *sp
|
||||
sps.a = sps.smallPool[:b.smallPoolLen]
|
||||
if len(sps.a) > 1 && !sort.IsSorted(sps) {
|
||||
sort.Sort(sps)
|
||||
|
@ -996,6 +1016,10 @@ func getWordNumBitMask(x uint16) (uint16, uint64) {
|
|||
func binarySearch16(u16 []uint16, x uint16) int {
|
||||
// The code has been adapted from sort.Search.
|
||||
n := len(u16)
|
||||
if n > 0 && u16[n-1] < x {
|
||||
// Fast path for values scanned in ascending order.
|
||||
return n
|
||||
}
|
||||
i, j := 0, n
|
||||
for i < j {
|
||||
h := int(uint(i+j) >> 1)
|
||||
|
|
2
vendor/github.com/VictoriaMetrics/fasthttp/go.mod
generated
vendored
2
vendor/github.com/VictoriaMetrics/fasthttp/go.mod
generated
vendored
|
@ -3,7 +3,7 @@ module github.com/VictoriaMetrics/fasthttp
|
|||
go 1.13
|
||||
|
||||
require (
|
||||
github.com/klauspost/compress v1.11.3
|
||||
github.com/klauspost/compress v1.11.12
|
||||
github.com/valyala/bytebufferpool v1.0.0
|
||||
github.com/valyala/tcplisten v0.0.0-20161114210144-ceec8f93295a
|
||||
)
|
||||
|
|
4
vendor/github.com/VictoriaMetrics/fasthttp/go.sum
generated
vendored
4
vendor/github.com/VictoriaMetrics/fasthttp/go.sum
generated
vendored
|
@ -1,5 +1,5 @@
|
|||
github.com/klauspost/compress v1.11.3 h1:dB4Bn0tN3wdCzQxnS8r06kV74qN/TAfaIS0bVE8h3jc=
|
||||
github.com/klauspost/compress v1.11.3/go.mod h1:aoV0uJVorq1K+umq18yTdKaF57EivdYsUV+/s2qKfXs=
|
||||
github.com/klauspost/compress v1.11.12 h1:famVnQVu7QwryBN4jNseQdUKES71ZAOnB6UQQJPZvqk=
|
||||
github.com/klauspost/compress v1.11.12/go.mod h1:aoV0uJVorq1K+umq18yTdKaF57EivdYsUV+/s2qKfXs=
|
||||
github.com/valyala/bytebufferpool v1.0.0 h1:GqA5TC/0021Y/b9FG4Oi9Mr3q7XYx6KllzawFIhcdPw=
|
||||
github.com/valyala/bytebufferpool v1.0.0/go.mod h1:6bBcMArwyJ5K/AmCkWv1jt77kVWyCJ6HpOuEn7z0Csc=
|
||||
github.com/valyala/tcplisten v0.0.0-20161114210144-ceec8f93295a h1:0R4NLDRDZX6JcmhJgXi5E4b8Wg84ihbmUKp/GvSPEzc=
|
||||
|
|
2
vendor/modules.txt
vendored
2
vendor/modules.txt
vendored
|
@ -13,7 +13,7 @@ cloud.google.com/go/storage
|
|||
# github.com/VictoriaMetrics/fastcache v1.5.8
|
||||
## explicit
|
||||
github.com/VictoriaMetrics/fastcache
|
||||
# github.com/VictoriaMetrics/fasthttp v1.0.13
|
||||
# github.com/VictoriaMetrics/fasthttp v1.0.14
|
||||
## explicit
|
||||
github.com/VictoriaMetrics/fasthttp
|
||||
github.com/VictoriaMetrics/fasthttp/fasthttputil
|
||||
|
|
Loading…
Reference in a new issue