Merge branch 'public-single-node' into pmm-6401-read-prometheus-data-files
20
.github/dependabot.yml
vendored
|
@ -4,3 +4,23 @@ updates:
|
|||
directory: "/"
|
||||
schedule:
|
||||
interval: "daily"
|
||||
- package-ecosystem: "gomod"
|
||||
directory: "/"
|
||||
schedule:
|
||||
interval: "daily"
|
||||
- package-ecosystem: "bundler"
|
||||
directory: "/docs"
|
||||
schedule:
|
||||
interval: "daily"
|
||||
- package-ecosystem: "gomod"
|
||||
directory: "/app/vmui/packages/vmui/web"
|
||||
schedule:
|
||||
interval: "daily"
|
||||
- package-ecosystem: "docker"
|
||||
directory: "/"
|
||||
schedule:
|
||||
interval: "daily"
|
||||
- package-ecosystem: "npm"
|
||||
directory: "/app/vmui"
|
||||
schedule:
|
||||
interval: "daily"
|
||||
|
|
2
.github/workflows/main.yml
vendored
|
@ -60,7 +60,7 @@ jobs:
|
|||
GOOS=darwin go build -mod=vendor ./app/vmctl
|
||||
CGO_ENABLED=0 GOOS=windows go build -mod=vendor ./app/vmagent
|
||||
- name: Publish coverage
|
||||
uses: codecov/codecov-action@v1.5.2
|
||||
uses: codecov/codecov-action@v2.0.2
|
||||
with:
|
||||
file: ./coverage.txt
|
||||
|
||||
|
|
3
.gitignore
vendored
|
@ -16,3 +16,6 @@
|
|||
/package/*.deb
|
||||
/package/*.rpm
|
||||
.DS_store
|
||||
Gemfile.lock
|
||||
/_site
|
||||
_site
|
17
README.md
|
@ -1264,8 +1264,11 @@ or similar auth proxy.
|
|||
* There is no need for VictoriaMetrics tuning since it uses reasonable defaults for command-line flags,
|
||||
which are automatically adjusted for the available CPU and RAM resources.
|
||||
* There is no need for Operating System tuning since VictoriaMetrics is optimized for default OS settings.
|
||||
The only option is increasing the limit on [the number of open files in the OS](https://medium.com/@muhammadtriwibowo/set-permanently-ulimit-n-open-files-in-ubuntu-4d61064429a),
|
||||
so Prometheus instances could establish more connections to VictoriaMetrics.
|
||||
The only option is increasing the limit on [the number of open files in the OS](https://medium.com/@muhammadtriwibowo/set-permanently-ulimit-n-open-files-in-ubuntu-4d61064429a).
|
||||
The recommendation is not specific for VictoriaMetrics only but also for any service which handles many HTTP connections and stores data on disk.
|
||||
* VictoriaMetrics is a write-heavy application and its performance depends on disk performance. So be careful with other
|
||||
applications or utilities (like [fstrim](http://manpages.ubuntu.com/manpages/bionic/man8/fstrim.8.html))
|
||||
which could [exhaust disk resources](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1521).
|
||||
* The recommended filesystem is `ext4`, the recommended persistent storage is [persistent HDD-based disk on GCP](https://cloud.google.com/compute/docs/disks/#pdspecs),
|
||||
since it is protected from hardware failures via internal replication and it can be [resized on the fly](https://cloud.google.com/compute/docs/disks/add-persistent-disk#resize_pd).
|
||||
If you plan to store more than 1TB of data on `ext4` partition or plan extending it to more than 16TB,
|
||||
|
@ -1593,7 +1596,7 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
|
|||
-enableTCP6
|
||||
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
|
||||
-envflag.enable
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set. See https://docs.victoriametrics.com/#environment-variables for more details
|
||||
-envflag.prefix string
|
||||
Prefix for environment variables if -envflag.enable is set
|
||||
-finalMergeDelay duration
|
||||
|
@ -1785,6 +1788,10 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
|
|||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 16384)
|
||||
-search.maxQueueDuration duration
|
||||
The maximum time the request waits for execution when -search.maxConcurrentRequests limit is reached; see also -search.maxQueryDuration (default 10s)
|
||||
-search.maxSamplesPerQuery int
|
||||
The maximum number of raw samples a single query can process across all time series. This protects from heavy queries, which select unexpectedly high number of raw samples. See also -search.maxSamplesPerSeries (default 1000000000)
|
||||
-search.maxSamplesPerSeries int
|
||||
The maximum number of raw samples a single query can scan per each time series. This option allows limiting memory usage (default 30000000)
|
||||
-search.maxStalenessInterval duration
|
||||
The maximum interval for staleness calculations. By default it is automatically calculated from the median interval between samples. This flag could be useful for tuning Prometheus data model closer to Influx-style data model. See https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness for details. See also '-search.maxLookback' flag, which has the same meaning due to historical reasons
|
||||
-search.maxStatusRequestDuration duration
|
||||
|
@ -1798,13 +1805,13 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
|
|||
-search.maxTagValues int
|
||||
The maximum number of tag values returned from /api/v1/label/<label_name>/values (default 100000)
|
||||
-search.maxUniqueTimeseries int
|
||||
The maximum number of unique time series each search can scan (default 300000)
|
||||
The maximum number of unique time series each search can scan. This option allows limiting memory usage (default 300000)
|
||||
-search.minStalenessInterval duration
|
||||
The minimum interval for staleness calculations. This flag could be useful for removing gaps on graphs generated from time series with irregular intervals between samples. See also '-search.maxStalenessInterval'
|
||||
-search.queryStats.lastQueriesCount int
|
||||
Query stats for /api/v1/status/top_queries is tracked on this number of last queries. Zero value disables query stats tracking (default 20000)
|
||||
-search.queryStats.minQueryDuration duration
|
||||
The minimum duration for queries to track in query stats at /api/v1/status/top_queries. Queries with lower duration are ignored in query stats
|
||||
The minimum duration for queries to track in query stats at /api/v1/status/top_queries. Queries with lower duration are ignored in query stats (default 1ms)
|
||||
-search.resetCacheAuthKey string
|
||||
Optional authKey for resetting rollup cache via /internal/resetRollupResultCache call
|
||||
-search.treatDotsAsIsInRegexps
|
||||
|
|
|
@ -11,6 +11,6 @@
|
|||
"status":"success",
|
||||
"data":{"resultType":"matrix",
|
||||
"result":[
|
||||
{"metric":{"item":"y"},"values":[["{TIME_S-1m}","0.5"]]}
|
||||
{"metric":{"item":"y"},"values":[["{TIME_S-1m}","0.5"], ["{TIME_S}","0.5"]]}
|
||||
]}}
|
||||
}
|
||||
|
|
|
@ -13,6 +13,6 @@
|
|||
"data":{"resultType":"matrix",
|
||||
"result":[
|
||||
{"metric":{"__name__":"not_nan_as_missing_data","item":"x"},"values":[["{TIME_S-2m}","2"]]},
|
||||
{"metric":{"__name__":"not_nan_as_missing_data","item":"y"},"values":[["{TIME_S-2m}","4"],["{TIME_S-1m}","3"]]}
|
||||
{"metric":{"__name__":"not_nan_as_missing_data","item":"y"},"values":[["{TIME_S-2m}","4"],["{TIME_S-1m}","3"],["{TIME_S}", "3"]]}
|
||||
]}}
|
||||
}
|
||||
|
|
|
@ -135,7 +135,11 @@ Also, Basic Auth can be enabled for the incoming `remote_write` requests with `-
|
|||
|
||||
### remote_write for clustered version
|
||||
|
||||
While `vmagent` can accept data in several supported protocols (OpenTSDB, Influx, Prometheus, Graphite) and scrape data from various targets, writes are always peformed in Promethes remote_write protocol. Therefore for the [clustered version](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html), `-remoteWrite.url` the command-line flag should be configured as `<schema>://<vminsert-host>:8480/insert/<customer-id>/prometheus/api/v1/write`
|
||||
While `vmagent` can accept data in several supported protocols (OpenTSDB, Influx, Prometheus, Graphite) and scrape data from various targets, writes are always peformed in Promethes remote_write protocol. Therefore for the [clustered version](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html), `-remoteWrite.url` the command-line flag should be configured as `<schema>://<vminsert-host>:8480/insert/<accountID>/prometheus/api/v1/write` according to [these docs](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format). There is also support for multitenant writes. See [these docs](#multitenancy).
|
||||
|
||||
## Multitenancy
|
||||
|
||||
By default `vmagent` collects the data without tenant identifiers and routes it to the configured `-remoteWrite.url`. But it can accept multitenant data if `-remoteWrite.multitenantURL` is set. In this case it accepts multitenant data at `http://vmagent:8429/insert/<accountID>/...` in the same way as cluster version of VictoriaMetrics does according to [these docs](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format) and routes it to `<-remoteWrite.multitenantURL>/insert/<accountID>/prometheus/api/v1/write`. If multiple `-remoteWrite.multitenantURL` command-line options are set, then `vmagent` replicates the collected data across all the configured urls. This allows using a single `vmagent` instance in front of VictoriaMetrics clusters for processing the data from all the tenants.
|
||||
|
||||
|
||||
## How to collect metrics in Prometheus format
|
||||
|
@ -528,7 +532,7 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
|
|||
-enableTCP6
|
||||
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
|
||||
-envflag.enable
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set. See https://docs.victoriametrics.com/#environment-variables for more details
|
||||
-envflag.prefix string
|
||||
Prefix for environment variables if -envflag.enable is set
|
||||
-fs.disableMmap
|
||||
|
@ -634,7 +638,7 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
|
|||
-promscrape.consulSDCheckInterval duration
|
||||
Interval for checking for changes in Consul. This works only if consul_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config for details (default 30s)
|
||||
-promscrape.digitaloceanSDCheckInterval duration
|
||||
Interval for checking for changes in digital ocean. This works only if digitalocean_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#digitalocean_sd_config for details (default 1m0s)
|
||||
Interval for checking for changes in digital ocean. This works only if digitalocean_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#digitalocean_sd_config for details (default 1m0s)
|
||||
-promscrape.disableCompression
|
||||
Whether to disable sending 'Accept-Encoding: gzip' request headers to all the scrape targets. This may reduce CPU usage on scrape targets at the cost of higher network bandwidth utilization. It is possible to set 'disable_compression: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
|
||||
-promscrape.disableKeepAlive
|
||||
|
@ -645,6 +649,8 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
|
|||
The maximum duration for waiting to perform API requests if more than -promscrape.discovery.concurrency requests are simultaneously performed (default 1m0s)
|
||||
-promscrape.dnsSDCheckInterval duration
|
||||
Interval for checking for changes in dns. This works only if dns_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config for details (default 30s)
|
||||
-promscrape.dockerSDCheckInterval duration
|
||||
Interval for checking for changes in docker. This works only if docker_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#docker_sd_config for details (default 30s)
|
||||
-promscrape.dockerswarmSDCheckInterval duration
|
||||
Interval for checking for changes in dockerswarm. This works only if dockerswarm_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dockerswarm_sd_config for details (default 30s)
|
||||
-promscrape.dropOriginalLabels
|
||||
|
@ -658,7 +664,7 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
|
|||
-promscrape.gceSDCheckInterval duration
|
||||
Interval for checking for changes in gce. This works only if gce_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#gce_sd_config for details (default 1m0s)
|
||||
-promscrape.httpSDCheckInterval duration
|
||||
Interval for checking for changes in http service discovery. This works only if http_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#http_sd_config for details (default 1m0s)
|
||||
Interval for checking for changes in http endpoint service discovery. This works only if http_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#http_sd_config for details (default 1m0s)
|
||||
-promscrape.kubernetes.apiServerTimeout duration
|
||||
How frequently to reload the full state from Kuberntes API server (default 30m0s)
|
||||
-promscrape.kubernetesSDCheckInterval duration
|
||||
|
@ -706,6 +712,9 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
|
|||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-remoteWrite.maxHourlySeries int
|
||||
The maximum number of unique series vmagent can send to remote storage systems during the last hour. Excess series are logged and dropped. This can be useful for limiting series cardinality. See also -remoteWrite.maxDailySeries
|
||||
-remoteWrite.multitenantURL array
|
||||
Base path for multitenant remote storage URL to write data to. See https://docs.victoriametrics.com/vmagent.html#multitenancy for details. Example url: http://<vminsert>:8480 . Pass multiple -remoteWrite.multitenantURL flags in order to replicate data to multiple remote storage systems. See also -remoteWrite.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-remoteWrite.oauth2.clientID array
|
||||
Optional OAuth2 clientID to use for -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
|
@ -725,7 +734,7 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
|
|||
Optional proxy URL for writing data to -remoteWrite.url. Supported proxies: http, https, socks5. Example: -remoteWrite.proxyURL=socks5://proxy:1234
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-remoteWrite.queues int
|
||||
The number of concurrent queues to each -remoteWrite.url. Set more queues if default number of queues isn't enough for sending high volume of collected data to remote storage (default 2 * numberOfAvailableCPUs)
|
||||
The number of concurrent queues to each -remoteWrite.url. Set more queues if default number of queues isn't enough for sending high volume of collected data to remote storage. Default value is 2 * numberOfAvailableCPUs (default 8)
|
||||
-remoteWrite.rateLimit array
|
||||
Optional rate limit in bytes per second for data sent to -remoteWrite.url. By default the rate limit is disabled. It can be useful for limiting load on remote storage when big amounts of buffered data is sent after temporary unavailability of the remote storage
|
||||
Supports array of values separated by comma or specified via multiple flags.
|
||||
|
@ -762,7 +771,7 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
|
|||
-remoteWrite.tmpDataPath string
|
||||
Path to directory where temporary data for remote write component is stored. See also -remoteWrite.maxDiskUsagePerURL (default "vmagent-remotewrite-data")
|
||||
-remoteWrite.url array
|
||||
Remote storage URL to write data to. It must support Prometheus remote_write API. It is recommended using VictoriaMetrics as remote storage. Example url: http://<victoriametrics-host>:8428/api/v1/write . Pass multiple -remoteWrite.url flags in order to write data concurrently to multiple remote storage systems
|
||||
Remote storage URL to write data to. It must support Prometheus remote_write API. It is recommended using VictoriaMetrics as remote storage. Example url: http://<victoriametrics-host>:8428/api/v1/write . Pass multiple -remoteWrite.url flags in order to replicate data to multiple remote storage systems. See also -remoteWrite.multitenantURL
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-remoteWrite.urlRelabelConfig array
|
||||
Optional path to relabel config for the corresponding -remoteWrite.url
|
||||
|
|
|
@ -5,32 +5,35 @@ import (
|
|||
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/common"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/auth"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
|
||||
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
|
||||
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/csvimport"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
|
||||
"github.com/VictoriaMetrics/metrics"
|
||||
)
|
||||
|
||||
var (
|
||||
rowsInserted = metrics.NewCounter(`vmagent_rows_inserted_total{type="csvimport"}`)
|
||||
rowsPerInsert = metrics.NewHistogram(`vmagent_rows_per_insert{type="csvimport"}`)
|
||||
rowsInserted = metrics.NewCounter(`vmagent_rows_inserted_total{type="csvimport"}`)
|
||||
rowsTenantInserted = tenantmetrics.NewCounterMap(`vmagent_tenant_inserted_rows_total{type="csvimport"}`)
|
||||
rowsPerInsert = metrics.NewHistogram(`vmagent_rows_per_insert{type="csvimport"}`)
|
||||
)
|
||||
|
||||
// InsertHandler processes csv data from req.
|
||||
func InsertHandler(req *http.Request) error {
|
||||
func InsertHandler(at *auth.Token, req *http.Request) error {
|
||||
extraLabels, err := parserCommon.GetExtraLabels(req)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
return writeconcurrencylimiter.Do(func() error {
|
||||
return parser.ParseStream(req, func(rows []parser.Row) error {
|
||||
return insertRows(rows, extraLabels)
|
||||
return insertRows(at, rows, extraLabels)
|
||||
})
|
||||
})
|
||||
}
|
||||
|
||||
func insertRows(rows []parser.Row, extraLabels []prompbmarshal.Label) error {
|
||||
func insertRows(at *auth.Token, rows []parser.Row, extraLabels []prompbmarshal.Label) error {
|
||||
ctx := common.GetPushCtx()
|
||||
defer common.PutPushCtx(ctx)
|
||||
|
||||
|
@ -64,8 +67,11 @@ func insertRows(rows []parser.Row, extraLabels []prompbmarshal.Label) error {
|
|||
ctx.WriteRequest.Timeseries = tssDst
|
||||
ctx.Labels = labels
|
||||
ctx.Samples = samples
|
||||
remotewrite.Push(&ctx.WriteRequest)
|
||||
remotewrite.PushWithAuthToken(at, &ctx.WriteRequest)
|
||||
rowsInserted.Add(len(rows))
|
||||
if at != nil {
|
||||
rowsTenantInserted.Get(at).Add(len(rows))
|
||||
}
|
||||
rowsPerInsert.Update(float64(len(rows)))
|
||||
return nil
|
||||
}
|
||||
|
|
|
@ -8,12 +8,14 @@ import (
|
|||
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/common"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/auth"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
|
||||
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
|
||||
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/influx"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
|
||||
"github.com/VictoriaMetrics/metrics"
|
||||
)
|
||||
|
@ -25,8 +27,9 @@ var (
|
|||
)
|
||||
|
||||
var (
|
||||
rowsInserted = metrics.NewCounter(`vmagent_rows_inserted_total{type="influx"}`)
|
||||
rowsPerInsert = metrics.NewHistogram(`vmagent_rows_per_insert{type="influx"}`)
|
||||
rowsInserted = metrics.NewCounter(`vmagent_rows_inserted_total{type="influx"}`)
|
||||
rowsTenantInserted = tenantmetrics.NewCounterMap(`vmagent_tenant_inserted_rows_total{type="influx"}`)
|
||||
rowsPerInsert = metrics.NewHistogram(`vmagent_rows_per_insert{type="influx"}`)
|
||||
)
|
||||
|
||||
// InsertHandlerForReader processes remote write for influx line protocol.
|
||||
|
@ -35,7 +38,7 @@ var (
|
|||
func InsertHandlerForReader(r io.Reader) error {
|
||||
return writeconcurrencylimiter.Do(func() error {
|
||||
return parser.ParseStream(r, false, "", "", func(db string, rows []parser.Row) error {
|
||||
return insertRows(db, rows, nil)
|
||||
return insertRows(nil, db, rows, nil)
|
||||
})
|
||||
})
|
||||
}
|
||||
|
@ -43,7 +46,7 @@ func InsertHandlerForReader(r io.Reader) error {
|
|||
// InsertHandlerForHTTP processes remote write for influx line protocol.
|
||||
//
|
||||
// See https://github.com/influxdata/influxdb/blob/4cbdc197b8117fee648d62e2e5be75c6575352f0/tsdb/README.md
|
||||
func InsertHandlerForHTTP(req *http.Request) error {
|
||||
func InsertHandlerForHTTP(at *auth.Token, req *http.Request) error {
|
||||
extraLabels, err := parserCommon.GetExtraLabels(req)
|
||||
if err != nil {
|
||||
return err
|
||||
|
@ -55,12 +58,12 @@ func InsertHandlerForHTTP(req *http.Request) error {
|
|||
// Read db tag from https://docs.influxdata.com/influxdb/v1.7/tools/api/#write-http-endpoint
|
||||
db := q.Get("db")
|
||||
return parser.ParseStream(req.Body, isGzipped, precision, db, func(db string, rows []parser.Row) error {
|
||||
return insertRows(db, rows, extraLabels)
|
||||
return insertRows(at, db, rows, extraLabels)
|
||||
})
|
||||
})
|
||||
}
|
||||
|
||||
func insertRows(db string, rows []parser.Row, extraLabels []prompbmarshal.Label) error {
|
||||
func insertRows(at *auth.Token, db string, rows []parser.Row, extraLabels []prompbmarshal.Label) error {
|
||||
ctx := getPushCtx()
|
||||
defer putPushCtx(ctx)
|
||||
|
||||
|
@ -130,8 +133,11 @@ func insertRows(db string, rows []parser.Row, extraLabels []prompbmarshal.Label)
|
|||
ctx.ctx.Labels = labels
|
||||
ctx.ctx.Samples = samples
|
||||
ctx.commonLabels = commonLabels
|
||||
remotewrite.Push(&ctx.ctx.WriteRequest)
|
||||
remotewrite.PushWithAuthToken(at, &ctx.ctx.WriteRequest)
|
||||
rowsInserted.Add(rowsTotal)
|
||||
if at != nil {
|
||||
rowsTenantInserted.Get(at).Add(rowsTotal)
|
||||
}
|
||||
rowsPerInsert.Update(float64(rowsTotal))
|
||||
|
||||
return nil
|
||||
|
|
|
@ -19,6 +19,7 @@ import (
|
|||
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/promremotewrite"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/vmimport"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/auth"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/buildinfo"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envflag"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
|
||||
|
@ -159,11 +160,12 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
|
|||
})
|
||||
return true
|
||||
}
|
||||
|
||||
path := strings.Replace(r.URL.Path, "//", "/", -1)
|
||||
switch path {
|
||||
case "/api/v1/write":
|
||||
prometheusWriteRequests.Inc()
|
||||
if err := promremotewrite.InsertHandler(r); err != nil {
|
||||
if err := promremotewrite.InsertHandler(nil, r); err != nil {
|
||||
prometheusWriteErrors.Inc()
|
||||
httpserver.Errorf(w, r, "%s", err)
|
||||
return true
|
||||
|
@ -172,7 +174,7 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
|
|||
return true
|
||||
case "/api/v1/import":
|
||||
vmimportRequests.Inc()
|
||||
if err := vmimport.InsertHandler(r); err != nil {
|
||||
if err := vmimport.InsertHandler(nil, r); err != nil {
|
||||
vmimportErrors.Inc()
|
||||
httpserver.Errorf(w, r, "%s", err)
|
||||
return true
|
||||
|
@ -181,7 +183,7 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
|
|||
return true
|
||||
case "/api/v1/import/csv":
|
||||
csvimportRequests.Inc()
|
||||
if err := csvimport.InsertHandler(r); err != nil {
|
||||
if err := csvimport.InsertHandler(nil, r); err != nil {
|
||||
csvimportErrors.Inc()
|
||||
httpserver.Errorf(w, r, "%s", err)
|
||||
return true
|
||||
|
@ -190,7 +192,7 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
|
|||
return true
|
||||
case "/api/v1/import/prometheus":
|
||||
prometheusimportRequests.Inc()
|
||||
if err := prometheusimport.InsertHandler(r); err != nil {
|
||||
if err := prometheusimport.InsertHandler(nil, r); err != nil {
|
||||
prometheusimportErrors.Inc()
|
||||
httpserver.Errorf(w, r, "%s", err)
|
||||
return true
|
||||
|
@ -199,7 +201,7 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
|
|||
return true
|
||||
case "/api/v1/import/native":
|
||||
nativeimportRequests.Inc()
|
||||
if err := native.InsertHandler(r); err != nil {
|
||||
if err := native.InsertHandler(nil, r); err != nil {
|
||||
nativeimportErrors.Inc()
|
||||
httpserver.Errorf(w, r, "%s", err)
|
||||
return true
|
||||
|
@ -208,7 +210,7 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
|
|||
return true
|
||||
case "/write", "/api/v2/write":
|
||||
influxWriteRequests.Inc()
|
||||
if err := influx.InsertHandlerForHTTP(r); err != nil {
|
||||
if err := influx.InsertHandlerForHTTP(nil, r); err != nil {
|
||||
influxWriteErrors.Inc()
|
||||
httpserver.Errorf(w, r, "%s", err)
|
||||
return true
|
||||
|
@ -245,9 +247,92 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
|
|||
}
|
||||
return true
|
||||
}
|
||||
if remotewrite.MultitenancyEnabled() {
|
||||
return processMultitenantRequest(w, r, path)
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
func processMultitenantRequest(w http.ResponseWriter, r *http.Request, path string) bool {
|
||||
p, err := httpserver.ParsePath(path)
|
||||
if err != nil {
|
||||
// Cannot parse multitenant path. Skip it - probably it will be parsed later.
|
||||
return false
|
||||
}
|
||||
if p.Prefix != "insert" {
|
||||
httpserver.Errorf(w, r, `unsupported multitenant prefix: %q; expected "insert"`, p.Prefix)
|
||||
return true
|
||||
}
|
||||
at, err := auth.NewToken(p.AuthToken)
|
||||
if err != nil {
|
||||
httpserver.Errorf(w, r, "cannot obtain auth token: %s", err)
|
||||
return true
|
||||
}
|
||||
switch p.Suffix {
|
||||
case "prometheus/", "prometheus", "prometheus/api/v1/write":
|
||||
prometheusWriteRequests.Inc()
|
||||
if err := promremotewrite.InsertHandler(at, r); err != nil {
|
||||
prometheusWriteErrors.Inc()
|
||||
httpserver.Errorf(w, r, "%s", err)
|
||||
return true
|
||||
}
|
||||
w.WriteHeader(http.StatusNoContent)
|
||||
return true
|
||||
case "prometheus/api/v1/import":
|
||||
vmimportRequests.Inc()
|
||||
if err := vmimport.InsertHandler(at, r); err != nil {
|
||||
vmimportErrors.Inc()
|
||||
httpserver.Errorf(w, r, "%s", err)
|
||||
return true
|
||||
}
|
||||
w.WriteHeader(http.StatusNoContent)
|
||||
return true
|
||||
case "prometheus/api/v1/import/csv":
|
||||
csvimportRequests.Inc()
|
||||
if err := csvimport.InsertHandler(at, r); err != nil {
|
||||
csvimportErrors.Inc()
|
||||
httpserver.Errorf(w, r, "%s", err)
|
||||
return true
|
||||
}
|
||||
w.WriteHeader(http.StatusNoContent)
|
||||
return true
|
||||
case "prometheus/api/v1/import/prometheus":
|
||||
prometheusimportRequests.Inc()
|
||||
if err := prometheusimport.InsertHandler(at, r); err != nil {
|
||||
prometheusimportErrors.Inc()
|
||||
httpserver.Errorf(w, r, "%s", err)
|
||||
return true
|
||||
}
|
||||
w.WriteHeader(http.StatusNoContent)
|
||||
return true
|
||||
case "prometheus/api/v1/import/native":
|
||||
nativeimportRequests.Inc()
|
||||
if err := native.InsertHandler(at, r); err != nil {
|
||||
nativeimportErrors.Inc()
|
||||
httpserver.Errorf(w, r, "%s", err)
|
||||
return true
|
||||
}
|
||||
w.WriteHeader(http.StatusNoContent)
|
||||
return true
|
||||
case "influx/write", "influx/api/v2/write":
|
||||
influxWriteRequests.Inc()
|
||||
if err := influx.InsertHandlerForHTTP(at, r); err != nil {
|
||||
influxWriteErrors.Inc()
|
||||
httpserver.Errorf(w, r, "%s", err)
|
||||
return true
|
||||
}
|
||||
w.WriteHeader(http.StatusNoContent)
|
||||
return true
|
||||
case "influx/query":
|
||||
influxQueryRequests.Inc()
|
||||
influxutils.WriteDatabaseNames(w)
|
||||
return true
|
||||
default:
|
||||
httpserver.Errorf(w, r, "unsupported multitenant path suffix: %q", p.Suffix)
|
||||
return true
|
||||
}
|
||||
}
|
||||
|
||||
var (
|
||||
prometheusWriteRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/api/v1/write", protocol="promremotewrite"}`)
|
||||
prometheusWriteErrors = metrics.NewCounter(`vmagent_http_request_errors_total{path="/api/v1/write", protocol="promremotewrite"}`)
|
||||
|
|
|
@ -5,36 +5,39 @@ import (
|
|||
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/common"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/auth"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
|
||||
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
|
||||
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/native"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
|
||||
"github.com/VictoriaMetrics/metrics"
|
||||
)
|
||||
|
||||
var (
|
||||
rowsInserted = metrics.NewCounter(`vmagent_rows_inserted_total{type="native"}`)
|
||||
rowsPerInsert = metrics.NewHistogram(`vmagent_rows_per_insert{type="native"}`)
|
||||
rowsInserted = metrics.NewCounter(`vmagent_rows_inserted_total{type="native"}`)
|
||||
rowsTenantInserted = tenantmetrics.NewCounterMap(`vmagent_tenant_inserted_rows_total{type="native"}`)
|
||||
rowsPerInsert = metrics.NewHistogram(`vmagent_rows_per_insert{type="native"}`)
|
||||
)
|
||||
|
||||
// InsertHandler processes `/api/v1/import` request.
|
||||
//
|
||||
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6
|
||||
func InsertHandler(req *http.Request) error {
|
||||
func InsertHandler(at *auth.Token, req *http.Request) error {
|
||||
extraLabels, err := parserCommon.GetExtraLabels(req)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
return writeconcurrencylimiter.Do(func() error {
|
||||
return parser.ParseStream(req, func(block *parser.Block) error {
|
||||
return insertRows(block, extraLabels)
|
||||
return insertRows(at, block, extraLabels)
|
||||
})
|
||||
})
|
||||
}
|
||||
|
||||
func insertRows(block *parser.Block, extraLabels []prompbmarshal.Label) error {
|
||||
func insertRows(at *auth.Token, block *parser.Block, extraLabels []prompbmarshal.Label) error {
|
||||
ctx := common.GetPushCtx()
|
||||
defer common.PutPushCtx(ctx)
|
||||
|
||||
|
@ -42,6 +45,9 @@ func insertRows(block *parser.Block, extraLabels []prompbmarshal.Label) error {
|
|||
// since relabeling can prevent from inserting the rows.
|
||||
rowsLen := len(block.Values)
|
||||
rowsInserted.Add(rowsLen)
|
||||
if at != nil {
|
||||
rowsTenantInserted.Get(at).Add(rowsLen)
|
||||
}
|
||||
rowsPerInsert.Update(float64(rowsLen))
|
||||
|
||||
tssDst := ctx.WriteRequest.Timeseries[:0]
|
||||
|
@ -80,6 +86,6 @@ func insertRows(block *parser.Block, extraLabels []prompbmarshal.Label) error {
|
|||
ctx.WriteRequest.Timeseries = tssDst
|
||||
ctx.Labels = labels
|
||||
ctx.Samples = samples
|
||||
remotewrite.Push(&ctx.WriteRequest)
|
||||
remotewrite.PushWithAuthToken(at, &ctx.WriteRequest)
|
||||
return nil
|
||||
}
|
||||
|
|
|
@ -5,20 +5,23 @@ import (
|
|||
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/common"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/auth"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
|
||||
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
|
||||
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/prometheus"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
|
||||
"github.com/VictoriaMetrics/metrics"
|
||||
)
|
||||
|
||||
var (
|
||||
rowsInserted = metrics.NewCounter(`vmagent_rows_inserted_total{type="prometheus"}`)
|
||||
rowsPerInsert = metrics.NewHistogram(`vmagent_rows_per_insert{type="prometheus"}`)
|
||||
rowsInserted = metrics.NewCounter(`vmagent_rows_inserted_total{type="prometheus"}`)
|
||||
rowsTenantInserted = tenantmetrics.NewCounterMap(`vmagent_tenant_inserted_rows_total{type="prometheus"}`)
|
||||
rowsPerInsert = metrics.NewHistogram(`vmagent_rows_per_insert{type="prometheus"}`)
|
||||
)
|
||||
|
||||
// InsertHandler processes `/api/v1/import/prometheus` request.
|
||||
func InsertHandler(req *http.Request) error {
|
||||
func InsertHandler(at *auth.Token, req *http.Request) error {
|
||||
extraLabels, err := parserCommon.GetExtraLabels(req)
|
||||
if err != nil {
|
||||
return err
|
||||
|
@ -30,12 +33,12 @@ func InsertHandler(req *http.Request) error {
|
|||
return writeconcurrencylimiter.Do(func() error {
|
||||
isGzipped := req.Header.Get("Content-Encoding") == "gzip"
|
||||
return parser.ParseStream(req.Body, defaultTimestamp, isGzipped, func(rows []parser.Row) error {
|
||||
return insertRows(rows, extraLabels)
|
||||
return insertRows(at, rows, extraLabels)
|
||||
}, nil)
|
||||
})
|
||||
}
|
||||
|
||||
func insertRows(rows []parser.Row, extraLabels []prompbmarshal.Label) error {
|
||||
func insertRows(at *auth.Token, rows []parser.Row, extraLabels []prompbmarshal.Label) error {
|
||||
ctx := common.GetPushCtx()
|
||||
defer common.PutPushCtx(ctx)
|
||||
|
||||
|
@ -69,8 +72,11 @@ func insertRows(rows []parser.Row, extraLabels []prompbmarshal.Label) error {
|
|||
ctx.WriteRequest.Timeseries = tssDst
|
||||
ctx.Labels = labels
|
||||
ctx.Samples = samples
|
||||
remotewrite.Push(&ctx.WriteRequest)
|
||||
remotewrite.PushWithAuthToken(at, &ctx.WriteRequest)
|
||||
rowsInserted.Add(len(rows))
|
||||
if at != nil {
|
||||
rowsTenantInserted.Get(at).Add(len(rows))
|
||||
}
|
||||
rowsPerInsert.Update(float64(len(rows)))
|
||||
return nil
|
||||
}
|
||||
|
|
|
@ -5,34 +5,37 @@ import (
|
|||
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/common"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/auth"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompb"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
|
||||
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
|
||||
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/promremotewrite"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
|
||||
"github.com/VictoriaMetrics/metrics"
|
||||
)
|
||||
|
||||
var (
|
||||
rowsInserted = metrics.NewCounter(`vmagent_rows_inserted_total{type="promremotewrite"}`)
|
||||
rowsPerInsert = metrics.NewHistogram(`vmagent_rows_per_insert{type="promremotewrite"}`)
|
||||
rowsInserted = metrics.NewCounter(`vmagent_rows_inserted_total{type="promremotewrite"}`)
|
||||
rowsTenantInserted = tenantmetrics.NewCounterMap(`vmagent_tenant_inserted_rows_total{type="promremotewrite"}`)
|
||||
rowsPerInsert = metrics.NewHistogram(`vmagent_rows_per_insert{type="promremotewrite"}`)
|
||||
)
|
||||
|
||||
// InsertHandler processes remote write for prometheus.
|
||||
func InsertHandler(req *http.Request) error {
|
||||
func InsertHandler(at *auth.Token, req *http.Request) error {
|
||||
extraLabels, err := parserCommon.GetExtraLabels(req)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
return writeconcurrencylimiter.Do(func() error {
|
||||
return parser.ParseStream(req, func(tss []prompb.TimeSeries) error {
|
||||
return insertRows(tss, extraLabels)
|
||||
return insertRows(at, tss, extraLabels)
|
||||
})
|
||||
})
|
||||
}
|
||||
|
||||
func insertRows(timeseries []prompb.TimeSeries, extraLabels []prompbmarshal.Label) error {
|
||||
func insertRows(at *auth.Token, timeseries []prompb.TimeSeries, extraLabels []prompbmarshal.Label) error {
|
||||
ctx := common.GetPushCtx()
|
||||
defer common.PutPushCtx(ctx)
|
||||
|
||||
|
@ -68,8 +71,11 @@ func insertRows(timeseries []prompb.TimeSeries, extraLabels []prompbmarshal.Labe
|
|||
ctx.WriteRequest.Timeseries = tssDst
|
||||
ctx.Labels = labels
|
||||
ctx.Samples = samples
|
||||
remotewrite.Push(&ctx.WriteRequest)
|
||||
remotewrite.PushWithAuthToken(at, &ctx.WriteRequest)
|
||||
rowsInserted.Add(rowsTotal)
|
||||
if at != nil {
|
||||
rowsTenantInserted.Get(at).Add(rowsTotal)
|
||||
}
|
||||
rowsPerInsert.Update(float64(rowsTotal))
|
||||
return nil
|
||||
}
|
||||
|
|
|
@ -78,6 +78,7 @@ type client struct {
|
|||
errorsCount *metrics.Counter
|
||||
packetsDropped *metrics.Counter
|
||||
retriesCount *metrics.Counter
|
||||
sendDuration *metrics.FloatCounter
|
||||
|
||||
wg sync.WaitGroup
|
||||
stopCh chan struct{}
|
||||
|
@ -133,6 +134,7 @@ func newClient(argIdx int, remoteWriteURL, sanitizedURL string, fq *persistentqu
|
|||
c.errorsCount = metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_errors_total{url=%q}`, c.sanitizedURL))
|
||||
c.packetsDropped = metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_packets_dropped_total{url=%q}`, c.sanitizedURL))
|
||||
c.retriesCount = metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_retries_count_total{url=%q}`, c.sanitizedURL))
|
||||
c.sendDuration = metrics.GetOrCreateFloatCounter(fmt.Sprintf(`vmagent_remotewrite_send_duration_seconds_total{url=%q}`, c.sanitizedURL))
|
||||
for i := 0; i < concurrency; i++ {
|
||||
c.wg.Add(1)
|
||||
go func() {
|
||||
|
@ -204,7 +206,9 @@ func (c *client) runWorker() {
|
|||
return
|
||||
}
|
||||
go func() {
|
||||
startTime := time.Now()
|
||||
ch <- c.sendBlock(block)
|
||||
c.sendDuration.Add(time.Since(startTime).Seconds())
|
||||
}()
|
||||
select {
|
||||
case ok := <-ch:
|
||||
|
|
|
@ -42,11 +42,11 @@ func loadRelabelConfigs() (*relabelConfigs, error) {
|
|||
}
|
||||
rcs.global = global
|
||||
}
|
||||
if len(*relabelConfigPaths) > len(*remoteWriteURLs) {
|
||||
return nil, fmt.Errorf("too many -remoteWrite.urlRelabelConfig args: %d; it mustn't exceed the number of -remoteWrite.url args: %d",
|
||||
len(*relabelConfigPaths), len(*remoteWriteURLs))
|
||||
if len(*relabelConfigPaths) > (len(*remoteWriteURLs) + len(*remoteWriteMultitenantURLs)) {
|
||||
return nil, fmt.Errorf("too many -remoteWrite.urlRelabelConfig args: %d; it mustn't exceed the number of -remoteWrite.url or -remoteWrite.multitenantURL args: %d",
|
||||
len(*relabelConfigPaths), (len(*remoteWriteURLs) + len(*remoteWriteMultitenantURLs)))
|
||||
}
|
||||
rcs.perURL = make([]*promrelabel.ParsedConfigs, len(*remoteWriteURLs))
|
||||
rcs.perURL = make([]*promrelabel.ParsedConfigs, (len(*remoteWriteURLs) + len(*remoteWriteMultitenantURLs)))
|
||||
for i, path := range *relabelConfigPaths {
|
||||
if len(path) == 0 {
|
||||
// Skip empty relabel config.
|
||||
|
|
|
@ -8,6 +8,7 @@ import (
|
|||
"sync/atomic"
|
||||
"time"
|
||||
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/auth"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bloomfilter"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
|
||||
|
@ -18,6 +19,7 @@ import (
|
|||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
|
||||
"github.com/VictoriaMetrics/metrics"
|
||||
xxhash "github.com/cespare/xxhash/v2"
|
||||
)
|
||||
|
@ -25,11 +27,14 @@ import (
|
|||
var (
|
||||
remoteWriteURLs = flagutil.NewArray("remoteWrite.url", "Remote storage URL to write data to. It must support Prometheus remote_write API. "+
|
||||
"It is recommended using VictoriaMetrics as remote storage. Example url: http://<victoriametrics-host>:8428/api/v1/write . "+
|
||||
"Pass multiple -remoteWrite.url flags in order to write data concurrently to multiple remote storage systems")
|
||||
"Pass multiple -remoteWrite.url flags in order to replicate data to multiple remote storage systems. See also -remoteWrite.multitenantURL")
|
||||
remoteWriteMultitenantURLs = flagutil.NewArray("remoteWrite.multitenantURL", "Base path for multitenant remote storage URL to write data to. "+
|
||||
"See https://docs.victoriametrics.com/vmagent.html#multitenancy for details. Example url: http://<vminsert>:8480 . "+
|
||||
"Pass multiple -remoteWrite.multitenantURL flags in order to replicate data to multiple remote storage systems. See also -remoteWrite.url")
|
||||
tmpDataPath = flag.String("remoteWrite.tmpDataPath", "vmagent-remotewrite-data", "Path to directory where temporary data for remote write component is stored. "+
|
||||
"See also -remoteWrite.maxDiskUsagePerURL")
|
||||
queues = flag.Int("remoteWrite.queues", cgroup.AvailableCPUs()*2, "The number of concurrent queues to each -remoteWrite.url. Set more queues if default number of queues "+
|
||||
"isn't enough for sending high volume of collected data to remote storage. Default value if 2 * numberOfAvailableCPUs")
|
||||
"isn't enough for sending high volume of collected data to remote storage. Default value is 2 * numberOfAvailableCPUs")
|
||||
showRemoteWriteURL = flag.Bool("remoteWrite.showURL", false, "Whether to show -remoteWrite.url in the exported metrics. "+
|
||||
"It is hidden by default, since it can contain sensitive info such as auth key")
|
||||
maxPendingBytesPerURL = flagutil.NewBytes("remoteWrite.maxDiskUsagePerURL", 0, "The maximum file-based buffer size in bytes at -remoteWrite.tmpDataPath "+
|
||||
|
@ -53,7 +58,22 @@ var (
|
|||
"Excess series are logged and dropped. This can be useful for limiting series churn rate. See also -remoteWrite.maxHourlySeries")
|
||||
)
|
||||
|
||||
var rwctxs []*remoteWriteCtx
|
||||
var (
|
||||
// rwctxsDefault contains statically populated entries when -remoteWrite.url is specified.
|
||||
rwctxsDefault []*remoteWriteCtx
|
||||
|
||||
// rwctxsMap contains dynamically populated entries when -remoteWrite.multitenantURL is specified.
|
||||
rwctxsMap = make(map[tenantmetrics.TenantID][]*remoteWriteCtx)
|
||||
rwctxsMapLock sync.Mutex
|
||||
|
||||
// Data without tenant id is written to defaultAuthToken if -remoteWrite.multitenantURL is specified.
|
||||
defaultAuthToken = &auth.Token{}
|
||||
)
|
||||
|
||||
// MultitenancyEnabled returns true if -remoteWrite.multitenantURL is specified.
|
||||
func MultitenancyEnabled() bool {
|
||||
return len(*remoteWriteMultitenantURLs) > 0
|
||||
}
|
||||
|
||||
// Contains the current relabelConfigs.
|
||||
var allRelabelConfigs atomic.Value
|
||||
|
@ -76,8 +96,11 @@ func InitSecretFlags() {
|
|||
//
|
||||
// Stop must be called for graceful shutdown.
|
||||
func Init() {
|
||||
if len(*remoteWriteURLs) == 0 {
|
||||
logger.Fatalf("at least one `-remoteWrite.url` command-line flag must be set")
|
||||
if len(*remoteWriteURLs) == 0 && len(*remoteWriteMultitenantURLs) == 0 {
|
||||
logger.Fatalf("at least one `-remoteWrite.url` or `-remoteWrite.multitenantURL` command-line flag must be set")
|
||||
}
|
||||
if len(*remoteWriteURLs) > 0 && len(*remoteWriteMultitenantURLs) > 0 {
|
||||
logger.Fatalf("cannot set both `-remoteWrite.url` and `-remoteWrite.multitenantURL` command-line flags")
|
||||
}
|
||||
if *maxHourlySeries > 0 {
|
||||
hourlySeriesLimiter = bloomfilter.NewLimiter(*maxHourlySeries, time.Hour)
|
||||
|
@ -116,23 +139,8 @@ func Init() {
|
|||
}
|
||||
allRelabelConfigs.Store(rcs)
|
||||
|
||||
maxInmemoryBlocks := memory.Allowed() / len(*remoteWriteURLs) / maxRowsPerBlock / 100
|
||||
if maxInmemoryBlocks > 400 {
|
||||
// There is no much sense in keeping higher number of blocks in memory,
|
||||
// since this means that the producer outperforms consumer and the queue
|
||||
// will continue growing. It is better storing the queue to file.
|
||||
maxInmemoryBlocks = 400
|
||||
}
|
||||
if maxInmemoryBlocks < 2 {
|
||||
maxInmemoryBlocks = 2
|
||||
}
|
||||
for i, remoteWriteURL := range *remoteWriteURLs {
|
||||
sanitizedURL := fmt.Sprintf("%d:secret-url", i+1)
|
||||
if *showRemoteWriteURL {
|
||||
sanitizedURL = fmt.Sprintf("%d:%s", i+1, remoteWriteURL)
|
||||
}
|
||||
rwctx := newRemoteWriteCtx(i, remoteWriteURL, maxInmemoryBlocks, sanitizedURL)
|
||||
rwctxs = append(rwctxs, rwctx)
|
||||
if len(*remoteWriteURLs) > 0 {
|
||||
rwctxsDefault = newRemoteWriteCtxs(nil, *remoteWriteURLs)
|
||||
}
|
||||
|
||||
// Start config reloader.
|
||||
|
@ -157,6 +165,37 @@ func Init() {
|
|||
}()
|
||||
}
|
||||
|
||||
func newRemoteWriteCtxs(at *auth.Token, urls []string) []*remoteWriteCtx {
|
||||
if len(urls) == 0 {
|
||||
logger.Panicf("BUG: urls must be non-empty")
|
||||
}
|
||||
|
||||
maxInmemoryBlocks := memory.Allowed() / len(urls) / maxRowsPerBlock / 100
|
||||
if maxInmemoryBlocks > 400 {
|
||||
// There is no much sense in keeping higher number of blocks in memory,
|
||||
// since this means that the producer outperforms consumer and the queue
|
||||
// will continue growing. It is better storing the queue to file.
|
||||
maxInmemoryBlocks = 400
|
||||
}
|
||||
if maxInmemoryBlocks < 2 {
|
||||
maxInmemoryBlocks = 2
|
||||
}
|
||||
rwctxs := make([]*remoteWriteCtx, len(urls))
|
||||
for i, remoteWriteURL := range urls {
|
||||
sanitizedURL := fmt.Sprintf("%d:secret-url", i+1)
|
||||
if at != nil {
|
||||
// Construct full remote_write url for the given tenant according to https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
|
||||
remoteWriteURL = fmt.Sprintf("%s/insert/%d:%d/prometheus/api/v1/write", remoteWriteURL, at.AccountID, at.ProjectID)
|
||||
sanitizedURL = fmt.Sprintf("%s:%d:%d", sanitizedURL, at.AccountID, at.ProjectID)
|
||||
}
|
||||
if *showRemoteWriteURL {
|
||||
sanitizedURL = fmt.Sprintf("%d:%s", i+1, remoteWriteURL)
|
||||
}
|
||||
rwctxs[i] = newRemoteWriteCtx(i, remoteWriteURL, maxInmemoryBlocks, sanitizedURL)
|
||||
}
|
||||
return rwctxs
|
||||
}
|
||||
|
||||
var stopCh = make(chan struct{})
|
||||
var configReloaderWG sync.WaitGroup
|
||||
|
||||
|
@ -167,16 +206,55 @@ func Stop() {
|
|||
close(stopCh)
|
||||
configReloaderWG.Wait()
|
||||
|
||||
for _, rwctx := range rwctxs {
|
||||
for _, rwctx := range rwctxsDefault {
|
||||
rwctx.MustStop()
|
||||
}
|
||||
rwctxs = nil
|
||||
rwctxsDefault = nil
|
||||
|
||||
// There is no need in locking rwctxsMapLock here, since nobody should call Push during the Stop call.
|
||||
for _, rwctxs := range rwctxsMap {
|
||||
for _, rwctx := range rwctxs {
|
||||
rwctx.MustStop()
|
||||
}
|
||||
}
|
||||
rwctxsMap = nil
|
||||
}
|
||||
|
||||
// Push sends wr to remote storage systems set via `-remoteWrite.url`.
|
||||
//
|
||||
// Note that wr may be modified by Push due to relabeling and rounding.
|
||||
func Push(wr *prompbmarshal.WriteRequest) {
|
||||
PushWithAuthToken(nil, wr)
|
||||
}
|
||||
|
||||
// PushWithAuthToken sends wr to remote storage systems set via `-remoteWrite.multitenantURL`.
|
||||
//
|
||||
// Note that wr may be modified by Push due to relabeling and rounding.
|
||||
func PushWithAuthToken(at *auth.Token, wr *prompbmarshal.WriteRequest) {
|
||||
if at == nil && len(*remoteWriteMultitenantURLs) > 0 {
|
||||
// Write data to default tenant if at isn't set while -remoteWrite.multitenantURL is set.
|
||||
at = defaultAuthToken
|
||||
}
|
||||
var rwctxs []*remoteWriteCtx
|
||||
if at == nil {
|
||||
rwctxs = rwctxsDefault
|
||||
} else {
|
||||
if len(*remoteWriteMultitenantURLs) == 0 {
|
||||
logger.Panicf("BUG: remoteWriteMultitenantURLs must be non-empty for non-nil at")
|
||||
}
|
||||
rwctxsMapLock.Lock()
|
||||
tenantID := tenantmetrics.TenantID{
|
||||
AccountID: at.AccountID,
|
||||
ProjectID: at.ProjectID,
|
||||
}
|
||||
rwctxs = rwctxsMap[tenantID]
|
||||
if rwctxs == nil {
|
||||
rwctxs = newRemoteWriteCtxs(at, *remoteWriteMultitenantURLs)
|
||||
rwctxsMap[tenantID] = rwctxs
|
||||
}
|
||||
rwctxsMapLock.Unlock()
|
||||
}
|
||||
|
||||
var rctx *relabelCtx
|
||||
rcs := allRelabelConfigs.Load().(*relabelConfigs)
|
||||
pcsGlobal := rcs.global
|
||||
|
|
|
@ -5,36 +5,39 @@ import (
|
|||
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/common"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/auth"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
|
||||
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
|
||||
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/vmimport"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
|
||||
"github.com/VictoriaMetrics/metrics"
|
||||
)
|
||||
|
||||
var (
|
||||
rowsInserted = metrics.NewCounter(`vmagent_rows_inserted_total{type="vmimport"}`)
|
||||
rowsPerInsert = metrics.NewHistogram(`vmagent_rows_per_insert{type="vmimport"}`)
|
||||
rowsInserted = metrics.NewCounter(`vmagent_rows_inserted_total{type="vmimport"}`)
|
||||
rowsTenantInserted = tenantmetrics.NewCounterMap(`vmagent_tenant_inserted_rows_total{type="vmimport"}`)
|
||||
rowsPerInsert = metrics.NewHistogram(`vmagent_rows_per_insert{type="vmimport"}`)
|
||||
)
|
||||
|
||||
// InsertHandler processes `/api/v1/import` request.
|
||||
//
|
||||
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6
|
||||
func InsertHandler(req *http.Request) error {
|
||||
func InsertHandler(at *auth.Token, req *http.Request) error {
|
||||
extraLabels, err := parserCommon.GetExtraLabels(req)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
return writeconcurrencylimiter.Do(func() error {
|
||||
return parser.ParseStream(req, func(rows []parser.Row) error {
|
||||
return insertRows(rows, extraLabels)
|
||||
return insertRows(at, rows, extraLabels)
|
||||
})
|
||||
})
|
||||
}
|
||||
|
||||
func insertRows(rows []parser.Row, extraLabels []prompbmarshal.Label) error {
|
||||
func insertRows(at *auth.Token, rows []parser.Row, extraLabels []prompbmarshal.Label) error {
|
||||
ctx := common.GetPushCtx()
|
||||
defer common.PutPushCtx(ctx)
|
||||
|
||||
|
@ -74,8 +77,11 @@ func insertRows(rows []parser.Row, extraLabels []prompbmarshal.Label) error {
|
|||
ctx.WriteRequest.Timeseries = tssDst
|
||||
ctx.Labels = labels
|
||||
ctx.Samples = samples
|
||||
remotewrite.Push(&ctx.WriteRequest)
|
||||
remotewrite.PushWithAuthToken(at, &ctx.WriteRequest)
|
||||
rowsInserted.Add(rowsTotal)
|
||||
if at != nil {
|
||||
rowsTenantInserted.Get(at).Add(rowsTotal)
|
||||
}
|
||||
rowsPerInsert.Update(float64(rowsTotal))
|
||||
return nil
|
||||
}
|
||||
|
|
|
@ -348,193 +348,193 @@ command-line flags with their descriptions.
|
|||
The shortlist of configuration flags is the following:
|
||||
```
|
||||
-datasource.appendTypePrefix
|
||||
Whether to add type prefix to -datasource.url based on the query type. Set to true if sending different query types to the vmselect URL.
|
||||
Whether to add type prefix to -datasource.url based on the query type. Set to true if sending different query types to the vmselect URL.
|
||||
-datasource.basicAuth.password string
|
||||
Optional basic auth password for -datasource.url
|
||||
Optional basic auth password for -datasource.url
|
||||
-datasource.basicAuth.username string
|
||||
Optional basic auth username for -datasource.url
|
||||
Optional basic auth username for -datasource.url
|
||||
-datasource.lookback duration
|
||||
Lookback defines how far into the past to look when evaluating queries. For example, if the datasource.lookback=5m then param "time" with value now()-5m will be added to every query.
|
||||
Lookback defines how far into the past to look when evaluating queries. For example, if the datasource.lookback=5m then param "time" with value now()-5m will be added to every query.
|
||||
-datasource.maxIdleConnections int
|
||||
Defines the number of idle (keep-alive connections) to each configured datasource. Consider setting this value equal to the value: groups_total * group.concurrency. Too low a value may result in a high number of sockets in TIME_WAIT state. (default 100)
|
||||
Defines the number of idle (keep-alive connections) to each configured datasource. Consider setting this value equal to the value: groups_total * group.concurrency. Too low a value may result in a high number of sockets in TIME_WAIT state. (default 100)
|
||||
-datasource.queryStep duration
|
||||
queryStep defines how far a value can fallback to when evaluating queries. For example, if datasource.queryStep=15s then param "step" with value "15s" will be added to every query.If queryStep isn't specified, rule's evaluationInterval will be used instead.
|
||||
queryStep defines how far a value can fallback to when evaluating queries. For example, if datasource.queryStep=15s then param "step" with value "15s" will be added to every query.If queryStep isn't specified, rule's evaluationInterval will be used instead.
|
||||
-datasource.roundDigits int
|
||||
Adds "round_digits" GET param to datasource requests. In VM "round_digits" limits the number of digits after the decimal point in response values.
|
||||
Adds "round_digits" GET param to datasource requests. In VM "round_digits" limits the number of digits after the decimal point in response values.
|
||||
-datasource.tlsCAFile string
|
||||
Optional path to TLS CA file to use for verifying connections to -datasource.url. By default, system CA is used
|
||||
Optional path to TLS CA file to use for verifying connections to -datasource.url. By default, system CA is used
|
||||
-datasource.tlsCertFile string
|
||||
Optional path to client-side TLS certificate file to use when connecting to -datasource.url
|
||||
Optional path to client-side TLS certificate file to use when connecting to -datasource.url
|
||||
-datasource.tlsInsecureSkipVerify
|
||||
Whether to skip tls verification when connecting to -datasource.url
|
||||
Whether to skip tls verification when connecting to -datasource.url
|
||||
-datasource.tlsKeyFile string
|
||||
Optional path to client-side TLS certificate key to use when connecting to -datasource.url
|
||||
Optional path to client-side TLS certificate key to use when connecting to -datasource.url
|
||||
-datasource.tlsServerName string
|
||||
Optional TLS server name to use for connections to -datasource.url. By default, the server name from -datasource.url is used
|
||||
Optional TLS server name to use for connections to -datasource.url. By default, the server name from -datasource.url is used
|
||||
-datasource.url string
|
||||
VictoriaMetrics or vmselect url. Required parameter. E.g. http://127.0.0.1:8428
|
||||
VictoriaMetrics or vmselect url. Required parameter. E.g. http://127.0.0.1:8428
|
||||
-dryRun -rule
|
||||
Whether to check only config files without running vmalert. The rules file are validated. The -rule flag must be specified.
|
||||
Whether to check only config files without running vmalert. The rules file are validated. The -rule flag must be specified.
|
||||
-enableTCP6
|
||||
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
|
||||
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
|
||||
-envflag.enable
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set. See https://docs.victoriametrics.com/#environment-variables for more details
|
||||
-envflag.prefix string
|
||||
Prefix for environment variables if -envflag.enable is set
|
||||
Prefix for environment variables if -envflag.enable is set
|
||||
-evaluationInterval duration
|
||||
How often to evaluate the rules (default 1m0s)
|
||||
How often to evaluate the rules (default 1m0s)
|
||||
-external.alert.source string
|
||||
External Alert Source allows to override the Source link for alerts sent to AlertManager for cases where you want to build a custom link to Grafana, Prometheus or any other service.
|
||||
eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{{$expr|quotesEscape|crlfEscape|queryEscape}}\"},{\"mode\":\"Metrics\"},{\"ui\":[true,true,true,\"none\"]}]'.If empty '/api/v1/:groupID/alertID/status' is used
|
||||
External Alert Source allows to override the Source link for alerts sent to AlertManager for cases where you want to build a custom link to Grafana, Prometheus or any other service.
|
||||
eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{{$expr|quotesEscape|crlfEscape|queryEscape}}\"},{\"mode\":\"Metrics\"},{\"ui\":[true,true,true,\"none\"]}]'.If empty '/api/v1/:groupID/alertID/status' is used
|
||||
-external.label array
|
||||
Optional label in the form 'name=value' to add to all generated recording rules and alerts. Pass multiple -label flags in order to add multiple label sets.
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
Optional label in the form 'name=value' to add to all generated recording rules and alerts. Pass multiple -label flags in order to add multiple label sets.
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-external.url string
|
||||
External URL is used as alert's source for sent alerts to the notifier
|
||||
External URL is used as alert's source for sent alerts to the notifier
|
||||
-fs.disableMmap
|
||||
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
|
||||
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
|
||||
-http.connTimeout duration
|
||||
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
|
||||
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
|
||||
-http.disableResponseCompression
|
||||
Disable compression of HTTP responses to save CPU resources. By default compression is enabled to save network bandwidth
|
||||
Disable compression of HTTP responses to save CPU resources. By default compression is enabled to save network bandwidth
|
||||
-http.idleConnTimeout duration
|
||||
Timeout for incoming idle http connections (default 1m0s)
|
||||
Timeout for incoming idle http connections (default 1m0s)
|
||||
-http.maxGracefulShutdownDuration duration
|
||||
The maximum duration for a graceful shutdown of the HTTP server. A highly loaded server may require increased value for a graceful shutdown (default 7s)
|
||||
The maximum duration for a graceful shutdown of the HTTP server. A highly loaded server may require increased value for a graceful shutdown (default 7s)
|
||||
-http.pathPrefix string
|
||||
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
|
||||
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
|
||||
-http.shutdownDelay duration
|
||||
Optional delay before http server shutdown. During this delay, the server returns non-OK responses from /health page, so load balancers can route new requests to other servers
|
||||
Optional delay before http server shutdown. During this delay, the server returns non-OK responses from /health page, so load balancers can route new requests to other servers
|
||||
-httpAuth.password string
|
||||
Password for HTTP Basic Auth. The authentication is disabled if -httpAuth.username is empty
|
||||
Password for HTTP Basic Auth. The authentication is disabled if -httpAuth.username is empty
|
||||
-httpAuth.username string
|
||||
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
|
||||
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
|
||||
-httpListenAddr string
|
||||
Address to listen for http connections (default ":8880")
|
||||
Address to listen for http connections (default ":8880")
|
||||
-loggerDisableTimestamps
|
||||
Whether to disable writing timestamps in logs
|
||||
Whether to disable writing timestamps in logs
|
||||
-loggerErrorsPerSecondLimit int
|
||||
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
|
||||
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
|
||||
-loggerFormat string
|
||||
Format for logs. Possible values: default, json (default "default")
|
||||
Format for logs. Possible values: default, json (default "default")
|
||||
-loggerLevel string
|
||||
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
|
||||
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
|
||||
-loggerOutput string
|
||||
Output for the logs. Supported values: stderr, stdout (default "stderr")
|
||||
Output for the logs. Supported values: stderr, stdout (default "stderr")
|
||||
-loggerTimezone string
|
||||
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
|
||||
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
|
||||
-loggerWarnsPerSecondLimit int
|
||||
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
|
||||
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
|
||||
-memory.allowedBytes size
|
||||
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to a non-zero value. Too low a value may increase the cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache resulting in higher disk IO usage
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to a non-zero value. Too low a value may increase the cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache resulting in higher disk IO usage
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-memory.allowedPercent float
|
||||
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low a value may increase cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache which will result in higher disk IO usage (default 60)
|
||||
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low a value may increase cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache which will result in higher disk IO usage (default 60)
|
||||
-metricsAuthKey string
|
||||
Auth key for /metrics. It overrides httpAuth settings
|
||||
Auth key for /metrics. It overrides httpAuth settings
|
||||
-notifier.basicAuth.password array
|
||||
Optional basic auth password for -notifier.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
Optional basic auth password for -notifier.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-notifier.basicAuth.username array
|
||||
Optional basic auth username for -notifier.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
Optional basic auth username for -notifier.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-notifier.tlsCAFile array
|
||||
Optional path to TLS CA file to use for verifying connections to -notifier.url. By default system CA is used
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
Optional path to TLS CA file to use for verifying connections to -notifier.url. By default system CA is used
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-notifier.tlsCertFile array
|
||||
Optional path to client-side TLS certificate file to use when connecting to -notifier.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
Optional path to client-side TLS certificate file to use when connecting to -notifier.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-notifier.tlsInsecureSkipVerify array
|
||||
Whether to skip tls verification when connecting to -notifier.url
|
||||
Supports array of values separated by comma or specified via multiple flags.
|
||||
Whether to skip tls verification when connecting to -notifier.url
|
||||
Supports array of values separated by comma or specified via multiple flags.
|
||||
-notifier.tlsKeyFile array
|
||||
Optional path to client-side TLS certificate key to use when connecting to -notifier.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
Optional path to client-side TLS certificate key to use when connecting to -notifier.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-notifier.tlsServerName array
|
||||
Optional TLS server name to use for connections to -notifier.url. By default the server name from -notifier.url is used
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
Optional TLS server name to use for connections to -notifier.url. By default the server name from -notifier.url is used
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-notifier.url array
|
||||
Prometheus alertmanager URL. Required parameter. e.g. http://127.0.0.1:9093
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
Prometheus alertmanager URL. Required parameter. e.g. http://127.0.0.1:9093
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-pprofAuthKey string
|
||||
Auth key for /debug/pprof. It overrides httpAuth settings
|
||||
Auth key for /debug/pprof. It overrides httpAuth settings
|
||||
-remoteRead.basicAuth.password string
|
||||
Optional basic auth password for -remoteRead.url
|
||||
Optional basic auth password for -remoteRead.url
|
||||
-remoteRead.basicAuth.username string
|
||||
Optional basic auth username for -remoteRead.url
|
||||
Optional basic auth username for -remoteRead.url
|
||||
-remoteRead.ignoreRestoreErrors
|
||||
Whether to ignore errors from remote storage when restoring alerts state on startup. (default true)
|
||||
Whether to ignore errors from remote storage when restoring alerts state on startup. (default true)
|
||||
-remoteRead.lookback duration
|
||||
Lookback defines how far to look into past for alerts timeseries. For example, if lookback=1h then range from now() to now()-1h will be scanned. (default 1h0m0s)
|
||||
Lookback defines how far to look into past for alerts timeseries. For example, if lookback=1h then range from now() to now()-1h will be scanned. (default 1h0m0s)
|
||||
-remoteRead.tlsCAFile string
|
||||
Optional path to TLS CA file to use for verifying connections to -remoteRead.url. By default system CA is used
|
||||
Optional path to TLS CA file to use for verifying connections to -remoteRead.url. By default system CA is used
|
||||
-remoteRead.tlsCertFile string
|
||||
Optional path to client-side TLS certificate file to use when connecting to -remoteRead.url
|
||||
Optional path to client-side TLS certificate file to use when connecting to -remoteRead.url
|
||||
-remoteRead.tlsInsecureSkipVerify
|
||||
Whether to skip tls verification when connecting to -remoteRead.url
|
||||
Whether to skip tls verification when connecting to -remoteRead.url
|
||||
-remoteRead.tlsKeyFile string
|
||||
Optional path to client-side TLS certificate key to use when connecting to -remoteRead.url
|
||||
Optional path to client-side TLS certificate key to use when connecting to -remoteRead.url
|
||||
-remoteRead.tlsServerName string
|
||||
Optional TLS server name to use for connections to -remoteRead.url. By default the server name from -remoteRead.url is used
|
||||
Optional TLS server name to use for connections to -remoteRead.url. By default the server name from -remoteRead.url is used
|
||||
-remoteRead.url vmalert
|
||||
Optional URL to VictoriaMetrics or vmselect that will be used to restore alerts state. This configuration makes sense only if vmalert was configured with `remoteWrite.url` before and has been successfully persisted its state. E.g. http://127.0.0.1:8428
|
||||
Optional URL to VictoriaMetrics or vmselect that will be used to restore alerts state. This configuration makes sense only if vmalert was configured with `remoteWrite.url` before and has been successfully persisted its state. E.g. http://127.0.0.1:8428
|
||||
-remoteWrite.basicAuth.password string
|
||||
Optional basic auth password for -remoteWrite.url
|
||||
Optional basic auth password for -remoteWrite.url
|
||||
-remoteWrite.basicAuth.username string
|
||||
Optional basic auth username for -remoteWrite.url
|
||||
Optional basic auth username for -remoteWrite.url
|
||||
-remoteWrite.concurrency int
|
||||
Defines number of writers for concurrent writing into remote querier (default 1)
|
||||
Defines number of writers for concurrent writing into remote querier (default 1)
|
||||
-remoteWrite.flushInterval duration
|
||||
Defines interval of flushes to remote write endpoint (default 5s)
|
||||
Defines interval of flushes to remote write endpoint (default 5s)
|
||||
-remoteWrite.maxBatchSize int
|
||||
Defines defines max number of timeseries to be flushed at once (default 1000)
|
||||
Defines defines max number of timeseries to be flushed at once (default 1000)
|
||||
-remoteWrite.maxQueueSize int
|
||||
Defines the max number of pending datapoints to remote write endpoint (default 100000)
|
||||
Defines the max number of pending datapoints to remote write endpoint (default 100000)
|
||||
-remoteWrite.tlsCAFile string
|
||||
Optional path to TLS CA file to use for verifying connections to -remoteWrite.url. By default system CA is used
|
||||
Optional path to TLS CA file to use for verifying connections to -remoteWrite.url. By default system CA is used
|
||||
-remoteWrite.tlsCertFile string
|
||||
Optional path to client-side TLS certificate file to use when connecting to -remoteWrite.url
|
||||
Optional path to client-side TLS certificate file to use when connecting to -remoteWrite.url
|
||||
-remoteWrite.tlsInsecureSkipVerify
|
||||
Whether to skip tls verification when connecting to -remoteWrite.url
|
||||
Whether to skip tls verification when connecting to -remoteWrite.url
|
||||
-remoteWrite.tlsKeyFile string
|
||||
Optional path to client-side TLS certificate key to use when connecting to -remoteWrite.url
|
||||
Optional path to client-side TLS certificate key to use when connecting to -remoteWrite.url
|
||||
-remoteWrite.tlsServerName string
|
||||
Optional TLS server name to use for connections to -remoteWrite.url. By default the server name from -remoteWrite.url is used
|
||||
Optional TLS server name to use for connections to -remoteWrite.url. By default the server name from -remoteWrite.url is used
|
||||
-remoteWrite.url string
|
||||
Optional URL to VictoriaMetrics or vminsert where to persist alerts state and recording rules results in form of timeseries. E.g. http://127.0.0.1:8428
|
||||
Optional URL to VictoriaMetrics or vminsert where to persist alerts state and recording rules results in form of timeseries. E.g. http://127.0.0.1:8428
|
||||
-replay.maxDatapointsPerQuery int
|
||||
Max number of data points expected in one request. The higher the value, the less requests will be made during replay. (default 1000)
|
||||
Max number of data points expected in one request. The higher the value, the less requests will be made during replay. (default 1000)
|
||||
-replay.ruleRetryAttempts int
|
||||
Defines how many retries to make before giving up on rule if request for it returns an error. (default 5)
|
||||
Defines how many retries to make before giving up on rule if request for it returns an error. (default 5)
|
||||
-replay.rulesDelay duration
|
||||
Delay between rules evaluation within the group. Could be important if there are chained rules inside of the groupand processing need to wait for previous rule results to be persisted by remote storage before evaluating the next rule. Keep it equal or bigger than -remoteWrite.flushInterval. (default 1s)
|
||||
Delay between rules evaluation within the group. Could be important if there are chained rules inside of the groupand processing need to wait for previous rule results to be persisted by remote storage before evaluating the next rule.Keep it equal or bigger than -remoteWrite.flushInterval. (default 1s)
|
||||
-replay.timeFrom string
|
||||
The time filter in RFC3339 format to select time series with timestamp equal or higher than provided value. E.g. '2020-01-01T20:07:00Z'
|
||||
The time filter in RFC3339 format to select time series with timestamp equal or higher than provided value. E.g. '2020-01-01T20:07:00Z'
|
||||
-replay.timeTo string
|
||||
The time filter in RFC3339 format to select timeseries with timestamp equal or lower than provided value. E.g. '2020-01-01T20:07:00Z'
|
||||
The time filter in RFC3339 format to select timeseries with timestamp equal or lower than provided value. E.g. '2020-01-01T20:07:00Z'
|
||||
-rule array
|
||||
Path to the file with alert rules.
|
||||
Supports patterns. Flag can be specified multiple times.
|
||||
Examples:
|
||||
-rule="/path/to/file". Path to a single file with alerting rules
|
||||
-rule="dir/*.yaml" -rule="/*.yaml". Relative path to all .yaml files in "dir" folder,
|
||||
absolute path to all .yaml files in root.
|
||||
Rule files may contain %{ENV_VAR} placeholders, which are substituted by the corresponding env vars.
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
Path to the file with alert rules.
|
||||
Supports patterns. Flag can be specified multiple times.
|
||||
Examples:
|
||||
-rule="/path/to/file". Path to a single file with alerting rules
|
||||
-rule="dir/*.yaml" -rule="/*.yaml". Relative path to all .yaml files in "dir" folder,
|
||||
absolute path to all .yaml files in root.
|
||||
Rule files may contain %{ENV_VAR} placeholders, which are substituted by the corresponding env vars.
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-rule.configCheckInterval duration
|
||||
Interval for checking for changes in '-rule' files. By default the checking is disabled. Send SIGHUP signal in order to force config check for changes
|
||||
Interval for checking for changes in '-rule' files. By default the checking is disabled. Send SIGHUP signal in order to force config check for changes
|
||||
-rule.validateExpressions
|
||||
Whether to validate rules expressions via MetricsQL engine (default true)
|
||||
Whether to validate rules expressions via MetricsQL engine (default true)
|
||||
-rule.validateTemplates
|
||||
Whether to validate annotation and label templates (default true)
|
||||
Whether to validate annotation and label templates (default true)
|
||||
-tls
|
||||
Whether to enable TLS (aka HTTPS) for incoming requests. -tlsCertFile and -tlsKeyFile must be set if -tls is set
|
||||
Whether to enable TLS (aka HTTPS) for incoming requests. -tlsCertFile and -tlsKeyFile must be set if -tls is set
|
||||
-tlsCertFile string
|
||||
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs as RSA certs are slower
|
||||
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs as RSA certs are slower
|
||||
-tlsKeyFile string
|
||||
Path to file with TLS key. Used only if -tls is set
|
||||
Path to file with TLS key. Used only if -tls is set
|
||||
-version
|
||||
Show VictoriaMetrics version
|
||||
Show VictoriaMetrics version
|
||||
```
|
||||
|
||||
`vmalert` supports "hot" config reload via the following methods:
|
||||
|
|
|
@ -42,6 +42,9 @@ type AlertingRule struct {
|
|||
// resets on every successful Exec
|
||||
// may be used as Health state
|
||||
lastExecError error
|
||||
// stores the number of samples returned during
|
||||
// the last evaluation
|
||||
lastExecSamples int
|
||||
|
||||
metrics *alertingRuleMetrics
|
||||
}
|
||||
|
@ -50,6 +53,7 @@ type alertingRuleMetrics struct {
|
|||
errors *gauge
|
||||
pending *gauge
|
||||
active *gauge
|
||||
samples *gauge
|
||||
}
|
||||
|
||||
func newAlertingRule(qb datasource.QuerierBuilder, group *Group, cfg config.Rule) *AlertingRule {
|
||||
|
@ -76,8 +80,8 @@ func newAlertingRule(qb datasource.QuerierBuilder, group *Group, cfg config.Rule
|
|||
labels := fmt.Sprintf(`alertname=%q, group=%q, id="%d"`, ar.Name, group.Name, ar.ID())
|
||||
ar.metrics.pending = getOrCreateGauge(fmt.Sprintf(`vmalert_alerts_pending{%s}`, labels),
|
||||
func() float64 {
|
||||
ar.mu.Lock()
|
||||
defer ar.mu.Unlock()
|
||||
ar.mu.RLock()
|
||||
defer ar.mu.RUnlock()
|
||||
var num int
|
||||
for _, a := range ar.alerts {
|
||||
if a.State == notifier.StatePending {
|
||||
|
@ -88,8 +92,8 @@ func newAlertingRule(qb datasource.QuerierBuilder, group *Group, cfg config.Rule
|
|||
})
|
||||
ar.metrics.active = getOrCreateGauge(fmt.Sprintf(`vmalert_alerts_firing{%s}`, labels),
|
||||
func() float64 {
|
||||
ar.mu.Lock()
|
||||
defer ar.mu.Unlock()
|
||||
ar.mu.RLock()
|
||||
defer ar.mu.RUnlock()
|
||||
var num int
|
||||
for _, a := range ar.alerts {
|
||||
if a.State == notifier.StateFiring {
|
||||
|
@ -98,15 +102,21 @@ func newAlertingRule(qb datasource.QuerierBuilder, group *Group, cfg config.Rule
|
|||
}
|
||||
return float64(num)
|
||||
})
|
||||
ar.metrics.errors = getOrCreateGauge(fmt.Sprintf(`vmalert_alerts_error{%s}`, labels),
|
||||
ar.metrics.errors = getOrCreateGauge(fmt.Sprintf(`vmalert_alerting_rules_error{%s}`, labels),
|
||||
func() float64 {
|
||||
ar.mu.Lock()
|
||||
defer ar.mu.Unlock()
|
||||
ar.mu.RLock()
|
||||
defer ar.mu.RUnlock()
|
||||
if ar.lastExecError == nil {
|
||||
return 0
|
||||
}
|
||||
return 1
|
||||
})
|
||||
ar.metrics.samples = getOrCreateGauge(fmt.Sprintf(`vmalert_alerting_rules_last_evaluation_samples{%s}`, labels),
|
||||
func() float64 {
|
||||
ar.mu.RLock()
|
||||
defer ar.mu.RUnlock()
|
||||
return float64(ar.lastExecSamples)
|
||||
})
|
||||
return ar
|
||||
}
|
||||
|
||||
|
@ -115,6 +125,7 @@ func (ar *AlertingRule) Close() {
|
|||
metrics.UnregisterMetric(ar.metrics.active.name)
|
||||
metrics.UnregisterMetric(ar.metrics.pending.name)
|
||||
metrics.UnregisterMetric(ar.metrics.errors.name)
|
||||
metrics.UnregisterMetric(ar.metrics.samples.name)
|
||||
}
|
||||
|
||||
// String implements Stringer interface
|
||||
|
@ -194,6 +205,7 @@ func (ar *AlertingRule) Exec(ctx context.Context) ([]prompbmarshal.TimeSeries, e
|
|||
|
||||
ar.lastExecError = err
|
||||
ar.lastExecTime = time.Now()
|
||||
ar.lastExecSamples = len(qMetrics)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("failed to execute query %q: %w", ar.Expr, err)
|
||||
}
|
||||
|
@ -384,6 +396,7 @@ func (ar *AlertingRule) RuleAPI() APIAlertingRule {
|
|||
Expression: ar.Expr,
|
||||
For: ar.For.String(),
|
||||
LastError: lastErr,
|
||||
LastSamples: ar.lastExecSamples,
|
||||
LastExec: ar.lastExecTime,
|
||||
Labels: ar.Labels,
|
||||
Annotations: ar.Annotations,
|
||||
|
|
|
@ -266,6 +266,8 @@ func configReload(ctx context.Context, m *manager, groupsCfg []config.Group) {
|
|||
}
|
||||
newGroupsCfg, err := config.Parse(*rulePath, *validateTemplates, *validateExpressions)
|
||||
if err != nil {
|
||||
configReloadErrors.Inc()
|
||||
configSuccess.Set(0)
|
||||
logger.Errorf("cannot parse configuration file: %s", err)
|
||||
continue
|
||||
}
|
||||
|
|
|
@ -90,13 +90,13 @@ func templateAnnotations(annotations map[string]string, data AlertTplData, funcs
|
|||
eg := new(utils.ErrGroup)
|
||||
r := make(map[string]string, len(annotations))
|
||||
for key, text := range annotations {
|
||||
r[key] = text
|
||||
buf.Reset()
|
||||
builder.Reset()
|
||||
builder.Grow(len(tplHeader) + len(text))
|
||||
builder.WriteString(tplHeader)
|
||||
builder.WriteString(text)
|
||||
if err := templateAnnotation(&buf, builder.String(), data, funcs); err != nil {
|
||||
r[key] = text
|
||||
eg.Add(fmt.Errorf("key %q, template %q: %w", key, text, err))
|
||||
continue
|
||||
}
|
||||
|
|
|
@ -35,12 +35,16 @@ type RecordingRule struct {
|
|||
// resets on every successful Exec
|
||||
// may be used as Health state
|
||||
lastExecError error
|
||||
// stores the number of samples returned during
|
||||
// the last evaluation
|
||||
lastExecSamples int
|
||||
|
||||
metrics *recordingRuleMetrics
|
||||
}
|
||||
|
||||
type recordingRuleMetrics struct {
|
||||
errors *gauge
|
||||
errors *gauge
|
||||
samples *gauge
|
||||
}
|
||||
|
||||
// String implements Stringer interface
|
||||
|
@ -73,19 +77,26 @@ func newRecordingRule(qb datasource.QuerierBuilder, group *Group, cfg config.Rul
|
|||
labels := fmt.Sprintf(`recording=%q, group=%q, id="%d"`, rr.Name, group.Name, rr.ID())
|
||||
rr.metrics.errors = getOrCreateGauge(fmt.Sprintf(`vmalert_recording_rules_error{%s}`, labels),
|
||||
func() float64 {
|
||||
rr.mu.Lock()
|
||||
defer rr.mu.Unlock()
|
||||
rr.mu.RLock()
|
||||
defer rr.mu.RUnlock()
|
||||
if rr.lastExecError == nil {
|
||||
return 0
|
||||
}
|
||||
return 1
|
||||
})
|
||||
rr.metrics.samples = getOrCreateGauge(fmt.Sprintf(`vmalert_recording_rules_last_evaluation_samples{%s}`, labels),
|
||||
func() float64 {
|
||||
rr.mu.RLock()
|
||||
defer rr.mu.RUnlock()
|
||||
return float64(rr.lastExecSamples)
|
||||
})
|
||||
return rr
|
||||
}
|
||||
|
||||
// Close unregisters rule metrics
|
||||
func (rr *RecordingRule) Close() {
|
||||
metrics.UnregisterMetric(rr.metrics.errors.name)
|
||||
metrics.UnregisterMetric(rr.metrics.samples.name)
|
||||
}
|
||||
|
||||
// ExecRange executes recording rule on the given time range similarly to Exec.
|
||||
|
@ -118,6 +129,7 @@ func (rr *RecordingRule) Exec(ctx context.Context) ([]prompbmarshal.TimeSeries,
|
|||
|
||||
rr.lastExecTime = time.Now()
|
||||
rr.lastExecError = err
|
||||
rr.lastExecSamples = len(qMetrics)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("failed to execute query %q: %w", rr.Expr, err)
|
||||
}
|
||||
|
@ -190,13 +202,14 @@ func (rr *RecordingRule) RuleAPI() APIRecordingRule {
|
|||
}
|
||||
return APIRecordingRule{
|
||||
// encode as strings to avoid rounding
|
||||
ID: fmt.Sprintf("%d", rr.ID()),
|
||||
GroupID: fmt.Sprintf("%d", rr.GroupID),
|
||||
Name: rr.Name,
|
||||
Type: rr.Type.String(),
|
||||
Expression: rr.Expr,
|
||||
LastError: lastErr,
|
||||
LastExec: rr.lastExecTime,
|
||||
Labels: rr.Labels,
|
||||
ID: fmt.Sprintf("%d", rr.ID()),
|
||||
GroupID: fmt.Sprintf("%d", rr.GroupID),
|
||||
Name: rr.Name,
|
||||
Type: rr.Type.String(),
|
||||
Expression: rr.Expr,
|
||||
LastError: lastErr,
|
||||
LastSamples: rr.lastExecSamples,
|
||||
LastExec: rr.lastExecTime,
|
||||
Labels: rr.Labels,
|
||||
}
|
||||
}
|
||||
|
|
|
@ -40,6 +40,7 @@ type APIAlertingRule struct {
|
|||
Expression string `json:"expression"`
|
||||
For string `json:"for"`
|
||||
LastError string `json:"last_error"`
|
||||
LastSamples int `json:"last_samples"`
|
||||
LastExec time.Time `json:"last_exec"`
|
||||
Labels map[string]string `json:"labels"`
|
||||
Annotations map[string]string `json:"annotations"`
|
||||
|
@ -47,12 +48,13 @@ type APIAlertingRule struct {
|
|||
|
||||
// APIRecordingRule represents RecordingRule for WEB view
|
||||
type APIRecordingRule struct {
|
||||
ID string `json:"id"`
|
||||
Name string `json:"name"`
|
||||
Type string `json:"type"`
|
||||
GroupID string `json:"group_id"`
|
||||
Expression string `json:"expression"`
|
||||
LastError string `json:"last_error"`
|
||||
LastExec time.Time `json:"last_exec"`
|
||||
Labels map[string]string `json:"labels"`
|
||||
ID string `json:"id"`
|
||||
Name string `json:"name"`
|
||||
Type string `json:"type"`
|
||||
GroupID string `json:"group_id"`
|
||||
Expression string `json:"expression"`
|
||||
LastError string `json:"last_error"`
|
||||
LastSamples int `json:"last_samples"`
|
||||
LastExec time.Time `json:"last_exec"`
|
||||
Labels map[string]string `json:"labels"`
|
||||
}
|
||||
|
|
|
@ -207,7 +207,7 @@ See the docs at https://docs.victoriametrics.com/vmauth.html .
|
|||
-enableTCP6
|
||||
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
|
||||
-envflag.enable
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set. See https://docs.victoriametrics.com/#environment-variables for more details
|
||||
-envflag.prefix string
|
||||
Prefix for environment variables if -envflag.enable is set
|
||||
-fs.disableMmap
|
||||
|
|
|
@ -186,7 +186,7 @@ See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time-
|
|||
Where to put the backup on the remote storage. Example: gcs://bucket/path/to/backup/dir, s3://bucket/path/to/backup/dir or fs:///path/to/local/backup/dir
|
||||
-dst can point to the previous backup. In this case incremental backup is performed, i.e. only changed data is uploaded
|
||||
-envflag.enable
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set. See https://docs.victoriametrics.com/#environment-variables for more details
|
||||
-envflag.prefix string
|
||||
Prefix for environment variables if -envflag.enable is set
|
||||
-fs.disableMmap
|
||||
|
|
|
@ -86,7 +86,7 @@ i.e. the end result would be similar to [rsync --delete](https://askubuntu.com/q
|
|||
-customS3Endpoint string
|
||||
Custom S3 endpoint for use with S3-compatible storages (e.g. MinIO). S3 is used if not set
|
||||
-envflag.enable
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set. See https://docs.victoriametrics.com/#environment-variables for more details
|
||||
-envflag.prefix string
|
||||
Prefix for environment variables if -envflag.enable is set
|
||||
-fs.disableMmap
|
||||
|
|
|
@ -381,14 +381,14 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
|
|||
return true
|
||||
}
|
||||
return true
|
||||
case "/api/v1/rules":
|
||||
case "/api/v1/rules", "/rules":
|
||||
// Return dumb placeholder for https://prometheus.io/docs/prometheus/latest/querying/api/#rules
|
||||
rulesRequests.Inc()
|
||||
w.Header().Set("Content-Type", "application/json; charset=utf-8")
|
||||
fmt.Fprintf(w, "%s", `{"status":"success","data":{"groups":[]}}`)
|
||||
return true
|
||||
case "/api/v1/alerts":
|
||||
// Return dumb placehloder for https://prometheus.io/docs/prometheus/latest/querying/api/#alerts
|
||||
case "/api/v1/alerts", "/alerts":
|
||||
// Return dumb placeholder for https://prometheus.io/docs/prometheus/latest/querying/api/#alerts
|
||||
alertsRequests.Inc()
|
||||
w.Header().Set("Content-Type", "application/json; charset=utf-8")
|
||||
fmt.Fprintf(w, "%s", `{"status":"success","data":{"alerts":[]}}`)
|
||||
|
|
|
@ -20,13 +20,16 @@ import (
|
|||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage"
|
||||
"github.com/VictoriaMetrics/metrics"
|
||||
"github.com/valyala/fastrand"
|
||||
)
|
||||
|
||||
var (
|
||||
maxTagKeysPerSearch = flag.Int("search.maxTagKeys", 100e3, "The maximum number of tag keys returned from /api/v1/labels")
|
||||
maxTagValuesPerSearch = flag.Int("search.maxTagValues", 100e3, "The maximum number of tag values returned from /api/v1/label/<label_name>/values")
|
||||
maxTagValueSuffixesPerSearch = flag.Int("search.maxTagValueSuffixesPerSearch", 100e3, "The maximum number of tag value suffixes returned from /metrics/find")
|
||||
maxMetricsPerSearch = flag.Int("search.maxUniqueTimeseries", 300e3, "The maximum number of unique time series each search can scan")
|
||||
maxMetricsPerSearch = flag.Int("search.maxUniqueTimeseries", 300e3, "The maximum number of unique time series each search can scan. This option allows limiting memory usage")
|
||||
maxSamplesPerSeries = flag.Int("search.maxSamplesPerSeries", 30e6, "The maximum number of raw samples a single query can scan per each time series. This option allows limiting memory usage")
|
||||
maxSamplesPerQuery = flag.Int("search.maxSamplesPerQuery", 1e9, "The maximum number of raw samples a single query can process across all time series. This protects from heavy queries, which select unexpectedly high number of raw samples. See also -search.maxSamplesPerSeries")
|
||||
)
|
||||
|
||||
// Result is a single timeseries result.
|
||||
|
@ -80,8 +83,6 @@ func (rss *Results) mustClose() {
|
|||
rss.tbf = nil
|
||||
}
|
||||
|
||||
var timeseriesWorkCh = make(chan *timeseriesWork, gomaxprocs*16)
|
||||
|
||||
type timeseriesWork struct {
|
||||
mustStop *uint32
|
||||
rss *Results
|
||||
|
@ -120,49 +121,77 @@ func putTimeseriesWork(tsw *timeseriesWork) {
|
|||
|
||||
var tswPool sync.Pool
|
||||
|
||||
func init() {
|
||||
for i := 0; i < gomaxprocs; i++ {
|
||||
go timeseriesWorker(uint(i))
|
||||
func scheduleTimeseriesWork(workChs []chan *timeseriesWork, tsw *timeseriesWork) {
|
||||
if len(workChs) == 1 {
|
||||
// Fast path for a single worker
|
||||
workChs[0] <- tsw
|
||||
return
|
||||
}
|
||||
attempts := 0
|
||||
for {
|
||||
idx := fastrand.Uint32n(uint32(len(workChs)))
|
||||
select {
|
||||
case workChs[idx] <- tsw:
|
||||
return
|
||||
default:
|
||||
attempts++
|
||||
if attempts >= len(workChs) {
|
||||
workChs[idx] <- tsw
|
||||
return
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func timeseriesWorker(workerID uint) {
|
||||
var rs Result
|
||||
var rsLastResetTime uint64
|
||||
for tsw := range timeseriesWorkCh {
|
||||
if atomic.LoadUint32(tsw.mustStop) != 0 {
|
||||
tsw.doneCh <- nil
|
||||
continue
|
||||
}
|
||||
rss := tsw.rss
|
||||
if rss.deadline.Exceeded() {
|
||||
func (tsw *timeseriesWork) do(r *Result, workerID uint) error {
|
||||
if atomic.LoadUint32(tsw.mustStop) != 0 {
|
||||
return nil
|
||||
}
|
||||
rss := tsw.rss
|
||||
if rss.deadline.Exceeded() {
|
||||
atomic.StoreUint32(tsw.mustStop, 1)
|
||||
return fmt.Errorf("timeout exceeded during query execution: %s", rss.deadline.String())
|
||||
}
|
||||
if err := tsw.pts.Unpack(r, rss.tbf, rss.tr, rss.fetchData); err != nil {
|
||||
atomic.StoreUint32(tsw.mustStop, 1)
|
||||
return fmt.Errorf("error during time series unpacking: %w", err)
|
||||
}
|
||||
if len(r.Timestamps) > 0 || !rss.fetchData {
|
||||
if err := tsw.f(r, workerID); err != nil {
|
||||
atomic.StoreUint32(tsw.mustStop, 1)
|
||||
tsw.doneCh <- fmt.Errorf("timeout exceeded during query execution: %s", rss.deadline.String())
|
||||
continue
|
||||
}
|
||||
if err := tsw.pts.Unpack(&rs, rss.tbf, rss.tr, rss.fetchData); err != nil {
|
||||
atomic.StoreUint32(tsw.mustStop, 1)
|
||||
tsw.doneCh <- fmt.Errorf("error during time series unpacking: %w", err)
|
||||
continue
|
||||
}
|
||||
if len(rs.Timestamps) > 0 || !rss.fetchData {
|
||||
if err := tsw.f(&rs, workerID); err != nil {
|
||||
atomic.StoreUint32(tsw.mustStop, 1)
|
||||
tsw.doneCh <- err
|
||||
continue
|
||||
}
|
||||
}
|
||||
tsw.rowsProcessed = len(rs.Values)
|
||||
tsw.doneCh <- nil
|
||||
currentTime := fasttime.UnixTimestamp()
|
||||
if cap(rs.Values) > 1024*1024 && 4*len(rs.Values) < cap(rs.Values) && currentTime-rsLastResetTime > 10 {
|
||||
// Reset rs in order to preseve memory usage after processing big time series with millions of rows.
|
||||
rs = Result{}
|
||||
rsLastResetTime = currentTime
|
||||
return err
|
||||
}
|
||||
}
|
||||
tsw.rowsProcessed = len(r.Values)
|
||||
return nil
|
||||
}
|
||||
|
||||
func timeseriesWorker(ch <-chan *timeseriesWork, workerID uint) {
|
||||
v := resultPool.Get()
|
||||
if v == nil {
|
||||
v = &result{}
|
||||
}
|
||||
r := v.(*result)
|
||||
for tsw := range ch {
|
||||
err := tsw.do(&r.rs, workerID)
|
||||
tsw.doneCh <- err
|
||||
}
|
||||
currentTime := fasttime.UnixTimestamp()
|
||||
if cap(r.rs.Values) > 1024*1024 && 4*len(r.rs.Values) < cap(r.rs.Values) && currentTime-r.lastResetTime > 10 {
|
||||
// Reset r.rs in order to preseve memory usage after processing big time series with millions of rows.
|
||||
r.rs = Result{}
|
||||
r.lastResetTime = currentTime
|
||||
}
|
||||
resultPool.Put(r)
|
||||
}
|
||||
|
||||
type result struct {
|
||||
rs Result
|
||||
lastResetTime uint64
|
||||
}
|
||||
|
||||
var resultPool sync.Pool
|
||||
|
||||
// RunParallel runs f in parallel for all the results from rss.
|
||||
//
|
||||
// f shouldn't hold references to rs after returning.
|
||||
|
@ -173,6 +202,30 @@ func timeseriesWorker(workerID uint) {
|
|||
func (rss *Results) RunParallel(f func(rs *Result, workerID uint) error) error {
|
||||
defer rss.mustClose()
|
||||
|
||||
// Spin up local workers.
|
||||
//
|
||||
// Do not use a global workChs with a global pool of workers, since it may lead to a deadlock in the following case:
|
||||
// - RunParallel is called with f, which blocks without forward progress.
|
||||
// - All the workers in the global pool became blocked in f.
|
||||
// - workChs is filled up, so it cannot accept new work items from other RunParallel calls.
|
||||
workers := len(rss.packedTimeseries)
|
||||
if workers > gomaxprocs {
|
||||
workers = gomaxprocs
|
||||
}
|
||||
if workers < 1 {
|
||||
workers = 1
|
||||
}
|
||||
workChs := make([]chan *timeseriesWork, workers)
|
||||
var workChsWG sync.WaitGroup
|
||||
for i := 0; i < workers; i++ {
|
||||
workChs[i] = make(chan *timeseriesWork, 16)
|
||||
workChsWG.Add(1)
|
||||
go func(workerID int) {
|
||||
defer workChsWG.Done()
|
||||
timeseriesWorker(workChs[workerID], uint(workerID))
|
||||
}(i)
|
||||
}
|
||||
|
||||
// Feed workers with work.
|
||||
tsws := make([]*timeseriesWork, len(rss.packedTimeseries))
|
||||
var mustStop uint32
|
||||
|
@ -182,7 +235,7 @@ func (rss *Results) RunParallel(f func(rs *Result, workerID uint) error) error {
|
|||
tsw.pts = &rss.packedTimeseries[i]
|
||||
tsw.f = f
|
||||
tsw.mustStop = &mustStop
|
||||
timeseriesWorkCh <- tsw
|
||||
scheduleTimeseriesWork(workChs, tsw)
|
||||
tsws[i] = tsw
|
||||
}
|
||||
seriesProcessedTotal := len(rss.packedTimeseries)
|
||||
|
@ -202,6 +255,13 @@ func (rss *Results) RunParallel(f func(rs *Result, workerID uint) error) error {
|
|||
|
||||
perQueryRowsProcessed.Update(float64(rowsProcessedTotal))
|
||||
perQuerySeriesProcessed.Update(float64(seriesProcessedTotal))
|
||||
|
||||
// Shut down local workers
|
||||
for _, workCh := range workChs {
|
||||
close(workCh)
|
||||
}
|
||||
workChsWG.Wait()
|
||||
|
||||
return firstErr
|
||||
}
|
||||
|
||||
|
@ -221,8 +281,6 @@ type promData struct {
|
|||
timestamps []int64
|
||||
}
|
||||
|
||||
var unpackWorkCh = make(chan *unpackWork, gomaxprocs*128)
|
||||
|
||||
type unpackWorkItem struct {
|
||||
br blockRef
|
||||
tr storage.TimeRange
|
||||
|
@ -284,23 +342,49 @@ func putUnpackWork(upw *unpackWork) {
|
|||
|
||||
var unpackWorkPool sync.Pool
|
||||
|
||||
func init() {
|
||||
for i := 0; i < gomaxprocs; i++ {
|
||||
go unpackWorker()
|
||||
func scheduleUnpackWork(workChs []chan *unpackWork, uw *unpackWork) {
|
||||
if len(workChs) == 1 {
|
||||
// Fast path for a single worker
|
||||
workChs[0] <- uw
|
||||
return
|
||||
}
|
||||
attempts := 0
|
||||
for {
|
||||
idx := fastrand.Uint32n(uint32(len(workChs)))
|
||||
select {
|
||||
case workChs[idx] <- uw:
|
||||
return
|
||||
default:
|
||||
attempts++
|
||||
if attempts >= len(workChs) {
|
||||
workChs[idx] <- uw
|
||||
return
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func unpackWorker() {
|
||||
var tmpBlock storage.Block
|
||||
for upw := range unpackWorkCh {
|
||||
upw.unpack(&tmpBlock)
|
||||
func unpackWorker(ch <-chan *unpackWork) {
|
||||
v := tmpBlockPool.Get()
|
||||
if v == nil {
|
||||
v = &storage.Block{}
|
||||
}
|
||||
tmpBlock := v.(*storage.Block)
|
||||
for upw := range ch {
|
||||
upw.unpack(tmpBlock)
|
||||
}
|
||||
tmpBlockPool.Put(v)
|
||||
}
|
||||
|
||||
var tmpBlockPool sync.Pool
|
||||
|
||||
// unpackBatchSize is the maximum number of blocks that may be unpacked at once by a single goroutine.
|
||||
//
|
||||
// This batch is needed in order to reduce contention for upackWorkCh in multi-CPU system.
|
||||
var unpackBatchSize = 32 * cgroup.AvailableCPUs()
|
||||
// It is better to load a single goroutine for up to one second on a system with many CPU cores
|
||||
// in order to reduce inter-CPU memory ping-pong.
|
||||
// A single goroutine can unpack up to 40 millions of rows per second, while a single block contains up to 8K rows.
|
||||
// So the batch size should be 40M / 8K = 5K.
|
||||
var unpackBatchSize = 5000
|
||||
|
||||
// Unpack unpacks pts to dst.
|
||||
func (pts *packedTimeseries) Unpack(dst *Result, tbf *tmpBlocksFile, tr storage.TimeRange, fetchData bool) error {
|
||||
|
@ -313,14 +397,39 @@ func (pts *packedTimeseries) Unpack(dst *Result, tbf *tmpBlocksFile, tr storage.
|
|||
return nil
|
||||
}
|
||||
|
||||
// Feed workers with work
|
||||
// Spin up local workers.
|
||||
// Do not use global workers pool, since it increases inter-CPU memory ping-poing,
|
||||
// which reduces the scalability on systems with many CPU cores.
|
||||
brsLen := len(pts.brs)
|
||||
workers := brsLen / unpackBatchSize
|
||||
if workers > gomaxprocs {
|
||||
workers = gomaxprocs
|
||||
}
|
||||
if workers < 1 {
|
||||
workers = 1
|
||||
}
|
||||
workChs := make([]chan *unpackWork, workers)
|
||||
var workChsWG sync.WaitGroup
|
||||
for i := 0; i < workers; i++ {
|
||||
// Use unbuffered channel on purpose, since there are high chances
|
||||
// that only a single unpackWork is needed to unpack.
|
||||
// The unbuffered channel should reduce inter-CPU ping-pong in this case,
|
||||
// which should improve the performance in a system with many CPU cores.
|
||||
workChs[i] = make(chan *unpackWork)
|
||||
workChsWG.Add(1)
|
||||
go func(workerID int) {
|
||||
defer workChsWG.Done()
|
||||
unpackWorker(workChs[workerID])
|
||||
}(i)
|
||||
}
|
||||
|
||||
// Feed workers with work
|
||||
upws := make([]*unpackWork, 0, 1+brsLen/unpackBatchSize)
|
||||
upw := getUnpackWork()
|
||||
upw.tbf = tbf
|
||||
for _, br := range pts.brs {
|
||||
if len(upw.ws) >= unpackBatchSize {
|
||||
unpackWorkCh <- upw
|
||||
scheduleUnpackWork(workChs, upw)
|
||||
upws = append(upws, upw)
|
||||
upw = getUnpackWork()
|
||||
upw.tbf = tbf
|
||||
|
@ -330,11 +439,12 @@ func (pts *packedTimeseries) Unpack(dst *Result, tbf *tmpBlocksFile, tr storage.
|
|||
tr: tr,
|
||||
})
|
||||
}
|
||||
unpackWorkCh <- upw
|
||||
scheduleUnpackWork(workChs, upw)
|
||||
upws = append(upws, upw)
|
||||
pts.brs = pts.brs[:0]
|
||||
|
||||
// Wait until work is complete
|
||||
samples := 0
|
||||
sbs := make([]*sortBlock, 0, brsLen)
|
||||
var firstErr error
|
||||
for _, upw := range upws {
|
||||
|
@ -343,14 +453,30 @@ func (pts *packedTimeseries) Unpack(dst *Result, tbf *tmpBlocksFile, tr storage.
|
|||
firstErr = err
|
||||
}
|
||||
if firstErr == nil {
|
||||
sbs = append(sbs, upw.sbs...)
|
||||
} else {
|
||||
for _, sb := range upw.sbs {
|
||||
samples += len(sb.Timestamps)
|
||||
}
|
||||
if *maxSamplesPerSeries <= 0 || samples < *maxSamplesPerSeries {
|
||||
sbs = append(sbs, upw.sbs...)
|
||||
} else {
|
||||
firstErr = fmt.Errorf("cannot process more than %d samples per series; either increase -search.maxSamplesPerSeries "+
|
||||
"or reduce time range for the query", *maxSamplesPerSeries)
|
||||
}
|
||||
}
|
||||
if firstErr != nil {
|
||||
for _, sb := range upw.sbs {
|
||||
putSortBlock(sb)
|
||||
}
|
||||
}
|
||||
putUnpackWork(upw)
|
||||
}
|
||||
|
||||
// Shut down local workers
|
||||
for _, workCh := range workChs {
|
||||
close(workCh)
|
||||
}
|
||||
workChsWG.Wait()
|
||||
|
||||
if firstErr != nil {
|
||||
return firstErr
|
||||
}
|
||||
|
@ -1008,6 +1134,7 @@ func ProcessSearchQuery(sq *storage.SearchQuery, fetchData bool, deadline search
|
|||
m := make(map[string][]blockRef, maxSeriesCount)
|
||||
orderedMetricNames := make([]string, 0, maxSeriesCount)
|
||||
blocksRead := 0
|
||||
samples := 0
|
||||
tbf := getTmpBlocksFile()
|
||||
var buf []byte
|
||||
for sr.NextMetricBlock() {
|
||||
|
@ -1017,7 +1144,14 @@ func ProcessSearchQuery(sq *storage.SearchQuery, fetchData bool, deadline search
|
|||
putStorageSearch(sr)
|
||||
return nil, fmt.Errorf("timeout exceeded while fetching data block #%d from storage: %s", blocksRead, deadline.String())
|
||||
}
|
||||
buf = sr.MetricBlockRef.BlockRef.Marshal(buf[:0])
|
||||
br := sr.MetricBlockRef.BlockRef
|
||||
samples += br.RowsCount()
|
||||
if *maxSamplesPerQuery > 0 && samples > *maxSamplesPerQuery {
|
||||
putTmpBlocksFile(tbf)
|
||||
putStorageSearch(sr)
|
||||
return nil, fmt.Errorf("cannot select more than -search.maxSamplesPerQuery=%d samples; possible solutions: to increase the -search.maxSamplesPerQuery; to reduce time range for the query; to use more specific label filters in order to select lower number of series", *maxSamplesPerQuery)
|
||||
}
|
||||
buf = br.Marshal(buf[:0])
|
||||
addr, err := tbf.WriteBlockRefData(buf)
|
||||
if err != nil {
|
||||
putTmpBlocksFile(tbf)
|
||||
|
@ -1025,15 +1159,13 @@ func ProcessSearchQuery(sq *storage.SearchQuery, fetchData bool, deadline search
|
|||
return nil, fmt.Errorf("cannot write %d bytes to temporary file: %w", len(buf), err)
|
||||
}
|
||||
metricName := sr.MetricBlockRef.MetricName
|
||||
metricNameStrUnsafe := bytesutil.ToUnsafeString(metricName)
|
||||
brs := m[metricNameStrUnsafe]
|
||||
brs := m[string(metricName)]
|
||||
brs = append(brs, blockRef{
|
||||
partRef: sr.MetricBlockRef.BlockRef.PartRef(),
|
||||
partRef: br.PartRef(),
|
||||
addr: addr,
|
||||
})
|
||||
if len(brs) > 1 {
|
||||
// An optimization: do not allocate a string for already existing metricName key in m
|
||||
m[metricNameStrUnsafe] = brs
|
||||
m[string(metricName)] = brs
|
||||
} else {
|
||||
// An optimization for big number of time series with long metricName values:
|
||||
// use only a single copy of metricName for both orderedMetricNames and m.
|
||||
|
|
|
@ -100,9 +100,8 @@ func newBinaryOpFunc(bf func(left, right float64, isBool bool) float64) binaryOp
|
|||
dstValues[j] = bf(a, b, isBool)
|
||||
}
|
||||
}
|
||||
// Optimization: remove time series containing only NaNs.
|
||||
// This is quite common after applying filters like `q > 0`.
|
||||
dst = removeNaNs(dst)
|
||||
// Do not remove time series containing only NaNs, since then the `(foo op bar) default N`
|
||||
// won't work as expected if `(foo op bar)` results to NaN series.
|
||||
return dst, nil
|
||||
}
|
||||
}
|
||||
|
|
|
@ -10,6 +10,7 @@ import (
|
|||
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/searchutils"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/decimal"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/memory"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage"
|
||||
|
@ -763,6 +764,10 @@ func getRollupMemoryLimiter() *memoryLimiter {
|
|||
func evalRollupWithIncrementalAggregate(name string, iafc *incrementalAggrFuncContext, rss *netstorage.Results, rcs []*rollupConfig,
|
||||
preFunc func(values []float64, timestamps []int64), sharedTimestamps []int64, removeMetricGroup bool) ([]*timeseries, error) {
|
||||
err := rss.RunParallel(func(rs *netstorage.Result, workerID uint) error {
|
||||
if name != "default_rollup" {
|
||||
// Remove Prometheus staleness marks, so non-default rollup functions don't hit NaN values.
|
||||
rs.Values, rs.Timestamps = dropStaleNaNs(rs.Values, rs.Timestamps)
|
||||
}
|
||||
preFunc(rs.Values, rs.Timestamps)
|
||||
ts := getTimeseries()
|
||||
defer putTimeseries(ts)
|
||||
|
@ -796,6 +801,10 @@ func evalRollupNoIncrementalAggregate(name string, rss *netstorage.Results, rcs
|
|||
tss := make([]*timeseries, 0, rss.Len()*len(rcs))
|
||||
var tssLock sync.Mutex
|
||||
err := rss.RunParallel(func(rs *netstorage.Result, workerID uint) error {
|
||||
if name != "default_rollup" {
|
||||
// Remove Prometheus staleness marks, so non-default rollup functions don't hit NaN values.
|
||||
rs.Values, rs.Timestamps = dropStaleNaNs(rs.Values, rs.Timestamps)
|
||||
}
|
||||
preFunc(rs.Values, rs.Timestamps)
|
||||
for _, rc := range rcs {
|
||||
if tsm := newTimeseriesMap(name, sharedTimestamps, &rs.MetricName); tsm != nil {
|
||||
|
@ -891,3 +900,28 @@ func toTagFilter(dst *storage.TagFilter, src *metricsql.LabelFilter) {
|
|||
dst.IsRegexp = src.IsRegexp
|
||||
dst.IsNegative = src.IsNegative
|
||||
}
|
||||
|
||||
func dropStaleNaNs(values []float64, timestamps []int64) ([]float64, []int64) {
|
||||
hasStaleSamples := false
|
||||
for _, v := range values {
|
||||
if decimal.IsStaleNaN(v) {
|
||||
hasStaleSamples = true
|
||||
break
|
||||
}
|
||||
}
|
||||
if !hasStaleSamples {
|
||||
// Fast path: values have no Prometheus staleness marks.
|
||||
return values, timestamps
|
||||
}
|
||||
// Slow path: drop Prometheus staleness marks from values.
|
||||
dstValues := values[:0]
|
||||
dstTimestamps := timestamps[:0]
|
||||
for i, v := range values {
|
||||
if decimal.IsStaleNaN(v) {
|
||||
continue
|
||||
}
|
||||
dstValues = append(dstValues, v)
|
||||
dstTimestamps = append(dstTimestamps, timestamps[i])
|
||||
}
|
||||
return dstValues, dstTimestamps
|
||||
}
|
||||
|
|
|
@ -698,6 +698,39 @@ func TestExecSuccess(t *testing.T) {
|
|||
resultExpected := []netstorage.Result{}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run("present_over_time(time())", func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `present_over_time(time())`
|
||||
r := netstorage.Result{
|
||||
MetricName: metricNameExpected,
|
||||
Values: []float64{1, 1, 1, 1, 1, 1},
|
||||
Timestamps: timestampsExpected,
|
||||
}
|
||||
resultExpected := []netstorage.Result{r}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run("present_over_time(time()[100:300])", func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `present_over_time(time()[100:300])`
|
||||
r := netstorage.Result{
|
||||
MetricName: metricNameExpected,
|
||||
Values: []float64{nan, 1, nan, nan, 1, nan},
|
||||
Timestamps: timestampsExpected,
|
||||
}
|
||||
resultExpected := []netstorage.Result{r}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run("present_over_time(time()<10m)", func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `present_over_time(time()<1600)`
|
||||
r := netstorage.Result{
|
||||
MetricName: metricNameExpected,
|
||||
Values: []float64{1, 1, 1, nan, nan, nan},
|
||||
Timestamps: timestampsExpected,
|
||||
}
|
||||
resultExpected := []netstorage.Result{r}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run("absent(123)", func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `absent(123)`
|
||||
|
@ -1044,6 +1077,21 @@ func TestExecSuccess(t *testing.T) {
|
|||
resultExpected := []netstorage.Result{r}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run("default_for_nan_series", func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `label_set(0, "foo", "bar")/0 default 7`
|
||||
r := netstorage.Result{
|
||||
MetricName: metricNameExpected,
|
||||
Values: []float64{7, 7, 7, 7, 7, 7},
|
||||
Timestamps: timestampsExpected,
|
||||
}
|
||||
r.MetricName.Tags = []storage.Tag{{
|
||||
Key: []byte("foo"),
|
||||
Value: []byte("bar"),
|
||||
}}
|
||||
resultExpected := []netstorage.Result{r}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run(`alias()`, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `alias(time(), "foobar")`
|
||||
|
@ -1691,10 +1739,12 @@ func TestExecSuccess(t *testing.T) {
|
|||
Values: []float64{1123.456, 1323.456, 1523.456, 1723.456, 1923.456, 2123.456},
|
||||
Timestamps: timestampsExpected,
|
||||
}
|
||||
r2.MetricName.Tags = []storage.Tag{{
|
||||
Key: []byte("foo"),
|
||||
Value: []byte("123.456"),
|
||||
}}
|
||||
r2.MetricName.Tags = []storage.Tag{
|
||||
{
|
||||
Key: []byte("foo"),
|
||||
Value: []byte("123.456"),
|
||||
},
|
||||
}
|
||||
resultExpected := []netstorage.Result{r1, r2}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
|
@ -6941,19 +6991,21 @@ func testResultsEqual(t *testing.T, result, resultExpected []netstorage.Result)
|
|||
func testMetricNamesEqual(t *testing.T, mn, mnExpected *storage.MetricName, pos int) {
|
||||
t.Helper()
|
||||
if string(mn.MetricGroup) != string(mnExpected.MetricGroup) {
|
||||
t.Fatalf(`unexpected MetricGroup at #%d; got %q; want %q`, pos, mn.MetricGroup, mnExpected.MetricGroup)
|
||||
t.Fatalf(`unexpected MetricGroup at #%d; got %q; want %q; metricGot=%s, metricExpected=%s`,
|
||||
pos, mn.MetricGroup, mnExpected.MetricGroup, mn.String(), mnExpected.String())
|
||||
}
|
||||
if len(mn.Tags) != len(mnExpected.Tags) {
|
||||
t.Fatalf(`unexpected tags count at #%d; got %d; want %d`, pos, len(mn.Tags), len(mnExpected.Tags))
|
||||
t.Fatalf(`unexpected tags count at #%d; got %d; want %d; metricGot=%s, metricExpected=%s`, pos, len(mn.Tags), len(mnExpected.Tags), mn.String(), mnExpected.String())
|
||||
}
|
||||
for i := range mn.Tags {
|
||||
tag := &mn.Tags[i]
|
||||
tagExpected := &mnExpected.Tags[i]
|
||||
if string(tag.Key) != string(tagExpected.Key) {
|
||||
t.Fatalf(`unexpected tag key at #%d,%d; got %q; want %q`, pos, i, tag.Key, tagExpected.Key)
|
||||
t.Fatalf(`unexpected tag key at #%d,%d; got %q; want %q; metricGot=%s, metricExpected=%s`, pos, i, tag.Key, tagExpected.Key, mn.String(), mnExpected.String())
|
||||
}
|
||||
if string(tag.Value) != string(tagExpected.Value) {
|
||||
t.Fatalf(`unexpected tag value for key %q at #%d,%d; got %q; want %q`, tag.Key, pos, i, tag.Value, tagExpected.Value)
|
||||
t.Fatalf(`unexpected tag value for key %q at #%d,%d; got %q; want %q; metricGot=%s, metricExpected=%s`,
|
||||
tag.Key, pos, i, tag.Value, tagExpected.Value, mn.String(), mnExpected.String())
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
|
@ -42,6 +42,7 @@ var rollupFuncs = map[string]newRollupFunc{
|
|||
"stddev_over_time": newRollupFuncOneArg(rollupStddev),
|
||||
"stdvar_over_time": newRollupFuncOneArg(rollupStdvar),
|
||||
"absent_over_time": newRollupFuncOneArg(rollupAbsent),
|
||||
"present_over_time": newRollupFuncOneArg(rollupPresent),
|
||||
|
||||
// Additional rollup funcs.
|
||||
"default_rollup": newRollupFuncOneArg(rollupDefault), // default rollup func
|
||||
|
@ -97,23 +98,24 @@ var rollupFuncs = map[string]newRollupFunc{
|
|||
// rollupAggrFuncs are functions that can be passed to `aggr_over_time()`
|
||||
var rollupAggrFuncs = map[string]rollupFunc{
|
||||
// Standard rollup funcs from PromQL.
|
||||
"changes": rollupChanges,
|
||||
"delta": rollupDelta,
|
||||
"deriv": rollupDerivSlow,
|
||||
"deriv_fast": rollupDerivFast,
|
||||
"idelta": rollupIdelta,
|
||||
"increase": rollupDelta, // + rollupFuncsRemoveCounterResets
|
||||
"irate": rollupIderiv, // + rollupFuncsRemoveCounterResets
|
||||
"rate": rollupDerivFast, // + rollupFuncsRemoveCounterResets
|
||||
"resets": rollupResets,
|
||||
"avg_over_time": rollupAvg,
|
||||
"min_over_time": rollupMin,
|
||||
"max_over_time": rollupMax,
|
||||
"sum_over_time": rollupSum,
|
||||
"count_over_time": rollupCount,
|
||||
"stddev_over_time": rollupStddev,
|
||||
"stdvar_over_time": rollupStdvar,
|
||||
"absent_over_time": rollupAbsent,
|
||||
"changes": rollupChanges,
|
||||
"delta": rollupDelta,
|
||||
"deriv": rollupDerivSlow,
|
||||
"deriv_fast": rollupDerivFast,
|
||||
"idelta": rollupIdelta,
|
||||
"increase": rollupDelta, // + rollupFuncsRemoveCounterResets
|
||||
"irate": rollupIderiv, // + rollupFuncsRemoveCounterResets
|
||||
"rate": rollupDerivFast, // + rollupFuncsRemoveCounterResets
|
||||
"resets": rollupResets,
|
||||
"avg_over_time": rollupAvg,
|
||||
"min_over_time": rollupMin,
|
||||
"max_over_time": rollupMax,
|
||||
"sum_over_time": rollupSum,
|
||||
"count_over_time": rollupCount,
|
||||
"stddev_over_time": rollupStddev,
|
||||
"stdvar_over_time": rollupStdvar,
|
||||
"absent_over_time": rollupAbsent,
|
||||
"present_over_time": rollupPresent,
|
||||
|
||||
// Additional rollup funcs.
|
||||
"range_over_time": rollupRange,
|
||||
|
@ -157,6 +159,7 @@ var rollupFuncsCannotAdjustWindow = map[string]bool{
|
|||
"stddev_over_time": true,
|
||||
"stdvar_over_time": true,
|
||||
"absent_over_time": true,
|
||||
"present_over_time": true,
|
||||
"sum2_over_time": true,
|
||||
"geomean_over_time": true,
|
||||
"distinct_over_time": true,
|
||||
|
@ -268,16 +271,16 @@ func getRollupConfigs(name string, rf rollupFunc, expr metricsql.Expr, start, en
|
|||
}
|
||||
newRollupConfig := func(rf rollupFunc, tagValue string) *rollupConfig {
|
||||
return &rollupConfig{
|
||||
TagValue: tagValue,
|
||||
Func: rf,
|
||||
Start: start,
|
||||
End: end,
|
||||
Step: step,
|
||||
Window: window,
|
||||
MayAdjustWindow: !rollupFuncsCannotAdjustWindow[name],
|
||||
CanDropLastSample: name == "default_rollup",
|
||||
LookbackDelta: lookbackDelta,
|
||||
Timestamps: sharedTimestamps,
|
||||
TagValue: tagValue,
|
||||
Func: rf,
|
||||
Start: start,
|
||||
End: end,
|
||||
Step: step,
|
||||
Window: window,
|
||||
MayAdjustWindow: !rollupFuncsCannotAdjustWindow[name],
|
||||
LookbackDelta: lookbackDelta,
|
||||
Timestamps: sharedTimestamps,
|
||||
isDefaultRollup: name == "default_rollup",
|
||||
}
|
||||
}
|
||||
appendRollupConfigs := func(dst []*rollupConfig) []*rollupConfig {
|
||||
|
@ -399,15 +402,13 @@ type rollupConfig struct {
|
|||
// when using window smaller than 2 x scrape_interval.
|
||||
MayAdjustWindow bool
|
||||
|
||||
// Whether the last sample can be dropped during rollup calculations.
|
||||
// The last sample can be dropped for `default_rollup()` function only.
|
||||
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/748 .
|
||||
CanDropLastSample bool
|
||||
|
||||
Timestamps []int64
|
||||
|
||||
// LoookbackDelta is the analog to `-query.lookback-delta` from Prometheus world.
|
||||
LookbackDelta int64
|
||||
|
||||
// Whether default_rollup is used.
|
||||
isDefaultRollup bool
|
||||
}
|
||||
|
||||
var (
|
||||
|
@ -516,7 +517,7 @@ func (rc *rollupConfig) doInternal(dstValues []float64, tsm *timeseriesMap, valu
|
|||
window := rc.Window
|
||||
if window <= 0 {
|
||||
window = rc.Step
|
||||
if rc.CanDropLastSample && rc.LookbackDelta > 0 && window > rc.LookbackDelta {
|
||||
if rc.isDefaultRollup && rc.LookbackDelta > 0 && window > rc.LookbackDelta {
|
||||
// Implicit window exceeds -search.maxStalenessInterval, so limit it to -search.maxStalenessInterval
|
||||
// according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/784
|
||||
window = rc.LookbackDelta
|
||||
|
@ -534,10 +535,6 @@ func (rc *rollupConfig) doInternal(dstValues []float64, tsm *timeseriesMap, valu
|
|||
j := 0
|
||||
ni := 0
|
||||
nj := 0
|
||||
stalenessInterval := int64(float64(scrapeInterval) * 0.9)
|
||||
// Do not drop trailing data points for queries, which return 2 or 1 point (aka instant queries).
|
||||
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/845
|
||||
canDropLastSample := rc.CanDropLastSample && len(rc.Timestamps) > 2
|
||||
f := rc.Func
|
||||
for _, tEnd := range rc.Timestamps {
|
||||
tStart := tEnd - window
|
||||
|
@ -557,16 +554,6 @@ func (rc *rollupConfig) doInternal(dstValues []float64, tsm *timeseriesMap, valu
|
|||
}
|
||||
rfa.values = values[i:j]
|
||||
rfa.timestamps = timestamps[i:j]
|
||||
if canDropLastSample && j == len(timestamps) && j > 0 && (tEnd-timestamps[j-1] > stalenessInterval || i == j && len(timestamps) == 1) {
|
||||
// Drop trailing data points in the following cases:
|
||||
// - if the distance between the last raw sample and tEnd exceeds stalenessInterval
|
||||
// - if time series contains only a single raw sample
|
||||
// This should prevent from double counting when a label changes in time series (for instance,
|
||||
// during new deployment in K8S). See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/748
|
||||
rfa.prevValue = nan
|
||||
rfa.values = nil
|
||||
rfa.timestamps = nil
|
||||
}
|
||||
if i > 0 {
|
||||
rfa.realPrevValue = values[i-1]
|
||||
} else {
|
||||
|
@ -1284,6 +1271,15 @@ func rollupAbsent(rfa *rollupFuncArg) float64 {
|
|||
return nan
|
||||
}
|
||||
|
||||
func rollupPresent(rfa *rollupFuncArg) float64 {
|
||||
// There is no need in handling NaNs here, since they must be cleaned up
|
||||
// before calling rollup funcs.
|
||||
if len(rfa.values) > 0 {
|
||||
return 1
|
||||
}
|
||||
return nan
|
||||
}
|
||||
|
||||
func rollupCount(rfa *rollupFuncArg) float64 {
|
||||
// There is no need in handling NaNs here, since they must be cleaned up
|
||||
// before calling rollup funcs.
|
||||
|
@ -1814,11 +1810,20 @@ func rollupFirst(rfa *rollupFuncArg) float64 {
|
|||
return values[0]
|
||||
}
|
||||
|
||||
var rollupDefault = rollupLast
|
||||
func rollupDefault(rfa *rollupFuncArg) float64 {
|
||||
values := rfa.values
|
||||
if len(values) == 0 {
|
||||
// Do not take into account rfa.prevValue, since it may lead
|
||||
// to inconsistent results comparing to Prometheus on broken time series
|
||||
// with irregular data points.
|
||||
return nan
|
||||
}
|
||||
// Intentionally do not skip the possible last Prometheus staleness mark.
|
||||
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1526 .
|
||||
return values[len(values)-1]
|
||||
}
|
||||
|
||||
func rollupLast(rfa *rollupFuncArg) float64 {
|
||||
// There is no need in handling NaNs here, since they must be cleaned up
|
||||
// before calling rollup funcs.
|
||||
values := rfa.values
|
||||
if len(values) == 0 {
|
||||
// Do not take into account rfa.prevValue, since it may lead
|
||||
|
|
|
@ -1823,8 +1823,15 @@ func newTransformFuncSort(isDesc bool) transformFunc {
|
|||
b := rvs[j].Values
|
||||
n := len(a) - 1
|
||||
for n >= 0 {
|
||||
if !math.IsNaN(a[n]) && !math.IsNaN(b[n]) && a[n] != b[n] {
|
||||
break
|
||||
if !math.IsNaN(a[n]) {
|
||||
if math.IsNaN(b[n]) {
|
||||
return false
|
||||
}
|
||||
if a[n] != b[n] {
|
||||
break
|
||||
}
|
||||
} else if !math.IsNaN(b[n]) {
|
||||
return true
|
||||
}
|
||||
n--
|
||||
}
|
||||
|
|
|
@ -1,3 +1,4 @@
|
|||
//go:build go1.15
|
||||
// +build go1.15
|
||||
|
||||
package promql
|
||||
|
|
|
@ -14,8 +14,7 @@ import (
|
|||
var (
|
||||
lastQueriesCount = flag.Int("search.queryStats.lastQueriesCount", 20000, "Query stats for /api/v1/status/top_queries is tracked on this number of last queries. "+
|
||||
"Zero value disables query stats tracking")
|
||||
minQueryDuration = flag.Duration("search.queryStats.minQueryDuration", 0, "The minimum duration for queries to track in query stats at /api/v1/status/top_queries. "+
|
||||
"Queries with lower duration are ignored in query stats")
|
||||
minQueryDuration = flag.Duration("search.queryStats.minQueryDuration", time.Millisecond, "The minimum duration for queries to track in query stats at /api/v1/status/top_queries. Queries with lower duration are ignored in query stats")
|
||||
)
|
||||
|
||||
var (
|
||||
|
|
|
@ -1,17 +1,17 @@
|
|||
{
|
||||
"files": {
|
||||
"main.css": "./static/css/main.0ba440d3.chunk.css",
|
||||
"main.js": "./static/js/main.ffd27a2f.chunk.js",
|
||||
"runtime-main.js": "./static/js/runtime-main.50ad8b45.js",
|
||||
"static/js/2.3cdac8ea.chunk.js": "./static/js/2.3cdac8ea.chunk.js",
|
||||
"static/js/3.d52da3ae.chunk.js": "./static/js/3.d52da3ae.chunk.js",
|
||||
"main.css": "./static/css/main.6452b577.chunk.css",
|
||||
"main.js": "./static/js/main.e5416b79.chunk.js",
|
||||
"runtime-main.js": "./static/js/runtime-main.0270250c.js",
|
||||
"static/js/2.63374ed0.chunk.js": "./static/js/2.63374ed0.chunk.js",
|
||||
"static/js/3.a5d02d16.chunk.js": "./static/js/3.a5d02d16.chunk.js",
|
||||
"index.html": "./index.html",
|
||||
"static/js/2.3cdac8ea.chunk.js.LICENSE.txt": "./static/js/2.3cdac8ea.chunk.js.LICENSE.txt"
|
||||
"static/js/2.63374ed0.chunk.js.LICENSE.txt": "./static/js/2.63374ed0.chunk.js.LICENSE.txt"
|
||||
},
|
||||
"entrypoints": [
|
||||
"static/js/runtime-main.50ad8b45.js",
|
||||
"static/js/2.3cdac8ea.chunk.js",
|
||||
"static/css/main.0ba440d3.chunk.css",
|
||||
"static/js/main.ffd27a2f.chunk.js"
|
||||
"static/js/runtime-main.0270250c.js",
|
||||
"static/js/2.63374ed0.chunk.js",
|
||||
"static/css/main.6452b577.chunk.css",
|
||||
"static/js/main.e5416b79.chunk.js"
|
||||
]
|
||||
}
|
|
@ -1 +1 @@
|
|||
<!doctype html><html lang="en"><head><meta charset="utf-8"/><link rel="icon" href="./favicon.ico"/><meta name="viewport" content="width=device-width,initial-scale=1"/><meta name="theme-color" content="#000000"/><meta name="description" content="VM-UI is a metric explorer for Victoria Metrics"/><link rel="apple-touch-icon" href="./apple-touch-icon.png"/><link rel="icon" type="image/png" sizes="32x32" href="./favicon-32x32.png"><link rel="manifest" href="./manifest.json"/><title>VM UI</title><link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Roboto:300,400,500,700&display=swap"/><link href="./static/css/main.0ba440d3.chunk.css" rel="stylesheet"></head><body><noscript>You need to enable JavaScript to run this app.</noscript><div id="root"></div><script>!function(e){function r(r){for(var n,i,a=r[0],c=r[1],l=r[2],s=0,p=[];s<a.length;s++)i=a[s],Object.prototype.hasOwnProperty.call(o,i)&&o[i]&&p.push(o[i][0]),o[i]=0;for(n in c)Object.prototype.hasOwnProperty.call(c,n)&&(e[n]=c[n]);for(f&&f(r);p.length;)p.shift()();return u.push.apply(u,l||[]),t()}function t(){for(var e,r=0;r<u.length;r++){for(var t=u[r],n=!0,a=1;a<t.length;a++){var c=t[a];0!==o[c]&&(n=!1)}n&&(u.splice(r--,1),e=i(i.s=t[0]))}return e}var n={},o={1:0},u=[];function i(r){if(n[r])return n[r].exports;var t=n[r]={i:r,l:!1,exports:{}};return e[r].call(t.exports,t,t.exports,i),t.l=!0,t.exports}i.e=function(e){var r=[],t=o[e];if(0!==t)if(t)r.push(t[2]);else{var n=new Promise((function(r,n){t=o[e]=[r,n]}));r.push(t[2]=n);var u,a=document.createElement("script");a.charset="utf-8",a.timeout=120,i.nc&&a.setAttribute("nonce",i.nc),a.src=function(e){return i.p+"static/js/"+({}[e]||e)+"."+{3:"d52da3ae"}[e]+".chunk.js"}(e);var c=new Error;u=function(r){a.onerror=a.onload=null,clearTimeout(l);var t=o[e];if(0!==t){if(t){var n=r&&("load"===r.type?"missing":r.type),u=r&&r.target&&r.target.src;c.message="Loading chunk "+e+" failed.\n("+n+": "+u+")",c.name="ChunkLoadError",c.type=n,c.request=u,t[1](c)}o[e]=void 0}};var l=setTimeout((function(){u({type:"timeout",target:a})}),12e4);a.onerror=a.onload=u,document.head.appendChild(a)}return Promise.all(r)},i.m=e,i.c=n,i.d=function(e,r,t){i.o(e,r)||Object.defineProperty(e,r,{enumerable:!0,get:t})},i.r=function(e){"undefined"!=typeof Symbol&&Symbol.toStringTag&&Object.defineProperty(e,Symbol.toStringTag,{value:"Module"}),Object.defineProperty(e,"__esModule",{value:!0})},i.t=function(e,r){if(1&r&&(e=i(e)),8&r)return e;if(4&r&&"object"==typeof e&&e&&e.__esModule)return e;var t=Object.create(null);if(i.r(t),Object.defineProperty(t,"default",{enumerable:!0,value:e}),2&r&&"string"!=typeof e)for(var n in e)i.d(t,n,function(r){return e[r]}.bind(null,n));return t},i.n=function(e){var r=e&&e.__esModule?function(){return e.default}:function(){return e};return i.d(r,"a",r),r},i.o=function(e,r){return Object.prototype.hasOwnProperty.call(e,r)},i.p="./",i.oe=function(e){throw console.error(e),e};var a=this.webpackJsonpvmui=this.webpackJsonpvmui||[],c=a.push.bind(a);a.push=r,a=a.slice();for(var l=0;l<a.length;l++)r(a[l]);var f=c;t()}([])</script><script src="./static/js/2.3cdac8ea.chunk.js"></script><script src="./static/js/main.ffd27a2f.chunk.js"></script></body></html>
|
||||
<!doctype html><html lang="en"><head><meta charset="utf-8"/><link rel="icon" href="./favicon.ico"/><meta name="viewport" content="width=device-width,initial-scale=1"/><meta name="theme-color" content="#000000"/><meta name="description" content="VM-UI is a metric explorer for Victoria Metrics"/><link rel="apple-touch-icon" href="./apple-touch-icon.png"/><link rel="icon" type="image/png" sizes="32x32" href="./favicon-32x32.png"><link rel="manifest" href="./manifest.json"/><title>VM UI</title><link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Roboto:300,400,500,700&display=swap"/><link href="./static/css/main.6452b577.chunk.css" rel="stylesheet"></head><body><noscript>You need to enable JavaScript to run this app.</noscript><div id="root"></div><script>!function(e){function r(r){for(var n,i,a=r[0],c=r[1],l=r[2],s=0,p=[];s<a.length;s++)i=a[s],Object.prototype.hasOwnProperty.call(o,i)&&o[i]&&p.push(o[i][0]),o[i]=0;for(n in c)Object.prototype.hasOwnProperty.call(c,n)&&(e[n]=c[n]);for(f&&f(r);p.length;)p.shift()();return u.push.apply(u,l||[]),t()}function t(){for(var e,r=0;r<u.length;r++){for(var t=u[r],n=!0,a=1;a<t.length;a++){var c=t[a];0!==o[c]&&(n=!1)}n&&(u.splice(r--,1),e=i(i.s=t[0]))}return e}var n={},o={1:0},u=[];function i(r){if(n[r])return n[r].exports;var t=n[r]={i:r,l:!1,exports:{}};return e[r].call(t.exports,t,t.exports,i),t.l=!0,t.exports}i.e=function(e){var r=[],t=o[e];if(0!==t)if(t)r.push(t[2]);else{var n=new Promise((function(r,n){t=o[e]=[r,n]}));r.push(t[2]=n);var u,a=document.createElement("script");a.charset="utf-8",a.timeout=120,i.nc&&a.setAttribute("nonce",i.nc),a.src=function(e){return i.p+"static/js/"+({}[e]||e)+"."+{3:"a5d02d16"}[e]+".chunk.js"}(e);var c=new Error;u=function(r){a.onerror=a.onload=null,clearTimeout(l);var t=o[e];if(0!==t){if(t){var n=r&&("load"===r.type?"missing":r.type),u=r&&r.target&&r.target.src;c.message="Loading chunk "+e+" failed.\n("+n+": "+u+")",c.name="ChunkLoadError",c.type=n,c.request=u,t[1](c)}o[e]=void 0}};var l=setTimeout((function(){u({type:"timeout",target:a})}),12e4);a.onerror=a.onload=u,document.head.appendChild(a)}return Promise.all(r)},i.m=e,i.c=n,i.d=function(e,r,t){i.o(e,r)||Object.defineProperty(e,r,{enumerable:!0,get:t})},i.r=function(e){"undefined"!=typeof Symbol&&Symbol.toStringTag&&Object.defineProperty(e,Symbol.toStringTag,{value:"Module"}),Object.defineProperty(e,"__esModule",{value:!0})},i.t=function(e,r){if(1&r&&(e=i(e)),8&r)return e;if(4&r&&"object"==typeof e&&e&&e.__esModule)return e;var t=Object.create(null);if(i.r(t),Object.defineProperty(t,"default",{enumerable:!0,value:e}),2&r&&"string"!=typeof e)for(var n in e)i.d(t,n,function(r){return e[r]}.bind(null,n));return t},i.n=function(e){var r=e&&e.__esModule?function(){return e.default}:function(){return e};return i.d(r,"a",r),r},i.o=function(e,r){return Object.prototype.hasOwnProperty.call(e,r)},i.p="./",i.oe=function(e){throw console.error(e),e};var a=this.webpackJsonpvmui=this.webpackJsonpvmui||[],c=a.push.bind(a);a.push=r,a=a.slice();for(var l=0;l<a.length;l++)r(a[l]);var f=c;t()}([])</script><script src="./static/js/2.63374ed0.chunk.js"></script><script src="./static/js/main.e5416b79.chunk.js"></script></body></html>
|
|
@ -1 +1 @@
|
|||
body{font-family:-apple-system,BlinkMacSystemFont,"Segoe UI","Roboto","Oxygen","Ubuntu","Cantarell","Fira Sans","Droid Sans","Helvetica Neue",sans-serif;-webkit-font-smoothing:antialiased;-moz-osx-font-smoothing:grayscale}code{font-family:source-code-pro,Menlo,Monaco,Consolas,"Courier New",monospace}.MuiAccordionSummary-content{margin:10px 0!important}.cm-activeLine{background-color:inherit!important}.cm-wrap{border-radius:4px;border:1px solid #b9b9b9;font-size:10px}.one-line-scroll .cm-wrap{height:24px}.line{fill:none;stroke-width:2}.overlay{fill:none;pointer-events:all}.dot{fill:#621773;stroke:#fff}
|
||||
body{font-family:-apple-system,BlinkMacSystemFont,"Segoe UI","Roboto","Oxygen","Ubuntu","Cantarell","Fira Sans","Droid Sans","Helvetica Neue",sans-serif;-webkit-font-smoothing:antialiased;-moz-osx-font-smoothing:grayscale}code{font-family:source-code-pro,Menlo,Monaco,Consolas,"Courier New",monospace}.MuiAccordionSummary-content{margin:10px 0!important}.cm-activeLine{background-color:inherit!important}.cm-wrap{border-radius:4px;border:1px solid #b9b9b9;font-size:10px}.one-line-scroll .cm-wrap{height:24px}.cm-content,.cm-gutter{min-height:51px}.one-line-scroll .cm-content,.one-line-scroll .cm-gutter{min-height:auto}.line{fill:none;stroke-width:2}.overlay{fill:none;pointer-events:all}.dot{fill:#621773;stroke:#fff}
|
2
app/vmselect/vmui/static/js/2.63374ed0.chunk.js
Normal file
|
@ -1 +1 @@
|
|||
(this.webpackJsonpvmui=this.webpackJsonpvmui||[]).push([[3],{430:function(t,n,e){"use strict";e.r(n),e.d(n,"getCLS",(function(){return l})),e.d(n,"getFCP",(function(){return g})),e.d(n,"getFID",(function(){return h})),e.d(n,"getLCP",(function(){return y})),e.d(n,"getTTFB",(function(){return F}));var i,a,r=function(){return"".concat(Date.now(),"-").concat(Math.floor(8999999999999*Math.random())+1e12)},o=function(t){var n=arguments.length>1&&void 0!==arguments[1]?arguments[1]:-1;return{name:t,value:n,delta:0,entries:[],id:r(),isFinal:!1}},u=function(t,n){try{if(PerformanceObserver.supportedEntryTypes.includes(t)){var e=new PerformanceObserver((function(t){return t.getEntries().map(n)}));return e.observe({type:t,buffered:!0}),e}}catch(t){}},s=!1,c=!1,d=function(t){s=!t.persisted},f=function(){addEventListener("pagehide",d),addEventListener("beforeunload",(function(){}))},p=function(t){var n=arguments.length>1&&void 0!==arguments[1]&&arguments[1];c||(f(),c=!0),addEventListener("visibilitychange",(function(n){var e=n.timeStamp;"hidden"===document.visibilityState&&t({timeStamp:e,isUnloading:s})}),{capture:!0,once:n})},v=function(t,n,e,i){var a;return function(){e&&n.isFinal&&e.disconnect(),n.value>=0&&(i||n.isFinal||"hidden"===document.visibilityState)&&(n.delta=n.value-(a||0),(n.delta||n.isFinal||void 0===a)&&(t(n),a=n.value))}},l=function(t){var n,e=arguments.length>1&&void 0!==arguments[1]&&arguments[1],i=o("CLS",0),a=function(t){t.hadRecentInput||(i.value+=t.value,i.entries.push(t),n())},r=u("layout-shift",a);r&&(n=v(t,i,r,e),p((function(t){var e=t.isUnloading;r.takeRecords().map(a),e&&(i.isFinal=!0),n()})))},m=function(){return void 0===i&&(i="hidden"===document.visibilityState?0:1/0,p((function(t){var n=t.timeStamp;return i=n}),!0)),{get timeStamp(){return i}}},g=function(t){var n,e=o("FCP"),i=m(),a=u("paint",(function(t){"first-contentful-paint"===t.name&&t.startTime<i.timeStamp&&(e.value=t.startTime,e.isFinal=!0,e.entries.push(t),n())}));a&&(n=v(t,e,a))},h=function(t){var n=o("FID"),e=m(),i=function(t){t.startTime<e.timeStamp&&(n.value=t.processingStart-t.startTime,n.entries.push(t),n.isFinal=!0,r())},a=u("first-input",i),r=v(t,n,a);a?p((function(){a.takeRecords().map(i),a.disconnect()}),!0):window.perfMetrics&&window.perfMetrics.onFirstInputDelay&&window.perfMetrics.onFirstInputDelay((function(t,i){i.timeStamp<e.timeStamp&&(n.value=t,n.isFinal=!0,n.entries=[{entryType:"first-input",name:i.type,target:i.target,cancelable:i.cancelable,startTime:i.timeStamp,processingStart:i.timeStamp+t}],r())}))},S=function(){return a||(a=new Promise((function(t){return["scroll","keydown","pointerdown"].map((function(n){addEventListener(n,t,{once:!0,passive:!0,capture:!0})}))}))),a},y=function(t){var n,e=arguments.length>1&&void 0!==arguments[1]&&arguments[1],i=o("LCP"),a=m(),r=function(t){var e=t.startTime;e<a.timeStamp?(i.value=e,i.entries.push(t)):i.isFinal=!0,n()},s=u("largest-contentful-paint",r);if(s){n=v(t,i,s,e);var c=function(){i.isFinal||(s.takeRecords().map(r),i.isFinal=!0,n())};S().then(c),p(c,!0)}},F=function(t){var n,e=o("TTFB");n=function(){try{var n=performance.getEntriesByType("navigation")[0]||function(){var t=performance.timing,n={entryType:"navigation",startTime:0};for(var e in t)"navigationStart"!==e&&"toJSON"!==e&&(n[e]=Math.max(t[e]-t.navigationStart,0));return n}();e.value=e.delta=n.responseStart,e.entries=[n],e.isFinal=!0,t(e)}catch(t){}},"complete"===document.readyState?setTimeout(n,0):addEventListener("pageshow",n)}}}]);
|
||||
(this.webpackJsonpvmui=this.webpackJsonpvmui||[]).push([[3],{431:function(t,n,e){"use strict";e.r(n),e.d(n,"getCLS",(function(){return l})),e.d(n,"getFCP",(function(){return g})),e.d(n,"getFID",(function(){return h})),e.d(n,"getLCP",(function(){return y})),e.d(n,"getTTFB",(function(){return F}));var i,a,r=function(){return"".concat(Date.now(),"-").concat(Math.floor(8999999999999*Math.random())+1e12)},o=function(t){var n=arguments.length>1&&void 0!==arguments[1]?arguments[1]:-1;return{name:t,value:n,delta:0,entries:[],id:r(),isFinal:!1}},u=function(t,n){try{if(PerformanceObserver.supportedEntryTypes.includes(t)){var e=new PerformanceObserver((function(t){return t.getEntries().map(n)}));return e.observe({type:t,buffered:!0}),e}}catch(t){}},s=!1,c=!1,d=function(t){s=!t.persisted},f=function(){addEventListener("pagehide",d),addEventListener("beforeunload",(function(){}))},p=function(t){var n=arguments.length>1&&void 0!==arguments[1]&&arguments[1];c||(f(),c=!0),addEventListener("visibilitychange",(function(n){var e=n.timeStamp;"hidden"===document.visibilityState&&t({timeStamp:e,isUnloading:s})}),{capture:!0,once:n})},v=function(t,n,e,i){var a;return function(){e&&n.isFinal&&e.disconnect(),n.value>=0&&(i||n.isFinal||"hidden"===document.visibilityState)&&(n.delta=n.value-(a||0),(n.delta||n.isFinal||void 0===a)&&(t(n),a=n.value))}},l=function(t){var n,e=arguments.length>1&&void 0!==arguments[1]&&arguments[1],i=o("CLS",0),a=function(t){t.hadRecentInput||(i.value+=t.value,i.entries.push(t),n())},r=u("layout-shift",a);r&&(n=v(t,i,r,e),p((function(t){var e=t.isUnloading;r.takeRecords().map(a),e&&(i.isFinal=!0),n()})))},m=function(){return void 0===i&&(i="hidden"===document.visibilityState?0:1/0,p((function(t){var n=t.timeStamp;return i=n}),!0)),{get timeStamp(){return i}}},g=function(t){var n,e=o("FCP"),i=m(),a=u("paint",(function(t){"first-contentful-paint"===t.name&&t.startTime<i.timeStamp&&(e.value=t.startTime,e.isFinal=!0,e.entries.push(t),n())}));a&&(n=v(t,e,a))},h=function(t){var n=o("FID"),e=m(),i=function(t){t.startTime<e.timeStamp&&(n.value=t.processingStart-t.startTime,n.entries.push(t),n.isFinal=!0,r())},a=u("first-input",i),r=v(t,n,a);a?p((function(){a.takeRecords().map(i),a.disconnect()}),!0):window.perfMetrics&&window.perfMetrics.onFirstInputDelay&&window.perfMetrics.onFirstInputDelay((function(t,i){i.timeStamp<e.timeStamp&&(n.value=t,n.isFinal=!0,n.entries=[{entryType:"first-input",name:i.type,target:i.target,cancelable:i.cancelable,startTime:i.timeStamp,processingStart:i.timeStamp+t}],r())}))},S=function(){return a||(a=new Promise((function(t){return["scroll","keydown","pointerdown"].map((function(n){addEventListener(n,t,{once:!0,passive:!0,capture:!0})}))}))),a},y=function(t){var n,e=arguments.length>1&&void 0!==arguments[1]&&arguments[1],i=o("LCP"),a=m(),r=function(t){var e=t.startTime;e<a.timeStamp?(i.value=e,i.entries.push(t)):i.isFinal=!0,n()},s=u("largest-contentful-paint",r);if(s){n=v(t,i,s,e);var c=function(){i.isFinal||(s.takeRecords().map(r),i.isFinal=!0,n())};S().then(c),p(c,!0)}},F=function(t){var n,e=o("TTFB");n=function(){try{var n=performance.getEntriesByType("navigation")[0]||function(){var t=performance.timing,n={entryType:"navigation",startTime:0};for(var e in t)"navigationStart"!==e&&"toJSON"!==e&&(n[e]=Math.max(t[e]-t.navigationStart,0));return n}();e.value=e.delta=n.responseStart,e.entries=[n],e.isFinal=!0,t(e)}catch(t){}},"complete"===document.readyState?setTimeout(n,0):addEventListener("pageshow",n)}}}]);
|
1
app/vmselect/vmui/static/js/main.e5416b79.chunk.js
Normal file
|
@ -1 +1 @@
|
|||
!function(e){function r(r){for(var n,i,a=r[0],c=r[1],l=r[2],s=0,p=[];s<a.length;s++)i=a[s],Object.prototype.hasOwnProperty.call(o,i)&&o[i]&&p.push(o[i][0]),o[i]=0;for(n in c)Object.prototype.hasOwnProperty.call(c,n)&&(e[n]=c[n]);for(f&&f(r);p.length;)p.shift()();return u.push.apply(u,l||[]),t()}function t(){for(var e,r=0;r<u.length;r++){for(var t=u[r],n=!0,a=1;a<t.length;a++){var c=t[a];0!==o[c]&&(n=!1)}n&&(u.splice(r--,1),e=i(i.s=t[0]))}return e}var n={},o={1:0},u=[];function i(r){if(n[r])return n[r].exports;var t=n[r]={i:r,l:!1,exports:{}};return e[r].call(t.exports,t,t.exports,i),t.l=!0,t.exports}i.e=function(e){var r=[],t=o[e];if(0!==t)if(t)r.push(t[2]);else{var n=new Promise((function(r,n){t=o[e]=[r,n]}));r.push(t[2]=n);var u,a=document.createElement("script");a.charset="utf-8",a.timeout=120,i.nc&&a.setAttribute("nonce",i.nc),a.src=function(e){return i.p+"static/js/"+({}[e]||e)+"."+{3:"d52da3ae"}[e]+".chunk.js"}(e);var c=new Error;u=function(r){a.onerror=a.onload=null,clearTimeout(l);var t=o[e];if(0!==t){if(t){var n=r&&("load"===r.type?"missing":r.type),u=r&&r.target&&r.target.src;c.message="Loading chunk "+e+" failed.\n("+n+": "+u+")",c.name="ChunkLoadError",c.type=n,c.request=u,t[1](c)}o[e]=void 0}};var l=setTimeout((function(){u({type:"timeout",target:a})}),12e4);a.onerror=a.onload=u,document.head.appendChild(a)}return Promise.all(r)},i.m=e,i.c=n,i.d=function(e,r,t){i.o(e,r)||Object.defineProperty(e,r,{enumerable:!0,get:t})},i.r=function(e){"undefined"!==typeof Symbol&&Symbol.toStringTag&&Object.defineProperty(e,Symbol.toStringTag,{value:"Module"}),Object.defineProperty(e,"__esModule",{value:!0})},i.t=function(e,r){if(1&r&&(e=i(e)),8&r)return e;if(4&r&&"object"===typeof e&&e&&e.__esModule)return e;var t=Object.create(null);if(i.r(t),Object.defineProperty(t,"default",{enumerable:!0,value:e}),2&r&&"string"!=typeof e)for(var n in e)i.d(t,n,function(r){return e[r]}.bind(null,n));return t},i.n=function(e){var r=e&&e.__esModule?function(){return e.default}:function(){return e};return i.d(r,"a",r),r},i.o=function(e,r){return Object.prototype.hasOwnProperty.call(e,r)},i.p="./",i.oe=function(e){throw console.error(e),e};var a=this.webpackJsonpvmui=this.webpackJsonpvmui||[],c=a.push.bind(a);a.push=r,a=a.slice();for(var l=0;l<a.length;l++)r(a[l]);var f=c;t()}([]);
|
||||
!function(e){function r(r){for(var n,i,a=r[0],c=r[1],l=r[2],s=0,p=[];s<a.length;s++)i=a[s],Object.prototype.hasOwnProperty.call(o,i)&&o[i]&&p.push(o[i][0]),o[i]=0;for(n in c)Object.prototype.hasOwnProperty.call(c,n)&&(e[n]=c[n]);for(f&&f(r);p.length;)p.shift()();return u.push.apply(u,l||[]),t()}function t(){for(var e,r=0;r<u.length;r++){for(var t=u[r],n=!0,a=1;a<t.length;a++){var c=t[a];0!==o[c]&&(n=!1)}n&&(u.splice(r--,1),e=i(i.s=t[0]))}return e}var n={},o={1:0},u=[];function i(r){if(n[r])return n[r].exports;var t=n[r]={i:r,l:!1,exports:{}};return e[r].call(t.exports,t,t.exports,i),t.l=!0,t.exports}i.e=function(e){var r=[],t=o[e];if(0!==t)if(t)r.push(t[2]);else{var n=new Promise((function(r,n){t=o[e]=[r,n]}));r.push(t[2]=n);var u,a=document.createElement("script");a.charset="utf-8",a.timeout=120,i.nc&&a.setAttribute("nonce",i.nc),a.src=function(e){return i.p+"static/js/"+({}[e]||e)+"."+{3:"a5d02d16"}[e]+".chunk.js"}(e);var c=new Error;u=function(r){a.onerror=a.onload=null,clearTimeout(l);var t=o[e];if(0!==t){if(t){var n=r&&("load"===r.type?"missing":r.type),u=r&&r.target&&r.target.src;c.message="Loading chunk "+e+" failed.\n("+n+": "+u+")",c.name="ChunkLoadError",c.type=n,c.request=u,t[1](c)}o[e]=void 0}};var l=setTimeout((function(){u({type:"timeout",target:a})}),12e4);a.onerror=a.onload=u,document.head.appendChild(a)}return Promise.all(r)},i.m=e,i.c=n,i.d=function(e,r,t){i.o(e,r)||Object.defineProperty(e,r,{enumerable:!0,get:t})},i.r=function(e){"undefined"!==typeof Symbol&&Symbol.toStringTag&&Object.defineProperty(e,Symbol.toStringTag,{value:"Module"}),Object.defineProperty(e,"__esModule",{value:!0})},i.t=function(e,r){if(1&r&&(e=i(e)),8&r)return e;if(4&r&&"object"===typeof e&&e&&e.__esModule)return e;var t=Object.create(null);if(i.r(t),Object.defineProperty(t,"default",{enumerable:!0,value:e}),2&r&&"string"!=typeof e)for(var n in e)i.d(t,n,function(r){return e[r]}.bind(null,n));return t},i.n=function(e){var r=e&&e.__esModule?function(){return e.default}:function(){return e};return i.d(r,"a",r),r},i.o=function(e,r){return Object.prototype.hasOwnProperty.call(e,r)},i.p="./",i.oe=function(e){throw console.error(e),e};var a=this.webpackJsonpvmui=this.webpackJsonpvmui||[],c=a.push.bind(a);a.push=r,a=a.slice();for(var l=0;l<a.length;l++)r(a[l]);var f=c;t()}([]);
|
|
@ -452,12 +452,6 @@ func registerStorageMetrics() {
|
|||
metrics.NewGauge(`vm_missing_tsids_for_metric_id_total`, func() float64 {
|
||||
return float64(idbm().MissingTSIDsForMetricID)
|
||||
})
|
||||
metrics.NewGauge(`vm_date_metric_ids_search_calls_total`, func() float64 {
|
||||
return float64(idbm().DateMetricIDsSearchCalls)
|
||||
})
|
||||
metrics.NewGauge(`vm_date_metric_ids_search_hits_total`, func() float64 {
|
||||
return float64(idbm().DateMetricIDsSearchHits)
|
||||
})
|
||||
metrics.NewGauge(`vm_index_blocks_with_metric_ids_processed_total`, func() float64 {
|
||||
return float64(idbm().IndexBlocksWithMetricIDsProcessed)
|
||||
})
|
||||
|
@ -613,6 +607,9 @@ func registerStorageMetrics() {
|
|||
metrics.NewGauge(`vm_date_range_hits_total`, func() float64 {
|
||||
return float64(idbm().DateRangeSearchHits)
|
||||
})
|
||||
metrics.NewGauge(`vm_global_search_calls_total`, func() float64 {
|
||||
return float64(idbm().GlobalSearchCalls)
|
||||
})
|
||||
|
||||
metrics.NewGauge(`vm_missing_metric_names_for_metric_id_total`, func() float64 {
|
||||
return float64(idbm().MissingMetricNamesForMetricID)
|
||||
|
@ -658,9 +655,6 @@ func registerStorageMetrics() {
|
|||
metrics.NewGauge(`vm_cache_entries{type="indexdb/tagFilters"}`, func() float64 {
|
||||
return float64(idbm().TagFiltersCacheSize)
|
||||
})
|
||||
metrics.NewGauge(`vm_cache_entries{type="indexdb/uselessTagFilters"}`, func() float64 {
|
||||
return float64(idbm().UselessTagFiltersCacheSize)
|
||||
})
|
||||
metrics.NewGauge(`vm_cache_entries{type="storage/regexps"}`, func() float64 {
|
||||
return float64(storage.RegexpCacheSize())
|
||||
})
|
||||
|
@ -701,9 +695,6 @@ func registerStorageMetrics() {
|
|||
metrics.NewGauge(`vm_cache_size_bytes{type="indexdb/tagFilters"}`, func() float64 {
|
||||
return float64(idbm().TagFiltersCacheSizeBytes)
|
||||
})
|
||||
metrics.NewGauge(`vm_cache_size_bytes{type="indexdb/uselessTagFilters"}`, func() float64 {
|
||||
return float64(idbm().UselessTagFiltersCacheSizeBytes)
|
||||
})
|
||||
metrics.NewGauge(`vm_cache_size_bytes{type="storage/prefetchedMetricIDs"}`, func() float64 {
|
||||
return float64(m().PrefetchedMetricIDsSizeBytes)
|
||||
})
|
||||
|
@ -732,9 +723,6 @@ func registerStorageMetrics() {
|
|||
metrics.NewGauge(`vm_cache_requests_total{type="indexdb/tagFilters"}`, func() float64 {
|
||||
return float64(idbm().TagFiltersCacheRequests)
|
||||
})
|
||||
metrics.NewGauge(`vm_cache_requests_total{type="indexdb/uselessTagFilters"}`, func() float64 {
|
||||
return float64(idbm().UselessTagFiltersCacheRequests)
|
||||
})
|
||||
metrics.NewGauge(`vm_cache_requests_total{type="storage/regexps"}`, func() float64 {
|
||||
return float64(storage.RegexpCacheRequests())
|
||||
})
|
||||
|
@ -763,9 +751,6 @@ func registerStorageMetrics() {
|
|||
metrics.NewGauge(`vm_cache_misses_total{type="indexdb/tagFilters"}`, func() float64 {
|
||||
return float64(idbm().TagFiltersCacheMisses)
|
||||
})
|
||||
metrics.NewGauge(`vm_cache_misses_total{type="indexdb/uselessTagFilters"}`, func() float64 {
|
||||
return float64(idbm().UselessTagFiltersCacheMisses)
|
||||
})
|
||||
metrics.NewGauge(`vm_cache_misses_total{type="storage/regexps"}`, func() float64 {
|
||||
return float64(storage.RegexpCacheMisses())
|
||||
})
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
FROM node:14-alpine3.12 as build-stage
|
||||
FROM node:alpine3.14
|
||||
|
||||
RUN apk update && apk add --no-cache bash bash-doc bash-completion libtool autoconf automake nasm pkgconfig libpng gcc make g++ zlib-dev gawk
|
||||
|
|
@ -1,4 +1,4 @@
|
|||
FROM golang:1.16.2 as build-web-stage
|
||||
FROM golang:1.16.7 as build-web-stage
|
||||
COPY build /build
|
||||
|
||||
WORKDIR /build
|
||||
|
@ -6,7 +6,7 @@ COPY web/ /build/
|
|||
RUN GOOS=linux GOARCH=amd64 GO111MODULE=on CGO_ENABLED=0 go build -o web-amd64 github.com/VictoriMetrics/vmui/ && \
|
||||
GOOS=windows GOARCH=amd64 GO111MODULE=on CGO_ENABLED=0 go build -o web-windows github.com/VictoriMetrics/vmui/
|
||||
|
||||
FROM alpine:3.13.2
|
||||
FROM alpine:3.14.1
|
||||
USER root
|
||||
|
||||
COPY --from=build-web-stage /build/web-amd64 /app/web
|
|
@ -1,8 +1,7 @@
|
|||
# All these commands must run from repository root.
|
||||
|
||||
vmui-package-base-image:
|
||||
(docker image ls --format '{{.Repository}}:{{.Tag}}' | grep -q vmui-builder-image) \
|
||||
|| docker build -t vmui-builder-image -f app/vmui/packages/vmui/Docker-build ./app/vmui
|
||||
docker build -t vmui-builder-image -f app/vmui/Dockerfile-build ./app/vmui
|
||||
|
||||
vmui-build: vmui-package-base-image
|
||||
docker run --rm \
|
||||
|
@ -13,7 +12,7 @@ vmui-build: vmui-package-base-image
|
|||
vmui-builder-image -c "npm install && npm run build"
|
||||
|
||||
vmui-release: vmui-build
|
||||
docker build -t ${DOCKER_NAMESPACE}/vmui:latest -f app/vmui/packages/vmui/Dockerfile-web ./app/vmui/packages/vmui
|
||||
docker build -t ${DOCKER_NAMESPACE}/vmui:latest -f app/vmui/Dockerfile-web ./app/vmui/packages/vmui
|
||||
docker tag ${DOCKER_NAMESPACE}/vmui:latest ${DOCKER_NAMESPACE}/vmui:${PKG_TAG}
|
||||
|
||||
vmui-publish-latest: vmui-release
|
||||
|
|
|
@ -29,7 +29,7 @@ Then run the built image with:
|
|||
docker run --rm --name vmui -p 8080:8080 victoriametrics/vmui
|
||||
```
|
||||
|
||||
Then naviate to `http://localhost:8080` in order to see the web UI.
|
||||
Then navigate to `http://localhost:8080` in order to see the web UI.
|
||||
|
||||
|
||||
## Static build
|
||||
|
|
|
@ -1,19 +0,0 @@
|
|||
FROM node:14-alpine3.12 as build-stage
|
||||
|
||||
RUN apk update && apk add --no-cache bash bash-doc bash-completion libtool autoconf automake nasm pkgconfig libpng gcc make g++ zlib-dev gawk
|
||||
|
||||
RUN mkdir -p /app
|
||||
WORKDIR /app
|
||||
COPY ./package.json /app/package.json
|
||||
COPY ./package-lock.json /app/package-lock.json
|
||||
RUN cd /app && npm install
|
||||
COPY . /app
|
||||
RUN npm run build
|
||||
|
||||
FROM nginx:latest as production-stage
|
||||
|
||||
COPY --from=build-stage /app/build /usr/share/nginx/html
|
||||
COPY ./nginx/nginx.conf /etc/nginx/nginx.conf
|
||||
COPY ./nginx/default /etc/nginx/sites-enabled/default
|
||||
EXPOSE 80
|
||||
CMD ["nginx", "-g", "daemon off;"]
|
21129
app/vmui/packages/vmui/package-lock.json
generated
|
@ -1,7 +1,5 @@
|
|||
import React, {FC, useEffect, useState} from "react";
|
||||
import {Box, FormControlLabel, IconButton, Switch, Tooltip} from "@material-ui/core";
|
||||
import PlayCircleOutlineIcon from "@material-ui/icons/PlayCircleOutline";
|
||||
|
||||
import EqualizerIcon from "@material-ui/icons/Equalizer";
|
||||
import {useAppDispatch, useAppState} from "../../../state/common/StateContext";
|
||||
import CircularProgressWithLabel from "../../common/CircularProgressWithLabel";
|
||||
|
@ -70,23 +68,19 @@ export const ExecutionControls: FC = () => {
|
|||
};
|
||||
|
||||
return <Box display="flex" alignItems="center">
|
||||
<Box mr={2}>
|
||||
<Tooltip title="Execute Query">
|
||||
<IconButton onClick={()=>dispatch({type: "RUN_QUERY"})}>
|
||||
<PlayCircleOutlineIcon className={classes.colorizing} fontSize="large"/>
|
||||
</IconButton>
|
||||
</Tooltip>
|
||||
</Box>
|
||||
{<FormControlLabel
|
||||
control={<Switch size="small" className={classes.colorizing} checked={autoRefresh} onChange={handleChange} />}
|
||||
label="Auto-refresh"
|
||||
/>}
|
||||
|
||||
{autoRefresh && <>
|
||||
<CircularProgressWithLabel className={classes.colorizing} label={delay} value={progress} onClick={() => {iterateDelays();}} />
|
||||
<Box ml={1}>
|
||||
<IconButton onClick={() => {iterateDelays();}}><EqualizerIcon style={{color: "white"}} /></IconButton>
|
||||
</Box>
|
||||
<CircularProgressWithLabel className={classes.colorizing} label={delay} value={progress}
|
||||
onClick={() => {iterateDelays();}} />
|
||||
<Tooltip title="Change delay refresh">
|
||||
<Box ml={1}>
|
||||
<IconButton onClick={() => {iterateDelays();}}><EqualizerIcon style={{color: "white"}} /></IconButton>
|
||||
</Box>
|
||||
</Tooltip>
|
||||
</>}
|
||||
</Box>;
|
||||
};
|
|
@ -1,4 +1,4 @@
|
|||
import React, {FC, useState} from "react";
|
||||
import React, {FC, useRef, useState} from "react";
|
||||
import {
|
||||
Accordion,
|
||||
AccordionDetails,
|
||||
|
@ -7,7 +7,10 @@ import {
|
|||
Grid,
|
||||
IconButton,
|
||||
TextField,
|
||||
Typography
|
||||
Typography,
|
||||
FormControlLabel,
|
||||
Tooltip,
|
||||
Switch,
|
||||
} from "@material-ui/core";
|
||||
import QueryEditor from "./QueryEditor";
|
||||
import {TimeSelector} from "./TimeSelector";
|
||||
|
@ -15,14 +18,36 @@ import {useAppDispatch, useAppState} from "../../../state/common/StateContext";
|
|||
import ExpandMoreIcon from "@material-ui/icons/ExpandMore";
|
||||
import SecurityIcon from "@material-ui/icons/Security";
|
||||
import {AuthDialog} from "./AuthDialog";
|
||||
import PlayCircleOutlineIcon from "@material-ui/icons/PlayCircleOutline";
|
||||
import Portal from "@material-ui/core/Portal";
|
||||
import Popover from "@material-ui/core/Popover";
|
||||
import SettingsIcon from "@material-ui/icons/Settings";
|
||||
import {saveToStorage} from "../../../utils/storage";
|
||||
|
||||
const QueryConfigurator: FC = () => {
|
||||
|
||||
const {serverUrl, query, time: {duration}} = useAppState();
|
||||
const dispatch = useAppDispatch();
|
||||
|
||||
const {queryControls: {autocomplete}} = useAppState();
|
||||
const onChangeAutocomplete = () => {
|
||||
dispatch({type: "TOGGLE_AUTOCOMPLETE"});
|
||||
saveToStorage("AUTOCOMPLETE", !autocomplete);
|
||||
};
|
||||
|
||||
const [dialogOpen, setDialogOpen] = useState(false);
|
||||
const [expanded, setExpanded] = useState(true);
|
||||
const [popoverOpen, setPopoverOpen] = useState(false);
|
||||
const refSettings = useRef<SVGGElement | any>(null);
|
||||
|
||||
const queryContainer = useRef<HTMLDivElement>(null);
|
||||
|
||||
const onSetDuration = (dur: string) => dispatch({type: "SET_DURATION", payload: dur});
|
||||
const onRunQuery = () => dispatch({type: "RUN_QUERY"});
|
||||
const onSetQuery = (query: string) => dispatch({type: "SET_QUERY", payload: query});
|
||||
const onSetServer = ({target: {value}}: {target: {value: string}}) => {
|
||||
dispatch({type: "SET_SERVER", payload: value});
|
||||
};
|
||||
|
||||
return (
|
||||
<>
|
||||
|
@ -35,31 +60,65 @@ const QueryConfigurator: FC = () => {
|
|||
<Box mr={2}>
|
||||
<Typography variant="h6" component="h2">Query Configuration</Typography>
|
||||
</Box>
|
||||
{!expanded && <Box flexGrow={1} onClick={e => e.stopPropagation()} onFocusCapture={e => e.stopPropagation()}>
|
||||
<QueryEditor server={serverUrl} query={query} oneLiner setQuery={(query) => dispatch({type: "SET_QUERY", payload: query})}/>
|
||||
</Box>}
|
||||
<Box flexGrow={1} onClick={e => e.stopPropagation()} onFocusCapture={e => e.stopPropagation()}>
|
||||
<Portal disablePortal={!expanded} container={queryContainer.current}>
|
||||
<QueryEditor server={serverUrl} query={query} oneLiner={!expanded} autocomplete={autocomplete}
|
||||
runQuery={onRunQuery}
|
||||
setQuery={onSetQuery}/>
|
||||
</Portal>
|
||||
</Box>
|
||||
</AccordionSummary>
|
||||
<AccordionDetails>
|
||||
<Grid container spacing={2}>
|
||||
<Grid item xs={12} md={6}>
|
||||
<Box>
|
||||
<Box py={2} display="flex">
|
||||
<Box py={2} display="flex" alignItems="center">
|
||||
<TextField variant="outlined" fullWidth label="Server URL" value={serverUrl}
|
||||
inputProps={{
|
||||
style: {fontFamily: "Monospace"}
|
||||
}}
|
||||
onChange={(e) => dispatch({type: "SET_SERVER", payload: e.target.value})}/>
|
||||
<Box pl={.5} flexGrow={0}>
|
||||
<IconButton onClick={() => setDialogOpen(true)}>
|
||||
<SecurityIcon/>
|
||||
</IconButton>
|
||||
onChange={onSetServer}/>
|
||||
<Box ml={1}>
|
||||
<Tooltip title="Execute Query">
|
||||
<IconButton onClick={onRunQuery}>
|
||||
<PlayCircleOutlineIcon />
|
||||
</IconButton>
|
||||
</Tooltip>
|
||||
</Box>
|
||||
<Box>
|
||||
<Tooltip title="Request Auth Settings">
|
||||
<IconButton onClick={() => setDialogOpen(true)}>
|
||||
<SecurityIcon/>
|
||||
</IconButton>
|
||||
</Tooltip>
|
||||
</Box>
|
||||
</Box>
|
||||
<QueryEditor server={serverUrl} query={query} setQuery={(query) => dispatch({type: "SET_QUERY", payload: query})}/>
|
||||
|
||||
<Box py={2} display="flex">
|
||||
<Box flexGrow={1} mr={2}>
|
||||
{/* for portal QueryEditor */}
|
||||
<div ref={queryContainer} />
|
||||
</Box>
|
||||
<div>
|
||||
<Tooltip title="Query Editor Settings">
|
||||
<IconButton onClick={() => setPopoverOpen(!popoverOpen)}>
|
||||
<SettingsIcon ref={refSettings}/>
|
||||
</IconButton>
|
||||
</Tooltip>
|
||||
<Popover open={popoverOpen} transformOrigin={{vertical: -20, horizontal: "left"}}
|
||||
onClose={() => setPopoverOpen(false)}
|
||||
anchorEl={refSettings.current}>
|
||||
<Box p={2}>
|
||||
{<FormControlLabel
|
||||
control={<Switch size="small" checked={autocomplete} onChange={onChangeAutocomplete}/>}
|
||||
label="Autocomplete"
|
||||
/>}
|
||||
</Box>
|
||||
</Popover>
|
||||
</div>
|
||||
</Box>
|
||||
</Box>
|
||||
</Grid>
|
||||
<Grid item xs={12} md={6}>
|
||||
<Grid item xs={8} md={6} >
|
||||
<Box style={{
|
||||
borderRadius: "4px",
|
||||
borderColor: "#b9b9b9",
|
||||
|
@ -68,7 +127,7 @@ const QueryConfigurator: FC = () => {
|
|||
height: "calc(100% - 18px)",
|
||||
marginTop: "16px"
|
||||
}}>
|
||||
<TimeSelector setDuration={(dur) => dispatch({type: "SET_DURATION", payload: dur})} duration={duration}/>
|
||||
<TimeSelector setDuration={onSetDuration} duration={duration}/>
|
||||
</Box>
|
||||
</Grid>
|
||||
</Grid>
|
||||
|
|
|
@ -4,15 +4,20 @@ import {defaultKeymap} from "@codemirror/next/commands";
|
|||
import React, {FC, useEffect, useRef, useState} from "react";
|
||||
import { PromQLExtension } from "codemirror-promql";
|
||||
import { basicSetup } from "@codemirror/next/basic-setup";
|
||||
import {isMacOs} from "../../../utils/detect-os";
|
||||
|
||||
export interface QueryEditorProps {
|
||||
setQuery: (query: string) => void;
|
||||
runQuery: () => void;
|
||||
query: string;
|
||||
server: string;
|
||||
oneLiner?: boolean;
|
||||
autocomplete: boolean
|
||||
}
|
||||
|
||||
const QueryEditor: FC<QueryEditorProps> = ({query, setQuery, server, oneLiner = false}) => {
|
||||
const QueryEditor: FC<QueryEditorProps> = ({
|
||||
query, setQuery, runQuery, server, oneLiner = false, autocomplete
|
||||
}) => {
|
||||
|
||||
const ref = useRef<HTMLDivElement>(null);
|
||||
|
||||
|
@ -33,28 +38,41 @@ const QueryEditor: FC<QueryEditorProps> = ({query, setQuery, server, oneLiner =
|
|||
// update state on change of autocomplete server
|
||||
useEffect(() => {
|
||||
|
||||
const promQL = new PromQLExtension().setComplete({url: server});
|
||||
const promQL = new PromQLExtension();
|
||||
promQL.activateCompletion(autocomplete);
|
||||
promQL.setComplete({url: server});
|
||||
|
||||
const listenerExtension = EditorView.updateListener.of(editorUpdate => {
|
||||
if (editorUpdate.docChanged) {
|
||||
setQuery(
|
||||
editorUpdate.state.doc.toJSON().map(el => el.trim()).join("")
|
||||
);
|
||||
setQuery(editorUpdate.state.doc.toJSON().map(el => el.trim()).join(""));
|
||||
}
|
||||
|
||||
});
|
||||
|
||||
editorView?.setState(EditorState.create({
|
||||
doc: query,
|
||||
extensions: [basicSetup, keymap(defaultKeymap), listenerExtension, promQL.asExtension()]
|
||||
extensions: [
|
||||
basicSetup,
|
||||
keymap(defaultKeymap),
|
||||
listenerExtension,
|
||||
promQL.asExtension(),
|
||||
keymap([
|
||||
{
|
||||
key: isMacOs() ? "Cmd-Enter" : "Ctrl-Enter",
|
||||
run: (): boolean => {
|
||||
runQuery();
|
||||
return true;
|
||||
},
|
||||
},
|
||||
]),
|
||||
]
|
||||
}));
|
||||
|
||||
}, [server, editorView]);
|
||||
}, [server, editorView, autocomplete]);
|
||||
|
||||
return (
|
||||
<>
|
||||
{/*Class one-line-scroll and other codemirror stylings are declared in index.css*/}
|
||||
<div ref={ref} className={oneLiner ? "one-line-scroll" : undefined}></div>
|
||||
{/*Class one-line-scroll and other codemirror styles are declared in index.css*/}
|
||||
<div ref={ref} className={oneLiner ? "one-line-scroll" : undefined}/>
|
||||
</>
|
||||
);
|
||||
};
|
||||
|
|
|
@ -71,7 +71,7 @@ export const TimeSelector: FC<TimeSelectorProps> = ({setDuration}) => {
|
|||
aria-haspopup="true"
|
||||
style={{cursor: "pointer"}}
|
||||
onMouseEnter={handlePopoverOpen}
|
||||
onMouseLeave={handlePopoverClose}><EFBFBD>: </span>
|
||||
onMouseLeave={handlePopoverClose}>: </span>
|
||||
<Popover
|
||||
open={open}
|
||||
anchorEl={anchorEl}
|
||||
|
@ -117,4 +117,4 @@ export const TimeSelector: FC<TimeSelectorProps> = ({setDuration}) => {
|
|||
</Box>
|
||||
</Box>
|
||||
</Box>;
|
||||
};
|
||||
};
|
||||
|
|
|
@ -21,7 +21,7 @@ const HomeLayout: FC = () => {
|
|||
<>
|
||||
<AppBar position="static">
|
||||
<Toolbar>
|
||||
<Box mr={2} display="flex">
|
||||
<Box display="flex">
|
||||
<Typography variant="h5">
|
||||
<span style={{fontWeight: "bolder"}}>VM</span>
|
||||
<span style={{fontWeight: "lighter"}}>UI</span>
|
||||
|
@ -43,24 +43,25 @@ const HomeLayout: FC = () => {
|
|||
Create an issue
|
||||
</Link>
|
||||
</div>
|
||||
<Box flexGrow={1}>
|
||||
<Box ml={4} flexGrow={1}>
|
||||
<ExecutionControls/>
|
||||
</Box>
|
||||
<DisplayTypeSwitch/>
|
||||
<UrlCopy url={fetchUrl}/>
|
||||
</Toolbar>
|
||||
</AppBar>
|
||||
<Box display="flex" flexDirection="column" style={{height: "calc(100vh - 64px)"}}>
|
||||
<Box display="flex" flexDirection="column" style={{minHeight: "calc(100vh - 64px)"}}>
|
||||
<Box m={2}>
|
||||
<QueryConfigurator/>
|
||||
</Box>
|
||||
<Box flexShrink={1} style={{overflowY: "scroll"}}>
|
||||
<Box flexShrink={1}>
|
||||
{isLoading && <Fade in={isLoading} style={{
|
||||
transitionDelay: isLoading ? "300ms" : "0ms",
|
||||
}}>
|
||||
<Box alignItems="center" flexDirection="column" display="flex"
|
||||
style={{
|
||||
width: "100%",
|
||||
maxWidth: "calc(100vh - 32px)",
|
||||
position: "absolute",
|
||||
height: "150px",
|
||||
background: "linear-gradient(rgba(255,255,255,.7), rgba(255,255,255,.7), rgba(255,255,255,0))"
|
||||
|
|
|
@ -37,7 +37,7 @@ export const ChartTooltip: React.FC<ChartTooltipProps> = ({data, time}) => {
|
|||
<Box>
|
||||
<Typography variant="body2">
|
||||
{data.metrics.map(({key, value}) =>
|
||||
<Box mb={.25} key={key} display="flex" flexDirection="row" alignItems="center">
|
||||
<Box component="span" mb={.25} key={key} display="flex" flexDirection="row" alignItems="center">
|
||||
<span>{key}: </span>
|
||||
<span style={{fontWeight: "bold"}}>{value}</span>
|
||||
</Box>)}
|
||||
|
|
|
@ -31,3 +31,9 @@ code {
|
|||
.one-line-scroll .cm-wrap {
|
||||
height: 24px;
|
||||
}
|
||||
.cm-content, .cm-gutter { min-height: 51px; }
|
||||
|
||||
.one-line-scroll .cm-content,
|
||||
.one-line-scroll .cm-gutter {
|
||||
min-height: auto;
|
||||
}
|
||||
|
|
|
@ -2,6 +2,7 @@ import {DisplayType} from "../../components/Home/Configurator/DisplayTypeSwitch"
|
|||
import {TimeParams, TimePeriod} from "../../types";
|
||||
import {dateFromSeconds, getDurationFromPeriod, getTimeperiodForDuration} from "../../utils/time";
|
||||
import {getFromStorage} from "../../utils/storage";
|
||||
import {getDefaultServer} from "../../utils/default-server-url";
|
||||
|
||||
export interface TimeState {
|
||||
duration: string;
|
||||
|
@ -15,6 +16,7 @@ export interface AppState {
|
|||
time: TimeState;
|
||||
queryControls: {
|
||||
autoRefresh: boolean;
|
||||
autocomplete: boolean
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -28,9 +30,11 @@ export type Action =
|
|||
| { type: "RUN_QUERY"}
|
||||
| { type: "RUN_QUERY_TO_NOW"}
|
||||
| { type: "TOGGLE_AUTOREFRESH"}
|
||||
| { type: "TOGGLE_AUTOCOMPLETE"}
|
||||
|
||||
|
||||
export const initialState: AppState = {
|
||||
serverUrl: getFromStorage("PREFERRED_URL") as string || "https://", // https://demo.promlabs.com or https://play.victoriametrics.com/select/accounting/1/6a716b0f-38bc-4856-90ce-448fd713e3fe/prometheus",
|
||||
serverUrl: getFromStorage("PREFERRED_URL") as string || getDefaultServer(), // https://demo.promlabs.com or https://play.victoriametrics.com/select/accounting/1/6a716b0f-38bc-4856-90ce-448fd713e3fe/prometheus",
|
||||
displayType: "chart",
|
||||
query: getFromStorage("LAST_QUERY") as string || "\n", // demo_memory_usage_bytes
|
||||
time: {
|
||||
|
@ -38,7 +42,8 @@ export const initialState: AppState = {
|
|||
period: getTimeperiodForDuration("1h")
|
||||
},
|
||||
queryControls: {
|
||||
autoRefresh: false
|
||||
autoRefresh: false,
|
||||
autocomplete: getFromStorage("AUTOCOMPLETE") as boolean || false
|
||||
}
|
||||
};
|
||||
|
||||
|
@ -99,6 +104,14 @@ export function reducer(state: AppState, action: Action): AppState {
|
|||
autoRefresh: !state.queryControls.autoRefresh
|
||||
}
|
||||
};
|
||||
case "TOGGLE_AUTOCOMPLETE":
|
||||
return {
|
||||
...state,
|
||||
queryControls: {
|
||||
...state.queryControls,
|
||||
autocomplete: !state.queryControls.autocomplete
|
||||
}
|
||||
};
|
||||
case "RUN_QUERY":
|
||||
return {
|
||||
...state,
|
||||
|
|
6
app/vmui/packages/vmui/src/utils/default-server-url.ts
Normal file
|
@ -0,0 +1,6 @@
|
|||
export const getDefaultServer = (): string => {
|
||||
const {href} = window.location;
|
||||
const regexp = /^http.+\/vmui/;
|
||||
const [result] = href.match(regexp) || ["https://"];
|
||||
return result.replace("vmui", "prometheus");
|
||||
};
|
13
app/vmui/packages/vmui/src/utils/detect-os.ts
Normal file
|
@ -0,0 +1,13 @@
|
|||
const desktopOs = {
|
||||
windows: "Windows",
|
||||
mac: "Mac OS",
|
||||
linux: "Linux"
|
||||
};
|
||||
|
||||
export const getOs = () : string => {
|
||||
return Object.values(desktopOs).find(os => navigator.userAgent.indexOf(os) >= 0) || "unknown";
|
||||
};
|
||||
|
||||
export const isMacOs = (): boolean => {
|
||||
return getOs() === desktopOs.mac;
|
||||
};
|
|
@ -1,4 +1,9 @@
|
|||
export type StorageKeys = "PREFERRED_URL" | "LAST_QUERY" | "BASIC_AUTH_DATA" | "BEARER_AUTH_DATA" | "AUTH_TYPE";
|
||||
export type StorageKeys = "PREFERRED_URL"
|
||||
| "LAST_QUERY"
|
||||
| "BASIC_AUTH_DATA"
|
||||
| "BEARER_AUTH_DATA"
|
||||
| "AUTH_TYPE"
|
||||
| "AUTOCOMPLETE";
|
||||
|
||||
export const saveToStorage = (key: StorageKeys, value: string | boolean | Record<string, unknown>): void => {
|
||||
if (value) {
|
||||
|
|
|
@ -4,7 +4,7 @@ DOCKER_NAMESPACE := victoriametrics
|
|||
|
||||
ROOT_IMAGE ?= alpine:3.14.0
|
||||
CERTS_IMAGE := alpine:3.14.0
|
||||
GO_BUILDER_IMAGE := golang:1.16.6
|
||||
GO_BUILDER_IMAGE := golang:1.16.7
|
||||
BUILDER_IMAGE := local/builder:2.0.0-$(shell echo $(GO_BUILDER_IMAGE) | tr : _)
|
||||
BASE_IMAGE := local/base:1.1.3-$(shell echo $(ROOT_IMAGE) | tr : _)-$(shell echo $(CERTS_IMAGE) | tr : _)
|
||||
|
||||
|
|
|
@ -19,11 +19,31 @@ groups:
|
|||
expr: up{job=~"victoriametrics|vmagent|vmalert"} == 0
|
||||
for: 2m
|
||||
labels:
|
||||
severity: "critical"
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Service {{ $labels.job }} is down on {{ $labels.instance }}"
|
||||
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 2 minutes."
|
||||
|
||||
- alert: ProcessNearFDLimits
|
||||
expr: (process_max_fds - process_open_fds) < 100
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Number of free file descriptors is less than 100 for \"{{ $labels.job }}\"(\"{{ $labels.instance }}\") for the last 5m"
|
||||
description: "Exhausting OS file descriptors limit can cause severe degradation of the process.
|
||||
Consider to increase the limit as fast as possible."
|
||||
|
||||
- alert: TooHighMemoryUsage
|
||||
expr: (process_resident_memory_anon_bytes / vm_available_memory_bytes) > 0.9
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "It is more than 90% of memory used by \"{{ $labels.job }}\"(\"{{ $labels.instance }}\") during the last 5m"
|
||||
description: "Too high memory usage may result into multiple issues such as OOMs or degraded performance.
|
||||
Consider to either increase available memory or decrease the load on the process."
|
||||
|
||||
# Alerts group for VM single assumes that Grafana dashboard
|
||||
# https://grafana.com/grafana/dashboards/10229 is installed.
|
||||
# Pls update the `dashboard` annotation according to your setup.
|
||||
|
@ -166,17 +186,6 @@ groups:
|
|||
description: "High rate of slow inserts on \"{{ $labels.instance }}\" may be a sign of resource exhaustion
|
||||
for the current load. It is likely more RAM is needed for optimal handling of the current number of active time series."
|
||||
|
||||
- alert: ProcessNearFDLimits
|
||||
expr: (process_max_fds - process_open_fds) < 100
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
dashboard: "http://localhost:3000/d/wNf0q_kZk?viewPanel=75&var-instance={{ $labels.instance }}"
|
||||
summary: "Number of free file descriptors is less than 100 for \"{{ $labels.job }}\"(\"{{ $labels.instance }}\") for the last 5m"
|
||||
description: "Exhausting OS file descriptors limit can cause severe degradation of the process.
|
||||
Consider to increase the limit as fast as possible."
|
||||
|
||||
- alert: LabelsLimitExceededOnIngestion
|
||||
expr: sum(increase(vm_metrics_with_dropped_labels_total[5m])) by (instance) > 0
|
||||
for: 15m
|
||||
|
|
|
@ -6,18 +6,20 @@ sort: 16
|
|||
|
||||
## Third-party articles and slides about VictoriaMetrics
|
||||
|
||||
* [Scaling to trillions of metric data points](https://engineering.razorpay.com/scaling-to-trillions-of-metric-data-points-f569a5b654f2)
|
||||
* [VictoriaMetrics vs. OpenTSDB](https://blg.robot-house.us/posts/tsdbs-grow/)
|
||||
* [Monitoring of multiple OpenShift clusters with VictoriaMetrics](https://medium.com/ibm-garage/monitoring-of-multiple-openshift-clusters-with-victoriametrics-d4f0979e2544)
|
||||
* [Fly's Prometheus Metrics](https://fly.io/blog/measuring-fly/)
|
||||
* [Infrastructure monitoring with Prometheus at Zerodha](https://zerodha.tech/blog/infra-monitoring-at-zerodha/)
|
||||
* [Sismology: Iguana Solutions’ Monitoring System](https://medium.com/nerd-for-tech/sismology-iguana-solutions-monitoring-system-f46e4170447f)
|
||||
* [Prometheus High Availability and Fault Tolerance strategy, long term storage with VictoriaMetrics](https://medium.com/miro-engineering/prometheus-high-availability-and-fault-tolerance-strategy-long-term-storage-with-victoriametrics-82f6f3f0409e)
|
||||
* [How we improved our Kubernetes monitoring at Smarkets, and how you could too](https://smarketshq.com/monitoring-kubernetes-clusters-41a4b24c19e3)
|
||||
* [Foiled by the Firewall: A Tale of Transition From Prometheus to VictoriaMetrics](https://www.percona.com/blog/2020/12/01/foiled-by-the-firewall-a-tale-of-transition-from-prometheus-to-victoriametrics/)
|
||||
* [Observations on Better Resource Usage with Percona Monitoring and Management v2.12.0](https://www.percona.com/blog/2020/12/23/observations-on-better-resource-usage-with-percona-monitoring-and-management-v2-12-0/)
|
||||
* [Better Prometheus rate() function with VictoriaMetrics](https://www.percona.com/blog/2020/02/28/better-prometheus-rate-function-with-victoriametrics/)
|
||||
* [Percona monitoring and management migration from Prometheus to VictoriaMetrics FAQ](https://www.percona.com/blog/2020/12/16/percona-monitoring-and-management-migration-from-prometheus-to-victoriametrics-faq/)
|
||||
* [Compiling a Percona Monitoring and Management v2 Client in ARM: Raspberry Pi 3 Reprise](https://www.percona.com/blog/2021/05/26/compiling-a-percona-monitoring-and-management-v2-client-in-arm-raspberry-pi-3/)
|
||||
* [Making peace with Prometheus rate()](https://blog.doit-intl.com/making-peace-with-prometheus-rate-43a3ea75c4cf)
|
||||
* [Infrastructure monitoring with Prometheus at Zerodha](https://zerodha.tech/blog/infra-monitoring-at-zerodha/)
|
||||
* [Sismology: Iguana Solutions’ Monitoring System](https://medium.com/@IG1.com/sismology-iguana-solutions-monitoring-system-f46e4170447f)
|
||||
* [Prometheus High Availability and Fault Tolerance strategy, long term storage with VictoriaMetrics](https://medium.com/miro-engineering/prometheus-high-availability-and-fault-tolerance-strategy-long-term-storage-with-victoriametrics-82f6f3f0409e)
|
||||
* [How we improved our Kubernetes monitoring at Smarkets, and how you could too](https://smarketshq.com/monitoring-kubernetes-clusters-41a4b24c19e3)
|
||||
* [Monitoring K8S with VictoriaMetrics](https://docs.google.com/presentation/d/1g7yUyVEaAp4tPuRy-MZbPXKqJ1z78_5VKuV841aQfsg/edit)
|
||||
* [CMS monitoring R&D: Real-time monitoring and alerts](https://indico.cern.ch/event/877333/contributions/3696707/attachments/1972189/3281133/CMS_mon_RD_for_opInt.pdf)
|
||||
* [The CMS monitoring infrastructure and applications](https://arxiv.org/pdf/2007.03630.pdf)
|
||||
|
@ -38,54 +40,56 @@ sort: 16
|
|||
* [Solving Metrics at scale with VictoriaMetrics](https://www.youtube.com/watch?v=QgLMztnj7-8)
|
||||
* [Monitoring Kubernetes clusters with VictoriaMetrics and Grafana](https://blog.cybozu.io/entry/2021/03/18/115743)
|
||||
* [Multi-tenancy monitoring system for Kubernetes cluster using VictoriaMetrics and operators](https://blog.kintone.io/entry/2021/03/31/175256)
|
||||
* [Monitoring as Code на базе VictoriaMetrics и Grafana](https://habr.com/ru/post/568090/)
|
||||
* [Push Prometheus metrics to VictoriaMetrics or other exporters](https://pythonawesome.com/push-prometheus-metrics-to-victoriametrics-or-other-exporters/)
|
||||
|
||||
|
||||
## Our articles
|
||||
|
||||
### Announcements
|
||||
|
||||
* [Open-sourcing VictoriaMetrics](https://medium.com/@valyala/open-sourcing-victoriametrics-f31e34485c2b)
|
||||
* [How we created VictoriaMetrics](https://medium.com/devopslinks/victoriametrics-creating-the-best-remote-storage-for-prometheus-5d92d66787ac)
|
||||
* [Anomaly Detection in VictoriaMetrics](https://medium.com/@VictoriaMetrics/anomaly-detection-in-victoriametrics-9528538786a7)
|
||||
* [Open-sourcing VictoriaMetrics](https://blog.usejournal.com/open-sourcing-victoriametrics-f31e34485c2b)
|
||||
* [VictoriaMetrics — creating the best remote storage for Prometheus](https://faun.pub/victoriametrics-creating-the-best-remote-storage-for-prometheus-5d92d66787ac)
|
||||
* [Anomaly Detection in VictoriaMetrics](https://victoriametrics.medium.com/anomaly-detection-in-victoriametrics-9528538786a7)
|
||||
|
||||
|
||||
### Benchmarks
|
||||
|
||||
* [VictoriaMetrics vs TimescaleDB vs InfluxDB benchmarks on 40K unique time series](https://medium.com/@valyala/when-size-matters-benchmarking-victoriametrics-vs-timescale-and-influxdb-6035811952d4)
|
||||
* [VictoriaMetrics vs TimescaleDB vs InfluxDB benchmarks on 400K, 4M and 40M unique time series](https://medium.com/@valyala/high-cardinality-tsdb-benchmarks-victoriametrics-vs-timescaledb-vs-influxdb-13e6ee64dd6b)
|
||||
* [Insert benchmarks for VictoriaMetrics vs InfluxDB on high-cardinality data](https://medium.com/@valyala/insert-benchmarks-with-inch-influxdb-vs-victoriametrics-e31a41ae2893)
|
||||
* [Measuring vertical scalability for time series databases in Google Cloud](https://medium.com/@valyala/measuring-vertical-scalability-for-time-series-databases-in-google-cloud-92550d78d8ae)
|
||||
* [Billy: how VictoriaMetrics deals with more than 500 billion rows](https://medium.com/@valyala/billy-how-victoriametrics-deals-with-more-than-500-billion-rows-e82ff8f725da)
|
||||
* [First look at performance comparison between InfluxDB IOx and VictoriaMetrics](https://medium.com/@VictoriaMetrics/first-look-at-perfomance-comparassion-between-influxdb-iox-and-victoriametrics-e590f847935b)
|
||||
* [When size matters — benchmarking VictoriaMetrics vs Timescale and InfluxDB](https://valyala.medium.com/when-size-matters-benchmarking-victoriametrics-vs-timescale-and-influxdb-6035811952d4)
|
||||
* [High-cardinality TSDB benchmarks: VictoriaMetrics vs TimescaleDB vs InfluxDB](https://valyala.medium.com/high-cardinality-tsdb-benchmarks-victoriametrics-vs-timescaledb-vs-influxdb-13e6ee64dd6b)
|
||||
* [Insert benchmarks with inch: InfluxDB vs VictoriaMetrics](https://valyala.medium.com/insert-benchmarks-with-inch-influxdb-vs-victoriametrics-e31a41ae2893)
|
||||
* [Measuring vertical scalability for time series databases in Google Cloud](https://valyala.medium.com/measuring-vertical-scalability-for-time-series-databases-in-google-cloud-92550d78d8ae)
|
||||
* [Billy: how VictoriaMetrics deals with more than 500 billion rows](https://valyala.medium.com/billy-how-victoriametrics-deals-with-more-than-500-billion-rows-e82ff8f725da)
|
||||
* [First look at performance comparison between InfluxDB IOx and VictoriaMetrics](https://victoriametrics.medium.com/first-look-at-perfomance-comparassion-between-influxdb-iox-and-victoriametrics-e590f847935b)
|
||||
* [Prometheus vs VictoriaMetrics benchmark on node-exporter metrics](https://valyala.medium.com/prometheus-vs-victoriametrics-benchmark-on-node-exporter-metrics-4ca29c75590f)
|
||||
* [Promscale vs VictoriaMetrics: resource usage on production workload](https://valyala.medium.com/promscale-vs-victoriametrics-resource-usage-on-production-workload-91c8e3786c03)
|
||||
|
||||
|
||||
### Technical articles
|
||||
|
||||
* [How VictoriaMetrics creates instant snapshots](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282)
|
||||
* [WAL Usage Looks Broken in Modern TSDBs](https://medium.com/@valyala/wal-usage-looks-broken-in-modern-time-series-databases-b62a627ab704)
|
||||
* [Why mmap'ed files in Go may hurt performance](https://medium.com/@valyala/mmap-in-go-considered-harmful-d92a25cb161d)
|
||||
* [Achieving better compression for time series data than Gorilla](https://medium.com/@valyala/victoriametrics-achieving-better-compression-for-time-series-data-than-gorilla-317bc1f95932)
|
||||
* [Stripping dependency bloat in VictoriaMetrics Docker image](https://medium.com/@valyala/stripping-dependency-bloat-in-victoriametrics-docker-image-983fb5912b0d)
|
||||
* [Speeding up backups for big time series databases](https://medium.com/@valyala/speeding-up-backups-for-big-time-series-databases-533c1a927883)
|
||||
* [Improving histogram usability for Prometheus and Grafana](https://medium.com/@valyala/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350)
|
||||
* [Why irate from Prometheus doesn't capture spikes](https://medium.com/@valyala/why-irate-from-prometheus-doesnt-capture-spikes-45f9896d7832)
|
||||
* [How VictoriaMetrics makes instant snapshots for multi-terabyte time series data](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282)
|
||||
* [WAL Usage Looks Broken in Modern TSDBs](https://valyala.medium.com/wal-usage-looks-broken-in-modern-time-series-databases-b62a627ab704)
|
||||
* [mmap may slow down your Go app](https://valyala.medium.com/mmap-in-go-considered-harmful-d92a25cb161d)
|
||||
* [VictoriaMetrics: achieving better compression than Gorilla for time series data](https://faun.pub/victoriametrics-achieving-better-compression-for-time-series-data-than-gorilla-317bc1f95932)
|
||||
* [Stripping dependency bloat in VictoriaMetrics Docker image](https://valyala.medium.com/stripping-dependency-bloat-in-victoriametrics-docker-image-983fb5912b0d)
|
||||
* [Speeding up backups for big time series databases](https://valyala.medium.com/speeding-up-backups-for-big-time-series-databases-533c1a927883)
|
||||
* [Improving histogram usability for Prometheus and Grafana](https://valyala.medium.com/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350)
|
||||
* [Why irate from Prometheus doesn't capture spikes](https://valyala.medium.com/why-irate-from-prometheus-doesnt-capture-spikes-45f9896d7832)
|
||||
|
||||
|
||||
### Tutorials, guides and how-to articles
|
||||
|
||||
* [PromQL tutorial for beginners](https://medium.com/@valyala/promql-tutorial-for-beginners-9ab455142085)
|
||||
* [Analyzing Prometheus data with external tools](https://medium.com/@valyala/analyzing-prometheus-data-with-external-tools-5f3e5e147639)
|
||||
* [Prometheus Subqueries in VictoriaMetrics](https://medium.com/@valyala/prometheus-subqueries-in-victoriametrics-9b1492b720b3)
|
||||
* [PromQL tutorial for beginners and humans](https://valyala.medium.com/promql-tutorial-for-beginners-9ab455142085)
|
||||
* [Analyzing Prometheus data with external tools](https://valyala.medium.com/analyzing-prometheus-data-with-external-tools-5f3e5e147639)
|
||||
* [Prometheus Subqueries in VictoriaMetrics](https://valyala.medium.com/prometheus-subqueries-in-victoriametrics-9b1492b720b3)
|
||||
* [How to migrate data from Prometheus to VictoriaMetrics](https://medium.com/@romanhavronenko/victoriametrics-how-to-migrate-data-from-prometheus-d44a6728f043)
|
||||
* [Filtering and modifying time series during import to VictoriaMetrics](https://medium.com/@romanhavronenko/victoriametrics-how-to-migrate-data-from-prometheus-filtering-and-modifying-time-series-6d40cea4bf21)
|
||||
* [VictoriaMetrics: how to migrate data from Prometheus. Filtering and modifying time series.](https://medium.com/@romanhavronenko/victoriametrics-how-to-migrate-data-from-prometheus-filtering-and-modifying-time-series-6d40cea4bf21)
|
||||
* [How to use relabeling in Prometheus and VictoriaMetrics](https://valyala.medium.com/how-to-use-relabeling-in-prometheus-and-victoriametrics-8b90fc22c4b2)
|
||||
* [How to monitor Go applications with VictoriaMetrics](https://victoriametrics.medium.com/how-to-monitor-go-applications-with-victoriametrics-c04703110870)
|
||||
* [Prometheus storage: tech terms for humans](https://medium.com/@valyala/prometheus-storage-technical-terms-for-humans-4ab4de6c3d48)
|
||||
* [Prometheus storage: tech terms for humans](https://valyala.medium.com/prometheus-storage-technical-terms-for-humans-4ab4de6c3d48)
|
||||
|
||||
|
||||
### Other articles
|
||||
|
||||
* [Comparing Thanos to VictoriaMetrics cluster](https://medium.com/@valyala/comparing-thanos-to-victoriametrics-cluster-b193bea1683)
|
||||
* [Evaluation performance and correctness: VictoriaMetrics response](https://medium.com/@valyala/evaluating-performance-and-correctness-victoriametrics-response-e27315627e87)
|
||||
* [Comparing Thanos to VictoriaMetrics cluster](https://faun.pub/comparing-thanos-to-victoriametrics-cluster-b193bea1683)
|
||||
* [Evaluation performance and correctness: VictoriaMetrics response](https://valyala.medium.com/evaluating-performance-and-correctness-victoriametrics-response-e27315627e87)
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
---
|
||||
sort: 20
|
||||
sort: 19
|
||||
---
|
||||
|
||||
# VM best practices
|
||||
|
|
|
@ -6,6 +6,28 @@ sort: 15
|
|||
|
||||
## tip
|
||||
|
||||
* FEATURE: add support for Prometheus staleness markers. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1526).
|
||||
* FEATURE: vmagent: automatically generate Prometheus staleness markers for the scraped metrics when scrape targets disappear in the same way as Prometheus does. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1526).
|
||||
* FEATURE: add `present_over_time(m[d])` function, which returns 1 if `m` has a least a single sample over the previous duration `d`. This function has been added also to [Prometheus 2.29](https://github.com/prometheus/prometheus/releases/tag/v2.29.0-rc.0).
|
||||
* FEATURE: vmagent: support multitenant writes according to [these docs](https://docs.victoriametrics.com/vmagent.html#multitenancy). This allows using a single `vmagent` instance in front of VictoriaMetrics cluster for all the tenants. Thanks to @omarghader for [the pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1505). See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1491).
|
||||
* FEATURE: vmagent: add `__meta_ec2_availability_zone_id` label to discovered Amazon EC2 targets. This label is available in Prometheus [starting from v2.29](https://github.com/prometheus/prometheus/releases/tag/v2.29.0-rc.0).
|
||||
* FAETURE: vmagent: add `__meta_gce_interface_ipv4_<name>` labels to discovered GCE targets. These labels are available in Prometheus [starting from v2.29](https://github.com/prometheus/prometheus/releases/tag/v2.29.0-rc.0).
|
||||
* FEATURE: add `-search.maxSamplesPerSeries` command-line flag for limiting the number of raw samples a single query can process per each time series. This option can protect from out of memory errors when a query processes tens of millions of raw samples per series. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1067).
|
||||
* FEATURE: add `-search.maxSamplesPerQuery` command-line flag for limiting the number of raw samples a single query can process across all the time series. This option can protect from heavy queries, which select too big number of raw samples. Thanks to @jiangxinlingdu for [the initial pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1478).
|
||||
* FEATURE: improve performance for queries that process big number of time series and/or samples on systems with big number of CPU cores.
|
||||
* FEATURE: vmalert: expose `vmalert_alerting_rules_last_evaluation_samples` and `vmalert_recording_rules_last_evaluation_samples` metrics. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1494).
|
||||
* FEATURE: vminsert: expose `vm_rpc_send_duration_seconds_total` counter, which can be used for determining high saturation of every `vminsert -> vmstorage` link with an alerting query `rate(vm_rpc_send_duration_seconds_total) > 0.9s`. This query triggers when the link is saturated by more than 90%. This usually means that more `vminsert` or `vmstorage` nodes must be added to the cluster in order to increase the total number of `vminsert -> vmstorage` links.
|
||||
* FEATURE: vmagent: expose `vmagent_remotewrite_send_duration_seconds_total` counter, which can be used for determining high saturation of every connection to remote storage with an alerting query `rate(vmagent_remotewrite_send_duration_seconds_total) > 0.9s`. This query triggers when a connection is satureated by more than 90%. This usually means that `-remoteWrite.queues` command-line flag must be increased in order to increase the number of connections per each remote storage.
|
||||
* FEATURE: vmui: automatically fill Server URL field. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1506).
|
||||
|
||||
* BUGFIX: fix corner cases for queries on time ranges exceeding 40 days. Previously some series can be missing in query results. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1486).
|
||||
* BUGFIX: vmselect: return dummy response at `/rules` page in the same way as for `/api/v1/rules` page. The `/rules` page is requested by Grafana 8. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1493) for details.
|
||||
* BUGFIX: vmbackup: automatically set default `us-east-1` S3 region if it is missing. This should simplify using S3-compatible services such as MinIO for backups. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1449).
|
||||
* BUGFIX: vmselect: prevent from possible deadlock when multiple `target` query args are passed to [Graphite Render API](https://docs.victoriametrics.com/#graphite-render-api-usage).
|
||||
* BUGFIX: return series with `a op b` labels and `N` values for `(a op b) default N` if `(a op b)` returns series with all NaN values. Previously such series were removed.
|
||||
* BUGFIX: vmui: fix layout when the query selects more than 27 time series. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1497).
|
||||
* BUGFIX: vmagent: restore highlighting in red for DOWN targets at `/targets` page. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1461).
|
||||
|
||||
|
||||
## [v1.63.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.63.0)
|
||||
|
||||
|
@ -14,7 +36,7 @@ sort: 15
|
|||
* `/vmui/` for a single-node VictoriaMetrics
|
||||
* `/select/<accountID>/vmui/` for `vmselect` at cluster version of VictoriaMetrics
|
||||
* FEATURE: support durations anywhere in [MetricsQL queries](https://docs.victoriametrics.com/MetricsQL.html). For example, `sum_over_time(m[1h]) / 1h` is a valid query, which is equivalent to `sum_over_time(m[1h]) / 3600`.
|
||||
* FEATURE: support durations without suffxies in [MetricsQL queries](https://docs.victoriametrics.com/MetricsQL.html). For example, `rate(m[3600])` is a valid query, which is equivalent to `rate(m[1h])`.
|
||||
* FEATURE: support durations without suffixes in [MetricsQL queries](https://docs.victoriametrics.com/MetricsQL.html). For example, `rate(m[3600])` is a valid query, which is equivalent to `rate(m[1h])`.
|
||||
* FEATURE: export `vmselect_request_duration_seconds` and `vminsert_request_duration_seconds` [VictoriaMetrics histograms](https://valyala.medium.com/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350) at `/metrics` page. These histograms can be used for determining latency distribution and SLI/SLO for the served requests. For example, the following query would return the percent of queries that took less than 500ms during the last hour: `histogram_share(500ms, sum(rate(vmselect_request_duration_seconds_bucket[1h])) by (vmrange))`.
|
||||
* FEATURE: vmagent: dynamically reload client TLS certificates from disk on every [mTLS connection](https://developers.cloudflare.com/cloudflare-one/identity/devices/mutual-tls-authentication). This should allow using `vmagent` with [Istio service mesh](https://istio.io/latest/about/service-mesh/). See [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1420).
|
||||
* FEATURE: log http request path plus all the query args on errors during request processing. Previously only http request path was logged without query args, so it could be hard debugging such errors.
|
||||
|
|
|
@ -108,7 +108,7 @@ After using Cacti, Graphite and StatsD for years, we wanted to upgrade our monit
|
|||
- is scalable
|
||||
- has a simple client that allows for provisioning and discovery with Puppet
|
||||
|
||||
We hed been running Prometheus for about a year in a test environment and it was working well but there was a need/wish for a few more years of retention than the old system provided. We tested Thanos which was a bit resource hungry but worked great for about half a year.
|
||||
We had been running Prometheus for about a year in a test environment and it was working well but there was a need/wish for a few more years of retention than the old system provided. We tested Thanos which was a bit resource hungry but worked great for about half a year.
|
||||
Then we discovered VictoriaMetrics. Our scale isn't that big so we don't have on-prem S3 and no Kubernetes. VM's single node instance provided
|
||||
the same result with far less maintenance overhead and lower hardware requirements.
|
||||
|
||||
|
|
|
@ -332,13 +332,15 @@ If you need multi-AZ setup, then it is recommended running independed clusters i
|
|||
[vmagent](https://docs.victoriametrics.com/vmagent.html) in front of these clusters, so it could replicate incoming data
|
||||
into all the cluster. Then [promxy](https://github.com/jacksontj/promxy) could be used for querying the data from multiple clusters.
|
||||
|
||||
Another solution is to use multi-level cluster setup, where the top level of `vminsert` nodes [replicate](#replication-and-data-safety) data among the lower level of `vminsert` nodes located at different availability zones. These `vminsert` nodes then spread the data among `vmstorage` nodes in each AZ. See [these docs](#multi-level-cluster-setup) for more details.
|
||||
Another solution is to use [multi-level cluster setup](#multi-level-cluster-setup).
|
||||
|
||||
|
||||
## Multi-level cluster setup
|
||||
|
||||
`vminsert` nodes can accept data from another `vminsert` nodes starting from [v1.60.0](https://docs.victoriametrics.com/CHANGELOG.html#v1600) if `-clusternativeListenAddr` command-line flag is set. For example, if `vminsert` is started with `-clusternativeListenAddr=:8400` command-line flag, then it can accept data from another `vminsert` nodes at TCP port 8400 in the same way as `vmstorage` nodes do. This allows chaining `vminsert` nodes and building multi-level cluster topologies with flexible configs. For example, the top level of `vminsert` nodes can replicate data among the second level of `vminsert` nodes located in distinct availability zones (AZ), while the second-level `vminsert` nodes can spread the data among `vmstorage` nodes located in the same AZ. Such setup guarantees cluster availability if some AZ becomes unavailable. The data from all the `vmstorage` nodes in all the AZs can be read via `vmselect` nodes, which are configured to query all the `vmstorage` nodes in all the availability zones (e.g. all the `vmstorage` addresses are passed via `-storageNode` command-line flag to `vmselect` nodes). Additionally, `-replicationFactor=k+1` must be passed to `vmselect` nodes, where `k` is the lowest number of `vmstorage` nodes in a single AZ. See [replication docs](#replication-and-data-safety) for more details.
|
||||
|
||||
Another option is to set up [vmagent](https://docs.victoriametrics.com/vmagent.html) for replicating the data among multiple VictoriaMetrics clusters. See [these docs](https://docs.victoriametrics.com/vmagent.html#multitenancy) for details.
|
||||
|
||||
|
||||
## Helm
|
||||
|
||||
|
@ -471,7 +473,7 @@ Below is the output for `/path/to/vminsert -help`:
|
|||
-enableTCP6
|
||||
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
|
||||
-envflag.enable
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set. See https://docs.victoriametrics.com/#environment-variables for more details
|
||||
-envflag.prefix string
|
||||
Prefix for environment variables if -envflag.enable is set
|
||||
-fs.disableMmap
|
||||
|
@ -587,7 +589,7 @@ Below is the output for `/path/to/vmselect -help`:
|
|||
-enableTCP6
|
||||
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
|
||||
-envflag.enable
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set. See https://docs.victoriametrics.com/#environment-variables for more details
|
||||
-envflag.prefix string
|
||||
Prefix for environment variables if -envflag.enable is set
|
||||
-fs.disableMmap
|
||||
|
@ -654,6 +656,10 @@ Below is the output for `/path/to/vmselect -help`:
|
|||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 16384)
|
||||
-search.maxQueueDuration duration
|
||||
The maximum time the request waits for execution when -search.maxConcurrentRequests limit is reached; see also -search.maxQueryDuration (default 10s)
|
||||
-search.maxSamplesPerQuery int
|
||||
The maximum number of raw samples a single query can process across all time series. This protects from heavy queries, which select unexpectedly high number of raw samples. See also -search.maxSamplesPerSeries (default 1000000000)
|
||||
-search.maxSamplesPerSeries int
|
||||
The maximum number of raw samples a single query can scan per each time series. See also -search.maxSamplesPerQuery (default 30000000)
|
||||
-search.maxStalenessInterval duration
|
||||
The maximum interval for staleness calculations. By default it is automatically calculated from the median interval between samples. This flag could be useful for tuning Prometheus data model closer to Influx-style data model. See https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness for details. See also '-search.maxLookback' flag, which has the same meaning due to historical reasons
|
||||
-search.maxStatusRequestDuration duration
|
||||
|
@ -665,7 +671,7 @@ Below is the output for `/path/to/vmselect -help`:
|
|||
-search.queryStats.lastQueriesCount int
|
||||
Query stats for /api/v1/status/top_queries is tracked on this number of last queries. Zero value disables query stats tracking (default 20000)
|
||||
-search.queryStats.minQueryDuration duration
|
||||
The minimum duration for queries to track in query stats at /api/v1/status/top_queries. Queries with lower duration are ignored in query stats
|
||||
The minimum duration for queries to track in query stats at /api/v1/status/top_queries. Queries with lower duration are ignored in query stats (default 1ms)
|
||||
-search.resetCacheAuthKey string
|
||||
Optional authKey for resetting rollup cache via /internal/resetRollupResultCache call
|
||||
-search.treatDotsAsIsInRegexps
|
||||
|
@ -700,7 +706,7 @@ Below is the output for `/path/to/vmstorage -help`:
|
|||
-enableTCP6
|
||||
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
|
||||
-envflag.enable
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set. See https://docs.victoriametrics.com/#environment-variables for more details
|
||||
-envflag.prefix string
|
||||
Prefix for environment variables if -envflag.enable is set
|
||||
-finalMergeDelay duration
|
||||
|
@ -760,7 +766,7 @@ Below is the output for `/path/to/vmstorage -help`:
|
|||
-search.maxTagValues int
|
||||
The maximum number of tag values returned per search (default 100000)
|
||||
-search.maxUniqueTimeseries int
|
||||
The maximum number of unique time series each search can scan (default 300000)
|
||||
The maximum number of unique time series a single query can process. This allows protecting against heavy queries, which select unexpectedly high number of series. See also -search.maxSamplesPerQuery and -search.maxSamplesPerSeries (default 300000)
|
||||
-smallMergeConcurrency int
|
||||
The maximum number of CPU cores to use for small merges. Default value is used if set to 0
|
||||
-snapshotAuthKey string
|
||||
|
|
3
docs/Gemfile
Normal file
|
@ -0,0 +1,3 @@
|
|||
source "https://rubygems.org"
|
||||
gem "jekyll-rtd-theme", "~> 2.0.6"
|
||||
gem "github-pages", group: :jekyll_plugins
|
|
@ -64,7 +64,7 @@ This functionality can be tried at [an editable Grafana dashboard](http://play-g
|
|||
- Functions for label manipulation:
|
||||
- `alias(q, name)` for setting metric name across all the time series `q`. For example, `alias(foo, "bar")` would give `bar` name to all the `foo` series.
|
||||
- `label_set(q, label1, value1, ... labelN, valueN)` for setting the given values for the given labels on `q`. For example, `label_set(foo, "bar", "baz")` would add `{bar="baz"}` label to all the `foo` series.
|
||||
- `label_map(q, label, srcValue1, dstValue1, ... srcValueN, dstValueN)` for mapping `label` values from `src*` to `dst*`. For example, `label_map(foo, "instance", "127.0.0.1", "locahost")` would rename `foo{instance="127.0.0.1"}` to `foo{instance="localhost"}`.
|
||||
- `label_map(q, label, srcValue1, dstValue1, ... srcValueN, dstValueN)` for mapping `label` values from `src*` to `dst*`. For example, `label_map(foo, "instance", "127.0.0.1", "localhost")` would rename `foo{instance="127.0.0.1"}` to `foo{instance="localhost"}`.
|
||||
- `label_uppercase(q, label1, ... labelN)` for uppercasing values for the given labels. For example, `label_uppercase(foo, "instance")` would transform `foo{instance="bar"}` to `foo{instance="BAR"}`.
|
||||
- `label_lowercase(q, label2, ... labelN)` for lowercasing value for the given labels. For example, `label_lowercase(foo, "instance")` would transform `foo{instance="BAR"}` to `foo{instance="bar"}`.
|
||||
- `label_del(q, label1, ... labelN)` for deleting the given labels from `q`. For example, `label_del(foo, "bar")` would delete `bar` label from all the `foo` series.
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
---
|
||||
sort: 19
|
||||
sort: 18
|
||||
---
|
||||
|
||||
# VictoriaMetrics Cluster Per Tenant Statistic
|
||||
|
|
|
@ -1264,8 +1264,11 @@ or similar auth proxy.
|
|||
* There is no need for VictoriaMetrics tuning since it uses reasonable defaults for command-line flags,
|
||||
which are automatically adjusted for the available CPU and RAM resources.
|
||||
* There is no need for Operating System tuning since VictoriaMetrics is optimized for default OS settings.
|
||||
The only option is increasing the limit on [the number of open files in the OS](https://medium.com/@muhammadtriwibowo/set-permanently-ulimit-n-open-files-in-ubuntu-4d61064429a),
|
||||
so Prometheus instances could establish more connections to VictoriaMetrics.
|
||||
The only option is increasing the limit on [the number of open files in the OS](https://medium.com/@muhammadtriwibowo/set-permanently-ulimit-n-open-files-in-ubuntu-4d61064429a).
|
||||
The recommendation is not specific for VictoriaMetrics only but also for any service which handles many HTTP connections and stores data on disk.
|
||||
* VictoriaMetrics is a write-heavy application and its performance depends on disk performance. So be careful with other
|
||||
applications or utilities (like [fstrim](http://manpages.ubuntu.com/manpages/bionic/man8/fstrim.8.html))
|
||||
which could [exhaust disk resources](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1521).
|
||||
* The recommended filesystem is `ext4`, the recommended persistent storage is [persistent HDD-based disk on GCP](https://cloud.google.com/compute/docs/disks/#pdspecs),
|
||||
since it is protected from hardware failures via internal replication and it can be [resized on the fly](https://cloud.google.com/compute/docs/disks/add-persistent-disk#resize_pd).
|
||||
If you plan to store more than 1TB of data on `ext4` partition or plan extending it to more than 16TB,
|
||||
|
@ -1593,7 +1596,7 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
|
|||
-enableTCP6
|
||||
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
|
||||
-envflag.enable
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set. See https://docs.victoriametrics.com/#environment-variables for more details
|
||||
-envflag.prefix string
|
||||
Prefix for environment variables if -envflag.enable is set
|
||||
-finalMergeDelay duration
|
||||
|
@ -1785,6 +1788,10 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
|
|||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 16384)
|
||||
-search.maxQueueDuration duration
|
||||
The maximum time the request waits for execution when -search.maxConcurrentRequests limit is reached; see also -search.maxQueryDuration (default 10s)
|
||||
-search.maxSamplesPerQuery int
|
||||
The maximum number of raw samples a single query can process across all time series. This protects from heavy queries, which select unexpectedly high number of raw samples. See also -search.maxSamplesPerSeries (default 1000000000)
|
||||
-search.maxSamplesPerSeries int
|
||||
The maximum number of raw samples a single query can scan per each time series. This option allows limiting memory usage (default 30000000)
|
||||
-search.maxStalenessInterval duration
|
||||
The maximum interval for staleness calculations. By default it is automatically calculated from the median interval between samples. This flag could be useful for tuning Prometheus data model closer to Influx-style data model. See https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness for details. See also '-search.maxLookback' flag, which has the same meaning due to historical reasons
|
||||
-search.maxStatusRequestDuration duration
|
||||
|
@ -1798,13 +1805,13 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
|
|||
-search.maxTagValues int
|
||||
The maximum number of tag values returned from /api/v1/label/<label_name>/values (default 100000)
|
||||
-search.maxUniqueTimeseries int
|
||||
The maximum number of unique time series each search can scan (default 300000)
|
||||
The maximum number of unique time series each search can scan. This option allows limiting memory usage (default 300000)
|
||||
-search.minStalenessInterval duration
|
||||
The minimum interval for staleness calculations. This flag could be useful for removing gaps on graphs generated from time series with irregular intervals between samples. See also '-search.maxStalenessInterval'
|
||||
-search.queryStats.lastQueriesCount int
|
||||
Query stats for /api/v1/status/top_queries is tracked on this number of last queries. Zero value disables query stats tracking (default 20000)
|
||||
-search.queryStats.minQueryDuration duration
|
||||
The minimum duration for queries to track in query stats at /api/v1/status/top_queries. Queries with lower duration are ignored in query stats
|
||||
The minimum duration for queries to track in query stats at /api/v1/status/top_queries. Queries with lower duration are ignored in query stats (default 1ms)
|
||||
-search.resetCacheAuthKey string
|
||||
Optional authKey for resetting rollup cache via /internal/resetRollupResultCache call
|
||||
-search.treatDotsAsIsInRegexps
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
---
|
||||
sort: 18
|
||||
sort: 17
|
||||
---
|
||||
|
||||
# Release process guidance
|
||||
|
|
|
@ -1,79 +0,0 @@
|
|||
---
|
||||
sort: 17
|
||||
---
|
||||
|
||||
# Sample size calculations
|
||||
|
||||
These calculations are for the “Lowest sample size” graph at https://victoriametrics.com/ .
|
||||
|
||||
How many metrics can be stored in 2tb disk for 2 years?
|
||||
|
||||
Seconds in 2 years:
|
||||
2 years * 365 days * 24 hours * 60 minutes * 60 seconds = 63072000 seconds
|
||||
|
||||
Resolution = 1 point per 10 second
|
||||
|
||||
That means each metric will contain 6307200 points.
|
||||
|
||||
2tb disk contains
|
||||
2 (tb) * 1024 (gb) * 1024 (mb) * 1024 (kb) * 1024 (b) = 2199023255552 bytes
|
||||
|
||||
## VictoriaMetrics
|
||||
Based on production data from our customers, sample size is 0.4 byte
|
||||
That means one metric with 10 seconds resolution will need
|
||||
6307200 points * 0.4 bytes/point = 2522880 bytes or 2.4 megabytes.
|
||||
Calculation for number of metrics can be stored in 2 tb disk:
|
||||
2199023255552 (disk size) / 2522880 (one metric for 2 year) = 871632 metrics
|
||||
So in 2tb we can store 871 632 metrics
|
||||
|
||||
## Graphite
|
||||
Based on https://m30m.github.io/whisper-calculator/ sample size of graphite metrics is 12b + 28b for each metric
|
||||
That means, one metric with 10 second resolution will need 75686428 bytes or 72.18 megabytes
|
||||
Calculation for number of metrics can be stored in 2 tb disk:
|
||||
2199023255552 / 75686428 = 29 054 metrics
|
||||
|
||||
## OpenTSDB
|
||||
Let's check official openTSDB site
|
||||
http://opentsdb.net/faq.html
|
||||
16 bytes of HBase overhead, 3 bytes for the metric, 4 bytes for the timestamp, 6 bytes per tag, 2 bytes of OpenTSDB overhead, up to 8 bytes for the value. Integers are stored with variable length encoding and can consume 1, 2, 4 or 8 bytes.
|
||||
That means, one metric with 10 second resolution will need
|
||||
6307200 * (1 + 4) + 3 + 16 + 2 = 31536021 bytes or 30 megabytes in the best scenario and
|
||||
6307200 * (8 + 4) + 3 + 16 + 2 = 75686421 bytes or 72 megabytes in the worst scenario.
|
||||
|
||||
Calculation for number of metrics can be stored in 2 tb disk:
|
||||
|
||||
2199023255552 / 31536021 = 69 730 metrics for best scenario
|
||||
2199023255552 / 75686421 = 29 054 metrics for worst scenario
|
||||
|
||||
Also, openTSDB allows to use compression
|
||||
" LZO is able to achieve a compression factor of 4.2x "
|
||||
So, let's multiply numbers on 4.2
|
||||
69 730 * 4,2 = 292 866 metrics for best scenario
|
||||
29 054 * 4,2 = 122 026 metrics for worst scenario
|
||||
|
||||
## M3DB
|
||||
Let's look at official m3db site https://m3db.github.io/m3/m3db/architecture/engine/
|
||||
They can achieve a sample size of 1.45 bytes/datapoint
|
||||
That means, one metric with 10 second resolution will need 9145440 bytes or 8,72177124 megabytes
|
||||
Calculation for number of metrics can be stored in 2 tb disk:
|
||||
2199023255552 / 9145440 = 240 450 metrics
|
||||
|
||||
## InfluxDB
|
||||
Based on official influxDB site https://docs.influxdata.com/influxdb/v1.8/guides/hardware_sizing/#bytes-and-compression
|
||||
"Non-string values require approximately three bytes". That means, one metric with 10 second resolution will need
|
||||
6307200 * 3 = 18921600 bytes or 18 megabytes
|
||||
Calculation for number of metrics can be stored in 2 tb disk:
|
||||
|
||||
2199023255552 / 18921600 = 116 217 metrics
|
||||
|
||||
## Prometheus
|
||||
Let's check official site: https://prometheus.io/docs/prometheus/latest/storage/
|
||||
"On average, Prometheus uses only around 1-2 bytes per sample."
|
||||
That means, one metric with 10 second resolution will need
|
||||
6307200 * 1 = 6307200 bytes in best scenario
|
||||
6307200 * 2 = 12614400 bytes in worst scenario.
|
||||
|
||||
Calculation for number of metrics can be stored in 2 tb disk:
|
||||
|
||||
2199023255552 / 6307200 = 348 652 metrics for the best case
|
||||
2199023255552 / 12614400 = 174 326 metrics for the worst cases
|
|
@ -1268,8 +1268,11 @@ or similar auth proxy.
|
|||
* There is no need for VictoriaMetrics tuning since it uses reasonable defaults for command-line flags,
|
||||
which are automatically adjusted for the available CPU and RAM resources.
|
||||
* There is no need for Operating System tuning since VictoriaMetrics is optimized for default OS settings.
|
||||
The only option is increasing the limit on [the number of open files in the OS](https://medium.com/@muhammadtriwibowo/set-permanently-ulimit-n-open-files-in-ubuntu-4d61064429a),
|
||||
so Prometheus instances could establish more connections to VictoriaMetrics.
|
||||
The only option is increasing the limit on [the number of open files in the OS](https://medium.com/@muhammadtriwibowo/set-permanently-ulimit-n-open-files-in-ubuntu-4d61064429a).
|
||||
The recommendation is not specific for VictoriaMetrics only but also for any service which handles many HTTP connections and stores data on disk.
|
||||
* VictoriaMetrics is a write-heavy application and its performance depends on disk performance. So be careful with other
|
||||
applications or utilities (like [fstrim](http://manpages.ubuntu.com/manpages/bionic/man8/fstrim.8.html))
|
||||
which could [exhaust disk resources](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1521).
|
||||
* The recommended filesystem is `ext4`, the recommended persistent storage is [persistent HDD-based disk on GCP](https://cloud.google.com/compute/docs/disks/#pdspecs),
|
||||
since it is protected from hardware failures via internal replication and it can be [resized on the fly](https://cloud.google.com/compute/docs/disks/add-persistent-disk#resize_pd).
|
||||
If you plan to store more than 1TB of data on `ext4` partition or plan extending it to more than 16TB,
|
||||
|
@ -1597,7 +1600,7 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
|
|||
-enableTCP6
|
||||
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
|
||||
-envflag.enable
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set. See https://docs.victoriametrics.com/#environment-variables for more details
|
||||
-envflag.prefix string
|
||||
Prefix for environment variables if -envflag.enable is set
|
||||
-finalMergeDelay duration
|
||||
|
@ -1789,6 +1792,10 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
|
|||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 16384)
|
||||
-search.maxQueueDuration duration
|
||||
The maximum time the request waits for execution when -search.maxConcurrentRequests limit is reached; see also -search.maxQueryDuration (default 10s)
|
||||
-search.maxSamplesPerQuery int
|
||||
The maximum number of raw samples a single query can process across all time series. This protects from heavy queries, which select unexpectedly high number of raw samples. See also -search.maxSamplesPerSeries (default 1000000000)
|
||||
-search.maxSamplesPerSeries int
|
||||
The maximum number of raw samples a single query can scan per each time series. This option allows limiting memory usage (default 30000000)
|
||||
-search.maxStalenessInterval duration
|
||||
The maximum interval for staleness calculations. By default it is automatically calculated from the median interval between samples. This flag could be useful for tuning Prometheus data model closer to Influx-style data model. See https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness for details. See also '-search.maxLookback' flag, which has the same meaning due to historical reasons
|
||||
-search.maxStatusRequestDuration duration
|
||||
|
@ -1802,13 +1809,13 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
|
|||
-search.maxTagValues int
|
||||
The maximum number of tag values returned from /api/v1/label/<label_name>/values (default 100000)
|
||||
-search.maxUniqueTimeseries int
|
||||
The maximum number of unique time series each search can scan (default 300000)
|
||||
The maximum number of unique time series each search can scan. This option allows limiting memory usage (default 300000)
|
||||
-search.minStalenessInterval duration
|
||||
The minimum interval for staleness calculations. This flag could be useful for removing gaps on graphs generated from time series with irregular intervals between samples. See also '-search.maxStalenessInterval'
|
||||
-search.queryStats.lastQueriesCount int
|
||||
Query stats for /api/v1/status/top_queries is tracked on this number of last queries. Zero value disables query stats tracking (default 20000)
|
||||
-search.queryStats.minQueryDuration duration
|
||||
The minimum duration for queries to track in query stats at /api/v1/status/top_queries. Queries with lower duration are ignored in query stats
|
||||
The minimum duration for queries to track in query stats at /api/v1/status/top_queries. Queries with lower duration are ignored in query stats (default 1ms)
|
||||
-search.resetCacheAuthKey string
|
||||
Optional authKey for resetting rollup cache via /internal/resetRollupResultCache call
|
||||
-search.treatDotsAsIsInRegexps
|
||||
|
|
Before Width: | Height: | Size: 43 KiB |
|
@ -4,4 +4,6 @@ sort: 21
|
|||
|
||||
# Guides
|
||||
|
||||
1. [Kubernetes monitoring with VictoriaMetrics Single](k8s-monitoring-with-vm-single.html)
|
||||
1. [K8s monitoring via VM Single](k8s-monitoring-via-vm-single.html)
|
||||
2. [K8s monitoring via VM Cluster](k8s-monitoring-via-vm-cluster.html)
|
||||
3. [HA monitoring setup in K8s via VM Cluster](k8s-ha-monitoring-via-vm-cluster.html)
|
Before Width: | Height: | Size: 51 KiB After Width: | Height: | Size: 51 KiB |
Before Width: | Height: | Size: 113 KiB After Width: | Height: | Size: 113 KiB |
Before Width: | Height: | Size: 149 KiB After Width: | Height: | Size: 149 KiB |
BIN
docs/guides/guide-vmcluster-k8s-ha-explore-count-up-graph.png
Normal file
After Width: | Height: | Size: 57 KiB |
BIN
docs/guides/guide-vmcluster-k8s-ha-explore-count-up-graph2.png
Normal file
After Width: | Height: | Size: 59 KiB |
BIN
docs/guides/guide-vmcluster-k8s-ha-explore-count-up.png
Normal file
After Width: | Height: | Size: 44 KiB |
BIN
docs/guides/guide-vmcluster-k8s-ha-explore.png
Normal file
After Width: | Height: | Size: 46 KiB |
BIN
docs/guides/guide-vmcluster-k8s-scheme.png
Normal file
After Width: | Height: | Size: 127 KiB |
Before Width: | Height: | Size: 104 KiB After Width: | Height: | Size: 104 KiB |
222
docs/guides/guide-vmcluster-vmagent-values.yaml
Normal file
|
@ -0,0 +1,222 @@
|
|||
remoteWriteUrls:
|
||||
- http://vmcluster-victoria-metrics-cluster-vminsert.default.svc.cluster.local:8480/insert/0/prometheus/
|
||||
|
||||
config:
|
||||
global:
|
||||
scrape_interval: 10s
|
||||
|
||||
scrape_configs:
|
||||
- job_name: vmagent
|
||||
static_configs:
|
||||
- targets: ["localhost:8429"]
|
||||
- job_name: "kubernetes-apiservers"
|
||||
kubernetes_sd_configs:
|
||||
- role: endpoints
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
relabel_configs:
|
||||
- source_labels:
|
||||
[
|
||||
__meta_kubernetes_namespace,
|
||||
__meta_kubernetes_service_name,
|
||||
__meta_kubernetes_endpoint_port_name,
|
||||
]
|
||||
action: keep
|
||||
regex: default;kubernetes;https
|
||||
- job_name: "kubernetes-nodes"
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
kubernetes_sd_configs:
|
||||
- role: node
|
||||
relabel_configs:
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_node_label_(.+)
|
||||
- target_label: __address__
|
||||
replacement: kubernetes.default.svc:443
|
||||
- source_labels: [__meta_kubernetes_node_name]
|
||||
regex: (.+)
|
||||
target_label: __metrics_path__
|
||||
replacement: /api/v1/nodes/$1/proxy/metrics
|
||||
- job_name: "kubernetes-nodes-cadvisor"
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
kubernetes_sd_configs:
|
||||
- role: node
|
||||
relabel_configs:
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_node_label_(.+)
|
||||
- target_label: __address__
|
||||
replacement: kubernetes.default.svc:443
|
||||
- source_labels: [__meta_kubernetes_node_name]
|
||||
regex: (.+)
|
||||
target_label: __metrics_path__
|
||||
replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
|
||||
metric_relabel_configs:
|
||||
- action: replace
|
||||
source_labels: [pod]
|
||||
regex: '(.+)'
|
||||
target_label: pod_name
|
||||
replacement: '${1}'
|
||||
- action: replace
|
||||
source_labels: [container]
|
||||
regex: '(.+)'
|
||||
target_label: container_name
|
||||
replacement: '${1}'
|
||||
- action: replace
|
||||
target_label: name
|
||||
replacement: k8s_stub
|
||||
- action: replace
|
||||
source_labels: [id]
|
||||
regex: '^/system\.slice/(.+)\.service$'
|
||||
target_label: systemd_service_name
|
||||
replacement: '${1}'
|
||||
- job_name: "kubernetes-service-endpoints"
|
||||
kubernetes_sd_configs:
|
||||
- role: endpoints
|
||||
relabel_configs:
|
||||
- action: drop
|
||||
source_labels: [__meta_kubernetes_pod_container_init]
|
||||
regex: true
|
||||
- action: keep_if_equal
|
||||
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port, __meta_kubernetes_pod_container_port_number]
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_scrape]
|
||||
action: keep
|
||||
regex: true
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_scheme]
|
||||
action: replace
|
||||
target_label: __scheme__
|
||||
regex: (https?)
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_path]
|
||||
action: replace
|
||||
target_label: __metrics_path__
|
||||
regex: (.+)
|
||||
- source_labels:
|
||||
[
|
||||
__address__,
|
||||
__meta_kubernetes_service_annotation_prometheus_io_port,
|
||||
]
|
||||
action: replace
|
||||
target_label: __address__
|
||||
regex: ([^:]+)(?::\d+)?;(\d+)
|
||||
replacement: $1:$2
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_service_label_(.+)
|
||||
- source_labels: [__meta_kubernetes_namespace]
|
||||
action: replace
|
||||
target_label: kubernetes_namespace
|
||||
- source_labels: [__meta_kubernetes_service_name]
|
||||
action: replace
|
||||
target_label: kubernetes_name
|
||||
- source_labels: [__meta_kubernetes_pod_node_name]
|
||||
action: replace
|
||||
target_label: kubernetes_node
|
||||
- job_name: "kubernetes-service-endpoints-slow"
|
||||
scrape_interval: 5m
|
||||
scrape_timeout: 30s
|
||||
kubernetes_sd_configs:
|
||||
- role: endpoints
|
||||
relabel_configs:
|
||||
- action: drop
|
||||
source_labels: [__meta_kubernetes_pod_container_init]
|
||||
regex: true
|
||||
- action: keep_if_equal
|
||||
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port, __meta_kubernetes_pod_container_port_number]
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_scrape_slow]
|
||||
action: keep
|
||||
regex: true
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_scheme]
|
||||
action: replace
|
||||
target_label: __scheme__
|
||||
regex: (https?)
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_path]
|
||||
action: replace
|
||||
target_label: __metrics_path__
|
||||
regex: (.+)
|
||||
- source_labels:
|
||||
[
|
||||
__address__,
|
||||
__meta_kubernetes_service_annotation_prometheus_io_port,
|
||||
]
|
||||
action: replace
|
||||
target_label: __address__
|
||||
regex: ([^:]+)(?::\d+)?;(\d+)
|
||||
replacement: $1:$2
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_service_label_(.+)
|
||||
- source_labels: [__meta_kubernetes_namespace]
|
||||
action: replace
|
||||
target_label: kubernetes_namespace
|
||||
- source_labels: [__meta_kubernetes_service_name]
|
||||
action: replace
|
||||
target_label: kubernetes_name
|
||||
- source_labels: [__meta_kubernetes_pod_node_name]
|
||||
action: replace
|
||||
target_label: kubernetes_node
|
||||
- job_name: "kubernetes-services"
|
||||
metrics_path: /probe
|
||||
params:
|
||||
module: [http_2xx]
|
||||
kubernetes_sd_configs:
|
||||
- role: service
|
||||
relabel_configs:
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_probe]
|
||||
action: keep
|
||||
regex: true
|
||||
- source_labels: [__address__]
|
||||
target_label: __param_target
|
||||
- target_label: __address__
|
||||
replacement: blackbox
|
||||
- source_labels: [__param_target]
|
||||
target_label: instance
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_service_label_(.+)
|
||||
- source_labels: [__meta_kubernetes_namespace]
|
||||
target_label: kubernetes_namespace
|
||||
- source_labels: [__meta_kubernetes_service_name]
|
||||
target_label: kubernetes_name
|
||||
- job_name: "kubernetes-pods"
|
||||
kubernetes_sd_configs:
|
||||
- role: pod
|
||||
relabel_configs:
|
||||
- action: drop
|
||||
source_labels: [__meta_kubernetes_pod_container_init]
|
||||
regex: true
|
||||
- action: keep_if_equal
|
||||
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port, __meta_kubernetes_pod_container_port_number]
|
||||
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
|
||||
action: keep
|
||||
regex: true
|
||||
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
|
||||
action: replace
|
||||
target_label: __metrics_path__
|
||||
regex: (.+)
|
||||
- source_labels:
|
||||
[__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
|
||||
action: replace
|
||||
regex: ([^:]+)(?::\d+)?;(\d+)
|
||||
replacement: $1:$2
|
||||
target_label: __address__
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_pod_label_(.+)
|
||||
- source_labels: [__meta_kubernetes_namespace]
|
||||
action: replace
|
||||
target_label: kubernetes_namespace
|
||||
- source_labels: [__meta_kubernetes_pod_name]
|
||||
action: replace
|
||||
target_label: kubernetes_pod_name
|
Before Width: | Height: | Size: 50 KiB After Width: | Height: | Size: 50 KiB |
Before Width: | Height: | Size: 171 KiB After Width: | Height: | Size: 171 KiB |
Before Width: | Height: | Size: 151 KiB After Width: | Height: | Size: 151 KiB |
BIN
docs/guides/guide-vmsingle-k8s-scheme.png
Normal file
After Width: | Height: | Size: 43 KiB |
81
docs/guides/guide-vmsingle-values.yaml
Normal file
|
@ -0,0 +1,81 @@
|
|||
server:
|
||||
scrape:
|
||||
enabled: true
|
||||
configMap: ""
|
||||
config:
|
||||
global:
|
||||
scrape_interval: 15s
|
||||
scrape_configs:
|
||||
- job_name: victoriametrics
|
||||
static_configs:
|
||||
- targets: [ "localhost:8428" ]
|
||||
- job_name: "kubernetes-apiservers"
|
||||
kubernetes_sd_configs:
|
||||
- role: endpoints
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
relabel_configs:
|
||||
- source_labels:
|
||||
[
|
||||
__meta_kubernetes_namespace,
|
||||
__meta_kubernetes_service_name,
|
||||
__meta_kubernetes_endpoint_port_name,
|
||||
]
|
||||
action: keep
|
||||
regex: default;kubernetes;https
|
||||
- job_name: "kubernetes-nodes"
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
kubernetes_sd_configs:
|
||||
- role: node
|
||||
relabel_configs:
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_node_label_(.+)
|
||||
- target_label: __address__
|
||||
replacement: kubernetes.default.svc:443
|
||||
- source_labels: [ __meta_kubernetes_node_name ]
|
||||
regex: (.+)
|
||||
target_label: __metrics_path__
|
||||
replacement: /api/v1/nodes/$1/proxy/metrics
|
||||
- job_name: "kubernetes-nodes-cadvisor"
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
kubernetes_sd_configs:
|
||||
- role: node
|
||||
relabel_configs:
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_node_label_(.+)
|
||||
- target_label: __address__
|
||||
replacement: kubernetes.default.svc:443
|
||||
- source_labels: [ __meta_kubernetes_node_name ]
|
||||
regex: (.+)
|
||||
target_label: __metrics_path__
|
||||
replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
|
||||
metric_relabel_configs:
|
||||
- action: replace
|
||||
source_labels: [pod]
|
||||
regex: '(.+)'
|
||||
target_label: pod_name
|
||||
replacement: '${1}'
|
||||
- action: replace
|
||||
source_labels: [container]
|
||||
regex: '(.+)'
|
||||
target_label: container_name
|
||||
replacement: '${1}'
|
||||
- action: replace
|
||||
target_label: name
|
||||
replacement: k8s_stub
|
||||
- action: replace
|
||||
source_labels: [id]
|
||||
regex: '^/system\.slice/(.+)\.service$'
|
||||
target_label: systemd_service_name
|
||||
replacement: '${1}'
|
434
docs/guides/k8s-ha-monitoring-via-vm-cluster.md
Normal file
|
@ -0,0 +1,434 @@
|
|||
# HA monitoring setup in Kubernetes via VictoriaMetrics Cluster
|
||||
|
||||
|
||||
**The guide covers:**
|
||||
|
||||
* High availability monitoring via [VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html) in [Kubernetes](https://kubernetes.io/) with Helm charts
|
||||
* How to store metrics
|
||||
* How to scrape metrics from k8s components using a service discovery
|
||||
* How to visualize stored data
|
||||
* How to store metrics in [VictoriaMetrics](https://victoriametrics.com)
|
||||
|
||||
**Preconditions**
|
||||
|
||||
* [Kubernetes cluster 1.19.12-gke.2100](https://cloud.google.com/kubernetes-engine). We use GKE cluster from [GCP](https://cloud.google.com/) but this guide also applies to any Kubernetes cluster. For example, [Amazon EKS](https://aws.amazon.com/ru/eks/).
|
||||
* [Helm 3 ](https://helm.sh/docs/intro/install)
|
||||
* [kubectl 1.21](https://kubernetes.io/docs/tasks/tools/install-kubectl)
|
||||
* [jq](https://stedolan.github.io/jq/download/) tool
|
||||
|
||||
## 1. VictoriaMetrics Helm repository
|
||||
|
||||
Please see the relevant [VictoriaMetrics Helm repository](https://docs.victoriametrics.com/guides/k8s-monitoring-via-vm-cluster.html#1-victoriametrics-helm-repository) section in previous guides.
|
||||
|
||||
|
||||
## 2. Install VictoriaMetrics Cluster from the Helm chart
|
||||
|
||||
Execute the following command in your terminal:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```yaml
|
||||
cat <<EOF | helm install vmcluster vm/victoria-metrics-cluster -f -
|
||||
vmselect:
|
||||
extraArgs:
|
||||
dedup.minScrapeInterval: 1ms
|
||||
replicationFactor: 2
|
||||
podAnnotations:
|
||||
prometheus.io/scrape: "true"
|
||||
prometheus.io/port: "8481"
|
||||
replicaCount: 3
|
||||
|
||||
vminsert:
|
||||
extraArgs:
|
||||
replicationFactor: 2
|
||||
podAnnotations:
|
||||
prometheus.io/scrape: "true"
|
||||
prometheus.io/port: "8480"
|
||||
replicaCount: 3
|
||||
|
||||
vmstorage:
|
||||
podAnnotations:
|
||||
prometheus.io/scrape: "true"
|
||||
prometheus.io/port: "8482"
|
||||
replicaCount: 3
|
||||
EOF
|
||||
```
|
||||
</div>
|
||||
|
||||
* The `Helm install vmcluster vm/victoria-metrics-cluster` command installs [VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html) to the default [namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/).
|
||||
* `dedup.minScrapeInterval: 1ms` configures [de-deplication](https://docs.victoriametrics.com/#deduplication) for the cluster that de-duplicates data points in the same time series if they fall within the same discrete 1s bucket. The earliest data point will be kept. In the case of equal timestamps, an arbitrary data point will be kept.
|
||||
* `replicationFactor: 2` Replication factor for the ingested data, i.e. how many copies should be made among distinct `-storageNode` instances. If the replication factor is greater than one, the deduplication must be enabled on the remote storage side.
|
||||
* `podAnnotations: prometheus.io/scrape: "true"` enables the scraping of metrics from the vmselect, vminsert and vmstorage pods.
|
||||
* `podAnnotations:prometheus.io/port: "some_port" ` enables the scraping of metrics from the vmselect, vminsert and vmstorage pods from corresponding ports.
|
||||
* `replicaCount: 3` creates three replicas of vmselect, vminsert and vmstorage.
|
||||
|
||||
|
||||
The expected result of the command execution is the following:
|
||||
|
||||
```bash
|
||||
NAME: vmcluster
|
||||
LAST DEPLOYED: Thu Jul 29 13:33:51 2021
|
||||
NAMESPACE: default
|
||||
STATUS: deployed
|
||||
REVISION: 1
|
||||
TEST SUITE: None
|
||||
NOTES:
|
||||
Write API:
|
||||
|
||||
The VictoriaMetrics write api can be accessed via port 8480 via the following DNS name from within your cluster:
|
||||
vmcluster-victoria-metrics-cluster-vminsert.default.svc.cluster.local
|
||||
|
||||
Get the VictoriaMetrics insert service URL by running these commands in the same shell:
|
||||
export POD_NAME=$(kubectl get pods --namespace default -l "app=vminsert" -o jsonpath="{.items[0].metadata.name}")
|
||||
kubectl --namespace default port-forward $POD_NAME 8480
|
||||
|
||||
You need to update your Prometheus configuration file and add the following lines to it:
|
||||
|
||||
prometheus.yml
|
||||
|
||||
remote_write:
|
||||
- url: "http://<insert-service>/insert/0/prometheus/"
|
||||
|
||||
|
||||
for example - inside the Kubernetes cluster:
|
||||
|
||||
remote_write:
|
||||
- url: "http://vmcluster-victoria-metrics-cluster-vminsert.default.svc.cluster.local:8480/insert/0/prometheus/"
|
||||
Read API:
|
||||
|
||||
The VictoriaMetrics read api can be accessed via port 8481 with the following DNS name from within your cluster:
|
||||
vmcluster-victoria-metrics-cluster-vmselect.default.svc.cluster.local
|
||||
|
||||
Get the VictoriaMetrics select service URL by running these commands in the same shell:
|
||||
export POD_NAME=$(kubectl get pods --namespace default -l "app=vmselect" -o jsonpath="{.items[0].metadata.name}")
|
||||
kubectl --namespace default port-forward $POD_NAME 8481
|
||||
|
||||
You need to specify select service URL into your Grafana:
|
||||
NOTE: you need to use the Prometheus Data Source
|
||||
|
||||
Input this URL field into Grafana
|
||||
|
||||
http://<select-service>/select/0/prometheus/
|
||||
|
||||
|
||||
for example - inside the Kubernetes cluster:
|
||||
|
||||
http://vmcluster-victoria-metrics-cluster-vmselect.default.svc.cluster.local:8481/select/0/prometheus/"
|
||||
|
||||
```
|
||||
|
||||
Verify that the VictoriaMetrics cluster pods are up and running by executing the following command:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
kubectl get pods | grep vmcluster
|
||||
```
|
||||
</div>
|
||||
|
||||
The expected output is:
|
||||
|
||||
```bash
|
||||
vmcluster-victoria-metrics-cluster-vminsert-78b84d8cd9-4mh9d 1/1 Running 0 2m28s
|
||||
vmcluster-victoria-metrics-cluster-vminsert-78b84d8cd9-4ppl7 1/1 Running 0 2m28s
|
||||
vmcluster-victoria-metrics-cluster-vminsert-78b84d8cd9-782qk 1/1 Running 0 2m28s
|
||||
vmcluster-victoria-metrics-cluster-vmselect-69c5f48bc6-4v4ws 1/1 Running 0 2m27s
|
||||
vmcluster-victoria-metrics-cluster-vmselect-69c5f48bc6-kwc7q 1/1 Running 0 2m28s
|
||||
vmcluster-victoria-metrics-cluster-vmselect-69c5f48bc6-v7pmk 1/1 Running 0 2m28s
|
||||
vmcluster-victoria-metrics-cluster-vmstorage-0 1/1 Running 0 2m27s
|
||||
vmcluster-victoria-metrics-cluster-vmstorage-1 1/1 Running 0 2m3s
|
||||
vmcluster-victoria-metrics-cluster-vmstorage-2 1/1 Running 0 99s
|
||||
```
|
||||
|
||||
## 3. Install vmagent from the Helm chart
|
||||
|
||||
To scrape metrics from Kubernetes with a VictoriaMetrics Cluster we will need to install [vmagent](https://docs.victoriametrics.com/vmagent.html) with some additional configurations. To do so, please run the following command:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```yaml
|
||||
helm install vmagent vm/victoria-metrics-agent -f https://docs.victoriametrics.com/guides/guide-vmcluster-vmagent-values.yaml
|
||||
```
|
||||
</div>
|
||||
|
||||
Here is full file content `guide-vmcluster-vmagent-values.yaml`
|
||||
|
||||
```yaml
|
||||
remoteWriteUrls:
|
||||
- http://vmcluster-victoria-metrics-cluster-vminsert.default.svc.cluster.local:8480/insert/0/prometheus/
|
||||
|
||||
scrape_configs:
|
||||
- job_name: vmagent
|
||||
static_configs:
|
||||
- targets: ["localhost:8429"]
|
||||
- job_name: "kubernetes-apiservers"
|
||||
kubernetes_sd_configs:
|
||||
- role: endpoints
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
relabel_configs:
|
||||
- source_labels:
|
||||
[
|
||||
__meta_kubernetes_namespace,
|
||||
__meta_kubernetes_service_name,
|
||||
__meta_kubernetes_endpoint_port_name,
|
||||
]
|
||||
action: keep
|
||||
regex: default;kubernetes;https
|
||||
- job_name: "kubernetes-nodes"
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
kubernetes_sd_configs:
|
||||
- role: node
|
||||
relabel_configs:
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_node_label_(.+)
|
||||
- target_label: __address__
|
||||
replacement: kubernetes.default.svc:443
|
||||
- source_labels: [__meta_kubernetes_node_name]
|
||||
regex: (.+)
|
||||
target_label: __metrics_path__
|
||||
replacement: /api/v1/nodes/$1/proxy/metrics
|
||||
- job_name: "kubernetes-nodes-cadvisor"
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
kubernetes_sd_configs:
|
||||
- role: node
|
||||
relabel_configs:
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_node_label_(.+)
|
||||
- target_label: __address__
|
||||
replacement: kubernetes.default.svc:443
|
||||
- source_labels: [__meta_kubernetes_node_name]
|
||||
regex: (.+)
|
||||
target_label: __metrics_path__
|
||||
replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
|
||||
metric_relabel_configs:
|
||||
- action: replace
|
||||
source_labels: [pod]
|
||||
regex: '(.+)'
|
||||
target_label: pod_name
|
||||
replacement: '${1}'
|
||||
- action: replace
|
||||
source_labels: [container]
|
||||
regex: '(.+)'
|
||||
target_label: container_name
|
||||
replacement: '${1}'
|
||||
- action: replace
|
||||
target_label: name
|
||||
replacement: k8s_stub
|
||||
- action: replace
|
||||
source_labels: [id]
|
||||
regex: '^/system\.slice/(.+)\.service$'
|
||||
target_label: systemd_service_name
|
||||
replacement: '${1}'
|
||||
```
|
||||
|
||||
* `remoteWriteUrls: - http://vmcluster-victoria-metrics-cluster-vminsert.default.svc.cluster.local:8480/insert/0/prometheus/` configures `vmagent` to write scraped metrics to the `vmselect service`.
|
||||
* The `metric_ralabel_configs` section allows you to process Kubernetes metrics for the Grafana dashboard.
|
||||
|
||||
|
||||
Verify that `vmagent`'s pod is up and running by executing the following command:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
kubectl get pods | grep vmagent
|
||||
```
|
||||
</div>
|
||||
|
||||
|
||||
The expected output is:
|
||||
|
||||
```bash
|
||||
vmagent-victoria-metrics-agent-57ddbdc55d-h4ljb 1/1 Running 0 13s
|
||||
```
|
||||
|
||||
## 4. Verifying HA of VictoraMetrics Cluster
|
||||
|
||||
Run the following command to check that VictoriaMetrics services are up and running:
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
kubectl get pods | grep victoria-metrics
|
||||
```
|
||||
</div>
|
||||
|
||||
The expected output is:
|
||||
|
||||
```bash
|
||||
vmagent-victoria-metrics-agent-57ddbdc55d-h4ljb 1/1 Running 0 75s
|
||||
vmcluster-victoria-metrics-cluster-vminsert-78b84d8cd9-s8v7x 1/1 Running 0 89s
|
||||
vmcluster-victoria-metrics-cluster-vminsert-78b84d8cd9-xlm9d 1/1 Running 0 89s
|
||||
vmcluster-victoria-metrics-cluster-vminsert-78b84d8cd9-xqxrh 1/1 Running 0 89s
|
||||
vmcluster-victoria-metrics-cluster-vmselect-69c5f48bc6-7dg95 1/1 Running 0 89s
|
||||
vmcluster-victoria-metrics-cluster-vmselect-69c5f48bc6-ck7qb 1/1 Running 0 89s
|
||||
vmcluster-victoria-metrics-cluster-vmselect-69c5f48bc6-jjqsl 1/1 Running 0 89s
|
||||
vmcluster-victoria-metrics-cluster-vmstorage-0 1/1 Running 0 89s
|
||||
vmcluster-victoria-metrics-cluster-vmstorage-1 1/1 Running 0 63s
|
||||
vmcluster-victoria-metrics-cluster-vmstorage-2 1/1 Running 0 34s
|
||||
```
|
||||
|
||||
To verify that metrics are present in the VictoriaMetrics send a curl request to the `vmselect` service from kubernetes or setup Grafana and check it via the web interface.
|
||||
|
||||
Run the following command to see the list of services:
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
k get svc | grep vmselect
|
||||
```
|
||||
</div>
|
||||
|
||||
The expected output:
|
||||
|
||||
```bash
|
||||
vmcluster-victoria-metrics-cluster-vmselect ClusterIP 10.88.2.69 <none> 8481/TCP 1m
|
||||
```
|
||||
|
||||
Run the following command to make `vmselect`'s port accessable from the local machine:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
kubectl port-forward svc/vmcluster-victoria-metrics-cluster-vmselect 8481:8481
|
||||
```
|
||||
</div>
|
||||
|
||||
Execute the following command to get metrics via `curl`:
|
||||
|
||||
```bash
|
||||
curl -sg 'http://127.0.0.1:8481/select/0/prometheus/api/v1/query_range?query=count(up{kubernetes_pod_name=~".*vmselect.*"})&start=-10m&step=1m' | jq
|
||||
```
|
||||
|
||||
The expected output is:
|
||||
|
||||
```bash
|
||||
{
|
||||
"status": "success",
|
||||
"isPartial": false,
|
||||
"data": {
|
||||
"resultType": "matrix",
|
||||
"result": [
|
||||
{
|
||||
"metric": {},
|
||||
"values": [
|
||||
[
|
||||
1628065480.657,
|
||||
"3"
|
||||
],
|
||||
[
|
||||
1628065540.657,
|
||||
"3"
|
||||
],
|
||||
[
|
||||
1628065600.657,
|
||||
"3"
|
||||
],
|
||||
[
|
||||
1628065660.657,
|
||||
"3"
|
||||
],
|
||||
[
|
||||
1628065720.657,
|
||||
"3"
|
||||
],
|
||||
[
|
||||
1628065780.657,
|
||||
"3"
|
||||
],
|
||||
[
|
||||
1628065840.657,
|
||||
"3"
|
||||
]
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
* Query `http://127.0.0.1:8481/select/0/prometheus/api/v1/query_range` uses [VictoriaMetrics querying API](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format) to fetch previously stored data points;
|
||||
* Argument `query=count(up{kubernetes_pod_name=~".*vmselect.*"})` specifies the query we want to execute. Specifically, we calculate the number of `vmselect` pods.
|
||||
* Additional arguments `start=-10m&step=1m'` set requested time range from -10 minutes (10 minutes ago) to now (default value if `end` argument is omitted) and step (the distance between returned data points) of 1 minute;
|
||||
* By adding `| jq` we pass the output to the jq utility which outputs information in json format
|
||||
|
||||
The expected result of the query `count(up{kubernetes_pod_name=~".*vmselect.*"})` should be equal to `3` - the number of replicas we set via `replicaCount` parameter.
|
||||
|
||||
|
||||
To test via Grafana, we need to install it first. [Install and connect Grafana to VictoriaMetrics](https://docs.victoriametrics.com/guides/k8s-monitoring-via-vm-cluster.html#4-install-and-connect-grafana-to-victoriametrics-with-helm), login into Grafana and open the metrics [Explore](http://127.0.0.1:3000/explore) page.
|
||||
|
||||
|
||||
<p align="center">
|
||||
<img src="guide-vmcluster-k8s-ha-explore.png" width="800" alt="grafana explore">
|
||||
</p>
|
||||
|
||||
Choose `victoriametrics` from the list of datasources and enter `count(up{kubernetes_pod_name=~".*vmselect.*"})` to the **Metric browser** field as shown on the screenshot, then press **Run query** button:
|
||||
|
||||
<p align="center">
|
||||
<img src="guide-vmcluster-k8s-ha-explore-count-up.png" width="800" alt="">
|
||||
</p>
|
||||
|
||||
The expected output is:
|
||||
|
||||
<p align="center">
|
||||
<img src="guide-vmcluster-k8s-ha-explore-count-up-graph.png" width="800" alt="">
|
||||
</p>
|
||||
|
||||
## 5. High Availability
|
||||
|
||||
To test if High Availability works, we need to shutdown one of the `vmstorages`. To do this, run the following command:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
kubectl scale sts vmcluster-victoria-metrics-cluster-vmstorage --replicas=2
|
||||
```
|
||||
</div>
|
||||
|
||||
Verify that now we have two running `vmstorages` in the cluster by executing the following command:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
kubectl get pods | grep vmstorage
|
||||
```
|
||||
</div>
|
||||
|
||||
The expected output is:
|
||||
```bash
|
||||
vmcluster-victoria-metrics-cluster-vmstorage-0 1/1 Running 0 44m
|
||||
vmcluster-victoria-metrics-cluster-vmstorage-1 1/1 Running 0 43m
|
||||
```
|
||||
|
||||
Return to Grafana Explore and press the **Run query** button again.
|
||||
|
||||
The expected output is:
|
||||
|
||||
<p align="center">
|
||||
<img src="guide-vmcluster-k8s-ha-explore-count-up-graph.png" width="800" alt="">
|
||||
</p>
|
||||
|
||||
As you can see, after we scaled down the `vmstorage` replicas number from three to two pods, metrics are still available and correct. The response is not partial as it was before scaling. Also we see that query `count(up{kubernetes_pod_name=~".*vmselect.*"})` returns the same value as before.
|
||||
|
||||
To confirm that the number of `vmstorage` pods is equivalent to two, execute the following request in Grafana Explore:
|
||||
|
||||
<p align="center">
|
||||
<img src="guide-vmcluster-k8s-ha-explore-count-up-graph2.png" width="800" alt="">
|
||||
</p>
|
||||
|
||||
|
||||
## 6. Final thoughts
|
||||
|
||||
* We set up VictoriaMetrics for Kubernetes cluster with HA.
|
||||
* We collected metrics from running services and stored them in the VictoriaMetrics database.
|
||||
* We configured `dedup.minScrapeInterval` and `replicationFactor: 2` for VictoriaMetrics cluster for high availability purposes.
|
||||
* We tested and made sure that metrics are available even if one of `vmstorages` nodes was turned off.
|
551
docs/guides/k8s-monitoring-via-vm-cluster.md
Normal file
|
@ -0,0 +1,551 @@
|
|||
# Kubernetes monitoring with VictoriaMetrics Cluster
|
||||
|
||||
|
||||
**This guide covers:**
|
||||
|
||||
* The setup of a [VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html) in [Kubernetes](https://kubernetes.io/) via Helm charts
|
||||
* How to scrape metrics from k8s components using service discovery
|
||||
* How to visualize stored data
|
||||
* How to store metrics in [VictoriaMetrics](https://victoriametrics.com) tsdb
|
||||
|
||||
**Precondition**
|
||||
|
||||
We will use:
|
||||
* [Kubernetes cluster 1.19.9-gke.1900](https://cloud.google.com/kubernetes-engine)
|
||||
> We use GKE cluster from [GCP](https://cloud.google.com/) but this guide also applies on any Kubernetes cluster. For example [Amazon EKS](https://aws.amazon.com/ru/eks/).
|
||||
* [Helm 3 ](https://helm.sh/docs/intro/install)
|
||||
* [kubectl 1.21](https://kubernetes.io/docs/tasks/tools/install-kubectl)
|
||||
|
||||
<p align="center">
|
||||
<img src="guide-vmcluster-k8s-scheme.png" width="800" alt="VictoriaMetrics Cluster on Kubernetes cluster">
|
||||
</p>
|
||||
|
||||
## 1. VictoriaMetrics Helm repository
|
||||
|
||||
> For this guide we will use Helm 3 but if you already use Helm 2 please see this [https://github.com/VictoriaMetrics/helm-charts#for-helm-v2](https://github.com/VictoriaMetrics/helm-charts#for-helm-v2)
|
||||
|
||||
You need to add the VictoriaMetrics Helm repository to install VictoriaMetrics components. We’re going to use [VictoriaMetrics Cluster](https://monitoring.monster/Cluster-VictoriaMetrics.html). You can do this by running the following command:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
helm repo add vm https://victoriametrics.github.io/helm-charts/
|
||||
```
|
||||
|
||||
</div>
|
||||
|
||||
Update Helm repositories:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
helm repo update
|
||||
```
|
||||
|
||||
</div>
|
||||
|
||||
To verify that everything is set up correctly you may run this command:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
helm search repo vm/
|
||||
```
|
||||
|
||||
</div>
|
||||
|
||||
The expected output is:
|
||||
|
||||
```bash
|
||||
NAME CHART VERSION APP VERSION DESCRIPTION
|
||||
vm/victoria-metrics-agent 0.7.20 v1.62.0 Victoria Metrics Agent - collects metrics from ...
|
||||
vm/victoria-metrics-alert 0.3.34 v1.62.0 Victoria Metrics Alert - executes a list of giv...
|
||||
vm/victoria-metrics-auth 0.2.23 1.62.0 Victoria Metrics Auth - is a simple auth proxy ...
|
||||
vm/victoria-metrics-cluster 0.8.32 1.62.0 Victoria Metrics Cluster version - high-perform...
|
||||
vm/victoria-metrics-k8s-stack 0.2.9 1.16.0 Kubernetes monitoring on VictoriaMetrics stack....
|
||||
vm/victoria-metrics-operator 0.1.17 0.16.0 Victoria Metrics Operator
|
||||
vm/victoria-metrics-single 0.7.5 1.62.0 Victoria Metrics Single version - high-performa...
|
||||
```
|
||||
|
||||
## 2. Install VictoriaMetrics Cluster from the Helm chart
|
||||
|
||||
Run this command in your terminal:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```yaml
|
||||
cat <<EOF | helm install vmcluster vm/victoria-metrics-cluster -f -
|
||||
vmselect:
|
||||
podAnnotations:
|
||||
prometheus.io/scrape: "true"
|
||||
prometheus.io/port: "8481"
|
||||
|
||||
vminsert:
|
||||
podAnnotations:
|
||||
prometheus.io/scrape: "true"
|
||||
prometheus.io/port: "8480"
|
||||
|
||||
vmstorage:
|
||||
podAnnotations:
|
||||
prometheus.io/scrape: "true"
|
||||
prometheus.io/port: "8482"
|
||||
EOF
|
||||
```
|
||||
</div>
|
||||
|
||||
* By running `Helm install vmcluster vm/victoria-metrics-cluster` we install [VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html) to default [namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/) inside your cluster.
|
||||
* By adding `podAnnotations: prometheus.io/scrape: "true"` we enable the scraping of metrics from the vmselect, vminsert and vmstorage pods.
|
||||
* By adding `podAnnotations:prometheus.io/port: "some_port" ` we enable the scraping of metrics from the vmselect, vminsert and vmstorage pods from their ports as well.
|
||||
|
||||
|
||||
As a result of this command you will see the following output:
|
||||
|
||||
```bash
|
||||
NAME: vmcluster
|
||||
LAST DEPLOYED: Thu Jul 1 09:41:57 2021
|
||||
NAMESPACE: default
|
||||
STATUS: deployed
|
||||
REVISION: 1
|
||||
TEST SUITE: None
|
||||
NOTES:
|
||||
Write API:
|
||||
|
||||
The Victoria Metrics write api can be accessed via port 8480 with the following DNS name from within your cluster:
|
||||
vmcluster-victoria-metrics-cluster-vminsert.default.svc.cluster.local
|
||||
|
||||
Get the Victoria Metrics insert service URL by running these commands in the same shell:
|
||||
export POD_NAME=$(kubectl get pods --namespace default -l "app=vminsert" -o jsonpath="{.items[0].metadata.name}")
|
||||
kubectl --namespace default port-forward $POD_NAME 8480
|
||||
|
||||
You need to update your Prometheus configuration file and add the following lines to it:
|
||||
|
||||
prometheus.yml
|
||||
|
||||
remote_write:
|
||||
- url: "http://<insert-service>/insert/0/prometheus/"
|
||||
|
||||
|
||||
|
||||
for example - inside the Kubernetes cluster:
|
||||
|
||||
remote_write:
|
||||
- url: "http://vmcluster-victoria-metrics-cluster-vminsert.default.svc.cluster.local:8480/insert/0/prometheus/"
|
||||
Read API:
|
||||
|
||||
The VictoriaMetrics read api can be accessed via port 8481 with the following DNS name from within your cluster:
|
||||
vmcluster-victoria-metrics-cluster-vmselect.default.svc.cluster.local
|
||||
|
||||
Get the VictoriaMetrics select service URL by running these commands in the same shell:
|
||||
export POD_NAME=$(kubectl get pods --namespace default -l "app=vmselect" -o jsonpath="{.items[0].metadata.name}")
|
||||
kubectl --namespace default port-forward $POD_NAME 8481
|
||||
|
||||
You will need to specify select service URL in your Grafana:
|
||||
NOTE: you need to use Prometheus Data Source
|
||||
|
||||
Input this URL field in Grafana
|
||||
|
||||
http://<select-service>/select/0/prometheus/
|
||||
|
||||
|
||||
for example - inside the Kubernetes cluster:
|
||||
|
||||
http://vmcluster-victoria-metrics-cluster-vmselect.default.svc.cluster.local:8481/select/0/prometheus/"
|
||||
|
||||
```
|
||||
|
||||
For us it’s important to remember the url for the datasource (copy lines from the output).
|
||||
|
||||
Verify that [VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html) pods are up and running by executing the following command:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
kubectl get pods
|
||||
```
|
||||
</div>
|
||||
|
||||
The expected output is:
|
||||
|
||||
```bash
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
vmcluster-victoria-metrics-cluster-vminsert-689cbc8f55-95szg 1/1 Running 0 16m
|
||||
vmcluster-victoria-metrics-cluster-vminsert-689cbc8f55-f852l 1/1 Running 0 16m
|
||||
vmcluster-victoria-metrics-cluster-vmselect-977d74cdf-bbgp5 1/1 Running 0 16m
|
||||
vmcluster-victoria-metrics-cluster-vmselect-977d74cdf-vzp6z 1/1 Running 0 16m
|
||||
vmcluster-victoria-metrics-cluster-vmstorage-0 1/1 Running 0 16m
|
||||
vmcluster-victoria-metrics-cluster-vmstorage-1 1/1 Running 0 16m
|
||||
```
|
||||
|
||||
## 3. Install vmagent from the Helm chart
|
||||
|
||||
To scrape metrics from Kubernetes with a [VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html) we need to install [vmagent](https://docs.victoriametrics.com/vmagent.html) with additional configuration. To do so, please run these commands in your terminal:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```yaml
|
||||
helm install vmagent vm/victoria-metrics-agent -f https://docs.victoriametrics.com/guides/guide-vmcluster-vmagent-values.yaml
|
||||
```
|
||||
</div>
|
||||
|
||||
Here is full file content `guide-vmcluster-vmagent-values.yaml`
|
||||
|
||||
```yaml
|
||||
remoteWriteUrls:
|
||||
- http://vmcluster-victoria-metrics-cluster-vminsert.default.svc.cluster.local:8480/insert/0/prometheus/
|
||||
|
||||
config:
|
||||
global:
|
||||
scrape_interval: 10s
|
||||
|
||||
scrape_configs:
|
||||
- job_name: vmagent
|
||||
static_configs:
|
||||
- targets: ["localhost:8429"]
|
||||
- job_name: "kubernetes-apiservers"
|
||||
kubernetes_sd_configs:
|
||||
- role: endpoints
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
relabel_configs:
|
||||
- source_labels:
|
||||
[
|
||||
__meta_kubernetes_namespace,
|
||||
__meta_kubernetes_service_name,
|
||||
__meta_kubernetes_endpoint_port_name,
|
||||
]
|
||||
action: keep
|
||||
regex: default;kubernetes;https
|
||||
- job_name: "kubernetes-nodes"
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
kubernetes_sd_configs:
|
||||
- role: node
|
||||
relabel_configs:
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_node_label_(.+)
|
||||
- target_label: __address__
|
||||
replacement: kubernetes.default.svc:443
|
||||
- source_labels: [__meta_kubernetes_node_name]
|
||||
regex: (.+)
|
||||
target_label: __metrics_path__
|
||||
replacement: /api/v1/nodes/$1/proxy/metrics
|
||||
- job_name: "kubernetes-nodes-cadvisor"
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
kubernetes_sd_configs:
|
||||
- role: node
|
||||
relabel_configs:
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_node_label_(.+)
|
||||
- target_label: __address__
|
||||
replacement: kubernetes.default.svc:443
|
||||
- source_labels: [__meta_kubernetes_node_name]
|
||||
regex: (.+)
|
||||
target_label: __metrics_path__
|
||||
replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
|
||||
metric_relabel_configs:
|
||||
- action: replace
|
||||
source_labels: [pod]
|
||||
regex: '(.+)'
|
||||
target_label: pod_name
|
||||
replacement: '${1}'
|
||||
- action: replace
|
||||
source_labels: [container]
|
||||
regex: '(.+)'
|
||||
target_label: container_name
|
||||
replacement: '${1}'
|
||||
- action: replace
|
||||
target_label: name
|
||||
replacement: k8s_stub
|
||||
- action: replace
|
||||
source_labels: [id]
|
||||
regex: '^/system\.slice/(.+)\.service$'
|
||||
target_label: systemd_service_name
|
||||
replacement: '${1}'
|
||||
- job_name: "kubernetes-service-endpoints"
|
||||
kubernetes_sd_configs:
|
||||
- role: endpoints
|
||||
relabel_configs:
|
||||
- action: drop
|
||||
source_labels: [__meta_kubernetes_pod_container_init]
|
||||
regex: true
|
||||
- action: keep_if_equal
|
||||
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port, __meta_kubernetes_pod_container_port_number]
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_scrape]
|
||||
action: keep
|
||||
regex: true
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_scheme]
|
||||
action: replace
|
||||
target_label: __scheme__
|
||||
regex: (https?)
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_path]
|
||||
action: replace
|
||||
target_label: __metrics_path__
|
||||
regex: (.+)
|
||||
- source_labels:
|
||||
[
|
||||
__address__,
|
||||
__meta_kubernetes_service_annotation_prometheus_io_port,
|
||||
]
|
||||
action: replace
|
||||
target_label: __address__
|
||||
regex: ([^:]+)(?::\d+)?;(\d+)
|
||||
replacement: $1:$2
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_service_label_(.+)
|
||||
- source_labels: [__meta_kubernetes_namespace]
|
||||
action: replace
|
||||
target_label: kubernetes_namespace
|
||||
- source_labels: [__meta_kubernetes_service_name]
|
||||
action: replace
|
||||
target_label: kubernetes_name
|
||||
- source_labels: [__meta_kubernetes_pod_node_name]
|
||||
action: replace
|
||||
target_label: kubernetes_node
|
||||
- job_name: "kubernetes-service-endpoints-slow"
|
||||
scrape_interval: 5m
|
||||
scrape_timeout: 30s
|
||||
kubernetes_sd_configs:
|
||||
- role: endpoints
|
||||
relabel_configs:
|
||||
- action: drop
|
||||
source_labels: [__meta_kubernetes_pod_container_init]
|
||||
regex: true
|
||||
- action: keep_if_equal
|
||||
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port, __meta_kubernetes_pod_container_port_number]
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_scrape_slow]
|
||||
action: keep
|
||||
regex: true
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_scheme]
|
||||
action: replace
|
||||
target_label: __scheme__
|
||||
regex: (https?)
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_path]
|
||||
action: replace
|
||||
target_label: __metrics_path__
|
||||
regex: (.+)
|
||||
- source_labels:
|
||||
[
|
||||
__address__,
|
||||
__meta_kubernetes_service_annotation_prometheus_io_port,
|
||||
]
|
||||
action: replace
|
||||
target_label: __address__
|
||||
regex: ([^:]+)(?::\d+)?;(\d+)
|
||||
replacement: $1:$2
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_service_label_(.+)
|
||||
- source_labels: [__meta_kubernetes_namespace]
|
||||
action: replace
|
||||
target_label: kubernetes_namespace
|
||||
- source_labels: [__meta_kubernetes_service_name]
|
||||
action: replace
|
||||
target_label: kubernetes_name
|
||||
- source_labels: [__meta_kubernetes_pod_node_name]
|
||||
action: replace
|
||||
target_label: kubernetes_node
|
||||
- job_name: "kubernetes-services"
|
||||
metrics_path: /probe
|
||||
params:
|
||||
module: [http_2xx]
|
||||
kubernetes_sd_configs:
|
||||
- role: service
|
||||
relabel_configs:
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_probe]
|
||||
action: keep
|
||||
regex: true
|
||||
- source_labels: [__address__]
|
||||
target_label: __param_target
|
||||
- target_label: __address__
|
||||
replacement: blackbox
|
||||
- source_labels: [__param_target]
|
||||
target_label: instance
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_service_label_(.+)
|
||||
- source_labels: [__meta_kubernetes_namespace]
|
||||
target_label: kubernetes_namespace
|
||||
- source_labels: [__meta_kubernetes_service_name]
|
||||
target_label: kubernetes_name
|
||||
- job_name: "kubernetes-pods"
|
||||
kubernetes_sd_configs:
|
||||
- role: pod
|
||||
relabel_configs:
|
||||
- action: drop
|
||||
source_labels: [__meta_kubernetes_pod_container_init]
|
||||
regex: true
|
||||
- action: keep_if_equal
|
||||
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port, __meta_kubernetes_pod_container_port_number]
|
||||
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
|
||||
action: keep
|
||||
regex: true
|
||||
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
|
||||
action: replace
|
||||
target_label: __metrics_path__
|
||||
regex: (.+)
|
||||
- source_labels:
|
||||
[__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
|
||||
action: replace
|
||||
regex: ([^:]+)(?::\d+)?;(\d+)
|
||||
replacement: $1:$2
|
||||
target_label: __address__
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_pod_label_(.+)
|
||||
- source_labels: [__meta_kubernetes_namespace]
|
||||
action: replace
|
||||
target_label: kubernetes_namespace
|
||||
- source_labels: [__meta_kubernetes_pod_name]
|
||||
action: replace
|
||||
target_label: kubernetes_pod_name
|
||||
```
|
||||
|
||||
* By adding `remoteWriteUrls: - http://vmcluster-victoria-metrics-cluster-vminsert.default.svc.cluster.local:8480/insert/0/prometheus/` we configuring [vmagent](https://docs.victoriametrics.com/vmagent.html) to write scraped metrics into the `vmselect service`.
|
||||
* The second part of this yaml file is needed to add the `metric_ralabel_configs` section that helps us to show Kubernetes metrics on the Grafana dashboard.
|
||||
|
||||
|
||||
Verify that `vmagent`'s pod is up and running by executing the following command:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
kubectl get pods | grep vmagent
|
||||
```
|
||||
</div>
|
||||
|
||||
The expected output is:
|
||||
|
||||
```bash
|
||||
vmagent-victoria-metrics-agent-69974b95b4-mhjph 1/1 Running 0 11m
|
||||
```
|
||||
|
||||
|
||||
## 4. Install and connect Grafana to VictoriaMetrics with Helm
|
||||
|
||||
Add the Grafana Helm repository.
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
helm repo add grafana https://grafana.github.io/helm-charts
|
||||
helm repo update
|
||||
```
|
||||
</div>
|
||||
|
||||
See more information on Grafana ArtifactHUB [https://artifacthub.io/packages/helm/grafana/grafana](https://artifacthub.io/packages/helm/grafana/grafana)
|
||||
|
||||
To install the chart with the release name `my-grafana`, add the VictoriaMetrics datasource with official dashboard and the Kubernetes dashboard:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```yaml
|
||||
cat <<EOF | helm install my-grafana grafana/grafana -f -
|
||||
datasources:
|
||||
datasources.yaml:
|
||||
apiVersion: 1
|
||||
datasources:
|
||||
- name: victoriametrics
|
||||
type: prometheus
|
||||
orgId: 1
|
||||
url: http://vmcluster-victoria-metrics-cluster-vmselect.default.svc.cluster.local:8481/select/0/prometheus/
|
||||
access: proxy
|
||||
isDefault: true
|
||||
updateIntervalSeconds: 10
|
||||
editable: true
|
||||
|
||||
dashboardProviders:
|
||||
dashboardproviders.yaml:
|
||||
apiVersion: 1
|
||||
providers:
|
||||
- name: 'default'
|
||||
orgId: 1
|
||||
folder: ''
|
||||
type: file
|
||||
disableDeletion: true
|
||||
editable: true
|
||||
options:
|
||||
path: /var/lib/grafana/dashboards/default
|
||||
|
||||
dashboards:
|
||||
default:
|
||||
victoriametrics:
|
||||
gnetId: 11176
|
||||
revision: 15
|
||||
datasource: victoriametrics
|
||||
vmagent:
|
||||
gnetId: 12683
|
||||
revision: 5
|
||||
datasource: victoriametrics
|
||||
kubernetes:
|
||||
gnetId: 14205
|
||||
revision: 1
|
||||
datasource: victoriametrics
|
||||
EOF
|
||||
```
|
||||
</div>
|
||||
|
||||
By running this command we:
|
||||
* Install Grafana from the Helm repository.
|
||||
* Provision a VictoriaMetrics data source with the url from the output above which we remembered.
|
||||
* Add this [https://grafana.com/grafana/dashboards/11176](https://grafana.com/grafana/dashboards/11176) dashboard for [VictoriaMetrics Cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html).
|
||||
* Add this [https://grafana.com/grafana/dashboards/12683](https://grafana.com/grafana/dashboards/12683) dashboard for [VictoriaMetrics Agent](https://docs.victoriametrics.com/vmagent.html).
|
||||
* Add this [https://grafana.com/grafana/dashboards/14205](https://grafana.com/grafana/dashboards/14205) dashboard to see Kubernetes cluster metrics.
|
||||
|
||||
|
||||
Please see the output log in your terminal. Copy, paste and run these commands.
|
||||
The first one will show `admin` password for the Grafana admin.
|
||||
The second and the third will forward Grafana to `127.0.0.1:3000`:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
kubectl get secret --namespace default my-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
|
||||
|
||||
export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=my-grafana" -o jsonpath="{.items[0].metadata.name}")
|
||||
|
||||
kubectl --namespace default port-forward $POD_NAME 3000
|
||||
```
|
||||
</div>
|
||||
|
||||
## 5. Check the result you obtained in your browser
|
||||
|
||||
To check that [VictoriaMetrics](https://victoriametrics.com) collects metrics from k8s cluster open in browser [http://127.0.0.1:3000/dashboards](http://127.0.0.1:3000/dashboards) and choose the `Kubernetes Cluster Monitoring (via Prometheus)` dashboard. Use `admin` for login and `password` that you previously got from kubectl.
|
||||
|
||||
<p align="center">
|
||||
<img src="guide-vmcluster-dashes-agent.png" width="800" alt="grafana dashboards">
|
||||
</p>
|
||||
|
||||
You will see something like this:
|
||||
<p align="center">
|
||||
<img src="guide-vmcluster-k8s-dashboard.png" width="800" alt="Kubernetes metrics provided by vmcluster">
|
||||
</p>
|
||||
|
||||
The VictoriaMetrics dashboard is also available to use:
|
||||
<p align="center">
|
||||
<img src="guide-vmcluster-grafana-dash.png" width="800" alt="VictoriaMetrics cluster dashboard">
|
||||
</p>
|
||||
|
||||
vmagent has it’s own dashboard:
|
||||
<p align="center">
|
||||
<img src="guide-vmcluster-vmagent-grafana-dash.png" width="800" alt="vmagent dashboard">
|
||||
</p>
|
||||
|
||||
## 6. Final thoughts
|
||||
|
||||
* We set up TimeSeries Database for your Kubernetes cluster.
|
||||
* We collected metrics from all running pods,nodes, … and stored them in a VictoriaMetrics database.
|
||||
* We visualized resources used in the Kubernetes cluster by using Grafana dashboards.
|
351
docs/guides/k8s-monitoring-via-vm-single.md
Normal file
|
@ -0,0 +1,351 @@
|
|||
# Kubernetes monitoring via VictoriaMetrics Single
|
||||
|
||||
|
||||
**This guide covers:**
|
||||
|
||||
* The setup of a [VictoriaMetrics Single](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html) in [Kubernetes](https://kubernetes.io/) via Helm charts
|
||||
* How to scrape metrics from k8s components using service discovery
|
||||
* How to visualize stored data
|
||||
* How to store metrics in [VictoriaMetrics](https://victoriametrics.com) tsdb
|
||||
|
||||
**Precondition**
|
||||
|
||||
We will use:
|
||||
* [Kubernetes cluster 1.19.9-gke.1900](https://cloud.google.com/kubernetes-engine)
|
||||
> We use GKE cluster from [GCP](https://cloud.google.com/) but this guide is also applied on any Kubernetes cluster. For example [Amazon EKS](https://aws.amazon.com/ru/eks/).
|
||||
* [Helm 3 ](https://helm.sh/docs/intro/install)
|
||||
* [kubectl 1.21](https://kubernetes.io/docs/tasks/tools/install-kubectl)
|
||||
|
||||
<p align="center">
|
||||
<img src="guide-vmsingle-k8s-scheme.png" width="800" alt="VictoriaMetrics Single on Kubernetes cluster">
|
||||
</p>
|
||||
|
||||
## 1. VictoriaMetrics Helm repository
|
||||
|
||||
> For this guide we will use Helm 3 but if you already use Helm 2 please see this [https://github.com/VictoriaMetrics/helm-charts#for-helm-v2](https://github.com/VictoriaMetrics/helm-charts#for-helm-v2)
|
||||
|
||||
You need to add the VictoriaMetrics Helm repository to install VictoriaMetrics components. We’re going to use [VictoriaMetrics Single](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html). You can do this by running the following command:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
helm repo add vm https://victoriametrics.github.io/helm-charts/
|
||||
```
|
||||
|
||||
</div>
|
||||
|
||||
Update Helm repositories:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
helm repo update
|
||||
```
|
||||
|
||||
</div>
|
||||
|
||||
To verify that everything is set up correctly you may run this command:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
helm search repo vm/
|
||||
```
|
||||
|
||||
</div>
|
||||
|
||||
The expected output is:
|
||||
|
||||
```bash
|
||||
NAME CHART VERSION APP VERSION DESCRIPTION
|
||||
vm/victoria-metrics-agent 0.7.20 v1.62.0 Victoria Metrics Agent - collects metrics from ...
|
||||
vm/victoria-metrics-alert 0.3.34 v1.62.0 Victoria Metrics Alert - executes a list of giv...
|
||||
vm/victoria-metrics-auth 0.2.23 1.62.0 Victoria Metrics Auth - is a simple auth proxy ...
|
||||
vm/victoria-metrics-cluster 0.8.32 1.62.0 Victoria Metrics Cluster version - high-perform...
|
||||
vm/victoria-metrics-k8s-stack 0.2.9 1.16.0 Kubernetes monitoring on VictoriaMetrics stack....
|
||||
vm/victoria-metrics-operator 0.1.17 0.16.0 Victoria Metrics Operator
|
||||
vm/victoria-metrics-single 0.7.5 1.62.0 Victoria Metrics Single version - high-performa...
|
||||
```
|
||||
|
||||
|
||||
## 2. Install [VictoriaMetrics Single](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html) from Helm Chart
|
||||
|
||||
Run this command in your terminal:
|
||||
|
||||
<div class="with-copy" markdown="1">.html
|
||||
|
||||
```yaml
|
||||
helm install vmsingle vm/victoria-metrics-single -f https://docs.victoriametrics.com/guides/guide-vmsingle-values.yaml
|
||||
```
|
||||
|
||||
Here is full file content `guide-vmsingle-values.yaml`
|
||||
|
||||
```yaml
|
||||
server:
|
||||
scrape:
|
||||
enabled: true
|
||||
configMap: ""
|
||||
config:
|
||||
global:
|
||||
scrape_interval: 15s
|
||||
scrape_configs:
|
||||
- job_name: victoriametrics
|
||||
static_configs:
|
||||
- targets: [ "localhost:8428" ]
|
||||
- job_name: "kubernetes-apiservers"
|
||||
kubernetes_sd_configs:
|
||||
- role: endpoints
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
relabel_configs:
|
||||
- source_labels:
|
||||
[
|
||||
__meta_kubernetes_namespace,
|
||||
__meta_kubernetes_service_name,
|
||||
__meta_kubernetes_endpoint_port_name,
|
||||
]
|
||||
action: keep
|
||||
regex: default;kubernetes;https
|
||||
- job_name: "kubernetes-nodes"
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
kubernetes_sd_configs:
|
||||
- role: node
|
||||
relabel_configs:
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_node_label_(.+)
|
||||
- target_label: __address__
|
||||
replacement: kubernetes.default.svc:443
|
||||
- source_labels: [ __meta_kubernetes_node_name ]
|
||||
regex: (.+)
|
||||
target_label: __metrics_path__
|
||||
replacement: /api/v1/nodes/$1/proxy/metrics
|
||||
- job_name: "kubernetes-nodes-cadvisor"
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
kubernetes_sd_configs:
|
||||
- role: node
|
||||
relabel_configs:
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_node_label_(.+)
|
||||
- target_label: __address__
|
||||
replacement: kubernetes.default.svc:443
|
||||
- source_labels: [ __meta_kubernetes_node_name ]
|
||||
regex: (.+)
|
||||
target_label: __metrics_path__
|
||||
replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
|
||||
metric_relabel_configs:
|
||||
- action: replace
|
||||
source_labels: [pod]
|
||||
regex: '(.+)'
|
||||
target_label: pod_name
|
||||
replacement: '${1}'
|
||||
- action: replace
|
||||
source_labels: [container]
|
||||
regex: '(.+)'
|
||||
target_label: container_name
|
||||
replacement: '${1}'
|
||||
- action: replace
|
||||
target_label: name
|
||||
replacement: k8s_stub
|
||||
- action: replace
|
||||
source_labels: [id]
|
||||
regex: '^/system\.slice/(.+)\.service$'
|
||||
target_label: systemd_service_name
|
||||
replacement: '${1}'
|
||||
|
||||
```
|
||||
|
||||
|
||||
</div>
|
||||
|
||||
* By running `helm install vmsingle vm/victoria-metrics-single` we install [VictoriaMetrics Single](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html) to default [namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/) inside your cluster
|
||||
* By adding `scrape: enable: true` we add and enable autodiscovery scraping from kubernetes cluster to [VictoriaMetrics Single](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html)
|
||||
* On line 166 from [https://docs.victoriametrics.com/guides/guide-vmsingle-values.yaml](https://docs.victoriametrics.com/guides/guide-vmsingle-values.yaml) we added `metric_ralabel_configs` section that will help us to show Kubernetes metrics on Grafana dashboard.
|
||||
|
||||
|
||||
As a result of the command you will see the following output:
|
||||
|
||||
```bash
|
||||
NAME: victoria-metrics
|
||||
LAST DEPLOYED: Fri Jun 25 12:06:13 2021
|
||||
NAMESPACE: default
|
||||
STATUS: deployed
|
||||
REVISION: 1
|
||||
TEST SUITE: None
|
||||
NOTES:
|
||||
The VictoriaMetrics write api can be accessed via port 8428 on the following DNS name from within your cluster:
|
||||
vmsingle-victoria-metrics-single-server.default.svc.cluster.local
|
||||
|
||||
|
||||
Metrics Ingestion:
|
||||
Get the Victoria Metrics service URL by running these commands in the same shell:
|
||||
export POD_NAME=$(kubectl get pods --namespace default -l "app=server" -o jsonpath="{.items[0].metadata.name}")
|
||||
kubectl --namespace default port-forward $POD_NAME 8428
|
||||
|
||||
Write url inside the kubernetes cluster:
|
||||
http://vmsingle-victoria-metrics-single-server.default.svc.cluster.local:8428/api/v1/write
|
||||
|
||||
Metrics Scrape:
|
||||
Pull-based scrapes are enabled
|
||||
Scrape config can be displayed by running this command::
|
||||
kubectl get cm vmsingle-victoria-metrics-single-server-scrapeconfig -n default
|
||||
|
||||
The target’s information is accessible via api:
|
||||
Inside cluster:
|
||||
http://vmsingle-victoria-metrics-single-server.default.svc.cluster.local:8428/targets
|
||||
Outside cluster:
|
||||
You need to port-forward service (see instructions above) and call
|
||||
http://<service-host-port>/targets
|
||||
|
||||
Read Data:
|
||||
The following url can be used as the datasource url in Grafana::
|
||||
http://vmsingle-victoria-metrics-single-server.default.svc.cluster.local:8428
|
||||
|
||||
```
|
||||
|
||||
For us it’s important to remember the url for the datasource (copy lines from output).
|
||||
|
||||
Verify that VictoriaMetrics pod is up and running by executing the following command:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
kubectl get pods
|
||||
```
|
||||
|
||||
</div>
|
||||
|
||||
The expected output is:
|
||||
|
||||
```bash
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
vmsingle-victoria-metrics-single-server-0 1/1 Running 0 68s
|
||||
```
|
||||
|
||||
|
||||
## 3. Install and connect Grafana to VictoriaMetrics with Helm
|
||||
|
||||
Add the Grafana Helm repository.
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
helm repo add grafana https://grafana.github.io/helm-charts
|
||||
helm repo update
|
||||
```
|
||||
|
||||
</div>
|
||||
|
||||
By installing the Chart with the release name `my-grafana`, you add the VictoriaMetrics datasource with official dashboard and kubernetes dashboard:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```yaml
|
||||
cat <<EOF | helm install my-grafana grafana/grafana -f -
|
||||
datasources:
|
||||
datasources.yaml:
|
||||
apiVersion: 1
|
||||
datasources:
|
||||
- name: victoriametrics
|
||||
type: prometheus
|
||||
orgId: 1
|
||||
url: http://vmsingle-victoria-metrics-single-server.default.svc.cluster.local:8428
|
||||
access: proxy
|
||||
isDefault: true
|
||||
updateIntervalSeconds: 10
|
||||
editable: true
|
||||
|
||||
dashboardProviders:
|
||||
dashboardproviders.yaml:
|
||||
apiVersion: 1
|
||||
providers:
|
||||
- name: 'default'
|
||||
orgId: 1
|
||||
folder: ''
|
||||
type: file
|
||||
disableDeletion: true
|
||||
editable: true
|
||||
options:
|
||||
path: /var/lib/grafana/dashboards/default
|
||||
|
||||
dashboards:
|
||||
default:
|
||||
victoriametrics:
|
||||
gnetId: 10229
|
||||
revision: 20
|
||||
datasource: victoriametrics
|
||||
kubernetes:
|
||||
gnetId: 14205
|
||||
revision: 1
|
||||
datasource: victoriametrics
|
||||
EOF
|
||||
```
|
||||
|
||||
</div>
|
||||
|
||||
By running this command we:
|
||||
* Install Grafana from Helm repository.
|
||||
* Provision VictoriaMetrics datasource with the url from the output above which we copied before.
|
||||
* Add this [https://grafana.com/grafana/dashboards/10229](https://grafana.com/grafana/dashboards/10229) dashboard for VictoriaMetrics.
|
||||
* Add this [https://grafana.com/grafana/dashboards/14205](https://grafana.com/grafana/dashboards/14205) dashboard to see Kubernetes cluster metrics.
|
||||
|
||||
|
||||
Check the output log in your terminal.
|
||||
To see the password for Grafana `admin` user use the following command:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
kubectl get secret --namespace default my-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
|
||||
```
|
||||
|
||||
</div>
|
||||
|
||||
Expose Grafana service on `127.0.0.1:3000`:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=my-grafana" -o jsonpath="{.items[0].metadata.name}")
|
||||
|
||||
kubectl --namespace default port-forward $POD_NAME 3000
|
||||
```
|
||||
|
||||
</div>
|
||||
|
||||
Now Grafana should be accessible on the [http://127.0.0.1:3000](http://127.0.0.1:3000) address.
|
||||
|
||||
|
||||
## 4. Check the obtained result in your browser
|
||||
|
||||
To check that VictoriaMetrics has collects metrics from the k8s cluster open in browser [http://127.0.0.1:3000/dashboards](http://127.0.0.1:3000/dashboards) and choose `Kubernetes Cluster Monitoring (via Prometheus)` dashboard. Use `admin` for login and `password` that you previously obtained from kubectl.
|
||||
|
||||
<p align="center">
|
||||
<img src="guide-vmsingle-grafana-dashboards.png" width="800" alt="">
|
||||
</p>
|
||||
|
||||
You will see something like this:
|
||||
<p align="center">
|
||||
<img src="guide-vmsingle-grafana-k8s-dashboard.png" width="800" alt="">
|
||||
</p>
|
||||
|
||||
VictoriaMetrics dashboard also available to use:
|
||||
<p align="center">
|
||||
<img src="guide-vmsingle-grafana.png" width="800" alt="">
|
||||
</p>
|
||||
|
||||
## 5. Final thoughts
|
||||
|
||||
* We have set up TimeSeries Database for your k8s cluster.
|
||||
* Collected metrics from all running pods,nodes, … and store them in VictoriaMetrics database.
|
||||
* Visualize resources used in Kubernetes cluster by Grafana dashboards.
|
|
@ -1,269 +0,0 @@
|
|||
# Kubernetes monitoring with VictoriaMetrics Single
|
||||
|
||||
|
||||
**This guide covers:**
|
||||
|
||||
* The setup of VictoriaMetrics single node in Kubernetes via helm charts
|
||||
* How to store metrics
|
||||
* How to scrape metrics from k8s components using service discovery
|
||||
* How to Visualize stored data
|
||||
|
||||
|
||||
**Precondition**
|
||||
|
||||
|
||||
We will use:
|
||||
* [Kubernetes cluster 1.19.10-gke.1600](https://cloud.google.com/kubernetes-engine)
|
||||
> We use GKE cluster from GCP but feel free to use any kubernetes setup eg [Amazon EKS](https://aws.amazon.com/ru/eks/)
|
||||
* [helm 3 ](https://helm.sh/docs/intro/install)
|
||||
* [kubectl 1.21](https://kubernetes.io/docs/tasks/tools/install-kubectl)
|
||||
|
||||
<p align="center">
|
||||
<img src="../assets/k8s/k8s-vm-single-scheme.png">
|
||||
</p>
|
||||
|
||||
**1. VictoriaMetrics helm repository**
|
||||
|
||||
> For this guide we will use helm 3 but if you already use helm 2 please see this [https://github.com/VictoriaMetrics/helm-charts#for-helm-v2](https://github.com/VictoriaMetrics/helm-charts#for-helm-v2)
|
||||
|
||||
You need to add the VictoriaMetrics helm repository to install VictoriaMetrics components. We’re going to use VictoriaMetrics single-node. You can do this by running the following command:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
helm repo add vm https://victoriametrics.github.io/helm-charts/
|
||||
```
|
||||
|
||||
</div>
|
||||
|
||||
Update helm repositories:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
helm repo update
|
||||
```
|
||||
|
||||
</div>
|
||||
|
||||
To verify that everything is set up correctly you may run this command:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
helm search repo vm/
|
||||
```
|
||||
|
||||
</div>
|
||||
|
||||
The expected output is:
|
||||
|
||||
```bash
|
||||
NAME CHART VERSION APP VERSION DESCRIPTION
|
||||
vm/victoria-metrics-agent 0.7.20 v1.62.0 Victoria Metrics Agent - collects metrics from ...
|
||||
vm/victoria-metrics-alert 0.3.34 v1.62.0 Victoria Metrics Alert - executes a list of giv...
|
||||
vm/victoria-metrics-auth 0.2.23 1.62.0 Victoria Metrics Auth - is a simple auth proxy ...
|
||||
vm/victoria-metrics-cluster 0.8.30 1.62.0 Victoria Metrics Cluster version - high-perform...
|
||||
vm/victoria-metrics-k8s-stack 0.2.8 1.16.0 Kubernetes monitoring on VictoriaMetrics stack....
|
||||
vm/victoria-metrics-operator 0.1.15 0.15.1 Victoria Metrics Operator
|
||||
vm/victoria-metrics-single 0.7.4 1.62.0 Victoria Metrics Single version - high-performa...
|
||||
```
|
||||
|
||||
|
||||
**2. Install VictoriaMetrics Single from helm Chart**
|
||||
|
||||
Run this command in your terminal:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```yaml
|
||||
cat <<EOF | helm install victoria-metrics vm/victoria-metrics-single -f -
|
||||
server:
|
||||
scrape:
|
||||
enabled: true
|
||||
EOF
|
||||
```
|
||||
|
||||
</div>
|
||||
|
||||
* By running `helm install victoria-metrics vm/victoria-metrics-single` we will install `VictoriaMetrics Single` to default namespace inside your cluster
|
||||
* By adding `scrape: enable: true` we add and enable autodiscovery scraping from kubernetes cluster to `VictoriaMetrics Single`
|
||||
|
||||
|
||||
As a result of the command you will see the following output:
|
||||
|
||||
```bash
|
||||
NAME: victoria-metrics
|
||||
LAST DEPLOYED: Fri Jun 25 12:06:13 2021
|
||||
NAMESPACE: default
|
||||
STATUS: deployed
|
||||
REVISION: 1
|
||||
TEST SUITE: None
|
||||
NOTES:
|
||||
The VictoriaMetrics write api can be accessed via port 8428 on the following DNS name from within your cluster:
|
||||
victoria-metrics-victoria-metrics-single-server.default.svc.cluster.local
|
||||
|
||||
|
||||
Metrics Ingestion:
|
||||
Get the Victoria Metrics service URL by running these commands in the same shell:
|
||||
export POD_NAME=$(kubectl get pods --namespace default -l "app=server" -o jsonpath="{.items[0].metadata.name}")
|
||||
kubectl --namespace default port-forward $POD_NAME 8428
|
||||
|
||||
Write url inside the kubernetes cluster:
|
||||
http://victoria-metrics-victoria-metrics-single-server.default.svc.cluster.local:8428/api/v1/write
|
||||
|
||||
Metrics Scrape:
|
||||
Pull-based scrapes are enabled
|
||||
Scrape config can be displayed by running this command:
|
||||
kubectl get cm victoria-metrics-victoria-metrics-single-server-scrapeconfig -n default
|
||||
|
||||
The target’s information is accessible via api:
|
||||
Inside cluster:
|
||||
http://victoria-metrics-victoria-metrics-single-server.default.svc.cluster.local:8428/atargets
|
||||
Outside cluster:
|
||||
You need to port-forward service (see instructions above) and call
|
||||
http://<service-host-port>/targets
|
||||
|
||||
Read Data:
|
||||
The following url can be used as the datasource url in Grafana:
|
||||
http://victoria-metrics-victoria-metrics-single-server.default.svc.cluster.local:8428
|
||||
|
||||
```
|
||||
|
||||
For us it’s important to remember the url for the datasource (copy lines from output).
|
||||
|
||||
Verify that VictoriaMetrics pod is up and running by executing the following command:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
kubectl get pods
|
||||
```
|
||||
|
||||
</div>
|
||||
|
||||
The expected output is:
|
||||
|
||||
```bash
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
victoria-metrics-victoria-metrics-single-server-0 1/1 Running 0 22s
|
||||
```
|
||||
|
||||
|
||||
**3. Install and connect Grafana to VictoriaMetrics with helm**
|
||||
|
||||
Add the Grafana helm repository.
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
helm repo add grafana https://grafana.github.io/helm-charts
|
||||
helm repo update
|
||||
```
|
||||
|
||||
</div>
|
||||
|
||||
See more info on Grafana ArtifactHUB [https://artifacthub.io/packages/helm/grafana/grafana](https://artifacthub.io/packages/helm/grafana/grafana)
|
||||
By installing the Chart with the release name `my-grafana`, you add the VictoriaMetrics datasource with official dashboard and kubernetes dashboard:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```yaml
|
||||
cat <<EOF | helm install my-grafana grafana/grafana -f -
|
||||
datasources:
|
||||
datasources.yaml:
|
||||
apiVersion: 1
|
||||
datasources:
|
||||
- name: victoriametrics
|
||||
type: prometheus
|
||||
orgId: 1
|
||||
url: http://victoria-metrics-victoria-metrics-single-server.default.svc.cluster.local:8428
|
||||
access: proxy
|
||||
isDefault: true
|
||||
updateIntervalSeconds: 10
|
||||
editable: true
|
||||
|
||||
dashboardProviders:
|
||||
dashboardproviders.yaml:
|
||||
apiVersion: 1
|
||||
providers:
|
||||
- name: 'default'
|
||||
orgId: 1
|
||||
folder: ''
|
||||
type: file
|
||||
disableDeletion: true
|
||||
editable: true
|
||||
options:
|
||||
path: /var/lib/grafana/dashboards/default
|
||||
|
||||
dashboards:
|
||||
default:
|
||||
victoriametrics:
|
||||
gnetId: 10229
|
||||
revision: 19
|
||||
datasource: victoriametrics
|
||||
kubernetes:
|
||||
gnetId: 14205
|
||||
revision: 1
|
||||
datasource: victoriametrics
|
||||
EOF
|
||||
```
|
||||
|
||||
</div>
|
||||
|
||||
By running this command we:
|
||||
* Install Grafana from helm repository.
|
||||
* Provision VictoriaMetrics datasource with the url from the output above which we copied before.
|
||||
* Add this [https://grafana.com/grafana/dashboards/10229](https://grafana.com/grafana/dashboards/10229) dashboard for VictoriaMetrics.
|
||||
* Add this [https://grafana.com/grafana/dashboards/14205](https://grafana.com/grafana/dashboards/14205) dashboard to see Kubernetes cluster metrics.
|
||||
|
||||
|
||||
Check the output log in your terminal.
|
||||
To see the password for Grafana `admin` user use the following command:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
kubectl get secret --namespace default my-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
|
||||
```
|
||||
|
||||
</div>
|
||||
|
||||
Expose Grafana service on `127.0.0.1:3000`:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=my-grafana" -o jsonpath="{.items[0].metadata.name}")
|
||||
|
||||
kubectl --namespace default port-forward $POD_NAME 3000
|
||||
```
|
||||
|
||||
</div>
|
||||
|
||||
Now Grafana should be accessible on the [http://127.0.0.1:3000](http://127.0.0.1:3000) address.
|
||||
|
||||
|
||||
**4. Check the obtained result in your browser**
|
||||
|
||||
To check that VictoriaMetrics has collected metrics from the k8s cluster open in browser [http://127.0.0.1:3000/dashboards](http://127.0.0.1:3000/dashboards) and choose `Kubernetes Cluster Monitoring (via Prometheus)` dashboard. Use `admin` for login and `password` that you previously obtained from kubectl.
|
||||
|
||||
<p align="center">
|
||||
<img src="../assets/k8s/grafana-dashboards.png" width="800" alt="">
|
||||
</p>
|
||||
|
||||
You will see something like this:
|
||||
<p align="center">
|
||||
<img src="../assets/k8s/vmsingle-grafana-k8s-dashboard.png" width="800" alt="">
|
||||
</p>
|
||||
|
||||
VictoriaMetrics dashboard also available to use:
|
||||
<p align="center">
|
||||
<img src="../assets/k8s/vmsingle-grafana.png" width="800" alt="">
|
||||
</p>
|
||||
|
||||
**5. Final thoughts**
|
||||
|
||||
* We have set up TimeSeries Database for your k8s cluster.
|
||||
* We collected metrics from all running pods, nodes, … and stored them in VictoriaMetrics database.
|
||||
* We can visualize the resources used in your Kubernetes cluster by using Grafana dashboards.
|
|
@ -139,7 +139,11 @@ Also, Basic Auth can be enabled for the incoming `remote_write` requests with `-
|
|||
|
||||
### remote_write for clustered version
|
||||
|
||||
While `vmagent` can accept data in several supported protocols (OpenTSDB, Influx, Prometheus, Graphite) and scrape data from various targets, writes are always peformed in Promethes remote_write protocol. Therefore for the [clustered version](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html), `-remoteWrite.url` the command-line flag should be configured as `<schema>://<vminsert-host>:8480/insert/<customer-id>/prometheus/api/v1/write`
|
||||
While `vmagent` can accept data in several supported protocols (OpenTSDB, Influx, Prometheus, Graphite) and scrape data from various targets, writes are always peformed in Promethes remote_write protocol. Therefore for the [clustered version](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html), `-remoteWrite.url` the command-line flag should be configured as `<schema>://<vminsert-host>:8480/insert/<accountID>/prometheus/api/v1/write` according to [these docs](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format). There is also support for multitenant writes. See [these docs](#multitenancy).
|
||||
|
||||
## Multitenancy
|
||||
|
||||
By default `vmagent` collects the data without tenant identifiers and routes it to the configured `-remoteWrite.url`. But it can accept multitenant data if `-remoteWrite.multitenantURL` is set. In this case it accepts multitenant data at `http://vmagent:8429/insert/<accountID>/...` in the same way as cluster version of VictoriaMetrics does according to [these docs](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format) and routes it to `<-remoteWrite.multitenantURL>/insert/<accountID>/prometheus/api/v1/write`. If multiple `-remoteWrite.multitenantURL` command-line options are set, then `vmagent` replicates the collected data across all the configured urls. This allows using a single `vmagent` instance in front of VictoriaMetrics clusters for processing the data from all the tenants.
|
||||
|
||||
|
||||
## How to collect metrics in Prometheus format
|
||||
|
@ -532,7 +536,7 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
|
|||
-enableTCP6
|
||||
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
|
||||
-envflag.enable
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set. See https://docs.victoriametrics.com/#environment-variables for more details
|
||||
-envflag.prefix string
|
||||
Prefix for environment variables if -envflag.enable is set
|
||||
-fs.disableMmap
|
||||
|
@ -638,7 +642,7 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
|
|||
-promscrape.consulSDCheckInterval duration
|
||||
Interval for checking for changes in Consul. This works only if consul_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config for details (default 30s)
|
||||
-promscrape.digitaloceanSDCheckInterval duration
|
||||
Interval for checking for changes in digital ocean. This works only if digitalocean_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#digitalocean_sd_config for details (default 1m0s)
|
||||
Interval for checking for changes in digital ocean. This works only if digitalocean_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#digitalocean_sd_config for details (default 1m0s)
|
||||
-promscrape.disableCompression
|
||||
Whether to disable sending 'Accept-Encoding: gzip' request headers to all the scrape targets. This may reduce CPU usage on scrape targets at the cost of higher network bandwidth utilization. It is possible to set 'disable_compression: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
|
||||
-promscrape.disableKeepAlive
|
||||
|
@ -649,6 +653,8 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
|
|||
The maximum duration for waiting to perform API requests if more than -promscrape.discovery.concurrency requests are simultaneously performed (default 1m0s)
|
||||
-promscrape.dnsSDCheckInterval duration
|
||||
Interval for checking for changes in dns. This works only if dns_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config for details (default 30s)
|
||||
-promscrape.dockerSDCheckInterval duration
|
||||
Interval for checking for changes in docker. This works only if docker_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#docker_sd_config for details (default 30s)
|
||||
-promscrape.dockerswarmSDCheckInterval duration
|
||||
Interval for checking for changes in dockerswarm. This works only if dockerswarm_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dockerswarm_sd_config for details (default 30s)
|
||||
-promscrape.dropOriginalLabels
|
||||
|
@ -662,7 +668,7 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
|
|||
-promscrape.gceSDCheckInterval duration
|
||||
Interval for checking for changes in gce. This works only if gce_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#gce_sd_config for details (default 1m0s)
|
||||
-promscrape.httpSDCheckInterval duration
|
||||
Interval for checking for changes in http service discovery. This works only if http_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#http_sd_config for details (default 1m0s)
|
||||
Interval for checking for changes in http endpoint service discovery. This works only if http_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#http_sd_config for details (default 1m0s)
|
||||
-promscrape.kubernetes.apiServerTimeout duration
|
||||
How frequently to reload the full state from Kuberntes API server (default 30m0s)
|
||||
-promscrape.kubernetesSDCheckInterval duration
|
||||
|
@ -710,6 +716,9 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
|
|||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-remoteWrite.maxHourlySeries int
|
||||
The maximum number of unique series vmagent can send to remote storage systems during the last hour. Excess series are logged and dropped. This can be useful for limiting series cardinality. See also -remoteWrite.maxDailySeries
|
||||
-remoteWrite.multitenantURL array
|
||||
Base path for multitenant remote storage URL to write data to. See https://docs.victoriametrics.com/vmagent.html#multitenancy for details. Example url: http://<vminsert>:8480 . Pass multiple -remoteWrite.multitenantURL flags in order to replicate data to multiple remote storage systems. See also -remoteWrite.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-remoteWrite.oauth2.clientID array
|
||||
Optional OAuth2 clientID to use for -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
|
@ -729,7 +738,7 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
|
|||
Optional proxy URL for writing data to -remoteWrite.url. Supported proxies: http, https, socks5. Example: -remoteWrite.proxyURL=socks5://proxy:1234
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-remoteWrite.queues int
|
||||
The number of concurrent queues to each -remoteWrite.url. Set more queues if default number of queues isn't enough for sending high volume of collected data to remote storage (default 2 * numberOfAvailableCPUs)
|
||||
The number of concurrent queues to each -remoteWrite.url. Set more queues if default number of queues isn't enough for sending high volume of collected data to remote storage. Default value is 2 * numberOfAvailableCPUs (default 8)
|
||||
-remoteWrite.rateLimit array
|
||||
Optional rate limit in bytes per second for data sent to -remoteWrite.url. By default the rate limit is disabled. It can be useful for limiting load on remote storage when big amounts of buffered data is sent after temporary unavailability of the remote storage
|
||||
Supports array of values separated by comma or specified via multiple flags.
|
||||
|
@ -766,7 +775,7 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
|
|||
-remoteWrite.tmpDataPath string
|
||||
Path to directory where temporary data for remote write component is stored. See also -remoteWrite.maxDiskUsagePerURL (default "vmagent-remotewrite-data")
|
||||
-remoteWrite.url array
|
||||
Remote storage URL to write data to. It must support Prometheus remote_write API. It is recommended using VictoriaMetrics as remote storage. Example url: http://<victoriametrics-host>:8428/api/v1/write . Pass multiple -remoteWrite.url flags in order to write data concurrently to multiple remote storage systems
|
||||
Remote storage URL to write data to. It must support Prometheus remote_write API. It is recommended using VictoriaMetrics as remote storage. Example url: http://<victoriametrics-host>:8428/api/v1/write . Pass multiple -remoteWrite.url flags in order to replicate data to multiple remote storage systems. See also -remoteWrite.multitenantURL
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-remoteWrite.urlRelabelConfig array
|
||||
Optional path to relabel config for the corresponding -remoteWrite.url
|
||||
|
|
206
docs/vmalert.md
|
@ -352,193 +352,193 @@ command-line flags with their descriptions.
|
|||
The shortlist of configuration flags is the following:
|
||||
```
|
||||
-datasource.appendTypePrefix
|
||||
Whether to add type prefix to -datasource.url based on the query type. Set to true if sending different query types to the vmselect URL.
|
||||
Whether to add type prefix to -datasource.url based on the query type. Set to true if sending different query types to the vmselect URL.
|
||||
-datasource.basicAuth.password string
|
||||
Optional basic auth password for -datasource.url
|
||||
Optional basic auth password for -datasource.url
|
||||
-datasource.basicAuth.username string
|
||||
Optional basic auth username for -datasource.url
|
||||
Optional basic auth username for -datasource.url
|
||||
-datasource.lookback duration
|
||||
Lookback defines how far into the past to look when evaluating queries. For example, if the datasource.lookback=5m then param "time" with value now()-5m will be added to every query.
|
||||
Lookback defines how far into the past to look when evaluating queries. For example, if the datasource.lookback=5m then param "time" with value now()-5m will be added to every query.
|
||||
-datasource.maxIdleConnections int
|
||||
Defines the number of idle (keep-alive connections) to each configured datasource. Consider setting this value equal to the value: groups_total * group.concurrency. Too low a value may result in a high number of sockets in TIME_WAIT state. (default 100)
|
||||
Defines the number of idle (keep-alive connections) to each configured datasource. Consider setting this value equal to the value: groups_total * group.concurrency. Too low a value may result in a high number of sockets in TIME_WAIT state. (default 100)
|
||||
-datasource.queryStep duration
|
||||
queryStep defines how far a value can fallback to when evaluating queries. For example, if datasource.queryStep=15s then param "step" with value "15s" will be added to every query.If queryStep isn't specified, rule's evaluationInterval will be used instead.
|
||||
queryStep defines how far a value can fallback to when evaluating queries. For example, if datasource.queryStep=15s then param "step" with value "15s" will be added to every query.If queryStep isn't specified, rule's evaluationInterval will be used instead.
|
||||
-datasource.roundDigits int
|
||||
Adds "round_digits" GET param to datasource requests. In VM "round_digits" limits the number of digits after the decimal point in response values.
|
||||
Adds "round_digits" GET param to datasource requests. In VM "round_digits" limits the number of digits after the decimal point in response values.
|
||||
-datasource.tlsCAFile string
|
||||
Optional path to TLS CA file to use for verifying connections to -datasource.url. By default, system CA is used
|
||||
Optional path to TLS CA file to use for verifying connections to -datasource.url. By default, system CA is used
|
||||
-datasource.tlsCertFile string
|
||||
Optional path to client-side TLS certificate file to use when connecting to -datasource.url
|
||||
Optional path to client-side TLS certificate file to use when connecting to -datasource.url
|
||||
-datasource.tlsInsecureSkipVerify
|
||||
Whether to skip tls verification when connecting to -datasource.url
|
||||
Whether to skip tls verification when connecting to -datasource.url
|
||||
-datasource.tlsKeyFile string
|
||||
Optional path to client-side TLS certificate key to use when connecting to -datasource.url
|
||||
Optional path to client-side TLS certificate key to use when connecting to -datasource.url
|
||||
-datasource.tlsServerName string
|
||||
Optional TLS server name to use for connections to -datasource.url. By default, the server name from -datasource.url is used
|
||||
Optional TLS server name to use for connections to -datasource.url. By default, the server name from -datasource.url is used
|
||||
-datasource.url string
|
||||
VictoriaMetrics or vmselect url. Required parameter. E.g. http://127.0.0.1:8428
|
||||
VictoriaMetrics or vmselect url. Required parameter. E.g. http://127.0.0.1:8428
|
||||
-dryRun -rule
|
||||
Whether to check only config files without running vmalert. The rules file are validated. The -rule flag must be specified.
|
||||
Whether to check only config files without running vmalert. The rules file are validated. The -rule flag must be specified.
|
||||
-enableTCP6
|
||||
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
|
||||
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
|
||||
-envflag.enable
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set. See https://docs.victoriametrics.com/#environment-variables for more details
|
||||
-envflag.prefix string
|
||||
Prefix for environment variables if -envflag.enable is set
|
||||
Prefix for environment variables if -envflag.enable is set
|
||||
-evaluationInterval duration
|
||||
How often to evaluate the rules (default 1m0s)
|
||||
How often to evaluate the rules (default 1m0s)
|
||||
-external.alert.source string
|
||||
External Alert Source allows to override the Source link for alerts sent to AlertManager for cases where you want to build a custom link to Grafana, Prometheus or any other service.
|
||||
eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{{$expr|quotesEscape|crlfEscape|queryEscape}}\"},{\"mode\":\"Metrics\"},{\"ui\":[true,true,true,\"none\"]}]'.If empty '/api/v1/:groupID/alertID/status' is used
|
||||
External Alert Source allows to override the Source link for alerts sent to AlertManager for cases where you want to build a custom link to Grafana, Prometheus or any other service.
|
||||
eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{{$expr|quotesEscape|crlfEscape|queryEscape}}\"},{\"mode\":\"Metrics\"},{\"ui\":[true,true,true,\"none\"]}]'.If empty '/api/v1/:groupID/alertID/status' is used
|
||||
-external.label array
|
||||
Optional label in the form 'name=value' to add to all generated recording rules and alerts. Pass multiple -label flags in order to add multiple label sets.
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
Optional label in the form 'name=value' to add to all generated recording rules and alerts. Pass multiple -label flags in order to add multiple label sets.
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-external.url string
|
||||
External URL is used as alert's source for sent alerts to the notifier
|
||||
External URL is used as alert's source for sent alerts to the notifier
|
||||
-fs.disableMmap
|
||||
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
|
||||
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
|
||||
-http.connTimeout duration
|
||||
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
|
||||
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
|
||||
-http.disableResponseCompression
|
||||
Disable compression of HTTP responses to save CPU resources. By default compression is enabled to save network bandwidth
|
||||
Disable compression of HTTP responses to save CPU resources. By default compression is enabled to save network bandwidth
|
||||
-http.idleConnTimeout duration
|
||||
Timeout for incoming idle http connections (default 1m0s)
|
||||
Timeout for incoming idle http connections (default 1m0s)
|
||||
-http.maxGracefulShutdownDuration duration
|
||||
The maximum duration for a graceful shutdown of the HTTP server. A highly loaded server may require increased value for a graceful shutdown (default 7s)
|
||||
The maximum duration for a graceful shutdown of the HTTP server. A highly loaded server may require increased value for a graceful shutdown (default 7s)
|
||||
-http.pathPrefix string
|
||||
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
|
||||
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
|
||||
-http.shutdownDelay duration
|
||||
Optional delay before http server shutdown. During this delay, the server returns non-OK responses from /health page, so load balancers can route new requests to other servers
|
||||
Optional delay before http server shutdown. During this delay, the server returns non-OK responses from /health page, so load balancers can route new requests to other servers
|
||||
-httpAuth.password string
|
||||
Password for HTTP Basic Auth. The authentication is disabled if -httpAuth.username is empty
|
||||
Password for HTTP Basic Auth. The authentication is disabled if -httpAuth.username is empty
|
||||
-httpAuth.username string
|
||||
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
|
||||
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
|
||||
-httpListenAddr string
|
||||
Address to listen for http connections (default ":8880")
|
||||
Address to listen for http connections (default ":8880")
|
||||
-loggerDisableTimestamps
|
||||
Whether to disable writing timestamps in logs
|
||||
Whether to disable writing timestamps in logs
|
||||
-loggerErrorsPerSecondLimit int
|
||||
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
|
||||
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
|
||||
-loggerFormat string
|
||||
Format for logs. Possible values: default, json (default "default")
|
||||
Format for logs. Possible values: default, json (default "default")
|
||||
-loggerLevel string
|
||||
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
|
||||
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
|
||||
-loggerOutput string
|
||||
Output for the logs. Supported values: stderr, stdout (default "stderr")
|
||||
Output for the logs. Supported values: stderr, stdout (default "stderr")
|
||||
-loggerTimezone string
|
||||
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
|
||||
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
|
||||
-loggerWarnsPerSecondLimit int
|
||||
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
|
||||
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
|
||||
-memory.allowedBytes size
|
||||
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to a non-zero value. Too low a value may increase the cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache resulting in higher disk IO usage
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to a non-zero value. Too low a value may increase the cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache resulting in higher disk IO usage
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-memory.allowedPercent float
|
||||
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low a value may increase cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache which will result in higher disk IO usage (default 60)
|
||||
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low a value may increase cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache which will result in higher disk IO usage (default 60)
|
||||
-metricsAuthKey string
|
||||
Auth key for /metrics. It overrides httpAuth settings
|
||||
Auth key for /metrics. It overrides httpAuth settings
|
||||
-notifier.basicAuth.password array
|
||||
Optional basic auth password for -notifier.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
Optional basic auth password for -notifier.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-notifier.basicAuth.username array
|
||||
Optional basic auth username for -notifier.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
Optional basic auth username for -notifier.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-notifier.tlsCAFile array
|
||||
Optional path to TLS CA file to use for verifying connections to -notifier.url. By default system CA is used
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
Optional path to TLS CA file to use for verifying connections to -notifier.url. By default system CA is used
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-notifier.tlsCertFile array
|
||||
Optional path to client-side TLS certificate file to use when connecting to -notifier.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
Optional path to client-side TLS certificate file to use when connecting to -notifier.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-notifier.tlsInsecureSkipVerify array
|
||||
Whether to skip tls verification when connecting to -notifier.url
|
||||
Supports array of values separated by comma or specified via multiple flags.
|
||||
Whether to skip tls verification when connecting to -notifier.url
|
||||
Supports array of values separated by comma or specified via multiple flags.
|
||||
-notifier.tlsKeyFile array
|
||||
Optional path to client-side TLS certificate key to use when connecting to -notifier.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
Optional path to client-side TLS certificate key to use when connecting to -notifier.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-notifier.tlsServerName array
|
||||
Optional TLS server name to use for connections to -notifier.url. By default the server name from -notifier.url is used
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
Optional TLS server name to use for connections to -notifier.url. By default the server name from -notifier.url is used
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-notifier.url array
|
||||
Prometheus alertmanager URL. Required parameter. e.g. http://127.0.0.1:9093
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
Prometheus alertmanager URL. Required parameter. e.g. http://127.0.0.1:9093
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-pprofAuthKey string
|
||||
Auth key for /debug/pprof. It overrides httpAuth settings
|
||||
Auth key for /debug/pprof. It overrides httpAuth settings
|
||||
-remoteRead.basicAuth.password string
|
||||
Optional basic auth password for -remoteRead.url
|
||||
Optional basic auth password for -remoteRead.url
|
||||
-remoteRead.basicAuth.username string
|
||||
Optional basic auth username for -remoteRead.url
|
||||
Optional basic auth username for -remoteRead.url
|
||||
-remoteRead.ignoreRestoreErrors
|
||||
Whether to ignore errors from remote storage when restoring alerts state on startup. (default true)
|
||||
Whether to ignore errors from remote storage when restoring alerts state on startup. (default true)
|
||||
-remoteRead.lookback duration
|
||||
Lookback defines how far to look into past for alerts timeseries. For example, if lookback=1h then range from now() to now()-1h will be scanned. (default 1h0m0s)
|
||||
Lookback defines how far to look into past for alerts timeseries. For example, if lookback=1h then range from now() to now()-1h will be scanned. (default 1h0m0s)
|
||||
-remoteRead.tlsCAFile string
|
||||
Optional path to TLS CA file to use for verifying connections to -remoteRead.url. By default system CA is used
|
||||
Optional path to TLS CA file to use for verifying connections to -remoteRead.url. By default system CA is used
|
||||
-remoteRead.tlsCertFile string
|
||||
Optional path to client-side TLS certificate file to use when connecting to -remoteRead.url
|
||||
Optional path to client-side TLS certificate file to use when connecting to -remoteRead.url
|
||||
-remoteRead.tlsInsecureSkipVerify
|
||||
Whether to skip tls verification when connecting to -remoteRead.url
|
||||
Whether to skip tls verification when connecting to -remoteRead.url
|
||||
-remoteRead.tlsKeyFile string
|
||||
Optional path to client-side TLS certificate key to use when connecting to -remoteRead.url
|
||||
Optional path to client-side TLS certificate key to use when connecting to -remoteRead.url
|
||||
-remoteRead.tlsServerName string
|
||||
Optional TLS server name to use for connections to -remoteRead.url. By default the server name from -remoteRead.url is used
|
||||
Optional TLS server name to use for connections to -remoteRead.url. By default the server name from -remoteRead.url is used
|
||||
-remoteRead.url vmalert
|
||||
Optional URL to VictoriaMetrics or vmselect that will be used to restore alerts state. This configuration makes sense only if vmalert was configured with `remoteWrite.url` before and has been successfully persisted its state. E.g. http://127.0.0.1:8428
|
||||
Optional URL to VictoriaMetrics or vmselect that will be used to restore alerts state. This configuration makes sense only if vmalert was configured with `remoteWrite.url` before and has been successfully persisted its state. E.g. http://127.0.0.1:8428
|
||||
-remoteWrite.basicAuth.password string
|
||||
Optional basic auth password for -remoteWrite.url
|
||||
Optional basic auth password for -remoteWrite.url
|
||||
-remoteWrite.basicAuth.username string
|
||||
Optional basic auth username for -remoteWrite.url
|
||||
Optional basic auth username for -remoteWrite.url
|
||||
-remoteWrite.concurrency int
|
||||
Defines number of writers for concurrent writing into remote querier (default 1)
|
||||
Defines number of writers for concurrent writing into remote querier (default 1)
|
||||
-remoteWrite.flushInterval duration
|
||||
Defines interval of flushes to remote write endpoint (default 5s)
|
||||
Defines interval of flushes to remote write endpoint (default 5s)
|
||||
-remoteWrite.maxBatchSize int
|
||||
Defines defines max number of timeseries to be flushed at once (default 1000)
|
||||
Defines defines max number of timeseries to be flushed at once (default 1000)
|
||||
-remoteWrite.maxQueueSize int
|
||||
Defines the max number of pending datapoints to remote write endpoint (default 100000)
|
||||
Defines the max number of pending datapoints to remote write endpoint (default 100000)
|
||||
-remoteWrite.tlsCAFile string
|
||||
Optional path to TLS CA file to use for verifying connections to -remoteWrite.url. By default system CA is used
|
||||
Optional path to TLS CA file to use for verifying connections to -remoteWrite.url. By default system CA is used
|
||||
-remoteWrite.tlsCertFile string
|
||||
Optional path to client-side TLS certificate file to use when connecting to -remoteWrite.url
|
||||
Optional path to client-side TLS certificate file to use when connecting to -remoteWrite.url
|
||||
-remoteWrite.tlsInsecureSkipVerify
|
||||
Whether to skip tls verification when connecting to -remoteWrite.url
|
||||
Whether to skip tls verification when connecting to -remoteWrite.url
|
||||
-remoteWrite.tlsKeyFile string
|
||||
Optional path to client-side TLS certificate key to use when connecting to -remoteWrite.url
|
||||
Optional path to client-side TLS certificate key to use when connecting to -remoteWrite.url
|
||||
-remoteWrite.tlsServerName string
|
||||
Optional TLS server name to use for connections to -remoteWrite.url. By default the server name from -remoteWrite.url is used
|
||||
Optional TLS server name to use for connections to -remoteWrite.url. By default the server name from -remoteWrite.url is used
|
||||
-remoteWrite.url string
|
||||
Optional URL to VictoriaMetrics or vminsert where to persist alerts state and recording rules results in form of timeseries. E.g. http://127.0.0.1:8428
|
||||
Optional URL to VictoriaMetrics or vminsert where to persist alerts state and recording rules results in form of timeseries. E.g. http://127.0.0.1:8428
|
||||
-replay.maxDatapointsPerQuery int
|
||||
Max number of data points expected in one request. The higher the value, the less requests will be made during replay. (default 1000)
|
||||
Max number of data points expected in one request. The higher the value, the less requests will be made during replay. (default 1000)
|
||||
-replay.ruleRetryAttempts int
|
||||
Defines how many retries to make before giving up on rule if request for it returns an error. (default 5)
|
||||
Defines how many retries to make before giving up on rule if request for it returns an error. (default 5)
|
||||
-replay.rulesDelay duration
|
||||
Delay between rules evaluation within the group. Could be important if there are chained rules inside of the groupand processing need to wait for previous rule results to be persisted by remote storage before evaluating the next rule. Keep it equal or bigger than -remoteWrite.flushInterval. (default 1s)
|
||||
Delay between rules evaluation within the group. Could be important if there are chained rules inside of the groupand processing need to wait for previous rule results to be persisted by remote storage before evaluating the next rule.Keep it equal or bigger than -remoteWrite.flushInterval. (default 1s)
|
||||
-replay.timeFrom string
|
||||
The time filter in RFC3339 format to select time series with timestamp equal or higher than provided value. E.g. '2020-01-01T20:07:00Z'
|
||||
The time filter in RFC3339 format to select time series with timestamp equal or higher than provided value. E.g. '2020-01-01T20:07:00Z'
|
||||
-replay.timeTo string
|
||||
The time filter in RFC3339 format to select timeseries with timestamp equal or lower than provided value. E.g. '2020-01-01T20:07:00Z'
|
||||
The time filter in RFC3339 format to select timeseries with timestamp equal or lower than provided value. E.g. '2020-01-01T20:07:00Z'
|
||||
-rule array
|
||||
Path to the file with alert rules.
|
||||
Supports patterns. Flag can be specified multiple times.
|
||||
Examples:
|
||||
-rule="/path/to/file". Path to a single file with alerting rules
|
||||
-rule="dir/*.yaml" -rule="/*.yaml". Relative path to all .yaml files in "dir" folder,
|
||||
absolute path to all .yaml files in root.
|
||||
Rule files may contain %{ENV_VAR} placeholders, which are substituted by the corresponding env vars.
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
Path to the file with alert rules.
|
||||
Supports patterns. Flag can be specified multiple times.
|
||||
Examples:
|
||||
-rule="/path/to/file". Path to a single file with alerting rules
|
||||
-rule="dir/*.yaml" -rule="/*.yaml". Relative path to all .yaml files in "dir" folder,
|
||||
absolute path to all .yaml files in root.
|
||||
Rule files may contain %{ENV_VAR} placeholders, which are substituted by the corresponding env vars.
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-rule.configCheckInterval duration
|
||||
Interval for checking for changes in '-rule' files. By default the checking is disabled. Send SIGHUP signal in order to force config check for changes
|
||||
Interval for checking for changes in '-rule' files. By default the checking is disabled. Send SIGHUP signal in order to force config check for changes
|
||||
-rule.validateExpressions
|
||||
Whether to validate rules expressions via MetricsQL engine (default true)
|
||||
Whether to validate rules expressions via MetricsQL engine (default true)
|
||||
-rule.validateTemplates
|
||||
Whether to validate annotation and label templates (default true)
|
||||
Whether to validate annotation and label templates (default true)
|
||||
-tls
|
||||
Whether to enable TLS (aka HTTPS) for incoming requests. -tlsCertFile and -tlsKeyFile must be set if -tls is set
|
||||
Whether to enable TLS (aka HTTPS) for incoming requests. -tlsCertFile and -tlsKeyFile must be set if -tls is set
|
||||
-tlsCertFile string
|
||||
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs as RSA certs are slower
|
||||
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs as RSA certs are slower
|
||||
-tlsKeyFile string
|
||||
Path to file with TLS key. Used only if -tls is set
|
||||
Path to file with TLS key. Used only if -tls is set
|
||||
-version
|
||||
Show VictoriaMetrics version
|
||||
Show VictoriaMetrics version
|
||||
```
|
||||
|
||||
`vmalert` supports "hot" config reload via the following methods:
|
||||
|
|