mirror of
https://github.com/VictoriaMetrics/VictoriaMetrics.git
synced 2024-12-01 14:47:38 +00:00
Merge branch 'public-single-node' into pmm-6401-read-prometheus-data-files
This commit is contained in:
commit
1b112405a8
65 changed files with 1467 additions and 408 deletions
20
CHANGELOG.md
20
CHANGELOG.md
|
@ -1,15 +1,35 @@
|
|||
# tip
|
||||
|
||||
|
||||
* FEATURE: allow setting `-retentionPeriod` smaller than one month. I.e. `-retentionPeriod=3d`, `-retentionPeriod=2w`, etc. is supported now.
|
||||
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/173
|
||||
* FEATURE: optimize more cases according to https://utcc.utoronto.ca/~cks/space/blog/sysadmin/PrometheusLabelNonOptimization . Now the following cases are optimized too:
|
||||
* `rollup_func(foo{filters}[d]) op bar` -> `rollup_func(foo{filters}[d]) op bar{filters}`
|
||||
* `transform_func(foo{filters}) op bar` -> `transform_func(foo{filters}) op bar{filters}`
|
||||
* `num_or_scalar op foo{filters} op bar` -> `num_or_scalar op foo{filters} op bar{filters}`
|
||||
* FEATURE: improve time series search for queries with multiple label filters. I.e. `foo{label1="value", label2=~"regexp"}`.
|
||||
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/781
|
||||
* FEATURE: vmagent: add `stream parse` mode. This mode allows reducing memory usage when individual scrape targets expose tens of millions of metrics.
|
||||
For example, during scraping Prometheus in [federation](https://prometheus.io/docs/prometheus/latest/federation/) mode.
|
||||
See `-promscrape.streamParse` command-line option and `stream_parse: true` config option for `scrape_config` section in `-promscrape.config`.
|
||||
See also [troubleshooting docs for vmagent](https://victoriametrics.github.io/vmagent.html#troubleshooting).
|
||||
* FEATURE: vmalert: add `-dryRun` command-line option for validating the provided config files without the need to start `vmalert` service.
|
||||
* FEATURE: accept optional third argument of string type at `topk_*` and `bottomk_*` functions. This is label name for additional time series to return with the sum of time series outside top/bottom K. See [MetricsQL docs](https://victoriametrics.github.io/MetricsQL.html) for more details.
|
||||
* FEATURE: vmagent: expose `/api/v1/targets` page according to [the corresponding Prometheus API](https://prometheus.io/docs/prometheus/latest/querying/api/#targets).
|
||||
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/643
|
||||
|
||||
* BUGFIX: vmagent: properly handle OpenStack endpoint ending with `v3.0` such as `https://ostack.example.com:5000/v3.0`
|
||||
in the same way as Prometheus does. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/728#issuecomment-709914803
|
||||
* BUGFIX: drop trailing data points for time series with a single raw sample. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/748
|
||||
* BUGFIX: do not drop trailing data points for instant queries to `/api/v1/query`. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/845
|
||||
* BUGFIX: vmbackup: fix panic when `-origin` isn't specified. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/856
|
||||
* BUGFIX: vmalert: skip automatically added labels on alerts restore. Label `alertgroup` was introduced in [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/611)
|
||||
and automatically added to generated time series. By mistake, this new label wasn't correctly purged on restore event and affected alert's ID uniqueness.
|
||||
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/870
|
||||
* BUGFIX: vmagent: fix panic at scrape error body formating. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/864
|
||||
* BUGFIX: vmagent: add leading missing slash to metrics path like Prometheus does. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/835
|
||||
* BUGFIX: vmagent: drop packet if remote storage returns 4xx status code. This make the behaviour consistent with Prometheus.
|
||||
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/873
|
||||
|
||||
|
||||
# [v1.44.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.44.0)
|
||||
|
|
|
@ -164,7 +164,7 @@ or [docker image](https://hub.docker.com/r/victoriametrics/victoria-metrics/) wi
|
|||
The following command-line flags are used the most:
|
||||
|
||||
* `-storageDataPath` - path to data directory. VictoriaMetrics stores all the data in this directory. Default path is `victoria-metrics-data` in the current working directory.
|
||||
* `-retentionPeriod` - retention period in months for stored data. Older data is automatically deleted. Default period is 1 month.
|
||||
* `-retentionPeriod` - retention for stored data. Older data is automatically deleted. Default retention is 1 month. See [these docs](#retention) for more details.
|
||||
|
||||
Other flags have good enough default values, so set them only if you really need this. Pass `-help` to see all the available flags with description and default values.
|
||||
|
||||
|
@ -495,6 +495,7 @@ VictoriaMetrics supports the following handlers from [Prometheus querying API](h
|
|||
* [/api/v1/labels](https://prometheus.io/docs/prometheus/latest/querying/api/#getting-label-names)
|
||||
* [/api/v1/label/.../values](https://prometheus.io/docs/prometheus/latest/querying/api/#querying-label-values)
|
||||
* [/api/v1/status/tsdb](https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats)
|
||||
* [/api/v1/targets](https://prometheus.io/docs/prometheus/latest/querying/api/#targets) - see [these docs](#how-to-scrape-prometheus-exporters-such-as-node-exporter) for more details.
|
||||
|
||||
These handlers can be queried from Prometheus-compatible clients such as Grafana or curl.
|
||||
|
||||
|
@ -1048,6 +1049,7 @@ The de-duplication reduces disk space usage if multiple identically configured P
|
|||
write data to the same VictoriaMetrics instance. Note that these Prometheus instances must have identical
|
||||
`external_labels` section in their configs, so they write data to the same time series.
|
||||
|
||||
|
||||
### Retention
|
||||
|
||||
Retention is configured with `-retentionPeriod` command-line flag. For instance, `-retentionPeriod=3` means
|
||||
|
@ -1059,6 +1061,10 @@ For example if `-retentionPeriod` is set to 1, data for January is deleted on Ma
|
|||
It is safe to extend `-retentionPeriod` on existing data. If `-retentionPeriod` is set to lower
|
||||
value than before then data outside the configured period will be eventually deleted.
|
||||
|
||||
VictoriaMetrics supports retention smaller than 1 month. For example, `-retentionPeriod=5d` would set data retention for 5 days.
|
||||
Older data is eventually deleted during [background merge](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282).
|
||||
|
||||
|
||||
### Multiple retentions
|
||||
|
||||
Just start multiple VictoriaMetrics instances with distinct values for the following flags:
|
||||
|
|
|
@ -211,9 +211,13 @@ either via `vmagent` itself or via Prometheus, so the exported metrics could be
|
|||
Use official [Grafana dashboard](https://grafana.com/grafana/dashboards/12683) for `vmagent` state overview.
|
||||
If you have suggestions, improvements or found a bug - feel free to open an issue on github or add review to the dashboard.
|
||||
|
||||
`vmagent` also exports target statuses at `http://vmagent-host:8429/targets` page in plaintext format.
|
||||
`/targets` handler accepts optional `show_original_labels=1` query arg, which shows the original labels per each target
|
||||
before applying relabeling. This information may be useful for debugging target relabeling.
|
||||
`vmagent` also exports target statuses at the following handlers:
|
||||
|
||||
* `http://vmagent-host:8429/targets`. This handler returns human-readable plaintext status for every active target.
|
||||
This page is convenient to query from command line with `wget`, `curl` or similar tools.
|
||||
It accepts optional `show_original_labels=1` query arg, which shows the original labels per each target before applying relabeling.
|
||||
This information may be useful for debugging target relabeling.
|
||||
* `http://vmagent-host:8429/api/v1/targets`. This handler returns data compatible with [the corresponding page from Prometheus API](https://prometheus.io/docs/prometheus/latest/querying/api/#targets).
|
||||
|
||||
|
||||
### Troubleshooting
|
||||
|
@ -224,7 +228,26 @@ before applying relabeling. This information may be useful for debugging target
|
|||
since `vmagent` establishes at least a single TCP connection per each target.
|
||||
|
||||
* When `vmagent` scrapes many unreliable targets, it can flood error log with scrape errors. These errors can be suppressed
|
||||
by passing `-promscrape.suppressScrapeErrors` command-line flag to `vmagent`. The most recent scrape error per each target can be observed at `http://vmagent-host:8429/targets`.
|
||||
by passing `-promscrape.suppressScrapeErrors` command-line flag to `vmagent`. The most recent scrape error per each target can be observed at `http://vmagent-host:8429/targets`
|
||||
and `http://vmagent-host:8429/api/v1/targets`.
|
||||
|
||||
* If `vmagent` scrapes targets with millions of metrics per each target (for instance, when scraping [federation endpoints](https://prometheus.io/docs/prometheus/latest/federation/)),
|
||||
then it is recommended enabling `stream parsing mode` in order to reduce memory usage during scraping. This mode may be enabled either globally for all the scrape targets
|
||||
by passing `-promscrape.streamParse` command-line flag or on a per-scrape target basis with `stream_parse: true` option. For example:
|
||||
|
||||
```yml
|
||||
scrape_configs:
|
||||
- job_name: 'big-federate'
|
||||
stream_parse: true
|
||||
static_configs:
|
||||
- targets:
|
||||
- big-prometeus1
|
||||
- big-prometeus2
|
||||
honor_labels: true
|
||||
metrics_path: /federate
|
||||
params:
|
||||
'match[]': ['{__name__!=""}']
|
||||
```
|
||||
|
||||
* It is recommended to increase `-remoteWrite.queues` if `vmagent_remotewrite_pending_data_bytes` metric exported at `http://vmagent-host:8429/metrics` page constantly grows.
|
||||
|
||||
|
|
|
@ -211,6 +211,12 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
|
|||
showOriginalLabels, _ := strconv.ParseBool(r.FormValue("show_original_labels"))
|
||||
promscrape.WriteHumanReadableTargetsStatus(w, showOriginalLabels)
|
||||
return true
|
||||
case "/api/v1/targets":
|
||||
promscrapeAPIV1TargetsRequests.Inc()
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
state := r.FormValue("state")
|
||||
promscrape.WriteAPIV1Targets(w, state)
|
||||
return true
|
||||
case "/-/reload":
|
||||
promscrapeConfigReloadRequests.Inc()
|
||||
procutil.SelfSIGHUP()
|
||||
|
@ -241,7 +247,8 @@ var (
|
|||
|
||||
influxQueryRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/query", protocol="influx"}`)
|
||||
|
||||
promscrapeTargetsRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/targets"}`)
|
||||
promscrapeTargetsRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/targets"}`)
|
||||
promscrapeAPIV1TargetsRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/api/v1/targets"}`)
|
||||
|
||||
promscrapeConfigReloadRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/-/reload"}`)
|
||||
)
|
||||
|
|
|
@ -53,6 +53,7 @@ type client struct {
|
|||
requestDuration *metrics.Histogram
|
||||
requestsOKCount *metrics.Counter
|
||||
errorsCount *metrics.Counter
|
||||
packetsDropped *metrics.Counter
|
||||
retriesCount *metrics.Counter
|
||||
|
||||
wg sync.WaitGroup
|
||||
|
@ -114,6 +115,7 @@ func newClient(argIdx int, remoteWriteURL, sanitizedURL string, fq *persistentqu
|
|||
c.requestDuration = metrics.GetOrCreateHistogram(fmt.Sprintf(`vmagent_remotewrite_duration_seconds{url=%q}`, c.sanitizedURL))
|
||||
c.requestsOKCount = metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_requests_total{url=%q, status_code="2XX"}`, c.sanitizedURL))
|
||||
c.errorsCount = metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_errors_total{url=%q}`, c.sanitizedURL))
|
||||
c.packetsDropped = metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_packets_dropped_total{url=%q}`, c.sanitizedURL))
|
||||
c.retriesCount = metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_retries_count_total{url=%q}`, c.sanitizedURL))
|
||||
for i := 0; i < concurrency; i++ {
|
||||
c.wg.Add(1)
|
||||
|
@ -228,10 +230,20 @@ again:
|
|||
c.requestsOKCount.Inc()
|
||||
return
|
||||
}
|
||||
metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_requests_total{url=%q, status_code="%d"}`, c.sanitizedURL, statusCode)).Inc()
|
||||
if statusCode/100 == 4 {
|
||||
// Just drop block on 4xx status code like Prometheus does.
|
||||
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/873
|
||||
body, _ := ioutil.ReadAll(resp.Body)
|
||||
_ = resp.Body.Close()
|
||||
logger.Errorf("unexpected status code received when sending a block with size %d bytes to %q: #%d; dropping the block for 4XX status code like Prometheus does; "+
|
||||
"response body=%q", len(block), c.sanitizedURL, statusCode, body)
|
||||
c.packetsDropped.Inc()
|
||||
return
|
||||
}
|
||||
|
||||
// Unexpected status code returned
|
||||
retriesCount++
|
||||
metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_requests_total{url=%q, status_code="%d"}`, c.sanitizedURL, statusCode)).Inc()
|
||||
retryDuration *= 2
|
||||
if retryDuration > time.Minute {
|
||||
retryDuration = time.Minute
|
||||
|
|
|
@ -403,7 +403,7 @@ func (ar *AlertingRule) Restore(ctx context.Context, q datasource.Querier, lookb
|
|||
labelsFilter += fmt.Sprintf(",%s=%q", k, v)
|
||||
}
|
||||
|
||||
// Get the last datapoint in range via MetricsQL `last_over_time`.
|
||||
// Get the last data point in range via MetricsQL `last_over_time`.
|
||||
// We don't use plain PromQL since Prometheus doesn't support
|
||||
// remote write protocol which is used for state persistence in vmalert.
|
||||
expr := fmt.Sprintf("last_over_time(%s{alertname=%q%s}[%ds])",
|
||||
|
@ -417,11 +417,14 @@ func (ar *AlertingRule) Restore(ctx context.Context, q datasource.Querier, lookb
|
|||
labels := m.Labels
|
||||
m.Labels = make([]datasource.Label, 0)
|
||||
// drop all extra labels, so hash key will
|
||||
// be identical to timeseries received in Exec
|
||||
// be identical to time series received in Exec
|
||||
for _, l := range labels {
|
||||
if l.Name == alertNameLabel {
|
||||
continue
|
||||
}
|
||||
if l.Name == alertGroupNameLabel {
|
||||
continue
|
||||
}
|
||||
// drop all overridden labels
|
||||
if _, ok := ar.Labels[l.Name]; ok {
|
||||
continue
|
||||
|
@ -436,7 +439,7 @@ func (ar *AlertingRule) Restore(ctx context.Context, q datasource.Querier, lookb
|
|||
a.ID = hash(m)
|
||||
a.State = notifier.StatePending
|
||||
ar.alerts[a.ID] = a
|
||||
logger.Infof("alert %q(%d) restored to state at %v", a.Name, a.ID, a.Start)
|
||||
logger.Infof("alert %q (%d) restored to state at %v", a.Name, a.ID, a.Start)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
|
|
@ -355,6 +355,7 @@ func TestAlertingRule_Restore(t *testing.T) {
|
|||
metricWithValueAndLabels(t, float64(time.Now().Truncate(time.Hour).Unix()),
|
||||
"__name__", alertForStateMetricName,
|
||||
alertNameLabel, "",
|
||||
alertGroupNameLabel, "groupID",
|
||||
"foo", "bar",
|
||||
"namespace", "baz",
|
||||
),
|
||||
|
|
|
@ -11,6 +11,7 @@ import (
|
|||
"time"
|
||||
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/notifier"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/utils"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envtemplate"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
|
||||
"github.com/VictoriaMetrics/metricsql"
|
||||
|
@ -193,25 +194,32 @@ func Parse(pathPatterns []string, validateAnnotations, validateExpressions bool)
|
|||
}
|
||||
fp = append(fp, matches...)
|
||||
}
|
||||
errGroup := new(utils.ErrGroup)
|
||||
var groups []Group
|
||||
for _, file := range fp {
|
||||
uniqueGroups := map[string]struct{}{}
|
||||
gr, err := parseFile(file)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("failed to parse file %q: %w", file, err)
|
||||
errGroup.Add(fmt.Errorf("failed to parse file %q: %w", file, err))
|
||||
continue
|
||||
}
|
||||
for _, g := range gr {
|
||||
if err := g.Validate(validateAnnotations, validateExpressions); err != nil {
|
||||
return nil, fmt.Errorf("invalid group %q in file %q: %w", g.Name, file, err)
|
||||
errGroup.Add(fmt.Errorf("invalid group %q in file %q: %w", g.Name, file, err))
|
||||
continue
|
||||
}
|
||||
if _, ok := uniqueGroups[g.Name]; ok {
|
||||
return nil, fmt.Errorf("group name %q duplicate in file %q", g.Name, file)
|
||||
errGroup.Add(fmt.Errorf("group name %q duplicate in file %q", g.Name, file))
|
||||
continue
|
||||
}
|
||||
uniqueGroups[g.Name] = struct{}{}
|
||||
g.File = file
|
||||
groups = append(groups, g)
|
||||
}
|
||||
}
|
||||
if err := errGroup.Err(); err != nil {
|
||||
return nil, err
|
||||
}
|
||||
if len(groups) < 1 {
|
||||
logger.Warnf("no groups found in %s", strings.Join(pathPatterns, ";"))
|
||||
}
|
||||
|
|
|
@ -10,6 +10,7 @@ import (
|
|||
"strings"
|
||||
"time"
|
||||
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/config"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/notifier"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/remoteread"
|
||||
|
@ -47,6 +48,8 @@ eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{
|
|||
|
||||
remoteReadLookBack = flag.Duration("remoteRead.lookback", time.Hour, "Lookback defines how far to look into past for alerts timeseries."+
|
||||
" For example, if lookback=1h then range from now() to now()-1h will be scanned.")
|
||||
|
||||
dryRun = flag.Bool("dryRun", false, "Whether to check only config files without running vmalert. The rules file are validated. The `-rule` flag must be specified.")
|
||||
)
|
||||
|
||||
func main() {
|
||||
|
@ -58,6 +61,18 @@ func main() {
|
|||
logger.Init()
|
||||
cgroup.UpdateGOMAXPROCSToCPUQuota()
|
||||
|
||||
if *dryRun {
|
||||
u, _ := url.Parse("https://victoriametrics.com/")
|
||||
notifier.InitTemplateFunc(u)
|
||||
groups, err := config.Parse(*rulePath, true, true)
|
||||
if err != nil {
|
||||
logger.Fatalf(err.Error())
|
||||
}
|
||||
if len(groups) == 0 {
|
||||
logger.Fatalf("No rules for validation. Please specify path to file(s) with alerting and/or recording rules using `-rule` flag")
|
||||
}
|
||||
return
|
||||
}
|
||||
ctx, cancel := context.WithCancel(context.Background())
|
||||
manager, err := newManager(ctx)
|
||||
if err != nil {
|
||||
|
|
|
@ -10,6 +10,7 @@ import (
|
|||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/backup/actions"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/backup/common"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/backup/fslocal"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/backup/fsnil"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/buildinfo"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envflag"
|
||||
|
@ -146,9 +147,9 @@ func newDstFS() (common.RemoteFS, error) {
|
|||
return fs, nil
|
||||
}
|
||||
|
||||
func newOriginFS() (common.RemoteFS, error) {
|
||||
func newOriginFS() (common.OriginFS, error) {
|
||||
if len(*origin) == 0 {
|
||||
return nil, nil
|
||||
return &fsnil.FS{}, nil
|
||||
}
|
||||
fs, err := actions.NewRemoteFS(*origin)
|
||||
if err != nil {
|
||||
|
|
|
@ -159,6 +159,12 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
|
|||
showOriginalLabels, _ := strconv.ParseBool(r.FormValue("show_original_labels"))
|
||||
promscrape.WriteHumanReadableTargetsStatus(w, showOriginalLabels)
|
||||
return true
|
||||
case "/api/v1/targets":
|
||||
promscrapeAPIV1TargetsRequests.Inc()
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
state := r.FormValue("state")
|
||||
promscrape.WriteAPIV1Targets(w, state)
|
||||
return true
|
||||
case "/-/reload":
|
||||
promscrapeConfigReloadRequests.Inc()
|
||||
procutil.SelfSIGHUP()
|
||||
|
@ -191,7 +197,8 @@ var (
|
|||
|
||||
influxQueryRequests = metrics.NewCounter(`vm_http_requests_total{path="/query", protocol="influx"}`)
|
||||
|
||||
promscrapeTargetsRequests = metrics.NewCounter(`vm_http_requests_total{path="/targets"}`)
|
||||
promscrapeTargetsRequests = metrics.NewCounter(`vm_http_requests_total{path="/targets"}`)
|
||||
promscrapeAPIV1TargetsRequests = metrics.NewCounter(`vm_http_requests_total{path="/api/v1/targets"}`)
|
||||
|
||||
promscrapeConfigReloadRequests = metrics.NewCounter(`vm_http_requests_total{path="/-/reload"}`)
|
||||
|
||||
|
|
|
@ -3,7 +3,7 @@
|
|||
`vmrestore` restores data from backups created by [vmbackup](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmbackup/README.md).
|
||||
VictoriaMetrics `v1.29.0` and newer versions must be used for working with the restored data.
|
||||
|
||||
Restore process can be interrupted at any time. It is automatically resumed from the inerruption point
|
||||
Restore process can be interrupted at any time. It is automatically resumed from the interruption point
|
||||
when restarting `vmrestore` with the same args.
|
||||
|
||||
|
||||
|
|
|
@ -69,7 +69,9 @@ func newAggrFunc(afe func(tss []*timeseries) []*timeseries) aggrFunc {
|
|||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return aggrFuncExt(afe, tss, &afa.ae.Modifier, afa.ae.Limit, false)
|
||||
return aggrFuncExt(func(tss []*timeseries, modififer *metricsql.ModifierExpr) []*timeseries {
|
||||
return afe(tss)
|
||||
}, tss, &afa.ae.Modifier, afa.ae.Limit, false)
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -98,7 +100,8 @@ func removeGroupTags(metricName *storage.MetricName, modifier *metricsql.Modifie
|
|||
}
|
||||
}
|
||||
|
||||
func aggrFuncExt(afe func(tss []*timeseries) []*timeseries, argOrig []*timeseries, modifier *metricsql.ModifierExpr, maxSeries int, keepOriginal bool) ([]*timeseries, error) {
|
||||
func aggrFuncExt(afe func(tss []*timeseries, modifier *metricsql.ModifierExpr) []*timeseries, argOrig []*timeseries,
|
||||
modifier *metricsql.ModifierExpr, maxSeries int, keepOriginal bool) ([]*timeseries, error) {
|
||||
arg := copyTimeseriesMetricNames(argOrig, keepOriginal)
|
||||
|
||||
// Perform grouping.
|
||||
|
@ -124,7 +127,7 @@ func aggrFuncExt(afe func(tss []*timeseries) []*timeseries, argOrig []*timeserie
|
|||
dstTssCount := 0
|
||||
rvs := make([]*timeseries, 0, len(m))
|
||||
for _, tss := range m {
|
||||
rv := afe(tss)
|
||||
rv := afe(tss, modifier)
|
||||
rvs = append(rvs, rv...)
|
||||
srcTssCount += len(tss)
|
||||
dstTssCount += len(rv)
|
||||
|
@ -141,7 +144,7 @@ func aggrFuncAny(afa *aggrFuncArg) ([]*timeseries, error) {
|
|||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
afe := func(tss []*timeseries) []*timeseries {
|
||||
afe := func(tss []*timeseries, modifier *metricsql.ModifierExpr) []*timeseries {
|
||||
return tss[:1]
|
||||
}
|
||||
limit := afa.ae.Limit
|
||||
|
@ -178,10 +181,11 @@ func aggrFuncSum(tss []*timeseries) []*timeseries {
|
|||
sum := float64(0)
|
||||
count := 0
|
||||
for _, ts := range tss {
|
||||
if math.IsNaN(ts.Values[i]) {
|
||||
v := ts.Values[i]
|
||||
if math.IsNaN(v) {
|
||||
continue
|
||||
}
|
||||
sum += ts.Values[i]
|
||||
sum += v
|
||||
count++
|
||||
}
|
||||
if count == 0 {
|
||||
|
@ -449,7 +453,7 @@ func aggrFuncZScore(afa *aggrFuncArg) ([]*timeseries, error) {
|
|||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
afe := func(tss []*timeseries) []*timeseries {
|
||||
afe := func(tss []*timeseries, modifier *metricsql.ModifierExpr) []*timeseries {
|
||||
for i := range tss[0].Values {
|
||||
// Calculate avg and stddev for tss points at position i.
|
||||
// See `Rapid calculation methods` at https://en.wikipedia.org/wiki/Standard_deviation
|
||||
|
@ -550,7 +554,7 @@ func aggrFuncCountValues(afa *aggrFuncArg) ([]*timeseries, error) {
|
|||
// Do nothing
|
||||
}
|
||||
|
||||
afe := func(tss []*timeseries) []*timeseries {
|
||||
afe := func(tss []*timeseries, modififer *metricsql.ModifierExpr) []*timeseries {
|
||||
m := make(map[float64]bool)
|
||||
for _, ts := range tss {
|
||||
for _, v := range ts.Values {
|
||||
|
@ -602,7 +606,7 @@ func newAggrFuncTopK(isReverse bool) aggrFunc {
|
|||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
afe := func(tss []*timeseries) []*timeseries {
|
||||
afe := func(tss []*timeseries, modififer *metricsql.ModifierExpr) []*timeseries {
|
||||
for n := range tss[0].Values {
|
||||
sort.Slice(tss, func(i, j int) bool {
|
||||
a := tss[i].Values[n]
|
||||
|
@ -623,21 +627,32 @@ func newAggrFuncTopK(isReverse bool) aggrFunc {
|
|||
func newAggrFuncRangeTopK(f func(values []float64) float64, isReverse bool) aggrFunc {
|
||||
return func(afa *aggrFuncArg) ([]*timeseries, error) {
|
||||
args := afa.args
|
||||
if err := expectTransformArgsNum(args, 2); err != nil {
|
||||
return nil, err
|
||||
if len(args) < 2 {
|
||||
return nil, fmt.Errorf(`unexpected number of args; got %d; want at least %d`, len(args), 2)
|
||||
}
|
||||
if len(args) > 3 {
|
||||
return nil, fmt.Errorf(`unexpected number of args; got %d; want no more than %d`, len(args), 3)
|
||||
}
|
||||
ks, err := getScalar(args[0], 0)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
afe := func(tss []*timeseries) []*timeseries {
|
||||
return getRangeTopKTimeseries(tss, ks, f, isReverse)
|
||||
remainingSumTagName := ""
|
||||
if len(args) == 3 {
|
||||
remainingSumTagName, err = getString(args[2], 2)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
}
|
||||
afe := func(tss []*timeseries, modifier *metricsql.ModifierExpr) []*timeseries {
|
||||
return getRangeTopKTimeseries(tss, modifier, ks, remainingSumTagName, f, isReverse)
|
||||
}
|
||||
return aggrFuncExt(afe, args[1], &afa.ae.Modifier, afa.ae.Limit, true)
|
||||
}
|
||||
}
|
||||
|
||||
func getRangeTopKTimeseries(tss []*timeseries, ks []float64, f func(values []float64) float64, isReverse bool) []*timeseries {
|
||||
func getRangeTopKTimeseries(tss []*timeseries, modifier *metricsql.ModifierExpr, ks []float64, remainingSumTagName string,
|
||||
f func(values []float64) float64, isReverse bool) []*timeseries {
|
||||
type tsWithValue struct {
|
||||
ts *timeseries
|
||||
value float64
|
||||
|
@ -661,28 +676,66 @@ func getRangeTopKTimeseries(tss []*timeseries, ks []float64, f func(values []flo
|
|||
for i := range maxs {
|
||||
tss[i] = maxs[i].ts
|
||||
}
|
||||
remainingSumTS := getRemainingSumTimeseries(tss, modifier, ks, remainingSumTagName)
|
||||
for i, k := range ks {
|
||||
fillNaNsAtIdx(i, k, tss)
|
||||
}
|
||||
if remainingSumTS != nil {
|
||||
tss = append(tss, remainingSumTS)
|
||||
}
|
||||
return removeNaNs(tss)
|
||||
}
|
||||
|
||||
func getRemainingSumTimeseries(tss []*timeseries, modifier *metricsql.ModifierExpr, ks []float64, remainingSumTagName string) *timeseries {
|
||||
if len(remainingSumTagName) == 0 || len(tss) == 0 {
|
||||
return nil
|
||||
}
|
||||
var dst timeseries
|
||||
dst.CopyFromShallowTimestamps(tss[0])
|
||||
removeGroupTags(&dst.MetricName, modifier)
|
||||
dst.MetricName.RemoveTag(remainingSumTagName)
|
||||
dst.MetricName.AddTag(remainingSumTagName, remainingSumTagName)
|
||||
for i, k := range ks {
|
||||
kn := getIntK(k, len(tss))
|
||||
var sum float64
|
||||
count := 0
|
||||
for _, ts := range tss[:len(tss)-kn] {
|
||||
v := ts.Values[i]
|
||||
if math.IsNaN(v) {
|
||||
continue
|
||||
}
|
||||
sum += v
|
||||
count++
|
||||
}
|
||||
if count == 0 {
|
||||
sum = nan
|
||||
}
|
||||
dst.Values[i] = sum
|
||||
}
|
||||
return &dst
|
||||
}
|
||||
|
||||
func fillNaNsAtIdx(idx int, k float64, tss []*timeseries) {
|
||||
if math.IsNaN(k) {
|
||||
k = 0
|
||||
}
|
||||
kn := int(k)
|
||||
if kn < 0 {
|
||||
kn = 0
|
||||
}
|
||||
if kn > len(tss) {
|
||||
kn = len(tss)
|
||||
}
|
||||
kn := getIntK(k, len(tss))
|
||||
for _, ts := range tss[:len(tss)-kn] {
|
||||
ts.Values[idx] = nan
|
||||
}
|
||||
}
|
||||
|
||||
func getIntK(k float64, kMax int) int {
|
||||
if math.IsNaN(k) {
|
||||
return 0
|
||||
}
|
||||
kn := int(k)
|
||||
if kn < 0 {
|
||||
return 0
|
||||
}
|
||||
if kn > kMax {
|
||||
return kMax
|
||||
}
|
||||
return kn
|
||||
}
|
||||
|
||||
func minValue(values []float64) float64 {
|
||||
if len(values) == 0 {
|
||||
return nan
|
||||
|
@ -746,7 +799,7 @@ func aggrFuncOutliersK(afa *aggrFuncArg) ([]*timeseries, error) {
|
|||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
afe := func(tss []*timeseries) []*timeseries {
|
||||
afe := func(tss []*timeseries, modifier *metricsql.ModifierExpr) []*timeseries {
|
||||
// Calculate medians for each point across tss.
|
||||
medians := make([]float64, len(ks))
|
||||
h := histogram.GetFast()
|
||||
|
@ -771,7 +824,7 @@ func aggrFuncOutliersK(afa *aggrFuncArg) ([]*timeseries, error) {
|
|||
}
|
||||
return sum2
|
||||
}
|
||||
return getRangeTopKTimeseries(tss, ks, f, false)
|
||||
return getRangeTopKTimeseries(tss, &afa.ae.Modifier, ks, "", f, false)
|
||||
}
|
||||
return aggrFuncExt(afe, args[1], &afa.ae.Modifier, afa.ae.Limit, true)
|
||||
}
|
||||
|
@ -792,7 +845,7 @@ func aggrFuncLimitK(afa *aggrFuncArg) ([]*timeseries, error) {
|
|||
maxK = k
|
||||
}
|
||||
}
|
||||
afe := func(tss []*timeseries) []*timeseries {
|
||||
afe := func(tss []*timeseries, modifier *metricsql.ModifierExpr) []*timeseries {
|
||||
if len(tss) > maxK {
|
||||
tss = tss[:maxK]
|
||||
}
|
||||
|
@ -833,8 +886,8 @@ func aggrFuncMedian(afa *aggrFuncArg) ([]*timeseries, error) {
|
|||
return aggrFuncExt(afe, tss, &afa.ae.Modifier, afa.ae.Limit, false)
|
||||
}
|
||||
|
||||
func newAggrQuantileFunc(phis []float64) func(tss []*timeseries) []*timeseries {
|
||||
return func(tss []*timeseries) []*timeseries {
|
||||
func newAggrQuantileFunc(phis []float64) func(tss []*timeseries, modifier *metricsql.ModifierExpr) []*timeseries {
|
||||
return func(tss []*timeseries, modifier *metricsql.ModifierExpr) []*timeseries {
|
||||
dst := tss[0]
|
||||
h := histogram.GetFast()
|
||||
defer histogram.PutFast(h)
|
||||
|
|
|
@ -4193,7 +4193,7 @@ func TestExecSuccess(t *testing.T) {
|
|||
})
|
||||
t.Run(`topk_max(1)`, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `sort(topk_max(1, label_set(10, "foo", "bar") or label_set(time()/150, "baz", "sss")))`
|
||||
q := `topk_max(1, label_set(10, "foo", "bar") or label_set(time()/150, "baz", "sss"))`
|
||||
r1 := netstorage.Result{
|
||||
MetricName: metricNameExpected,
|
||||
Values: []float64{nan, nan, nan, 10.666666666666666, 12, 13.333333333333334},
|
||||
|
@ -4206,6 +4206,84 @@ func TestExecSuccess(t *testing.T) {
|
|||
resultExpected := []netstorage.Result{r1}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run(`topk_max(1, remaining_sum)`, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `sort_desc(topk_max(1, label_set(10, "foo", "bar") or label_set(time()/150, "baz", "sss"), "remaining_sum"))`
|
||||
r1 := netstorage.Result{
|
||||
MetricName: metricNameExpected,
|
||||
Values: []float64{nan, nan, nan, 10.666666666666666, 12, 13.333333333333334},
|
||||
Timestamps: timestampsExpected,
|
||||
}
|
||||
r1.MetricName.Tags = []storage.Tag{{
|
||||
Key: []byte("baz"),
|
||||
Value: []byte("sss"),
|
||||
}}
|
||||
r2 := netstorage.Result{
|
||||
MetricName: metricNameExpected,
|
||||
Values: []float64{10, 10, 10, 10, 10, 10},
|
||||
Timestamps: timestampsExpected,
|
||||
}
|
||||
r2.MetricName.Tags = []storage.Tag{
|
||||
{
|
||||
Key: []byte("remaining_sum"),
|
||||
Value: []byte("remaining_sum"),
|
||||
},
|
||||
}
|
||||
resultExpected := []netstorage.Result{r1, r2}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run(`topk_max(2, remaining_sum)`, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `sort_desc(topk_max(2, label_set(10, "foo", "bar") or label_set(time()/150, "baz", "sss"), "remaining_sum"))`
|
||||
r1 := netstorage.Result{
|
||||
MetricName: metricNameExpected,
|
||||
Values: []float64{nan, nan, nan, 10.666666666666666, 12, 13.333333333333334},
|
||||
Timestamps: timestampsExpected,
|
||||
}
|
||||
r1.MetricName.Tags = []storage.Tag{{
|
||||
Key: []byte("baz"),
|
||||
Value: []byte("sss"),
|
||||
}}
|
||||
r2 := netstorage.Result{
|
||||
MetricName: metricNameExpected,
|
||||
Values: []float64{10, 10, 10, 10, 10, 10},
|
||||
Timestamps: timestampsExpected,
|
||||
}
|
||||
r2.MetricName.Tags = []storage.Tag{
|
||||
{
|
||||
Key: []byte("foo"),
|
||||
Value: []byte("bar"),
|
||||
},
|
||||
}
|
||||
resultExpected := []netstorage.Result{r1, r2}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run(`topk_max(3, remaining_sum)`, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `sort_desc(topk_max(3, label_set(10, "foo", "bar") or label_set(time()/150, "baz", "sss"), "remaining_sum"))`
|
||||
r1 := netstorage.Result{
|
||||
MetricName: metricNameExpected,
|
||||
Values: []float64{nan, nan, nan, 10.666666666666666, 12, 13.333333333333334},
|
||||
Timestamps: timestampsExpected,
|
||||
}
|
||||
r1.MetricName.Tags = []storage.Tag{{
|
||||
Key: []byte("baz"),
|
||||
Value: []byte("sss"),
|
||||
}}
|
||||
r2 := netstorage.Result{
|
||||
MetricName: metricNameExpected,
|
||||
Values: []float64{10, 10, 10, 10, 10, 10},
|
||||
Timestamps: timestampsExpected,
|
||||
}
|
||||
r2.MetricName.Tags = []storage.Tag{
|
||||
{
|
||||
Key: []byte("foo"),
|
||||
Value: []byte("bar"),
|
||||
},
|
||||
}
|
||||
resultExpected := []netstorage.Result{r1, r2}
|
||||
f(q, resultExpected)
|
||||
})
|
||||
t.Run(`bottomk_max(1)`, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
q := `sort(bottomk_max(1, label_set(10, "foo", "bar") or label_set(time()/150, "baz", "sss")))`
|
||||
|
|
|
@ -519,12 +519,13 @@ func (rc *rollupConfig) doInternal(dstValues []float64, tsm *timeseriesMap, valu
|
|||
}
|
||||
rfa.values = values[i:j]
|
||||
rfa.timestamps = timestamps[i:j]
|
||||
if j == len(timestamps) && j > 0 && (tEnd-timestamps[j-1] > stalenessInterval || i == j && len(timestamps) == 1) {
|
||||
if j == len(timestamps) && j > 0 && (tEnd-timestamps[j-1] > stalenessInterval || i == j && len(timestamps) == 1) && rc.End-tEnd >= 2*rc.Step {
|
||||
// Drop trailing data points in the following cases:
|
||||
// - if the distance between the last raw sample and tEnd exceeds stalenessInterval
|
||||
// - if time series contains only a single raw sample
|
||||
// This should prevent from double counting when a label changes in time series (for instance,
|
||||
// during new deployment in K8S). See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/748
|
||||
// Do not drop trailing data points for instant queries. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/845
|
||||
rfa.prevValue = nan
|
||||
rfa.values = nil
|
||||
rfa.timestamps = nil
|
||||
|
|
|
@ -11,6 +11,7 @@ import (
|
|||
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmstorage/promdb"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
|
||||
|
@ -20,7 +21,7 @@ import (
|
|||
)
|
||||
|
||||
var (
|
||||
retentionPeriod = flag.Int("retentionPeriod", 1, "Retention period in months")
|
||||
retentionPeriod = flagutil.NewDuration("retentionPeriod", 1, "Data with timestamps outside the retentionPeriod is automatically deleted")
|
||||
snapshotAuthKey = flag.String("snapshotAuthKey", "", "authKey, which must be passed in query string to /snapshot* pages")
|
||||
forceMergeAuthKey = flag.String("forceMergeAuthKey", "", "authKey, which must be passed in query string to /internal/force_merge pages")
|
||||
|
||||
|
@ -45,12 +46,12 @@ func CheckTimeRange(tr storage.TimeRange) error {
|
|||
if !*denyQueriesOutsideRetention {
|
||||
return nil
|
||||
}
|
||||
minAllowedTimestamp := (int64(fasttime.UnixTimestamp()) - int64(*retentionPeriod)*3600*24*30) * 1000
|
||||
minAllowedTimestamp := int64(fasttime.UnixTimestamp()*1000) - retentionPeriod.Msecs
|
||||
if tr.MinTimestamp > minAllowedTimestamp {
|
||||
return nil
|
||||
}
|
||||
return &httpserver.ErrorWithStatusCode{
|
||||
Err: fmt.Errorf("the given time range %s is outside the allowed retention of %d months according to -denyQueriesOutsideRetention", &tr, *retentionPeriod),
|
||||
Err: fmt.Errorf("the given time range %s is outside the allowed -retentionPeriod=%s according to -denyQueriesOutsideRetention", &tr, retentionPeriod),
|
||||
StatusCode: http.StatusServiceUnavailable,
|
||||
}
|
||||
}
|
||||
|
@ -73,12 +74,12 @@ func InitWithoutMetrics() {
|
|||
storage.SetBigMergeWorkersCount(*bigMergeConcurrency)
|
||||
storage.SetSmallMergeWorkersCount(*smallMergeConcurrency)
|
||||
|
||||
logger.Infof("opening storage at %q with retention period %d months", *DataPath, *retentionPeriod)
|
||||
logger.Infof("opening storage at %q with -retentionPeriod=%s", *DataPath, retentionPeriod)
|
||||
startTime := time.Now()
|
||||
WG = syncwg.WaitGroup{}
|
||||
strg, err := storage.OpenStorage(*DataPath, *retentionPeriod)
|
||||
strg, err := storage.OpenStorage(*DataPath, retentionPeriod.Msecs)
|
||||
if err != nil {
|
||||
logger.Fatalf("cannot open a storage at %s with retention period %d months: %s", *DataPath, *retentionPeriod, err)
|
||||
logger.Fatalf("cannot open a storage at %s with -retentionPeriod=%s: %s", *DataPath, retentionPeriod, err)
|
||||
}
|
||||
Storage = strg
|
||||
|
||||
|
|
|
@ -56,7 +56,7 @@
|
|||
"gnetId": 10229,
|
||||
"graphTooltip": 0,
|
||||
"id": null,
|
||||
"iteration": 1599034965731,
|
||||
"iteration": 1603307754894,
|
||||
"links": [
|
||||
{
|
||||
"icon": "doc",
|
||||
|
@ -925,7 +925,7 @@
|
|||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "$ds",
|
||||
"description": "Shows how many ongoing insertions are taking place.\n* `max` - equal to number of CPU * 2\n* `current` - current number of goroutines busy with inserting rows into storage\n\nWhen `current` hits `max` constantly, it means storage is overloaded and require more CPU.",
|
||||
"description": "Shows how many ongoing insertions (not API /write calls) on disk are taking place, where:\n* `max` - equal to number of CPUs;\n* `current` - current number of goroutines busy with inserting rows into underlying storage.\n\nEvery successful API /write call results into flush on disk. However, these two actions are separated and controlled via different concurrency limiters. The `max` on this panel can't be changed and always equal to number of CPUs. \n\nWhen `current` hits `max` constantly, it means storage is overloaded and requires more CPU.\n\n",
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"custom": {},
|
||||
|
@ -979,6 +979,7 @@
|
|||
{
|
||||
"expr": "sum(vm_concurrent_addrows_capacity{job=\"$job\", instance=\"$instance\"})",
|
||||
"format": "time_series",
|
||||
"interval": "",
|
||||
"intervalFactor": 1,
|
||||
"legendFormat": "max",
|
||||
"refId": "A"
|
||||
|
@ -995,7 +996,7 @@
|
|||
"timeFrom": null,
|
||||
"timeRegions": [],
|
||||
"timeShift": null,
|
||||
"title": "Concurrent inserts ($instance)",
|
||||
"title": "Concurrent flushes on disk ($instance)",
|
||||
"tooltip": {
|
||||
"shared": true,
|
||||
"sort": 2,
|
||||
|
@ -1164,7 +1165,7 @@
|
|||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 36
|
||||
"y": 3
|
||||
},
|
||||
"hiddenSeries": false,
|
||||
"id": 10,
|
||||
|
@ -1250,7 +1251,7 @@
|
|||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "$ds",
|
||||
"description": "How many datapoints are in RAM queue waiting to be written into storage. The number of pending data points should be in the range from 0 to `2*<ingestion_rate>`, since VictoriaMetrics pushes pending data to persistent storage every second.",
|
||||
"description": "Shows the time needed to reach the 100% of disk capacity based on the following params:\n* free disk space;\n* rows ingestion rate;\n* compression.\n\nUse this panel for capacity planning in order to estimate the time remaining for running out of the disk space.\n\n",
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"custom": {},
|
||||
|
@ -1264,63 +1265,53 @@
|
|||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 36
|
||||
"y": 3
|
||||
},
|
||||
"hiddenSeries": false,
|
||||
"id": 34,
|
||||
"id": 73,
|
||||
"legend": {
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"alignAsTable": true,
|
||||
"avg": true,
|
||||
"current": true,
|
||||
"hideZero": true,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"show": false,
|
||||
"total": false,
|
||||
"values": false
|
||||
"values": true
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 1,
|
||||
"links": [],
|
||||
"nullPointMode": "null",
|
||||
"nullPointMode": "null as zero",
|
||||
"percentage": false,
|
||||
"pluginVersion": "7.1.1",
|
||||
"pointradius": 2,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [
|
||||
{
|
||||
"alias": "pending index entries",
|
||||
"yaxis": 2
|
||||
}
|
||||
],
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "vm_pending_rows{job=\"$job\", instance=~\"$instance\", type=\"storage\"}",
|
||||
"expr": "vm_free_disk_space_bytes{job=\"$job\", instance=\"$instance\"} / (sum(rate(vm_rows_added_to_storage_total{job=\"$job\", instance=\"$instance\"}[1d])) * (sum(vm_data_size_bytes{job=\"$job\", instance=\"$instance\", type!=\"indexdb\"}) / sum(vm_rows{job=\"$job\", instance=\"$instance\", type!=\"indexdb\"})))",
|
||||
"format": "time_series",
|
||||
"hide": false,
|
||||
"interval": "",
|
||||
"intervalFactor": 1,
|
||||
"legendFormat": "pending datapoints",
|
||||
"legendFormat": "",
|
||||
"refId": "A"
|
||||
},
|
||||
{
|
||||
"expr": "vm_pending_rows{job=\"$job\", instance=~\"$instance\", type=\"indexdb\"}",
|
||||
"format": "time_series",
|
||||
"hide": false,
|
||||
"intervalFactor": 1,
|
||||
"legendFormat": "pending index entries",
|
||||
"refId": "B"
|
||||
}
|
||||
],
|
||||
"thresholds": [],
|
||||
"timeFrom": null,
|
||||
"timeRegions": [],
|
||||
"timeShift": null,
|
||||
"title": "Pending datapoints ($instance)",
|
||||
"title": "Storage full ETA ($instance)",
|
||||
"tooltip": {
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"sort": 2,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
|
@ -1333,7 +1324,8 @@
|
|||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "short",
|
||||
"decimals": null,
|
||||
"format": "s",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
|
@ -1341,8 +1333,7 @@
|
|||
"show": true
|
||||
},
|
||||
{
|
||||
"decimals": 3,
|
||||
"format": "none",
|
||||
"format": "short",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
|
@ -1375,7 +1366,7 @@
|
|||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 44
|
||||
"y": 11
|
||||
},
|
||||
"hiddenSeries": false,
|
||||
"id": 30,
|
||||
|
@ -1472,7 +1463,7 @@
|
|||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "$ds",
|
||||
"description": "Data parts of LSM tree.\nHigh number of parts could be an evidence of slow merge performance - check the resource utilization.\n* `indexdb` - inverted index\n* `storage/small` - recently added parts of data ingested into storage(hot data)\n* `storage/big` - small parts gradually merged into big parts (cold data)",
|
||||
"description": "How many datapoints are in RAM queue waiting to be written into storage. The number of pending data points should be in the range from 0 to `2*<ingestion_rate>`, since VictoriaMetrics pushes pending data to persistent storage every second.",
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"custom": {},
|
||||
|
@ -1486,16 +1477,16 @@
|
|||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 44
|
||||
"y": 11
|
||||
},
|
||||
"hiddenSeries": false,
|
||||
"id": 36,
|
||||
"id": 34,
|
||||
"legend": {
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"show": true,
|
||||
"show": false,
|
||||
"total": false,
|
||||
"values": false
|
||||
},
|
||||
|
@ -1508,27 +1499,41 @@
|
|||
"pointradius": 2,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"seriesOverrides": [
|
||||
{
|
||||
"alias": "pending index entries",
|
||||
"yaxis": 2
|
||||
}
|
||||
],
|
||||
"spaceLength": 10,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(vm_parts{job=\"$job\", instance=\"$instance\"}) by (type)",
|
||||
"expr": "vm_pending_rows{job=\"$job\", instance=~\"$instance\", type=\"storage\"}",
|
||||
"format": "time_series",
|
||||
"hide": false,
|
||||
"intervalFactor": 1,
|
||||
"legendFormat": "{{type}}",
|
||||
"legendFormat": "pending datapoints",
|
||||
"refId": "A"
|
||||
},
|
||||
{
|
||||
"expr": "vm_pending_rows{job=\"$job\", instance=~\"$instance\", type=\"indexdb\"}",
|
||||
"format": "time_series",
|
||||
"hide": false,
|
||||
"intervalFactor": 1,
|
||||
"legendFormat": "pending index entries",
|
||||
"refId": "B"
|
||||
}
|
||||
],
|
||||
"thresholds": [],
|
||||
"timeFrom": null,
|
||||
"timeRegions": [],
|
||||
"timeShift": null,
|
||||
"title": "LSM parts ($instance)",
|
||||
"title": "Pending datapoints ($instance)",
|
||||
"tooltip": {
|
||||
"shared": true,
|
||||
"sort": 2,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
|
@ -1549,7 +1554,8 @@
|
|||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"decimals": 3,
|
||||
"format": "none",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
|
@ -1582,7 +1588,7 @@
|
|||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 52
|
||||
"y": 19
|
||||
},
|
||||
"hiddenSeries": false,
|
||||
"id": 53,
|
||||
|
@ -1669,6 +1675,196 @@
|
|||
"alignLevel": null
|
||||
}
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "$ds",
|
||||
"description": "Data parts of LSM tree.\nHigh number of parts could be an evidence of slow merge performance - check the resource utilization.\n* `indexdb` - inverted index\n* `storage/small` - recently added parts of data ingested into storage(hot data)\n* `storage/big` - small parts gradually merged into big parts (cold data)",
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"custom": {},
|
||||
"links": []
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"fill": 1,
|
||||
"fillGradient": 0,
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 19
|
||||
},
|
||||
"hiddenSeries": false,
|
||||
"id": 36,
|
||||
"legend": {
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"show": true,
|
||||
"total": false,
|
||||
"values": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 1,
|
||||
"links": [],
|
||||
"nullPointMode": "null",
|
||||
"percentage": false,
|
||||
"pluginVersion": "7.1.1",
|
||||
"pointradius": 2,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(vm_parts{job=\"$job\", instance=\"$instance\"}) by (type)",
|
||||
"format": "time_series",
|
||||
"intervalFactor": 1,
|
||||
"legendFormat": "{{type}}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"thresholds": [],
|
||||
"timeFrom": null,
|
||||
"timeRegions": [],
|
||||
"timeShift": null,
|
||||
"title": "LSM parts ($instance)",
|
||||
"tooltip": {
|
||||
"shared": true,
|
||||
"sort": 2,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"buckets": null,
|
||||
"mode": "time",
|
||||
"name": null,
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"format": "short",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": "0",
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": "0",
|
||||
"show": true
|
||||
}
|
||||
],
|
||||
"yaxis": {
|
||||
"align": false,
|
||||
"alignLevel": null
|
||||
}
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "$ds",
|
||||
"description": "The number of on-going merges in storage nodes. It is expected to have high numbers for `storage/small` metric.",
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"custom": {},
|
||||
"links": []
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"fill": 1,
|
||||
"fillGradient": 0,
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 27
|
||||
},
|
||||
"hiddenSeries": false,
|
||||
"id": 62,
|
||||
"legend": {
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"show": true,
|
||||
"total": false,
|
||||
"values": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 1,
|
||||
"nullPointMode": "null",
|
||||
"percentage": false,
|
||||
"pluginVersion": "7.1.1",
|
||||
"pointradius": 2,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(vm_active_merges{job=\"$job\", instance=\"$instance\"}) by(type)",
|
||||
"legendFormat": "{{type}}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"thresholds": [],
|
||||
"timeFrom": null,
|
||||
"timeRegions": [],
|
||||
"timeShift": null,
|
||||
"title": "Active merges ($instance)",
|
||||
"tooltip": {
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"buckets": null,
|
||||
"mode": "time",
|
||||
"name": null,
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"decimals": 0,
|
||||
"format": "short",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": "0",
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": "0",
|
||||
"show": true
|
||||
}
|
||||
],
|
||||
"yaxis": {
|
||||
"align": false,
|
||||
"alignLevel": null
|
||||
}
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
|
@ -1689,7 +1885,7 @@
|
|||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 52
|
||||
"y": 27
|
||||
},
|
||||
"hiddenSeries": false,
|
||||
"id": 55,
|
||||
|
@ -1764,194 +1960,6 @@
|
|||
"alignLevel": null
|
||||
}
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "$ds",
|
||||
"description": "The number of on-going merges in storage nodes. It is expected to have high numbers for `storage/small` metric.",
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"custom": {},
|
||||
"links": []
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"fill": 1,
|
||||
"fillGradient": 0,
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 60
|
||||
},
|
||||
"hiddenSeries": false,
|
||||
"id": 62,
|
||||
"legend": {
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"show": true,
|
||||
"total": false,
|
||||
"values": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 1,
|
||||
"nullPointMode": "null",
|
||||
"percentage": false,
|
||||
"pluginVersion": "7.1.1",
|
||||
"pointradius": 2,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(vm_active_merges{job=\"$job\", instance=\"$instance\"}) by(type)",
|
||||
"legendFormat": "{{type}}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"thresholds": [],
|
||||
"timeFrom": null,
|
||||
"timeRegions": [],
|
||||
"timeShift": null,
|
||||
"title": "Active merges ($instance)",
|
||||
"tooltip": {
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"buckets": null,
|
||||
"mode": "time",
|
||||
"name": null,
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"decimals": 0,
|
||||
"format": "short",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": "0",
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": "0",
|
||||
"show": true
|
||||
}
|
||||
],
|
||||
"yaxis": {
|
||||
"align": false,
|
||||
"alignLevel": null
|
||||
}
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "$ds",
|
||||
"description": "The number of rows merged per second by storage nodes.",
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"custom": {},
|
||||
"links": []
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"fill": 1,
|
||||
"fillGradient": 0,
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 60
|
||||
},
|
||||
"hiddenSeries": false,
|
||||
"id": 64,
|
||||
"legend": {
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"show": true,
|
||||
"total": false,
|
||||
"values": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 1,
|
||||
"nullPointMode": "null",
|
||||
"percentage": false,
|
||||
"pluginVersion": "7.1.1",
|
||||
"pointradius": 2,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(vm_rows_merged_total{job=\"$job\", instance=\"$instance\"}[5m])) by(type)",
|
||||
"legendFormat": "{{type}}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"thresholds": [],
|
||||
"timeFrom": null,
|
||||
"timeRegions": [],
|
||||
"timeShift": null,
|
||||
"title": "Merge speed ($instance)",
|
||||
"tooltip": {
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"buckets": null,
|
||||
"mode": "time",
|
||||
"name": null,
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"decimals": 0,
|
||||
"format": "short",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": "0",
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": "0",
|
||||
"show": true
|
||||
}
|
||||
],
|
||||
"yaxis": {
|
||||
"align": false,
|
||||
"alignLevel": null
|
||||
}
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
|
@ -1972,7 +1980,7 @@
|
|||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 68
|
||||
"y": 35
|
||||
},
|
||||
"hiddenSeries": false,
|
||||
"id": 58,
|
||||
|
@ -2050,6 +2058,100 @@
|
|||
"alignLevel": null
|
||||
}
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": "$ds",
|
||||
"description": "The number of rows merged per second by storage nodes.",
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"custom": {},
|
||||
"links": []
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"fill": 1,
|
||||
"fillGradient": 0,
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 35
|
||||
},
|
||||
"hiddenSeries": false,
|
||||
"id": 64,
|
||||
"legend": {
|
||||
"avg": false,
|
||||
"current": false,
|
||||
"max": false,
|
||||
"min": false,
|
||||
"show": true,
|
||||
"total": false,
|
||||
"values": false
|
||||
},
|
||||
"lines": true,
|
||||
"linewidth": 1,
|
||||
"nullPointMode": "null",
|
||||
"percentage": false,
|
||||
"pluginVersion": "7.1.1",
|
||||
"pointradius": 2,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
"seriesOverrides": [],
|
||||
"spaceLength": 10,
|
||||
"stack": false,
|
||||
"steppedLine": false,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(vm_rows_merged_total{job=\"$job\", instance=\"$instance\"}[5m])) by(type)",
|
||||
"legendFormat": "{{type}}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"thresholds": [],
|
||||
"timeFrom": null,
|
||||
"timeRegions": [],
|
||||
"timeShift": null,
|
||||
"title": "Merge speed ($instance)",
|
||||
"tooltip": {
|
||||
"shared": true,
|
||||
"sort": 0,
|
||||
"value_type": "individual"
|
||||
},
|
||||
"type": "graph",
|
||||
"xaxis": {
|
||||
"buckets": null,
|
||||
"mode": "time",
|
||||
"name": null,
|
||||
"show": true,
|
||||
"values": []
|
||||
},
|
||||
"yaxes": [
|
||||
{
|
||||
"decimals": 0,
|
||||
"format": "short",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": "0",
|
||||
"show": true
|
||||
},
|
||||
{
|
||||
"format": "short",
|
||||
"label": null,
|
||||
"logBase": 1,
|
||||
"max": null,
|
||||
"min": "0",
|
||||
"show": true
|
||||
}
|
||||
],
|
||||
"yaxis": {
|
||||
"align": false,
|
||||
"alignLevel": null
|
||||
}
|
||||
},
|
||||
{
|
||||
"aliasColors": {},
|
||||
"bars": false,
|
||||
|
@ -2070,7 +2172,7 @@
|
|||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 68
|
||||
"y": 43
|
||||
},
|
||||
"hiddenSeries": false,
|
||||
"id": 67,
|
||||
|
@ -3293,4 +3395,4 @@
|
|||
"title": "VictoriaMetrics",
|
||||
"uid": "wNf0q_kZk",
|
||||
"version": 1
|
||||
}
|
||||
}
|
|
@ -21,7 +21,10 @@ services:
|
|||
image: victoriametrics/victoria-metrics
|
||||
ports:
|
||||
- 8428:8428
|
||||
- 8089:8089
|
||||
- 8089:8089/udp
|
||||
- 2003:2003
|
||||
- 2003:2003/udp
|
||||
- 4242:4242
|
||||
volumes:
|
||||
- vmdata:/storage
|
||||
|
@ -30,6 +33,7 @@ services:
|
|||
- '--graphiteListenAddr=:2003'
|
||||
- '--opentsdbListenAddr=:4242'
|
||||
- '--httpListenAddr=:8428'
|
||||
- '--influxListenAddr=:8089'
|
||||
networks:
|
||||
- vm_net
|
||||
restart: always
|
||||
|
|
|
@ -113,6 +113,7 @@ This functionality can be tried at [an editable Grafana dashboard](http://play-g
|
|||
- `bottomk_max(k, q)` - returns bottom K time series with the min maximums on the given time range
|
||||
- `bottomk_avg(k, q)` - returns bottom K time series with the min averages on the given time range
|
||||
- `bottomk_median(k, q)` - returns bottom K time series with the min medians on the given time range
|
||||
All the `topk_*` and `bottomk_*` functions accept optional third argument - label name for the sum of the remaining time series outside top K or bottom K time series. For example, `topk_max(3, process_resident_memory_bytes, "remaining_sum")` would return up to 3 time series with the maximum value for `process_resident_memory_bytes` plus fourth time series with the sum of the remaining time series if any. The fourth time series will contain `remaining_sum="remaining_sum"` additional label.
|
||||
- `share_le_over_time(m[d], le)` - returns share (in the range 0..1) of values in `m` over `d`, which are smaller or equal to `le`. Useful for calculating SLI and SLO.
|
||||
Example: `share_le_over_time(memory_usage_bytes[24h], 100*1024*1024)` returns the share of time series values for the last 24 hours when memory usage was below or equal to 100MB.
|
||||
- `share_gt_over_time(m[d], gt)` - returns share (in the range 0..1) of values in `m` over `d`, which are bigger than `gt`. Useful for calculating SLI and SLO.
|
||||
|
|
|
@ -8,7 +8,7 @@
|
|||
and their default values. Default flag values should fit the majoirty of cases. The minimum required flags to configure are:
|
||||
|
||||
* `-storageDataPath` - path to directory where VictoriaMetrics stores all the data.
|
||||
* `-retentionPeriod` - data retention in months.
|
||||
* `-retentionPeriod` - data retention.
|
||||
|
||||
For instance:
|
||||
|
||||
|
|
|
@ -164,7 +164,7 @@ or [docker image](https://hub.docker.com/r/victoriametrics/victoria-metrics/) wi
|
|||
The following command-line flags are used the most:
|
||||
|
||||
* `-storageDataPath` - path to data directory. VictoriaMetrics stores all the data in this directory. Default path is `victoria-metrics-data` in the current working directory.
|
||||
* `-retentionPeriod` - retention period in months for stored data. Older data is automatically deleted. Default period is 1 month.
|
||||
* `-retentionPeriod` - retention for stored data. Older data is automatically deleted. Default retention is 1 month. See [these docs](#retention) for more details.
|
||||
|
||||
Other flags have good enough default values, so set them only if you really need this. Pass `-help` to see all the available flags with description and default values.
|
||||
|
||||
|
@ -495,6 +495,7 @@ VictoriaMetrics supports the following handlers from [Prometheus querying API](h
|
|||
* [/api/v1/labels](https://prometheus.io/docs/prometheus/latest/querying/api/#getting-label-names)
|
||||
* [/api/v1/label/.../values](https://prometheus.io/docs/prometheus/latest/querying/api/#querying-label-values)
|
||||
* [/api/v1/status/tsdb](https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats)
|
||||
* [/api/v1/targets](https://prometheus.io/docs/prometheus/latest/querying/api/#targets) - see [these docs](#how-to-scrape-prometheus-exporters-such-as-node-exporter) for more details.
|
||||
|
||||
These handlers can be queried from Prometheus-compatible clients such as Grafana or curl.
|
||||
|
||||
|
@ -1048,6 +1049,7 @@ The de-duplication reduces disk space usage if multiple identically configured P
|
|||
write data to the same VictoriaMetrics instance. Note that these Prometheus instances must have identical
|
||||
`external_labels` section in their configs, so they write data to the same time series.
|
||||
|
||||
|
||||
### Retention
|
||||
|
||||
Retention is configured with `-retentionPeriod` command-line flag. For instance, `-retentionPeriod=3` means
|
||||
|
@ -1059,6 +1061,10 @@ For example if `-retentionPeriod` is set to 1, data for January is deleted on Ma
|
|||
It is safe to extend `-retentionPeriod` on existing data. If `-retentionPeriod` is set to lower
|
||||
value than before then data outside the configured period will be eventually deleted.
|
||||
|
||||
VictoriaMetrics supports retention smaller than 1 month. For example, `-retentionPeriod=5d` would set data retention for 5 days.
|
||||
Older data is eventually deleted during [background merge](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282).
|
||||
|
||||
|
||||
### Multiple retentions
|
||||
|
||||
Just start multiple VictoriaMetrics instances with distinct values for the following flags:
|
||||
|
|
|
@ -211,9 +211,13 @@ either via `vmagent` itself or via Prometheus, so the exported metrics could be
|
|||
Use official [Grafana dashboard](https://grafana.com/grafana/dashboards/12683) for `vmagent` state overview.
|
||||
If you have suggestions, improvements or found a bug - feel free to open an issue on github or add review to the dashboard.
|
||||
|
||||
`vmagent` also exports target statuses at `http://vmagent-host:8429/targets` page in plaintext format.
|
||||
`/targets` handler accepts optional `show_original_labels=1` query arg, which shows the original labels per each target
|
||||
before applying relabeling. This information may be useful for debugging target relabeling.
|
||||
`vmagent` also exports target statuses at the following handlers:
|
||||
|
||||
* `http://vmagent-host:8429/targets`. This handler returns human-readable plaintext status for every active target.
|
||||
This page is convenient to query from command line with `wget`, `curl` or similar tools.
|
||||
It accepts optional `show_original_labels=1` query arg, which shows the original labels per each target before applying relabeling.
|
||||
This information may be useful for debugging target relabeling.
|
||||
* `http://vmagent-host:8429/api/v1/targets`. This handler returns data compatible with [the corresponding page from Prometheus API](https://prometheus.io/docs/prometheus/latest/querying/api/#targets).
|
||||
|
||||
|
||||
### Troubleshooting
|
||||
|
@ -224,7 +228,26 @@ before applying relabeling. This information may be useful for debugging target
|
|||
since `vmagent` establishes at least a single TCP connection per each target.
|
||||
|
||||
* When `vmagent` scrapes many unreliable targets, it can flood error log with scrape errors. These errors can be suppressed
|
||||
by passing `-promscrape.suppressScrapeErrors` command-line flag to `vmagent`. The most recent scrape error per each target can be observed at `http://vmagent-host:8429/targets`.
|
||||
by passing `-promscrape.suppressScrapeErrors` command-line flag to `vmagent`. The most recent scrape error per each target can be observed at `http://vmagent-host:8429/targets`
|
||||
and `http://vmagent-host:8429/api/v1/targets`.
|
||||
|
||||
* If `vmagent` scrapes targets with millions of metrics per each target (for instance, when scraping [federation endpoints](https://prometheus.io/docs/prometheus/latest/federation/)),
|
||||
then it is recommended enabling `stream parsing mode` in order to reduce memory usage during scraping. This mode may be enabled either globally for all the scrape targets
|
||||
by passing `-promscrape.streamParse` command-line flag or on a per-scrape target basis with `stream_parse: true` option. For example:
|
||||
|
||||
```yml
|
||||
scrape_configs:
|
||||
- job_name: 'big-federate'
|
||||
stream_parse: true
|
||||
static_configs:
|
||||
- targets:
|
||||
- big-prometeus1
|
||||
- big-prometeus2
|
||||
honor_labels: true
|
||||
metrics_path: /federate
|
||||
params:
|
||||
'match[]': ['{__name__!=""}']
|
||||
```
|
||||
|
||||
* It is recommended to increase `-remoteWrite.queues` if `vmagent_remotewrite_pending_data_bytes` metric exported at `http://vmagent-host:8429/metrics` page constantly grows.
|
||||
|
||||
|
|
|
@ -3,7 +3,7 @@
|
|||
`vmrestore` restores data from backups created by [vmbackup](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmbackup/README.md).
|
||||
VictoriaMetrics `v1.29.0` and newer versions must be used for working with the restored data.
|
||||
|
||||
Restore process can be interrupted at any time. It is automatically resumed from the inerruption point
|
||||
Restore process can be interrupted at any time. It is automatically resumed from the interruption point
|
||||
when restarting `vmrestore` with the same args.
|
||||
|
||||
|
||||
|
|
69
lib/flagutil/duration.go
Normal file
69
lib/flagutil/duration.go
Normal file
|
@ -0,0 +1,69 @@
|
|||
package flagutil
|
||||
|
||||
import (
|
||||
"flag"
|
||||
"fmt"
|
||||
"strconv"
|
||||
"strings"
|
||||
|
||||
"github.com/VictoriaMetrics/metricsql"
|
||||
)
|
||||
|
||||
// NewDuration returns new `duration` flag with the given name, defaultValue and description.
|
||||
//
|
||||
// DefaultValue is in months.
|
||||
func NewDuration(name string, defaultValue float64, description string) *Duration {
|
||||
description += "\nThe following optional suffixes are supported: h (hour), d (day), w (week), y (year). If suffix isn't set, then the duration is counted in months"
|
||||
d := Duration{
|
||||
Msecs: int64(defaultValue * msecsPerMonth),
|
||||
valueString: fmt.Sprintf("%g", defaultValue),
|
||||
}
|
||||
flag.Var(&d, name, description)
|
||||
return &d
|
||||
}
|
||||
|
||||
// Duration is a flag for holding duration.
|
||||
type Duration struct {
|
||||
// Msecs contains parsed duration in milliseconds.
|
||||
Msecs int64
|
||||
|
||||
valueString string
|
||||
}
|
||||
|
||||
// String implements flag.Value interface
|
||||
func (d *Duration) String() string {
|
||||
return d.valueString
|
||||
}
|
||||
|
||||
// Set implements flag.Value interface
|
||||
func (d *Duration) Set(value string) error {
|
||||
// An attempt to parse value in months.
|
||||
months, err := strconv.ParseFloat(value, 64)
|
||||
if err == nil {
|
||||
if months > maxMonths {
|
||||
return fmt.Errorf("duration months must be smaller than %d; got %g", maxMonths, months)
|
||||
}
|
||||
if months < 0 {
|
||||
return fmt.Errorf("duration months cannot be negative; got %g", months)
|
||||
}
|
||||
d.Msecs = int64(months * msecsPerMonth)
|
||||
d.valueString = value
|
||||
return nil
|
||||
}
|
||||
// Parse duration.
|
||||
value = strings.ToLower(value)
|
||||
if strings.HasSuffix(value, "m") {
|
||||
return fmt.Errorf("duration in months must be set without `m` suffix due to ambiguity with duration in minutes; got %s", value)
|
||||
}
|
||||
msecs, err := metricsql.PositiveDurationValue(value, 0)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
d.Msecs = msecs
|
||||
d.valueString = value
|
||||
return nil
|
||||
}
|
||||
|
||||
const maxMonths = 12 * 100
|
||||
|
||||
const msecsPerMonth = 31 * 24 * 3600 * 1000
|
57
lib/flagutil/duration_test.go
Normal file
57
lib/flagutil/duration_test.go
Normal file
|
@ -0,0 +1,57 @@
|
|||
package flagutil
|
||||
|
||||
import (
|
||||
"strings"
|
||||
"testing"
|
||||
)
|
||||
|
||||
func TestDurationSetFailure(t *testing.T) {
|
||||
f := func(value string) {
|
||||
t.Helper()
|
||||
var d Duration
|
||||
if err := d.Set(value); err == nil {
|
||||
t.Fatalf("expecting non-nil error in d.Set(%q)", value)
|
||||
}
|
||||
}
|
||||
f("")
|
||||
f("foobar")
|
||||
f("5foobar")
|
||||
f("ah")
|
||||
f("134xd")
|
||||
f("2.43sdfw")
|
||||
|
||||
// Too big value in months
|
||||
f("12345")
|
||||
|
||||
// Negative duration
|
||||
f("-1")
|
||||
f("-34h")
|
||||
|
||||
// Duration in minutes is confused with duration in months
|
||||
f("1m")
|
||||
}
|
||||
|
||||
func TestDurationSetSuccess(t *testing.T) {
|
||||
f := func(value string, expectedMsecs int64) {
|
||||
t.Helper()
|
||||
var d Duration
|
||||
if err := d.Set(value); err != nil {
|
||||
t.Fatalf("unexpected error in d.Set(%q): %s", value, err)
|
||||
}
|
||||
if d.Msecs != expectedMsecs {
|
||||
t.Fatalf("unexpected result; got %d; want %d", d.Msecs, expectedMsecs)
|
||||
}
|
||||
valueString := d.String()
|
||||
valueExpected := strings.ToLower(value)
|
||||
if valueString != valueExpected {
|
||||
t.Fatalf("unexpected valueString; got %q; want %q", valueString, valueExpected)
|
||||
}
|
||||
}
|
||||
f("0", 0)
|
||||
f("1", msecsPerMonth)
|
||||
f("123.456", 123.456*msecsPerMonth)
|
||||
f("1h", 3600*1000)
|
||||
f("1.5d", 1.5*24*3600*1000)
|
||||
f("2.3W", 2.3*7*24*3600*1000)
|
||||
f("0.25y", 0.25*365*24*3600*1000)
|
||||
}
|
|
@ -306,6 +306,9 @@ func maybeGzipResponseWriter(w http.ResponseWriter, r *http.Request) http.Respon
|
|||
if *disableResponseCompression {
|
||||
return w
|
||||
}
|
||||
if r.Header.Get("Connection") == "Upgrade" {
|
||||
return w
|
||||
}
|
||||
ae := r.Header.Get("Accept-Encoding")
|
||||
if ae == "" {
|
||||
return w
|
||||
|
|
|
@ -35,12 +35,12 @@ func initOnce() {
|
|||
mem := sysTotalMemory()
|
||||
if allowedBytes.N <= 0 {
|
||||
if *allowedPercent < 1 || *allowedPercent > 200 {
|
||||
logger.Panicf("FATAL: -memory.allowedPercent must be in the range [1...200]; got %f", *allowedPercent)
|
||||
logger.Panicf("FATAL: -memory.allowedPercent must be in the range [1...200]; got %g", *allowedPercent)
|
||||
}
|
||||
percent := *allowedPercent / 100
|
||||
allowedMemory = int(float64(mem) * percent)
|
||||
remainingMemory = mem - allowedMemory
|
||||
logger.Infof("limiting caches to %d bytes, leaving %d bytes to the OS according to -memory.allowedPercent=%f", allowedMemory, remainingMemory, *allowedPercent)
|
||||
logger.Infof("limiting caches to %d bytes, leaving %d bytes to the OS according to -memory.allowedPercent=%g", allowedMemory, remainingMemory, *allowedPercent)
|
||||
} else {
|
||||
allowedMemory = allowedBytes.N
|
||||
remainingMemory = mem - allowedMemory
|
||||
|
|
|
@ -1,13 +1,18 @@
|
|||
package promscrape
|
||||
|
||||
import (
|
||||
"context"
|
||||
"crypto/tls"
|
||||
"flag"
|
||||
"fmt"
|
||||
"io"
|
||||
"io/ioutil"
|
||||
"net/http"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
|
||||
"github.com/VictoriaMetrics/fasthttp"
|
||||
"github.com/VictoriaMetrics/metrics"
|
||||
)
|
||||
|
@ -22,11 +27,19 @@ var (
|
|||
"This may be useful when targets has no support for HTTP keep-alive connection. "+
|
||||
"It is possible to set `disable_keepalive: true` individually per each 'scrape_config` section in '-promscrape.config' for fine grained control. "+
|
||||
"Note that disabling HTTP keep-alive may increase load on both vmagent and scrape targets")
|
||||
streamParse = flag.Bool("promscrape.streamParse", false, "Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful "+
|
||||
"for reducing memory usage when millions of metrics are exposed per each scrape target. "+
|
||||
"It is posible to set `stream_parse: true` individually per each `scrape_config` section in `-promscrape.config` for fine grained control")
|
||||
)
|
||||
|
||||
type client struct {
|
||||
// hc is the default client optimized for common case of scraping targets with moderate number of metrics.
|
||||
hc *fasthttp.HostClient
|
||||
|
||||
// sc (aka `stream client`) is used instead of hc if ScrapeWork.ParseStream is set.
|
||||
// It may be useful for scraping targets with millions of metrics per target.
|
||||
sc *http.Client
|
||||
|
||||
scrapeURL string
|
||||
host string
|
||||
requestURI string
|
||||
|
@ -64,8 +77,23 @@ func newClient(sw *ScrapeWork) *client {
|
|||
MaxResponseBodySize: maxScrapeSize.N,
|
||||
MaxIdempotentRequestAttempts: 1,
|
||||
}
|
||||
var sc *http.Client
|
||||
if *streamParse || sw.StreamParse {
|
||||
sc = &http.Client{
|
||||
Transport: &http.Transport{
|
||||
TLSClientConfig: tlsCfg,
|
||||
TLSHandshakeTimeout: 10 * time.Second,
|
||||
IdleConnTimeout: 2 * sw.ScrapeInterval,
|
||||
DisableCompression: *disableCompression || sw.DisableCompression,
|
||||
DisableKeepAlives: *disableKeepAlive || sw.DisableKeepAlive,
|
||||
DialContext: statStdDial,
|
||||
},
|
||||
Timeout: sw.ScrapeTimeout,
|
||||
}
|
||||
}
|
||||
return &client{
|
||||
hc: hc,
|
||||
sc: sc,
|
||||
|
||||
scrapeURL: sw.ScrapeURL,
|
||||
host: host,
|
||||
|
@ -76,6 +104,43 @@ func newClient(sw *ScrapeWork) *client {
|
|||
}
|
||||
}
|
||||
|
||||
func (c *client) GetStreamReader() (*streamReader, error) {
|
||||
deadline := time.Now().Add(c.hc.ReadTimeout)
|
||||
ctx, cancel := context.WithDeadline(context.Background(), deadline)
|
||||
req, err := http.NewRequestWithContext(ctx, "GET", c.scrapeURL, nil)
|
||||
if err != nil {
|
||||
cancel()
|
||||
return nil, fmt.Errorf("cannot create request for %q: %w", c.scrapeURL, err)
|
||||
}
|
||||
// The following `Accept` header has been copied from Prometheus sources.
|
||||
// See https://github.com/prometheus/prometheus/blob/f9d21f10ecd2a343a381044f131ea4e46381ce09/scrape/scrape.go#L532 .
|
||||
// This is needed as a workaround for scraping stupid Java-based servers such as Spring Boot.
|
||||
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/608 for details.
|
||||
// Do not bloat the `Accept` header with OpenMetrics shit, since it looks like dead standard now.
|
||||
req.Header.Set("Accept", "text/plain;version=0.0.4;q=1,*/*;q=0.1")
|
||||
if c.authHeader != "" {
|
||||
req.Header.Set("Authorization", c.authHeader)
|
||||
}
|
||||
resp, err := c.sc.Do(req)
|
||||
if err != nil {
|
||||
cancel()
|
||||
return nil, fmt.Errorf("cannot scrape %q: %w", c.scrapeURL, err)
|
||||
}
|
||||
if resp.StatusCode != http.StatusOK {
|
||||
metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_scrapes_total{status_code="%d"}`, resp.StatusCode)).Inc()
|
||||
respBody, _ := ioutil.ReadAll(resp.Body)
|
||||
_ = resp.Body.Close()
|
||||
cancel()
|
||||
return nil, fmt.Errorf("unexpected status code returned when scraping %q: %d; expecting %d; response body: %q",
|
||||
c.scrapeURL, resp.StatusCode, http.StatusOK, respBody)
|
||||
}
|
||||
scrapesOK.Inc()
|
||||
return &streamReader{
|
||||
r: resp.Body,
|
||||
cancel: cancel,
|
||||
}, nil
|
||||
}
|
||||
|
||||
func (c *client) ReadData(dst []byte) ([]byte, error) {
|
||||
deadline := time.Now().Add(c.hc.ReadTimeout)
|
||||
req := fasthttp.AcquireRequest()
|
||||
|
@ -87,7 +152,7 @@ func (c *client) ReadData(dst []byte) ([]byte, error) {
|
|||
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/608 for details.
|
||||
// Do not bloat the `Accept` header with OpenMetrics shit, since it looks like dead standard now.
|
||||
req.Header.Set("Accept", "text/plain;version=0.0.4;q=1,*/*;q=0.1")
|
||||
if !*disableCompression || c.disableCompression {
|
||||
if !*disableCompression && !c.disableCompression {
|
||||
req.Header.Set("Accept-Encoding", "gzip")
|
||||
}
|
||||
if *disableKeepAlive || c.disableKeepAlive {
|
||||
|
@ -131,7 +196,6 @@ func (c *client) ReadData(dst []byte) ([]byte, error) {
|
|||
}
|
||||
return dst, fmt.Errorf("error when scraping %q: %w", c.scrapeURL, err)
|
||||
}
|
||||
dstLen := len(dst)
|
||||
if ce := resp.Header.Peek("Content-Encoding"); string(ce) == "gzip" {
|
||||
var err error
|
||||
var src []byte
|
||||
|
@ -154,7 +218,7 @@ func (c *client) ReadData(dst []byte) ([]byte, error) {
|
|||
if statusCode != fasthttp.StatusOK {
|
||||
metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_scrapes_total{status_code="%d"}`, statusCode)).Inc()
|
||||
return dst, fmt.Errorf("unexpected status code returned when scraping %q: %d; expecting %d; response body: %q",
|
||||
c.scrapeURL, statusCode, fasthttp.StatusOK, dst[dstLen:])
|
||||
c.scrapeURL, statusCode, fasthttp.StatusOK, dst)
|
||||
}
|
||||
scrapesOK.Inc()
|
||||
fasthttp.ReleaseResponse(resp)
|
||||
|
@ -185,3 +249,22 @@ func doRequestWithPossibleRetry(hc *fasthttp.HostClient, req *fasthttp.Request,
|
|||
}
|
||||
}
|
||||
}
|
||||
|
||||
type streamReader struct {
|
||||
r io.ReadCloser
|
||||
cancel context.CancelFunc
|
||||
bytesRead int64
|
||||
}
|
||||
|
||||
func (sr *streamReader) Read(p []byte) (int, error) {
|
||||
n, err := sr.r.Read(p)
|
||||
sr.bytesRead += int64(n)
|
||||
return n, err
|
||||
}
|
||||
|
||||
func (sr *streamReader) MustClose() {
|
||||
sr.cancel()
|
||||
if err := sr.r.Close(); err != nil {
|
||||
logger.Errorf("cannot close reader: %s", err)
|
||||
}
|
||||
}
|
||||
|
|
|
@ -84,6 +84,7 @@ type ScrapeConfig struct {
|
|||
// These options are supported only by lib/promscrape.
|
||||
DisableCompression bool `yaml:"disable_compression"`
|
||||
DisableKeepAlive bool `yaml:"disable_keepalive"`
|
||||
StreamParse bool `yaml:"stream_parse"`
|
||||
|
||||
// This is set in loadConfig
|
||||
swc *scrapeWorkConfig
|
||||
|
@ -473,6 +474,7 @@ func getScrapeWorkConfig(sc *ScrapeConfig, baseDir string, globalCfg *GlobalConf
|
|||
sampleLimit: sc.SampleLimit,
|
||||
disableCompression: sc.DisableCompression,
|
||||
disableKeepAlive: sc.DisableKeepAlive,
|
||||
streamParse: sc.StreamParse,
|
||||
}
|
||||
return swc, nil
|
||||
}
|
||||
|
@ -493,6 +495,7 @@ type scrapeWorkConfig struct {
|
|||
sampleLimit int
|
||||
disableCompression bool
|
||||
disableKeepAlive bool
|
||||
streamParse bool
|
||||
}
|
||||
|
||||
func appendKubernetesScrapeWork(dst []ScrapeWork, sdc *kubernetes.SDConfig, baseDir string, swc *scrapeWorkConfig) ([]ScrapeWork, bool) {
|
||||
|
@ -642,6 +645,7 @@ func appendScrapeWork(dst []ScrapeWork, swc *scrapeWorkConfig, target string, ex
|
|||
labels = promrelabel.RemoveMetaLabels(labels[:0], labels)
|
||||
if len(labels) == 0 {
|
||||
// Drop target without labels.
|
||||
droppedTargetsMap.Register(originalLabels)
|
||||
return dst, nil
|
||||
}
|
||||
// See https://www.robustperception.io/life-of-a-label
|
||||
|
@ -652,10 +656,12 @@ func appendScrapeWork(dst []ScrapeWork, swc *scrapeWorkConfig, target string, ex
|
|||
addressRelabeled := promrelabel.GetLabelValueByName(labels, "__address__")
|
||||
if len(addressRelabeled) == 0 {
|
||||
// Drop target without scrape address.
|
||||
droppedTargetsMap.Register(originalLabels)
|
||||
return dst, nil
|
||||
}
|
||||
if strings.Contains(addressRelabeled, "/") {
|
||||
// Drop target with '/'
|
||||
droppedTargetsMap.Register(originalLabels)
|
||||
return dst, nil
|
||||
}
|
||||
addressRelabeled = addMissingPort(schemeRelabeled, addressRelabeled)
|
||||
|
@ -663,6 +669,9 @@ func appendScrapeWork(dst []ScrapeWork, swc *scrapeWorkConfig, target string, ex
|
|||
if metricsPathRelabeled == "" {
|
||||
metricsPathRelabeled = "/metrics"
|
||||
}
|
||||
if !strings.HasPrefix(metricsPathRelabeled, "/") {
|
||||
metricsPathRelabeled = "/" + metricsPathRelabeled
|
||||
}
|
||||
paramsRelabeled := getParamsFromLabels(labels, swc.params)
|
||||
optionalQuestion := "?"
|
||||
if len(paramsRelabeled) == 0 || strings.Contains(metricsPathRelabeled, "?") {
|
||||
|
@ -696,6 +705,7 @@ func appendScrapeWork(dst []ScrapeWork, swc *scrapeWorkConfig, target string, ex
|
|||
SampleLimit: swc.sampleLimit,
|
||||
DisableCompression: swc.disableCompression,
|
||||
DisableKeepAlive: swc.disableKeepAlive,
|
||||
StreamParse: swc.streamParse,
|
||||
|
||||
jobNameOriginal: swc.jobName,
|
||||
})
|
||||
|
|
|
@ -1276,6 +1276,7 @@ scrape_configs:
|
|||
sample_limit: 100
|
||||
disable_keepalive: true
|
||||
disable_compression: true
|
||||
stream_parse: true
|
||||
static_configs:
|
||||
- targets:
|
||||
- 192.168.1.2 # SNMP device.
|
||||
|
@ -1328,9 +1329,49 @@ scrape_configs:
|
|||
SampleLimit: 100,
|
||||
DisableKeepAlive: true,
|
||||
DisableCompression: true,
|
||||
StreamParse: true,
|
||||
jobNameOriginal: "snmp",
|
||||
},
|
||||
})
|
||||
f(`
|
||||
scrape_configs:
|
||||
- job_name: path wo slash
|
||||
static_configs:
|
||||
- targets: ["foo.bar:1234"]
|
||||
relabel_configs:
|
||||
- replacement: metricspath
|
||||
target_label: __metrics_path__
|
||||
`, []ScrapeWork{
|
||||
{
|
||||
ScrapeURL: "http://foo.bar:1234/metricspath",
|
||||
ScrapeInterval: defaultScrapeInterval,
|
||||
ScrapeTimeout: defaultScrapeTimeout,
|
||||
Labels: []prompbmarshal.Label{
|
||||
{
|
||||
Name: "__address__",
|
||||
Value: "foo.bar:1234",
|
||||
},
|
||||
{
|
||||
Name: "__metrics_path__",
|
||||
Value: "metricspath",
|
||||
},
|
||||
{
|
||||
Name: "__scheme__",
|
||||
Value: "http",
|
||||
},
|
||||
{
|
||||
Name: "instance",
|
||||
Value: "foo.bar:1234",
|
||||
},
|
||||
{
|
||||
Name: "job",
|
||||
Value: "path wo slash",
|
||||
},
|
||||
},
|
||||
jobNameOriginal: "path wo slash",
|
||||
AuthConfig: &promauth.Config{},
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
var defaultRegexForRelabelConfig = regexp.MustCompile("^(.*)$")
|
||||
|
|
|
@ -284,6 +284,7 @@ func (sg *scraperGroup) update(sws []ScrapeWork) {
|
|||
"original labels for target1: %s; original labels for target2: %s",
|
||||
sw.ScrapeURL, sw.LabelsString(), promLabelsString(originalLabels), promLabelsString(sw.OriginalLabels))
|
||||
}
|
||||
droppedTargetsMap.Register(sw.OriginalLabels)
|
||||
continue
|
||||
}
|
||||
swsMap[key] = sw.OriginalLabels
|
||||
|
@ -333,6 +334,7 @@ func newScraper(sw *ScrapeWork, group string, pushData func(wr *prompbmarshal.Wr
|
|||
sc.sw.Config = *sw
|
||||
sc.sw.ScrapeGroup = group
|
||||
sc.sw.ReadData = c.ReadData
|
||||
sc.sw.GetStreamReader = c.GetStreamReader
|
||||
sc.sw.PushData = pushData
|
||||
return sc
|
||||
}
|
||||
|
|
|
@ -82,19 +82,22 @@ type ScrapeWork struct {
|
|||
// Whether to disable HTTP keep-alive when querying ScrapeURL.
|
||||
DisableKeepAlive bool
|
||||
|
||||
// Whether to parse target responses in a streaming manner.
|
||||
StreamParse bool
|
||||
|
||||
// The original 'job_name'
|
||||
jobNameOriginal string
|
||||
}
|
||||
|
||||
// key returns unique identifier for the given sw.
|
||||
//
|
||||
// it can be used for comparing for equality two ScrapeWork objects.
|
||||
// it can be used for comparing for equality for two ScrapeWork objects.
|
||||
func (sw *ScrapeWork) key() string {
|
||||
// Do not take into account OriginalLabels.
|
||||
key := fmt.Sprintf("ScrapeURL=%s, ScrapeInterval=%s, ScrapeTimeout=%s, HonorLabels=%v, HonorTimestamps=%v, Labels=%s, "+
|
||||
"AuthConfig=%s, MetricRelabelConfigs=%s, SampleLimit=%d, DisableCompression=%v, DisableKeepAlive=%v",
|
||||
"AuthConfig=%s, MetricRelabelConfigs=%s, SampleLimit=%d, DisableCompression=%v, DisableKeepAlive=%v, StreamParse=%v",
|
||||
sw.ScrapeURL, sw.ScrapeInterval, sw.ScrapeTimeout, sw.HonorLabels, sw.HonorTimestamps, sw.LabelsString(),
|
||||
sw.AuthConfig.String(), sw.metricRelabelConfigsString(), sw.SampleLimit, sw.DisableCompression, sw.DisableKeepAlive)
|
||||
sw.AuthConfig.String(), sw.metricRelabelConfigsString(), sw.SampleLimit, sw.DisableCompression, sw.DisableKeepAlive, sw.StreamParse)
|
||||
return key
|
||||
}
|
||||
|
||||
|
@ -132,6 +135,9 @@ type scrapeWork struct {
|
|||
// ReadData is called for reading the data.
|
||||
ReadData func(dst []byte) ([]byte, error)
|
||||
|
||||
// GetStreamReader is called if Config.StreamParse is set.
|
||||
GetStreamReader func() (*streamReader, error)
|
||||
|
||||
// PushData is called for pushing collected data.
|
||||
PushData func(wr *prompbmarshal.WriteRequest)
|
||||
|
||||
|
@ -221,6 +227,15 @@ var (
|
|||
)
|
||||
|
||||
func (sw *scrapeWork) scrapeInternal(scrapeTimestamp, realTimestamp int64) error {
|
||||
if *streamParse || sw.Config.StreamParse {
|
||||
// Read data from scrape targets in streaming manner.
|
||||
// This case is optimized for targets exposing millions and more of metrics per target.
|
||||
return sw.scrapeStream(scrapeTimestamp, realTimestamp)
|
||||
}
|
||||
|
||||
// Common case: read all the data from scrape target to memory (body) and then process it.
|
||||
// This case should work more optimally for than stream parse code above for common case when scrape target exposes
|
||||
// up to a few thouthand metrics.
|
||||
body := leveledbytebufferpool.Get(sw.prevBodyLen)
|
||||
var err error
|
||||
body.B, err = sw.ReadData(body.B[:0])
|
||||
|
@ -281,6 +296,66 @@ func (sw *scrapeWork) scrapeInternal(scrapeTimestamp, realTimestamp int64) error
|
|||
return err
|
||||
}
|
||||
|
||||
func (sw *scrapeWork) scrapeStream(scrapeTimestamp, realTimestamp int64) error {
|
||||
sr, err := sw.GetStreamReader()
|
||||
if err != nil {
|
||||
return fmt.Errorf("cannot read data: %s", err)
|
||||
}
|
||||
samplesScraped := 0
|
||||
samplesPostRelabeling := 0
|
||||
wc := writeRequestCtxPool.Get(sw.prevRowsLen)
|
||||
var mu sync.Mutex
|
||||
err = parser.ParseStream(sr, scrapeTimestamp, false, func(rows []parser.Row) error {
|
||||
mu.Lock()
|
||||
defer mu.Unlock()
|
||||
samplesScraped += len(rows)
|
||||
for i := range rows {
|
||||
sw.addRowToTimeseries(wc, &rows[i], scrapeTimestamp, true)
|
||||
if len(wc.labels) > 40000 {
|
||||
// Limit the maximum size of wc.writeRequest.
|
||||
// This should reduce memory usage when scraping targets with millions of metrics and/or labels.
|
||||
// For example, when scraping /federate handler from Prometheus - see https://prometheus.io/docs/prometheus/latest/federation/
|
||||
samplesPostRelabeling += len(wc.writeRequest.Timeseries)
|
||||
sw.updateSeriesAdded(wc)
|
||||
startTime := time.Now()
|
||||
sw.PushData(&wc.writeRequest)
|
||||
pushDataDuration.UpdateDuration(startTime)
|
||||
wc.resetNoRows()
|
||||
}
|
||||
}
|
||||
return nil
|
||||
})
|
||||
scrapedSamples.Update(float64(samplesScraped))
|
||||
endTimestamp := time.Now().UnixNano() / 1e6
|
||||
duration := float64(endTimestamp-realTimestamp) / 1e3
|
||||
scrapeDuration.Update(duration)
|
||||
scrapeResponseSize.Update(float64(sr.bytesRead))
|
||||
sr.MustClose()
|
||||
up := 1
|
||||
if err != nil {
|
||||
if samplesScraped == 0 {
|
||||
up = 0
|
||||
}
|
||||
scrapesFailed.Inc()
|
||||
}
|
||||
samplesPostRelabeling += len(wc.writeRequest.Timeseries)
|
||||
sw.updateSeriesAdded(wc)
|
||||
seriesAdded := sw.finalizeSeriesAdded(samplesPostRelabeling)
|
||||
sw.addAutoTimeseries(wc, "up", float64(up), scrapeTimestamp)
|
||||
sw.addAutoTimeseries(wc, "scrape_duration_seconds", duration, scrapeTimestamp)
|
||||
sw.addAutoTimeseries(wc, "scrape_samples_scraped", float64(samplesScraped), scrapeTimestamp)
|
||||
sw.addAutoTimeseries(wc, "scrape_samples_post_metric_relabeling", float64(samplesPostRelabeling), scrapeTimestamp)
|
||||
sw.addAutoTimeseries(wc, "scrape_series_added", float64(seriesAdded), scrapeTimestamp)
|
||||
startTime := time.Now()
|
||||
sw.PushData(&wc.writeRequest)
|
||||
pushDataDuration.UpdateDuration(startTime)
|
||||
sw.prevRowsLen = len(wc.rows.Rows)
|
||||
wc.reset()
|
||||
writeRequestCtxPool.Put(wc)
|
||||
tsmGlobal.Update(&sw.Config, sw.ScrapeGroup, up == 1, realTimestamp, int64(duration*1000), err)
|
||||
return nil
|
||||
}
|
||||
|
||||
// leveledWriteRequestCtxPool allows reducing memory usage when writeRequesCtx
|
||||
// structs contain mixed number of labels.
|
||||
//
|
||||
|
|
|
@ -1,14 +1,48 @@
|
|||
package promscrape
|
||||
|
||||
import (
|
||||
"context"
|
||||
"net"
|
||||
"sync"
|
||||
"sync/atomic"
|
||||
"time"
|
||||
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/netutil"
|
||||
"github.com/VictoriaMetrics/fasthttp"
|
||||
"github.com/VictoriaMetrics/metrics"
|
||||
)
|
||||
|
||||
func statStdDial(ctx context.Context, network, addr string) (net.Conn, error) {
|
||||
d := getStdDialer()
|
||||
conn, err := d.DialContext(ctx, network, addr)
|
||||
dialsTotal.Inc()
|
||||
if err != nil {
|
||||
dialErrors.Inc()
|
||||
return nil, err
|
||||
}
|
||||
conns.Inc()
|
||||
sc := &statConn{
|
||||
Conn: conn,
|
||||
}
|
||||
return sc, nil
|
||||
}
|
||||
|
||||
func getStdDialer() *net.Dialer {
|
||||
stdDialerOnce.Do(func() {
|
||||
stdDialer = &net.Dialer{
|
||||
Timeout: 30 * time.Second,
|
||||
KeepAlive: 30 * time.Second,
|
||||
DualStack: netutil.TCP6Enabled(),
|
||||
}
|
||||
})
|
||||
return stdDialer
|
||||
}
|
||||
|
||||
var (
|
||||
stdDialer *net.Dialer
|
||||
stdDialerOnce sync.Once
|
||||
)
|
||||
|
||||
func statDial(addr string) (conn net.Conn, err error) {
|
||||
if netutil.TCP6Enabled() {
|
||||
conn, err = fasthttp.DialDualStack(addr)
|
||||
|
|
|
@ -6,6 +6,10 @@ import (
|
|||
"sort"
|
||||
"sync"
|
||||
"time"
|
||||
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
|
||||
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
|
||||
)
|
||||
|
||||
var tsmGlobal = newTargetStatusMap()
|
||||
|
@ -15,6 +19,26 @@ func WriteHumanReadableTargetsStatus(w io.Writer, showOriginalLabels bool) {
|
|||
tsmGlobal.WriteHumanReadable(w, showOriginalLabels)
|
||||
}
|
||||
|
||||
// WriteAPIV1Targets writes /api/v1/targets to w according to https://prometheus.io/docs/prometheus/latest/querying/api/#targets
|
||||
func WriteAPIV1Targets(w io.Writer, state string) {
|
||||
if state == "" {
|
||||
state = "any"
|
||||
}
|
||||
fmt.Fprintf(w, `{"status":"success","data":{"activeTargets":`)
|
||||
if state == "active" || state == "any" {
|
||||
tsmGlobal.WriteActiveTargetsJSON(w)
|
||||
} else {
|
||||
fmt.Fprintf(w, `[]`)
|
||||
}
|
||||
fmt.Fprintf(w, `,"droppedTargets":`)
|
||||
if state == "dropped" || state == "any" {
|
||||
droppedTargetsMap.WriteDroppedTargetsJSON(w)
|
||||
} else {
|
||||
fmt.Fprintf(w, `[]`)
|
||||
}
|
||||
fmt.Fprintf(w, `}}`)
|
||||
}
|
||||
|
||||
type targetStatusMap struct {
|
||||
mu sync.Mutex
|
||||
m map[uint64]targetStatus
|
||||
|
@ -73,6 +97,66 @@ func (tsm *targetStatusMap) StatusByGroup(group string, up bool) int {
|
|||
return count
|
||||
}
|
||||
|
||||
// WriteActiveTargetsJSON writes `activeTargets` contents to w according to https://prometheus.io/docs/prometheus/latest/querying/api/#targets
|
||||
func (tsm *targetStatusMap) WriteActiveTargetsJSON(w io.Writer) {
|
||||
tsm.mu.Lock()
|
||||
type keyStatus struct {
|
||||
key string
|
||||
st targetStatus
|
||||
}
|
||||
kss := make([]keyStatus, 0, len(tsm.m))
|
||||
for _, st := range tsm.m {
|
||||
key := promLabelsString(st.sw.OriginalLabels)
|
||||
kss = append(kss, keyStatus{
|
||||
key: key,
|
||||
st: st,
|
||||
})
|
||||
}
|
||||
tsm.mu.Unlock()
|
||||
|
||||
sort.Slice(kss, func(i, j int) bool {
|
||||
return kss[i].key < kss[j].key
|
||||
})
|
||||
fmt.Fprintf(w, `[`)
|
||||
for i, ks := range kss {
|
||||
st := ks.st
|
||||
fmt.Fprintf(w, `{"discoveredLabels":`)
|
||||
writeLabelsJSON(w, st.sw.OriginalLabels)
|
||||
fmt.Fprintf(w, `,"labels":`)
|
||||
labelsFinalized := promrelabel.FinalizeLabels(nil, st.sw.Labels)
|
||||
writeLabelsJSON(w, labelsFinalized)
|
||||
fmt.Fprintf(w, `,"scrapePool":%q`, st.sw.Job())
|
||||
fmt.Fprintf(w, `,"scrapeUrl":%q`, st.sw.ScrapeURL)
|
||||
errMsg := ""
|
||||
if st.err != nil {
|
||||
errMsg = st.err.Error()
|
||||
}
|
||||
fmt.Fprintf(w, `,"lastError":%q`, errMsg)
|
||||
fmt.Fprintf(w, `,"lastScrape":%q`, time.Unix(st.scrapeTime/1000, (st.scrapeTime%1000)*1e6).Format(time.RFC3339Nano))
|
||||
fmt.Fprintf(w, `,"lastScrapeDuration":%g`, (time.Millisecond * time.Duration(st.scrapeDuration)).Seconds())
|
||||
state := "up"
|
||||
if !st.up {
|
||||
state = "down"
|
||||
}
|
||||
fmt.Fprintf(w, `,"health":%q}`, state)
|
||||
if i+1 < len(kss) {
|
||||
fmt.Fprintf(w, `,`)
|
||||
}
|
||||
}
|
||||
fmt.Fprintf(w, `]`)
|
||||
}
|
||||
|
||||
func writeLabelsJSON(w io.Writer, labels []prompbmarshal.Label) {
|
||||
fmt.Fprintf(w, `{`)
|
||||
for i, label := range labels {
|
||||
fmt.Fprintf(w, "%q:%q", label.Name, label.Value)
|
||||
if i+1 < len(labels) {
|
||||
fmt.Fprintf(w, `,`)
|
||||
}
|
||||
}
|
||||
fmt.Fprintf(w, `}`)
|
||||
}
|
||||
|
||||
func (tsm *targetStatusMap) WriteHumanReadable(w io.Writer, showOriginalLabels bool) {
|
||||
byJob := make(map[string][]targetStatus)
|
||||
tsm.mu.Lock()
|
||||
|
@ -143,3 +227,69 @@ type targetStatus struct {
|
|||
func (st *targetStatus) getDurationFromLastScrape() time.Duration {
|
||||
return time.Since(time.Unix(st.scrapeTime/1000, (st.scrapeTime%1000)*1e6))
|
||||
}
|
||||
|
||||
type droppedTargets struct {
|
||||
mu sync.Mutex
|
||||
m map[string]droppedTarget
|
||||
lastCleanupTime uint64
|
||||
}
|
||||
|
||||
type droppedTarget struct {
|
||||
originalLabels []prompbmarshal.Label
|
||||
deadline uint64
|
||||
}
|
||||
|
||||
func (dt *droppedTargets) Register(originalLabels []prompbmarshal.Label) {
|
||||
key := promLabelsString(originalLabels)
|
||||
currentTime := fasttime.UnixTimestamp()
|
||||
dt.mu.Lock()
|
||||
dt.m[key] = droppedTarget{
|
||||
originalLabels: originalLabels,
|
||||
deadline: currentTime + 10*60,
|
||||
}
|
||||
if currentTime-dt.lastCleanupTime > 60 {
|
||||
for k, v := range dt.m {
|
||||
if currentTime > v.deadline {
|
||||
delete(dt.m, k)
|
||||
}
|
||||
}
|
||||
dt.lastCleanupTime = currentTime
|
||||
}
|
||||
dt.mu.Unlock()
|
||||
}
|
||||
|
||||
// WriteDroppedTargetsJSON writes `droppedTargets` contents to w according to https://prometheus.io/docs/prometheus/latest/querying/api/#targets
|
||||
func (dt *droppedTargets) WriteDroppedTargetsJSON(w io.Writer) {
|
||||
dt.mu.Lock()
|
||||
type keyStatus struct {
|
||||
key string
|
||||
originalLabels []prompbmarshal.Label
|
||||
}
|
||||
kss := make([]keyStatus, 0, len(dt.m))
|
||||
for _, v := range dt.m {
|
||||
key := promLabelsString(v.originalLabels)
|
||||
kss = append(kss, keyStatus{
|
||||
key: key,
|
||||
originalLabels: v.originalLabels,
|
||||
})
|
||||
}
|
||||
dt.mu.Unlock()
|
||||
|
||||
sort.Slice(kss, func(i, j int) bool {
|
||||
return kss[i].key < kss[j].key
|
||||
})
|
||||
fmt.Fprintf(w, `[`)
|
||||
for i, ks := range kss {
|
||||
fmt.Fprintf(w, `{"discoveredLabels":`)
|
||||
writeLabelsJSON(w, ks.originalLabels)
|
||||
fmt.Fprintf(w, `}`)
|
||||
if i+1 < len(kss) {
|
||||
fmt.Fprintf(w, `,`)
|
||||
}
|
||||
}
|
||||
fmt.Fprintf(w, `]`)
|
||||
}
|
||||
|
||||
var droppedTargetsMap = &droppedTargets{
|
||||
m: make(map[string]droppedTarget),
|
||||
}
|
||||
|
|
|
@ -23,7 +23,8 @@ var (
|
|||
|
||||
// ParseStream parses csv from req and calls callback for the parsed rows.
|
||||
//
|
||||
// The callback can be called multiple times for streamed data from req.
|
||||
// The callback can be called concurrently multiple times for streamed data from req.
|
||||
// The callback can be called after ParseStream returns.
|
||||
//
|
||||
// callback shouldn't hold rows after returning.
|
||||
func ParseStream(req *http.Request, callback func(rows []Row) error) error {
|
||||
|
|
|
@ -23,7 +23,8 @@ var (
|
|||
|
||||
// ParseStream parses Graphite lines from r and calls callback for the parsed rows.
|
||||
//
|
||||
// The callback can be called multiple times for streamed data from r.
|
||||
// The callback can be called concurrently multiple times for streamed data from r.
|
||||
// The callback can be called after ParseStream returns.
|
||||
//
|
||||
// callback shouldn't hold rows after returning.
|
||||
func ParseStream(r io.Reader, callback func(rows []Row) error) error {
|
||||
|
|
|
@ -24,7 +24,8 @@ var (
|
|||
|
||||
// ParseStream parses r with the given args and calls callback for the parsed rows.
|
||||
//
|
||||
// The callback can be called multiple times for streamed data from r.
|
||||
// The callback can be called concurrently multiple times for streamed data from r.
|
||||
// The callback can be called after ParseStream returns.
|
||||
//
|
||||
// callback shouldn't hold rows after returning.
|
||||
func ParseStream(r io.Reader, isGzipped bool, precision, db string, callback func(db string, rows []Row) error) error {
|
||||
|
|
|
@ -17,7 +17,8 @@ import (
|
|||
|
||||
// ParseStream parses /api/v1/import/native lines from req and calls callback for parsed blocks.
|
||||
//
|
||||
// The callback can be called multiple times for streamed data from req.
|
||||
// The callback can be called concurrently multiple times for streamed data from req.
|
||||
// The callback can be called after ParseStream returns.
|
||||
//
|
||||
// callback shouldn't hold block after returning.
|
||||
// callback can be called in parallel from multiple concurrent goroutines.
|
||||
|
|
|
@ -23,7 +23,8 @@ var (
|
|||
|
||||
// ParseStream parses OpenTSDB lines from r and calls callback for the parsed rows.
|
||||
//
|
||||
// The callback can be called multiple times for streamed data from r.
|
||||
// The callback can be called concurrently multiple times for streamed data from r.
|
||||
// The callback can be called after ParseStream returns.
|
||||
//
|
||||
// callback shouldn't hold rows after returning.
|
||||
func ParseStream(r io.Reader, callback func(rows []Row) error) error {
|
||||
|
|
|
@ -26,7 +26,8 @@ var (
|
|||
|
||||
// ParseStream parses OpenTSDB http lines from req and calls callback for the parsed rows.
|
||||
//
|
||||
// The callback can be called multiple times for streamed data from req.
|
||||
// The callback can be called concurrently multiple times for streamed data from req.
|
||||
// The callback can be called after ParseStream returns.
|
||||
//
|
||||
// callback shouldn't hold rows after returning.
|
||||
func ParseStream(req *http.Request, callback func(rows []Row) error) error {
|
||||
|
|
|
@ -16,7 +16,8 @@ import (
|
|||
|
||||
// ParseStream parses lines with Prometheus exposition format from r and calls callback for the parsed rows.
|
||||
//
|
||||
// The callback can be called multiple times for streamed data from r.
|
||||
// The callback can be called concurrently multiple times for streamed data from r.
|
||||
// It is guaranteed that the callback isn't called after ParseStream returns.
|
||||
//
|
||||
// callback shouldn't hold rows after returning.
|
||||
func ParseStream(r io.Reader, defaultTimestamp int64, isGzipped bool, callback func(rows []Row) error) error {
|
||||
|
@ -32,11 +33,17 @@ func ParseStream(r io.Reader, defaultTimestamp int64, isGzipped bool, callback f
|
|||
defer putStreamContext(ctx)
|
||||
for ctx.Read() {
|
||||
uw := getUnmarshalWork()
|
||||
uw.callback = callback
|
||||
uw.callback = func(rows []Row) error {
|
||||
err := callback(rows)
|
||||
ctx.wg.Done()
|
||||
return err
|
||||
}
|
||||
uw.defaultTimestamp = defaultTimestamp
|
||||
uw.reqBuf, ctx.reqBuf = ctx.reqBuf, uw.reqBuf
|
||||
ctx.wg.Add(1)
|
||||
common.ScheduleUnmarshalWork(uw)
|
||||
}
|
||||
ctx.wg.Wait() // wait for all the outstanding callback calls before returning
|
||||
return ctx.Error()
|
||||
}
|
||||
|
||||
|
@ -61,6 +68,8 @@ type streamContext struct {
|
|||
reqBuf []byte
|
||||
tailBuf []byte
|
||||
err error
|
||||
|
||||
wg sync.WaitGroup
|
||||
}
|
||||
|
||||
func (ctx *streamContext) Error() error {
|
||||
|
|
|
@ -21,6 +21,9 @@ var maxInsertRequestSize = flagutil.NewBytes("maxInsertRequestSize", 32*1024*102
|
|||
|
||||
// ParseStream parses Prometheus remote_write message req and calls callback for the parsed timeseries.
|
||||
//
|
||||
// The callback can be called concurrently multiple times for streamed data from req.
|
||||
// The callback can be called after ParseStream returns.
|
||||
//
|
||||
// callback shouldn't hold tss after returning.
|
||||
func ParseStream(req *http.Request, callback func(tss []prompb.TimeSeries) error) error {
|
||||
ctx := getPushCtx(req.Body)
|
||||
|
|
|
@ -20,10 +20,10 @@ var maxLineLen = flagutil.NewBytes("import.maxLineLen", 100*1024*1024, "The maxi
|
|||
|
||||
// ParseStream parses /api/v1/import lines from req and calls callback for the parsed rows.
|
||||
//
|
||||
// The callback can be called multiple times for streamed data from req.
|
||||
// The callback can be called concurrently multiple times for streamed data from req.
|
||||
// The callback can be called after ParseStream returns.
|
||||
//
|
||||
// callback shouldn't hold rows after returning.
|
||||
// callback is called from multiple concurrent goroutines.
|
||||
func ParseStream(req *http.Request, callback func(rows []Row) error) error {
|
||||
r := req.Body
|
||||
if req.Header.Get("Content-Encoding") == "gzip" {
|
||||
|
|
|
@ -23,7 +23,7 @@ const (
|
|||
type Block struct {
|
||||
bh blockHeader
|
||||
|
||||
// nextIdx is the next row index for timestamps and values.
|
||||
// nextIdx is the next index for reading timestamps and values.
|
||||
nextIdx int
|
||||
|
||||
timestamps []int64
|
||||
|
|
|
@ -15,12 +15,12 @@ import (
|
|||
//
|
||||
// rowsMerged is atomically updated with the number of merged rows during the merge.
|
||||
func mergeBlockStreams(ph *partHeader, bsw *blockStreamWriter, bsrs []*blockStreamReader, stopCh <-chan struct{},
|
||||
dmis *uint64set.Set, rowsMerged, rowsDeleted *uint64) error {
|
||||
dmis *uint64set.Set, retentionDeadline int64, rowsMerged, rowsDeleted *uint64) error {
|
||||
ph.Reset()
|
||||
|
||||
bsm := bsmPool.Get().(*blockStreamMerger)
|
||||
bsm.Init(bsrs)
|
||||
err := mergeBlockStreamsInternal(ph, bsw, bsm, stopCh, dmis, rowsMerged, rowsDeleted)
|
||||
err := mergeBlockStreamsInternal(ph, bsw, bsm, stopCh, dmis, retentionDeadline, rowsMerged, rowsDeleted)
|
||||
bsm.reset()
|
||||
bsmPool.Put(bsm)
|
||||
bsw.MustClose()
|
||||
|
@ -39,29 +39,10 @@ var bsmPool = &sync.Pool{
|
|||
var errForciblyStopped = fmt.Errorf("forcibly stopped")
|
||||
|
||||
func mergeBlockStreamsInternal(ph *partHeader, bsw *blockStreamWriter, bsm *blockStreamMerger, stopCh <-chan struct{},
|
||||
dmis *uint64set.Set, rowsMerged, rowsDeleted *uint64) error {
|
||||
// Search for the first block to merge
|
||||
var pendingBlock *Block
|
||||
for bsm.NextBlock() {
|
||||
select {
|
||||
case <-stopCh:
|
||||
return errForciblyStopped
|
||||
default:
|
||||
}
|
||||
if dmis.Has(bsm.Block.bh.TSID.MetricID) {
|
||||
// Skip blocks for deleted metrics.
|
||||
*rowsDeleted += uint64(bsm.Block.bh.RowsCount)
|
||||
continue
|
||||
}
|
||||
pendingBlock = getBlock()
|
||||
pendingBlock.CopyFrom(bsm.Block)
|
||||
break
|
||||
}
|
||||
if pendingBlock != nil {
|
||||
defer putBlock(pendingBlock)
|
||||
}
|
||||
|
||||
// Merge blocks.
|
||||
dmis *uint64set.Set, retentionDeadline int64, rowsMerged, rowsDeleted *uint64) error {
|
||||
pendingBlockIsEmpty := true
|
||||
pendingBlock := getBlock()
|
||||
defer putBlock(pendingBlock)
|
||||
tmpBlock := getBlock()
|
||||
defer putBlock(tmpBlock)
|
||||
for bsm.NextBlock() {
|
||||
|
@ -75,6 +56,17 @@ func mergeBlockStreamsInternal(ph *partHeader, bsw *blockStreamWriter, bsm *bloc
|
|||
*rowsDeleted += uint64(bsm.Block.bh.RowsCount)
|
||||
continue
|
||||
}
|
||||
if bsm.Block.bh.MaxTimestamp < retentionDeadline {
|
||||
// Skip blocks out of the given retention.
|
||||
*rowsDeleted += uint64(bsm.Block.bh.RowsCount)
|
||||
continue
|
||||
}
|
||||
if pendingBlockIsEmpty {
|
||||
// Load the next block if pendingBlock is empty.
|
||||
pendingBlock.CopyFrom(bsm.Block)
|
||||
pendingBlockIsEmpty = false
|
||||
continue
|
||||
}
|
||||
|
||||
// Verify whether pendingBlock may be merged with bsm.Block (the current block).
|
||||
if pendingBlock.bh.TSID.MetricID != bsm.Block.bh.TSID.MetricID {
|
||||
|
@ -104,16 +96,20 @@ func mergeBlockStreamsInternal(ph *partHeader, bsw *blockStreamWriter, bsm *bloc
|
|||
tmpBlock.bh.TSID = bsm.Block.bh.TSID
|
||||
tmpBlock.bh.Scale = bsm.Block.bh.Scale
|
||||
tmpBlock.bh.PrecisionBits = minUint8(pendingBlock.bh.PrecisionBits, bsm.Block.bh.PrecisionBits)
|
||||
mergeBlocks(tmpBlock, pendingBlock, bsm.Block)
|
||||
mergeBlocks(tmpBlock, pendingBlock, bsm.Block, retentionDeadline, rowsDeleted)
|
||||
if len(tmpBlock.timestamps) <= maxRowsPerBlock {
|
||||
// More entries may be added to tmpBlock. Swap it with pendingBlock,
|
||||
// so more entries may be added to pendingBlock on the next iteration.
|
||||
tmpBlock.fixupTimestamps()
|
||||
if len(tmpBlock.timestamps) > 0 {
|
||||
tmpBlock.fixupTimestamps()
|
||||
} else {
|
||||
pendingBlockIsEmpty = true
|
||||
}
|
||||
pendingBlock, tmpBlock = tmpBlock, pendingBlock
|
||||
continue
|
||||
}
|
||||
|
||||
// Write the first len(maxRowsPerBlock) of tmpBlock.timestamps to bsw,
|
||||
// Write the first maxRowsPerBlock of tmpBlock.timestamps to bsw,
|
||||
// leave the rest in pendingBlock.
|
||||
tmpBlock.nextIdx = maxRowsPerBlock
|
||||
pendingBlock.CopyFrom(tmpBlock)
|
||||
|
@ -127,18 +123,21 @@ func mergeBlockStreamsInternal(ph *partHeader, bsw *blockStreamWriter, bsm *bloc
|
|||
if err := bsm.Error(); err != nil {
|
||||
return fmt.Errorf("cannot read block to be merged: %w", err)
|
||||
}
|
||||
if pendingBlock != nil {
|
||||
if !pendingBlockIsEmpty {
|
||||
bsw.WriteExternalBlock(pendingBlock, ph, rowsMerged)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// mergeBlocks merges ib1 and ib2 to ob.
|
||||
func mergeBlocks(ob, ib1, ib2 *Block) {
|
||||
func mergeBlocks(ob, ib1, ib2 *Block, retentionDeadline int64, rowsDeleted *uint64) {
|
||||
ib1.assertMergeable(ib2)
|
||||
ib1.assertUnmarshaled()
|
||||
ib2.assertUnmarshaled()
|
||||
|
||||
skipSamplesOutsideRetention(ib1, retentionDeadline, rowsDeleted)
|
||||
skipSamplesOutsideRetention(ib2, retentionDeadline, rowsDeleted)
|
||||
|
||||
if ib1.bh.MaxTimestamp < ib2.bh.MinTimestamp {
|
||||
// Fast path - ib1 values have smaller timestamps than ib2 values.
|
||||
appendRows(ob, ib1)
|
||||
|
@ -176,6 +175,16 @@ func mergeBlocks(ob, ib1, ib2 *Block) {
|
|||
}
|
||||
}
|
||||
|
||||
func skipSamplesOutsideRetention(b *Block, retentionDeadline int64, rowsDeleted *uint64) {
|
||||
timestamps := b.timestamps
|
||||
nextIdx := b.nextIdx
|
||||
for nextIdx < len(timestamps) && timestamps[nextIdx] < retentionDeadline {
|
||||
nextIdx++
|
||||
}
|
||||
*rowsDeleted += uint64(nextIdx - b.nextIdx)
|
||||
b.nextIdx = nextIdx
|
||||
}
|
||||
|
||||
func appendRows(ob, ib *Block) {
|
||||
ob.timestamps = append(ob.timestamps, ib.timestamps[ib.nextIdx:]...)
|
||||
ob.values = append(ob.values, ib.values[ib.nextIdx:]...)
|
||||
|
@ -189,7 +198,7 @@ func unmarshalAndCalibrateScale(b1, b2 *Block) error {
|
|||
return err
|
||||
}
|
||||
|
||||
scale := decimal.CalibrateScale(b1.values, b1.bh.Scale, b2.values, b2.bh.Scale)
|
||||
scale := decimal.CalibrateScale(b1.values[b1.nextIdx:], b1.bh.Scale, b2.values[b2.nextIdx:], b2.bh.Scale)
|
||||
b1.bh.Scale = scale
|
||||
b2.bh.Scale = scale
|
||||
return nil
|
||||
|
|
|
@ -365,7 +365,7 @@ func TestMergeForciblyStop(t *testing.T) {
|
|||
ch := make(chan struct{})
|
||||
var rowsMerged, rowsDeleted uint64
|
||||
close(ch)
|
||||
if err := mergeBlockStreams(&mp.ph, &bsw, bsrs, ch, nil, &rowsMerged, &rowsDeleted); !errors.Is(err, errForciblyStopped) {
|
||||
if err := mergeBlockStreams(&mp.ph, &bsw, bsrs, ch, nil, 0, &rowsMerged, &rowsDeleted); !errors.Is(err, errForciblyStopped) {
|
||||
t.Fatalf("unexpected error in mergeBlockStreams: got %v; want %v", err, errForciblyStopped)
|
||||
}
|
||||
if rowsMerged != 0 {
|
||||
|
@ -385,7 +385,7 @@ func testMergeBlockStreams(t *testing.T, bsrs []*blockStreamReader, expectedBloc
|
|||
bsw.InitFromInmemoryPart(&mp)
|
||||
|
||||
var rowsMerged, rowsDeleted uint64
|
||||
if err := mergeBlockStreams(&mp.ph, &bsw, bsrs, nil, nil, &rowsMerged, &rowsDeleted); err != nil {
|
||||
if err := mergeBlockStreams(&mp.ph, &bsw, bsrs, nil, nil, 0, &rowsMerged, &rowsDeleted); err != nil {
|
||||
t.Fatalf("unexpected error in mergeBlockStreams: %s", err)
|
||||
}
|
||||
|
||||
|
|
|
@ -41,7 +41,7 @@ func benchmarkMergeBlockStreams(b *testing.B, mps []*inmemoryPart, rowsPerLoop i
|
|||
}
|
||||
mpOut.Reset()
|
||||
bsw.InitFromInmemoryPart(&mpOut)
|
||||
if err := mergeBlockStreams(&mpOut.ph, &bsw, bsrs, nil, nil, &rowsMerged, &rowsDeleted); err != nil {
|
||||
if err := mergeBlockStreams(&mpOut.ph, &bsw, bsrs, nil, nil, 0, &rowsMerged, &rowsDeleted); err != nil {
|
||||
panic(fmt.Errorf("cannot merge block streams: %w", err))
|
||||
}
|
||||
}
|
||||
|
|
|
@ -196,6 +196,10 @@ func (mn *MetricName) CopyFrom(src *MetricName) {
|
|||
|
||||
// AddTag adds new tag to mn with the given key and value.
|
||||
func (mn *MetricName) AddTag(key, value string) {
|
||||
if key == string(metricGroupTagKey) {
|
||||
mn.MetricGroup = append(mn.MetricGroup, value...)
|
||||
return
|
||||
}
|
||||
tag := mn.addNextTag()
|
||||
tag.Key = append(tag.Key[:0], key...)
|
||||
tag.Value = append(tag.Value[:0], value...)
|
||||
|
@ -203,6 +207,10 @@ func (mn *MetricName) AddTag(key, value string) {
|
|||
|
||||
// AddTagBytes adds new tag to mn with the given key and value.
|
||||
func (mn *MetricName) AddTagBytes(key, value []byte) {
|
||||
if string(key) == string(metricGroupTagKey) {
|
||||
mn.MetricGroup = append(mn.MetricGroup, value...)
|
||||
return
|
||||
}
|
||||
tag := mn.addNextTag()
|
||||
tag.Key = append(tag.Key[:0], key...)
|
||||
tag.Value = append(tag.Value[:0], value...)
|
||||
|
|
|
@ -134,6 +134,10 @@ type partition struct {
|
|||
// The callack that returns deleted metric ids which must be skipped during merge.
|
||||
getDeletedMetricIDs func() *uint64set.Set
|
||||
|
||||
// data retention in milliseconds.
|
||||
// Used for deleting data outside the retention during background merge.
|
||||
retentionMsecs int64
|
||||
|
||||
// Name is the name of the partition in the form YYYY_MM.
|
||||
name string
|
||||
|
||||
|
@ -206,7 +210,7 @@ func (pw *partWrapper) decRef() {
|
|||
|
||||
// createPartition creates new partition for the given timestamp and the given paths
|
||||
// to small and big partitions.
|
||||
func createPartition(timestamp int64, smallPartitionsPath, bigPartitionsPath string, getDeletedMetricIDs func() *uint64set.Set) (*partition, error) {
|
||||
func createPartition(timestamp int64, smallPartitionsPath, bigPartitionsPath string, getDeletedMetricIDs func() *uint64set.Set, retentionMsecs int64) (*partition, error) {
|
||||
name := timestampToPartitionName(timestamp)
|
||||
smallPartsPath := filepath.Clean(smallPartitionsPath) + "/" + name
|
||||
bigPartsPath := filepath.Clean(bigPartitionsPath) + "/" + name
|
||||
|
@ -219,7 +223,7 @@ func createPartition(timestamp int64, smallPartitionsPath, bigPartitionsPath str
|
|||
return nil, fmt.Errorf("cannot create directories for big parts %q: %w", bigPartsPath, err)
|
||||
}
|
||||
|
||||
pt := newPartition(name, smallPartsPath, bigPartsPath, getDeletedMetricIDs)
|
||||
pt := newPartition(name, smallPartsPath, bigPartsPath, getDeletedMetricIDs, retentionMsecs)
|
||||
pt.tr.fromPartitionTimestamp(timestamp)
|
||||
pt.startMergeWorkers()
|
||||
pt.startRawRowsFlusher()
|
||||
|
@ -241,7 +245,7 @@ func (pt *partition) Drop() {
|
|||
}
|
||||
|
||||
// openPartition opens the existing partition from the given paths.
|
||||
func openPartition(smallPartsPath, bigPartsPath string, getDeletedMetricIDs func() *uint64set.Set) (*partition, error) {
|
||||
func openPartition(smallPartsPath, bigPartsPath string, getDeletedMetricIDs func() *uint64set.Set, retentionMsecs int64) (*partition, error) {
|
||||
smallPartsPath = filepath.Clean(smallPartsPath)
|
||||
bigPartsPath = filepath.Clean(bigPartsPath)
|
||||
|
||||
|
@ -265,7 +269,7 @@ func openPartition(smallPartsPath, bigPartsPath string, getDeletedMetricIDs func
|
|||
return nil, fmt.Errorf("cannot open big parts from %q: %w", bigPartsPath, err)
|
||||
}
|
||||
|
||||
pt := newPartition(name, smallPartsPath, bigPartsPath, getDeletedMetricIDs)
|
||||
pt := newPartition(name, smallPartsPath, bigPartsPath, getDeletedMetricIDs, retentionMsecs)
|
||||
pt.smallParts = smallParts
|
||||
pt.bigParts = bigParts
|
||||
if err := pt.tr.fromPartitionName(name); err != nil {
|
||||
|
@ -278,13 +282,14 @@ func openPartition(smallPartsPath, bigPartsPath string, getDeletedMetricIDs func
|
|||
return pt, nil
|
||||
}
|
||||
|
||||
func newPartition(name, smallPartsPath, bigPartsPath string, getDeletedMetricIDs func() *uint64set.Set) *partition {
|
||||
func newPartition(name, smallPartsPath, bigPartsPath string, getDeletedMetricIDs func() *uint64set.Set, retentionMsecs int64) *partition {
|
||||
p := &partition{
|
||||
name: name,
|
||||
smallPartsPath: smallPartsPath,
|
||||
bigPartsPath: bigPartsPath,
|
||||
|
||||
getDeletedMetricIDs: getDeletedMetricIDs,
|
||||
retentionMsecs: retentionMsecs,
|
||||
|
||||
mergeIdx: uint64(time.Now().UnixNano()),
|
||||
stopCh: make(chan struct{}),
|
||||
|
@ -1129,7 +1134,8 @@ func (pt *partition) mergeParts(pws []*partWrapper, stopCh <-chan struct{}) erro
|
|||
atomic.AddUint64(&pt.smallMergesCount, 1)
|
||||
atomic.AddUint64(&pt.activeSmallMerges, 1)
|
||||
}
|
||||
err := mergeBlockStreams(&ph, bsw, bsrs, stopCh, dmis, rowsMerged, rowsDeleted)
|
||||
retentionDeadline := timestampFromTime(startTime) - pt.retentionMsecs
|
||||
err := mergeBlockStreams(&ph, bsw, bsrs, stopCh, dmis, retentionDeadline, rowsMerged, rowsDeleted)
|
||||
if isBigPart {
|
||||
atomic.AddUint64(&pt.activeBigMerges, ^uint64(0))
|
||||
} else {
|
||||
|
|
|
@ -167,7 +167,8 @@ func testPartitionSearchEx(t *testing.T, ptt int64, tr TimeRange, partsCount, ma
|
|||
})
|
||||
|
||||
// Create partition from rowss and test search on it.
|
||||
pt, err := createPartition(ptt, "./small-table", "./big-table", nilGetDeletedMetricIDs)
|
||||
retentionMsecs := timestampFromTime(time.Now()) - ptr.MinTimestamp + 3600*1000
|
||||
pt, err := createPartition(ptt, "./small-table", "./big-table", nilGetDeletedMetricIDs, retentionMsecs)
|
||||
if err != nil {
|
||||
t.Fatalf("cannot create partition: %s", err)
|
||||
}
|
||||
|
@ -191,7 +192,7 @@ func testPartitionSearchEx(t *testing.T, ptt int64, tr TimeRange, partsCount, ma
|
|||
pt.MustClose()
|
||||
|
||||
// Open the created partition and test search on it.
|
||||
pt, err = openPartition(smallPartsPath, bigPartsPath, nilGetDeletedMetricIDs)
|
||||
pt, err = openPartition(smallPartsPath, bigPartsPath, nilGetDeletedMetricIDs, retentionMsecs)
|
||||
if err != nil {
|
||||
t.Fatalf("cannot open partition: %s", err)
|
||||
}
|
||||
|
|
|
@ -27,7 +27,10 @@ import (
|
|||
"github.com/VictoriaMetrics/fastcache"
|
||||
)
|
||||
|
||||
const maxRetentionMonths = 12 * 100
|
||||
const (
|
||||
msecsPerMonth = 31 * 24 * 3600 * 1000
|
||||
maxRetentionMsecs = 100 * 12 * msecsPerMonth
|
||||
)
|
||||
|
||||
// Storage represents TSDB storage.
|
||||
type Storage struct {
|
||||
|
@ -47,9 +50,9 @@ type Storage struct {
|
|||
slowPerDayIndexInserts uint64
|
||||
slowMetricNameLoads uint64
|
||||
|
||||
path string
|
||||
cachePath string
|
||||
retentionMonths int
|
||||
path string
|
||||
cachePath string
|
||||
retentionMsecs int64
|
||||
|
||||
// lock file for exclusive access to the storage on the given path.
|
||||
flockF *os.File
|
||||
|
@ -106,23 +109,19 @@ type Storage struct {
|
|||
snapshotLock sync.Mutex
|
||||
}
|
||||
|
||||
// OpenStorage opens storage on the given path with the given number of retention months.
|
||||
func OpenStorage(path string, retentionMonths int) (*Storage, error) {
|
||||
if retentionMonths > maxRetentionMonths {
|
||||
return nil, fmt.Errorf("too big retentionMonths=%d; cannot exceed %d", retentionMonths, maxRetentionMonths)
|
||||
}
|
||||
if retentionMonths <= 0 {
|
||||
retentionMonths = maxRetentionMonths
|
||||
}
|
||||
// OpenStorage opens storage on the given path with the given retentionMsecs.
|
||||
func OpenStorage(path string, retentionMsecs int64) (*Storage, error) {
|
||||
path, err := filepath.Abs(path)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("cannot determine absolute path for %q: %w", path, err)
|
||||
}
|
||||
|
||||
if retentionMsecs <= 0 {
|
||||
retentionMsecs = maxRetentionMsecs
|
||||
}
|
||||
s := &Storage{
|
||||
path: path,
|
||||
cachePath: path + "/cache",
|
||||
retentionMonths: retentionMonths,
|
||||
path: path,
|
||||
cachePath: path + "/cache",
|
||||
retentionMsecs: retentionMsecs,
|
||||
|
||||
stop: make(chan struct{}),
|
||||
}
|
||||
|
@ -178,7 +177,7 @@ func OpenStorage(path string, retentionMonths int) (*Storage, error) {
|
|||
|
||||
// Load data
|
||||
tablePath := path + "/data"
|
||||
tb, err := openTable(tablePath, retentionMonths, s.getDeletedMetricIDs)
|
||||
tb, err := openTable(tablePath, s.getDeletedMetricIDs, retentionMsecs)
|
||||
if err != nil {
|
||||
s.idb().MustClose()
|
||||
return nil, fmt.Errorf("cannot open table at %q: %w", tablePath, err)
|
||||
|
@ -473,8 +472,9 @@ func (s *Storage) startRetentionWatcher() {
|
|||
}
|
||||
|
||||
func (s *Storage) retentionWatcher() {
|
||||
retentionMonths := int((s.retentionMsecs + (msecsPerMonth - 1)) / msecsPerMonth)
|
||||
for {
|
||||
d := nextRetentionDuration(s.retentionMonths)
|
||||
d := nextRetentionDuration(retentionMonths)
|
||||
select {
|
||||
case <-s.stop:
|
||||
return
|
||||
|
|
|
@ -353,8 +353,8 @@ func TestStorageOpenMultipleTimes(t *testing.T) {
|
|||
|
||||
func TestStorageRandTimestamps(t *testing.T) {
|
||||
path := "TestStorageRandTimestamps"
|
||||
retentionMonths := 60
|
||||
s, err := OpenStorage(path, retentionMonths)
|
||||
retentionMsecs := int64(60 * msecsPerMonth)
|
||||
s, err := OpenStorage(path, retentionMsecs)
|
||||
if err != nil {
|
||||
t.Fatalf("cannot open storage: %s", err)
|
||||
}
|
||||
|
@ -364,7 +364,7 @@ func TestStorageRandTimestamps(t *testing.T) {
|
|||
t.Fatal(err)
|
||||
}
|
||||
s.MustClose()
|
||||
s, err = OpenStorage(path, retentionMonths)
|
||||
s, err = OpenStorage(path, retentionMsecs)
|
||||
}
|
||||
})
|
||||
t.Run("concurrent", func(t *testing.T) {
|
||||
|
|
|
@ -22,6 +22,7 @@ type table struct {
|
|||
bigPartitionsPath string
|
||||
|
||||
getDeletedMetricIDs func() *uint64set.Set
|
||||
retentionMsecs int64
|
||||
|
||||
ptws []*partitionWrapper
|
||||
ptwsLock sync.Mutex
|
||||
|
@ -30,8 +31,7 @@ type table struct {
|
|||
|
||||
stop chan struct{}
|
||||
|
||||
retentionMilliseconds int64
|
||||
retentionWatcherWG sync.WaitGroup
|
||||
retentionWatcherWG sync.WaitGroup
|
||||
}
|
||||
|
||||
// partitionWrapper provides refcounting mechanism for the partition.
|
||||
|
@ -77,12 +77,12 @@ func (ptw *partitionWrapper) scheduleToDrop() {
|
|||
atomic.AddUint64(&ptw.mustDrop, 1)
|
||||
}
|
||||
|
||||
// openTable opens a table on the given path with the given retentionMonths.
|
||||
// openTable opens a table on the given path with the given retentionMsecs.
|
||||
//
|
||||
// The table is created if it doesn't exist.
|
||||
//
|
||||
// Data older than the retentionMonths may be dropped at any time.
|
||||
func openTable(path string, retentionMonths int, getDeletedMetricIDs func() *uint64set.Set) (*table, error) {
|
||||
// Data older than the retentionMsecs may be dropped at any time.
|
||||
func openTable(path string, getDeletedMetricIDs func() *uint64set.Set, retentionMsecs int64) (*table, error) {
|
||||
path = filepath.Clean(path)
|
||||
|
||||
// Create a directory for the table if it doesn't exist yet.
|
||||
|
@ -115,7 +115,7 @@ func openTable(path string, retentionMonths int, getDeletedMetricIDs func() *uin
|
|||
}
|
||||
|
||||
// Open partitions.
|
||||
pts, err := openPartitions(smallPartitionsPath, bigPartitionsPath, getDeletedMetricIDs)
|
||||
pts, err := openPartitions(smallPartitionsPath, bigPartitionsPath, getDeletedMetricIDs, retentionMsecs)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("cannot open partitions in the table %q: %w", path, err)
|
||||
}
|
||||
|
@ -125,6 +125,7 @@ func openTable(path string, retentionMonths int, getDeletedMetricIDs func() *uin
|
|||
smallPartitionsPath: smallPartitionsPath,
|
||||
bigPartitionsPath: bigPartitionsPath,
|
||||
getDeletedMetricIDs: getDeletedMetricIDs,
|
||||
retentionMsecs: retentionMsecs,
|
||||
|
||||
flockF: flockF,
|
||||
|
||||
|
@ -133,11 +134,6 @@ func openTable(path string, retentionMonths int, getDeletedMetricIDs func() *uin
|
|||
for _, pt := range pts {
|
||||
tb.addPartitionNolock(pt)
|
||||
}
|
||||
if retentionMonths <= 0 || retentionMonths > maxRetentionMonths {
|
||||
retentionMonths = maxRetentionMonths
|
||||
}
|
||||
tb.retentionMilliseconds = int64(retentionMonths) * 31 * 24 * 3600 * 1e3
|
||||
|
||||
tb.startRetentionWatcher()
|
||||
return tb, nil
|
||||
}
|
||||
|
@ -357,7 +353,7 @@ func (tb *table) AddRows(rows []rawRow) error {
|
|||
continue
|
||||
}
|
||||
|
||||
pt, err := createPartition(r.Timestamp, tb.smallPartitionsPath, tb.bigPartitionsPath, tb.getDeletedMetricIDs)
|
||||
pt, err := createPartition(r.Timestamp, tb.smallPartitionsPath, tb.bigPartitionsPath, tb.getDeletedMetricIDs, tb.retentionMsecs)
|
||||
if err != nil {
|
||||
errors = append(errors, err)
|
||||
continue
|
||||
|
@ -376,7 +372,7 @@ func (tb *table) AddRows(rows []rawRow) error {
|
|||
|
||||
func (tb *table) getMinMaxTimestamps() (int64, int64) {
|
||||
now := int64(fasttime.UnixTimestamp() * 1000)
|
||||
minTimestamp := now - tb.retentionMilliseconds
|
||||
minTimestamp := now - tb.retentionMsecs
|
||||
maxTimestamp := now + 2*24*3600*1000 // allow max +2 days from now due to timezones shit :)
|
||||
if minTimestamp < 0 {
|
||||
// Negative timestamps aren't supported by the storage.
|
||||
|
@ -406,7 +402,7 @@ func (tb *table) retentionWatcher() {
|
|||
case <-ticker.C:
|
||||
}
|
||||
|
||||
minTimestamp := int64(fasttime.UnixTimestamp()*1000) - tb.retentionMilliseconds
|
||||
minTimestamp := int64(fasttime.UnixTimestamp()*1000) - tb.retentionMsecs
|
||||
var ptwsDrop []*partitionWrapper
|
||||
tb.ptwsLock.Lock()
|
||||
dst := tb.ptws[:0]
|
||||
|
@ -457,7 +453,7 @@ func (tb *table) PutPartitions(ptws []*partitionWrapper) {
|
|||
}
|
||||
}
|
||||
|
||||
func openPartitions(smallPartitionsPath, bigPartitionsPath string, getDeletedMetricIDs func() *uint64set.Set) ([]*partition, error) {
|
||||
func openPartitions(smallPartitionsPath, bigPartitionsPath string, getDeletedMetricIDs func() *uint64set.Set, retentionMsecs int64) ([]*partition, error) {
|
||||
// Certain partition directories in either `big` or `small` dir may be missing
|
||||
// after restoring from backup. So populate partition names from both dirs.
|
||||
ptNames := make(map[string]bool)
|
||||
|
@ -471,7 +467,7 @@ func openPartitions(smallPartitionsPath, bigPartitionsPath string, getDeletedMet
|
|||
for ptName := range ptNames {
|
||||
smallPartsPath := smallPartitionsPath + "/" + ptName
|
||||
bigPartsPath := bigPartitionsPath + "/" + ptName
|
||||
pt, err := openPartition(smallPartsPath, bigPartsPath, getDeletedMetricIDs)
|
||||
pt, err := openPartition(smallPartsPath, bigPartsPath, getDeletedMetricIDs, retentionMsecs)
|
||||
if err != nil {
|
||||
mustClosePartitions(pts)
|
||||
return nil, fmt.Errorf("cannot open partition %q: %w", ptName, err)
|
||||
|
|
|
@ -66,7 +66,7 @@ func (ts *tableSearch) Init(tb *table, tsids []TSID, tr TimeRange) {
|
|||
// Adjust tr.MinTimestamp, so it doesn't obtain data older
|
||||
// than the tb retention.
|
||||
now := int64(fasttime.UnixTimestamp() * 1000)
|
||||
minTimestamp := now - tb.retentionMilliseconds
|
||||
minTimestamp := now - tb.retentionMsecs
|
||||
if tr.MinTimestamp < minTimestamp {
|
||||
tr.MinTimestamp = minTimestamp
|
||||
}
|
||||
|
|
|
@ -181,7 +181,7 @@ func testTableSearchEx(t *testing.T, trData, trSearch TimeRange, partitionsCount
|
|||
})
|
||||
|
||||
// Create a table from rowss and test search on it.
|
||||
tb, err := openTable("./test-table", -1, nilGetDeletedMetricIDs)
|
||||
tb, err := openTable("./test-table", nilGetDeletedMetricIDs, maxRetentionMsecs)
|
||||
if err != nil {
|
||||
t.Fatalf("cannot create table: %s", err)
|
||||
}
|
||||
|
@ -202,7 +202,7 @@ func testTableSearchEx(t *testing.T, trData, trSearch TimeRange, partitionsCount
|
|||
tb.MustClose()
|
||||
|
||||
// Open the created table and test search on it.
|
||||
tb, err = openTable("./test-table", -1, nilGetDeletedMetricIDs)
|
||||
tb, err = openTable("./test-table", nilGetDeletedMetricIDs, maxRetentionMsecs)
|
||||
if err != nil {
|
||||
t.Fatalf("cannot open table: %s", err)
|
||||
}
|
||||
|
|
|
@ -47,7 +47,7 @@ func openBenchTable(b *testing.B, startTimestamp int64, rowsPerInsert, rowsCount
|
|||
createBenchTable(b, path, startTimestamp, rowsPerInsert, rowsCount, tsidsCount)
|
||||
createdBenchTables[path] = true
|
||||
}
|
||||
tb, err := openTable(path, -1, nilGetDeletedMetricIDs)
|
||||
tb, err := openTable(path, nilGetDeletedMetricIDs, maxRetentionMsecs)
|
||||
if err != nil {
|
||||
b.Fatalf("cnanot open table %q: %s", path, err)
|
||||
}
|
||||
|
@ -70,7 +70,7 @@ var createdBenchTables = make(map[string]bool)
|
|||
func createBenchTable(b *testing.B, path string, startTimestamp int64, rowsPerInsert, rowsCount, tsidsCount int) {
|
||||
b.Helper()
|
||||
|
||||
tb, err := openTable(path, -1, nilGetDeletedMetricIDs)
|
||||
tb, err := openTable(path, nilGetDeletedMetricIDs, maxRetentionMsecs)
|
||||
if err != nil {
|
||||
b.Fatalf("cannot open table %q: %s", path, err)
|
||||
}
|
||||
|
|
|
@ -7,7 +7,7 @@ import (
|
|||
|
||||
func TestTableOpenClose(t *testing.T) {
|
||||
const path = "TestTableOpenClose"
|
||||
const retentionMonths = 123
|
||||
const retentionMsecs = 123 * msecsPerMonth
|
||||
|
||||
if err := os.RemoveAll(path); err != nil {
|
||||
t.Fatalf("cannot remove %q: %s", path, err)
|
||||
|
@ -17,7 +17,7 @@ func TestTableOpenClose(t *testing.T) {
|
|||
}()
|
||||
|
||||
// Create a new table
|
||||
tb, err := openTable(path, retentionMonths, nilGetDeletedMetricIDs)
|
||||
tb, err := openTable(path, nilGetDeletedMetricIDs, retentionMsecs)
|
||||
if err != nil {
|
||||
t.Fatalf("cannot create new table: %s", err)
|
||||
}
|
||||
|
@ -27,7 +27,7 @@ func TestTableOpenClose(t *testing.T) {
|
|||
|
||||
// Re-open created table multiple times.
|
||||
for i := 0; i < 10; i++ {
|
||||
tb, err := openTable(path, retentionMonths, nilGetDeletedMetricIDs)
|
||||
tb, err := openTable(path, nilGetDeletedMetricIDs, retentionMsecs)
|
||||
if err != nil {
|
||||
t.Fatalf("cannot open created table: %s", err)
|
||||
}
|
||||
|
@ -37,20 +37,20 @@ func TestTableOpenClose(t *testing.T) {
|
|||
|
||||
func TestTableOpenMultipleTimes(t *testing.T) {
|
||||
const path = "TestTableOpenMultipleTimes"
|
||||
const retentionMonths = 123
|
||||
const retentionMsecs = 123 * msecsPerMonth
|
||||
|
||||
defer func() {
|
||||
_ = os.RemoveAll(path)
|
||||
}()
|
||||
|
||||
tb1, err := openTable(path, retentionMonths, nilGetDeletedMetricIDs)
|
||||
tb1, err := openTable(path, nilGetDeletedMetricIDs, retentionMsecs)
|
||||
if err != nil {
|
||||
t.Fatalf("cannot open table the first time: %s", err)
|
||||
}
|
||||
defer tb1.MustClose()
|
||||
|
||||
for i := 0; i < 10; i++ {
|
||||
tb2, err := openTable(path, retentionMonths, nilGetDeletedMetricIDs)
|
||||
tb2, err := openTable(path, nilGetDeletedMetricIDs, retentionMsecs)
|
||||
if err == nil {
|
||||
tb2.MustClose()
|
||||
t.Fatalf("expecting non-nil error when opening already opened table")
|
||||
|
|
|
@ -45,7 +45,7 @@ func benchmarkTableAddRows(b *testing.B, rowsPerInsert, tsidsCount int) {
|
|||
b.SetBytes(int64(rowsCountExpected))
|
||||
tablePath := "./benchmarkTableAddRows"
|
||||
for i := 0; i < b.N; i++ {
|
||||
tb, err := openTable(tablePath, -1, nilGetDeletedMetricIDs)
|
||||
tb, err := openTable(tablePath, nilGetDeletedMetricIDs, maxRetentionMsecs)
|
||||
if err != nil {
|
||||
b.Fatalf("cannot open table %q: %s", tablePath, err)
|
||||
}
|
||||
|
@ -93,7 +93,7 @@ func benchmarkTableAddRows(b *testing.B, rowsPerInsert, tsidsCount int) {
|
|||
tb.MustClose()
|
||||
|
||||
// Open the table from files and verify the rows count on it
|
||||
tb, err = openTable(tablePath, -1, nilGetDeletedMetricIDs)
|
||||
tb, err = openTable(tablePath, nilGetDeletedMetricIDs, maxRetentionMsecs)
|
||||
if err != nil {
|
||||
b.Fatalf("cannot open table %q: %s", tablePath, err)
|
||||
}
|
||||
|
|
11
ports/OpenBSD/README.md
Normal file
11
ports/OpenBSD/README.md
Normal file
|
@ -0,0 +1,11 @@
|
|||
# OpenBSD ports
|
||||
|
||||
Tested with Release 6.7
|
||||
|
||||
The VictoriaMetrics DB must be place in `/usr/ports/sysutils` directory
|
||||
and the file `/usr/ports/infrastructure/db/user.list`
|
||||
should be modified with a new line
|
||||
```
|
||||
855 _vmetrics _vmetrics sysutils/VictoriaMetrics
|
||||
```
|
||||
|
38
ports/OpenBSD/VictoriaMetrics/Makefile
Normal file
38
ports/OpenBSD/VictoriaMetrics/Makefile
Normal file
|
@ -0,0 +1,38 @@
|
|||
# $OpenBSD$
|
||||
|
||||
COMMENT = fast, cost-effective and scalable time series database
|
||||
|
||||
GH_ACCOUNT = VictoriaMetrics
|
||||
GH_PROJECT = VictoriaMetrics
|
||||
GH_TAGNAME = v1.44.0
|
||||
|
||||
CATEGORIES = sysutils
|
||||
|
||||
HOMEPAGE = https://victoriametrics.com/
|
||||
|
||||
MAINTAINER = VictoriaMetrics <info@victoriametrics.com>
|
||||
|
||||
# Apache License 2.0
|
||||
PERMIT_PACKAGE = Yes
|
||||
|
||||
WANTLIB = c pthread
|
||||
|
||||
USE_GMAKE = Yes
|
||||
|
||||
MODULES= lang/go
|
||||
MODGO_GOPATH= ${MODGO_WORKSPACE}
|
||||
|
||||
do-build:
|
||||
cd ${WRKSRC} && GOOS=openbsd ${MAKE_ENV} ${MAKE_PROGRAM} victoria-metrics-pure
|
||||
cd ${WRKSRC} && GOOS=openbsd ${MAKE_ENV} ${MAKE_PROGRAM} vmbackup
|
||||
|
||||
do-install:
|
||||
${INSTALL_PROGRAM} ./pkg/vmlogger.pl ${PREFIX}/bin/vmetricslogger.pl
|
||||
${INSTALL_PROGRAM} ${WRKSRC}/bin/victoria-metrics-pure ${PREFIX}/bin/vmetrics
|
||||
${INSTALL_PROGRAM} ${WRKSRC}/bin/vmbackup ${PREFIX}/bin/vmetricsbackup
|
||||
${INSTALL_DATA_DIR} ${PREFIX}/share/doc/vmetrics/
|
||||
${INSTALL_DATA} ${WRKSRC}/README.md ${PREFIX}/share/doc/vmetrics/
|
||||
${INSTALL_DATA} ${WRKSRC}/LICENSE ${PREFIX}/share/doc/vmetrics/
|
||||
${INSTALL_DATA} ${WRKSRC}/docs/* ${PREFIX}/share/doc/vmetrics/
|
||||
|
||||
.include <bsd.port.mk>
|
2
ports/OpenBSD/VictoriaMetrics/distinfo
Normal file
2
ports/OpenBSD/VictoriaMetrics/distinfo
Normal file
|
@ -0,0 +1,2 @@
|
|||
SHA256 (VictoriaMetrics-1.44.0.tar.gz) = OIXIyqiijWvAPDgq5wMoDpv1rENcIOWIcXmz4T5v1lU=
|
||||
SIZE (VictoriaMetrics-1.44.0.tar.gz) = 8898365
|
3
ports/OpenBSD/VictoriaMetrics/pkg/DESCR
Normal file
3
ports/OpenBSD/VictoriaMetrics/pkg/DESCR
Normal file
|
@ -0,0 +1,3 @@
|
|||
VictoriaMetrics is fast,
|
||||
cost-effective and scalable time-series database.
|
||||
|
34
ports/OpenBSD/VictoriaMetrics/pkg/PLIST
Normal file
34
ports/OpenBSD/VictoriaMetrics/pkg/PLIST
Normal file
|
@ -0,0 +1,34 @@
|
|||
@comment $OpenBSD$
|
||||
@newgroup _vmetrics:855
|
||||
@newuser _vmetrics:855:_vmetrics:daemon:VictoriaMetrics:${VARBASE}/db/vmetrics:/sbin/nologin
|
||||
@sample ${SYSCONFDIR}/prometheus/
|
||||
@rcscript ${RCDIR}/vmetrics
|
||||
@bin bin/vmetricslogger.pl
|
||||
@bin bin/vmetrics
|
||||
@bin bin/vmetricsbackup
|
||||
share/doc/vmetrics/
|
||||
share/doc/vmetrics/Articles.md
|
||||
share/doc/vmetrics/CaseStudies.md
|
||||
share/doc/vmetrics/Cluster-VictoriaMetrics.md
|
||||
share/doc/vmetrics/ExtendedPromQL.md
|
||||
share/doc/vmetrics/FAQ.md
|
||||
share/doc/vmetrics/Home.md
|
||||
share/doc/vmetrics/LICENSE
|
||||
share/doc/vmetrics/MetricsQL.md
|
||||
share/doc/vmetrics/Quick-Start.md
|
||||
share/doc/vmetrics/README.md
|
||||
share/doc/vmetrics/Release-Guide.md
|
||||
share/doc/vmetrics/SampleSizeCalculations.md
|
||||
share/doc/vmetrics/Single-server-VictoriaMetrics.md
|
||||
share/doc/vmetrics/logo.png
|
||||
share/doc/vmetrics/robots.txt
|
||||
share/doc/vmetrics/vmagent.md
|
||||
share/doc/vmetrics/vmagent.png
|
||||
share/doc/vmetrics/vmalert.md
|
||||
share/doc/vmetrics/vmauth.md
|
||||
share/doc/vmetrics/vmbackup.md
|
||||
share/doc/vmetrics/vmrestore.md
|
||||
@mode 0755
|
||||
@owner _vmetrics
|
||||
@group _vmetrics
|
||||
@sample ${VARBASE}/db/vmetrics
|
19
ports/OpenBSD/VictoriaMetrics/pkg/vmetrics.rc
Normal file
19
ports/OpenBSD/VictoriaMetrics/pkg/vmetrics.rc
Normal file
|
@ -0,0 +1,19 @@
|
|||
#!/bin/sh
|
||||
#
|
||||
# $OpenBSD$
|
||||
|
||||
daemon="${TRUEPREFIX}/bin/vmetrics"
|
||||
daemon_flags="-storageDataPath=/var/db/vmetrics/ ${daemon_flags}"
|
||||
daemon_user=_vmetrics
|
||||
|
||||
. /etc/rc.d/rc.subr
|
||||
|
||||
pexp="${daemon}.*"
|
||||
rc_bg=YES
|
||||
rc_reload=NO
|
||||
|
||||
rc_start() {
|
||||
${rcexec} "${daemon} -loggerDisableTimestamps ${daemon_flags} < /dev/null 2>&1 | ${TRUEPREFIX}/bin/vmetricslogger.pl"
|
||||
}
|
||||
|
||||
rc_cmd $1
|
18
ports/OpenBSD/VictoriaMetrics/pkg/vmlogger.pl
Normal file
18
ports/OpenBSD/VictoriaMetrics/pkg/vmlogger.pl
Normal file
|
@ -0,0 +1,18 @@
|
|||
#!/usr/bin/perl
|
||||
use Sys::Syslog qw(:standard :macros);
|
||||
|
||||
openlog("victoria-metrics", "pid", "daemon");
|
||||
|
||||
while (my $l = <>) {
|
||||
my @d = split /\t/, $l;
|
||||
# go level : "INFO", "WARN", "ERROR", "FATAL", "PANIC":
|
||||
my $lvl = $d[0];
|
||||
$lvl = LOG_EMERG if ($lvl eq 'panic');
|
||||
$lvl = 'crit' if ($lvl eq 'fatal');
|
||||
$lvl = 'err' if ($lvl eq 'error');
|
||||
$lvl = 'warning' if ($lvl eq 'warn');
|
||||
chomp $d[2];
|
||||
syslog( $lvl, $d[2] );
|
||||
}
|
||||
|
||||
closelog();
|
Loading…
Reference in a new issue