Merge branch 'public-single-node' into pmm-6401-read-prometheus-data-files

This commit is contained in:
Aliaksandr Valialkin 2020-11-02 00:47:34 +02:00
commit 1b112405a8
65 changed files with 1467 additions and 408 deletions

View file

@ -1,15 +1,35 @@
# tip # tip
* FEATURE: allow setting `-retentionPeriod` smaller than one month. I.e. `-retentionPeriod=3d`, `-retentionPeriod=2w`, etc. is supported now.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/173
* FEATURE: optimize more cases according to https://utcc.utoronto.ca/~cks/space/blog/sysadmin/PrometheusLabelNonOptimization . Now the following cases are optimized too: * FEATURE: optimize more cases according to https://utcc.utoronto.ca/~cks/space/blog/sysadmin/PrometheusLabelNonOptimization . Now the following cases are optimized too:
* `rollup_func(foo{filters}[d]) op bar` -> `rollup_func(foo{filters}[d]) op bar{filters}` * `rollup_func(foo{filters}[d]) op bar` -> `rollup_func(foo{filters}[d]) op bar{filters}`
* `transform_func(foo{filters}) op bar` -> `transform_func(foo{filters}) op bar{filters}` * `transform_func(foo{filters}) op bar` -> `transform_func(foo{filters}) op bar{filters}`
* `num_or_scalar op foo{filters} op bar` -> `num_or_scalar op foo{filters} op bar{filters}` * `num_or_scalar op foo{filters} op bar` -> `num_or_scalar op foo{filters} op bar{filters}`
* FEATURE: improve time series search for queries with multiple label filters. I.e. `foo{label1="value", label2=~"regexp"}`. * FEATURE: improve time series search for queries with multiple label filters. I.e. `foo{label1="value", label2=~"regexp"}`.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/781 See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/781
* FEATURE: vmagent: add `stream parse` mode. This mode allows reducing memory usage when individual scrape targets expose tens of millions of metrics.
For example, during scraping Prometheus in [federation](https://prometheus.io/docs/prometheus/latest/federation/) mode.
See `-promscrape.streamParse` command-line option and `stream_parse: true` config option for `scrape_config` section in `-promscrape.config`.
See also [troubleshooting docs for vmagent](https://victoriametrics.github.io/vmagent.html#troubleshooting).
* FEATURE: vmalert: add `-dryRun` command-line option for validating the provided config files without the need to start `vmalert` service.
* FEATURE: accept optional third argument of string type at `topk_*` and `bottomk_*` functions. This is label name for additional time series to return with the sum of time series outside top/bottom K. See [MetricsQL docs](https://victoriametrics.github.io/MetricsQL.html) for more details.
* FEATURE: vmagent: expose `/api/v1/targets` page according to [the corresponding Prometheus API](https://prometheus.io/docs/prometheus/latest/querying/api/#targets).
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/643
* BUGFIX: vmagent: properly handle OpenStack endpoint ending with `v3.0` such as `https://ostack.example.com:5000/v3.0` * BUGFIX: vmagent: properly handle OpenStack endpoint ending with `v3.0` such as `https://ostack.example.com:5000/v3.0`
in the same way as Prometheus does. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/728#issuecomment-709914803 in the same way as Prometheus does. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/728#issuecomment-709914803
* BUGFIX: drop trailing data points for time series with a single raw sample. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/748
* BUGFIX: do not drop trailing data points for instant queries to `/api/v1/query`. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/845
* BUGFIX: vmbackup: fix panic when `-origin` isn't specified. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/856
* BUGFIX: vmalert: skip automatically added labels on alerts restore. Label `alertgroup` was introduced in [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/611)
and automatically added to generated time series. By mistake, this new label wasn't correctly purged on restore event and affected alert's ID uniqueness.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/870
* BUGFIX: vmagent: fix panic at scrape error body formating. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/864
* BUGFIX: vmagent: add leading missing slash to metrics path like Prometheus does. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/835
* BUGFIX: vmagent: drop packet if remote storage returns 4xx status code. This make the behaviour consistent with Prometheus.
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/873
# [v1.44.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.44.0) # [v1.44.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.44.0)

View file

@ -164,7 +164,7 @@ or [docker image](https://hub.docker.com/r/victoriametrics/victoria-metrics/) wi
The following command-line flags are used the most: The following command-line flags are used the most:
* `-storageDataPath` - path to data directory. VictoriaMetrics stores all the data in this directory. Default path is `victoria-metrics-data` in the current working directory. * `-storageDataPath` - path to data directory. VictoriaMetrics stores all the data in this directory. Default path is `victoria-metrics-data` in the current working directory.
* `-retentionPeriod` - retention period in months for stored data. Older data is automatically deleted. Default period is 1 month. * `-retentionPeriod` - retention for stored data. Older data is automatically deleted. Default retention is 1 month. See [these docs](#retention) for more details.
Other flags have good enough default values, so set them only if you really need this. Pass `-help` to see all the available flags with description and default values. Other flags have good enough default values, so set them only if you really need this. Pass `-help` to see all the available flags with description and default values.
@ -495,6 +495,7 @@ VictoriaMetrics supports the following handlers from [Prometheus querying API](h
* [/api/v1/labels](https://prometheus.io/docs/prometheus/latest/querying/api/#getting-label-names) * [/api/v1/labels](https://prometheus.io/docs/prometheus/latest/querying/api/#getting-label-names)
* [/api/v1/label/.../values](https://prometheus.io/docs/prometheus/latest/querying/api/#querying-label-values) * [/api/v1/label/.../values](https://prometheus.io/docs/prometheus/latest/querying/api/#querying-label-values)
* [/api/v1/status/tsdb](https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats) * [/api/v1/status/tsdb](https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats)
* [/api/v1/targets](https://prometheus.io/docs/prometheus/latest/querying/api/#targets) - see [these docs](#how-to-scrape-prometheus-exporters-such-as-node-exporter) for more details.
These handlers can be queried from Prometheus-compatible clients such as Grafana or curl. These handlers can be queried from Prometheus-compatible clients such as Grafana or curl.
@ -1048,6 +1049,7 @@ The de-duplication reduces disk space usage if multiple identically configured P
write data to the same VictoriaMetrics instance. Note that these Prometheus instances must have identical write data to the same VictoriaMetrics instance. Note that these Prometheus instances must have identical
`external_labels` section in their configs, so they write data to the same time series. `external_labels` section in their configs, so they write data to the same time series.
### Retention ### Retention
Retention is configured with `-retentionPeriod` command-line flag. For instance, `-retentionPeriod=3` means Retention is configured with `-retentionPeriod` command-line flag. For instance, `-retentionPeriod=3` means
@ -1059,6 +1061,10 @@ For example if `-retentionPeriod` is set to 1, data for January is deleted on Ma
It is safe to extend `-retentionPeriod` on existing data. If `-retentionPeriod` is set to lower It is safe to extend `-retentionPeriod` on existing data. If `-retentionPeriod` is set to lower
value than before then data outside the configured period will be eventually deleted. value than before then data outside the configured period will be eventually deleted.
VictoriaMetrics supports retention smaller than 1 month. For example, `-retentionPeriod=5d` would set data retention for 5 days.
Older data is eventually deleted during [background merge](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282).
### Multiple retentions ### Multiple retentions
Just start multiple VictoriaMetrics instances with distinct values for the following flags: Just start multiple VictoriaMetrics instances with distinct values for the following flags:

View file

@ -211,9 +211,13 @@ either via `vmagent` itself or via Prometheus, so the exported metrics could be
Use official [Grafana dashboard](https://grafana.com/grafana/dashboards/12683) for `vmagent` state overview. Use official [Grafana dashboard](https://grafana.com/grafana/dashboards/12683) for `vmagent` state overview.
If you have suggestions, improvements or found a bug - feel free to open an issue on github or add review to the dashboard. If you have suggestions, improvements or found a bug - feel free to open an issue on github or add review to the dashboard.
`vmagent` also exports target statuses at `http://vmagent-host:8429/targets` page in plaintext format. `vmagent` also exports target statuses at the following handlers:
`/targets` handler accepts optional `show_original_labels=1` query arg, which shows the original labels per each target
before applying relabeling. This information may be useful for debugging target relabeling. * `http://vmagent-host:8429/targets`. This handler returns human-readable plaintext status for every active target.
This page is convenient to query from command line with `wget`, `curl` or similar tools.
It accepts optional `show_original_labels=1` query arg, which shows the original labels per each target before applying relabeling.
This information may be useful for debugging target relabeling.
* `http://vmagent-host:8429/api/v1/targets`. This handler returns data compatible with [the corresponding page from Prometheus API](https://prometheus.io/docs/prometheus/latest/querying/api/#targets).
### Troubleshooting ### Troubleshooting
@ -224,7 +228,26 @@ before applying relabeling. This information may be useful for debugging target
since `vmagent` establishes at least a single TCP connection per each target. since `vmagent` establishes at least a single TCP connection per each target.
* When `vmagent` scrapes many unreliable targets, it can flood error log with scrape errors. These errors can be suppressed * When `vmagent` scrapes many unreliable targets, it can flood error log with scrape errors. These errors can be suppressed
by passing `-promscrape.suppressScrapeErrors` command-line flag to `vmagent`. The most recent scrape error per each target can be observed at `http://vmagent-host:8429/targets`. by passing `-promscrape.suppressScrapeErrors` command-line flag to `vmagent`. The most recent scrape error per each target can be observed at `http://vmagent-host:8429/targets`
and `http://vmagent-host:8429/api/v1/targets`.
* If `vmagent` scrapes targets with millions of metrics per each target (for instance, when scraping [federation endpoints](https://prometheus.io/docs/prometheus/latest/federation/)),
then it is recommended enabling `stream parsing mode` in order to reduce memory usage during scraping. This mode may be enabled either globally for all the scrape targets
by passing `-promscrape.streamParse` command-line flag or on a per-scrape target basis with `stream_parse: true` option. For example:
```yml
scrape_configs:
- job_name: 'big-federate'
stream_parse: true
static_configs:
- targets:
- big-prometeus1
- big-prometeus2
honor_labels: true
metrics_path: /federate
params:
'match[]': ['{__name__!=""}']
```
* It is recommended to increase `-remoteWrite.queues` if `vmagent_remotewrite_pending_data_bytes` metric exported at `http://vmagent-host:8429/metrics` page constantly grows. * It is recommended to increase `-remoteWrite.queues` if `vmagent_remotewrite_pending_data_bytes` metric exported at `http://vmagent-host:8429/metrics` page constantly grows.

View file

@ -211,6 +211,12 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
showOriginalLabels, _ := strconv.ParseBool(r.FormValue("show_original_labels")) showOriginalLabels, _ := strconv.ParseBool(r.FormValue("show_original_labels"))
promscrape.WriteHumanReadableTargetsStatus(w, showOriginalLabels) promscrape.WriteHumanReadableTargetsStatus(w, showOriginalLabels)
return true return true
case "/api/v1/targets":
promscrapeAPIV1TargetsRequests.Inc()
w.Header().Set("Content-Type", "application/json")
state := r.FormValue("state")
promscrape.WriteAPIV1Targets(w, state)
return true
case "/-/reload": case "/-/reload":
promscrapeConfigReloadRequests.Inc() promscrapeConfigReloadRequests.Inc()
procutil.SelfSIGHUP() procutil.SelfSIGHUP()
@ -241,7 +247,8 @@ var (
influxQueryRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/query", protocol="influx"}`) influxQueryRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/query", protocol="influx"}`)
promscrapeTargetsRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/targets"}`) promscrapeTargetsRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/targets"}`)
promscrapeAPIV1TargetsRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/api/v1/targets"}`)
promscrapeConfigReloadRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/-/reload"}`) promscrapeConfigReloadRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/-/reload"}`)
) )

View file

@ -53,6 +53,7 @@ type client struct {
requestDuration *metrics.Histogram requestDuration *metrics.Histogram
requestsOKCount *metrics.Counter requestsOKCount *metrics.Counter
errorsCount *metrics.Counter errorsCount *metrics.Counter
packetsDropped *metrics.Counter
retriesCount *metrics.Counter retriesCount *metrics.Counter
wg sync.WaitGroup wg sync.WaitGroup
@ -114,6 +115,7 @@ func newClient(argIdx int, remoteWriteURL, sanitizedURL string, fq *persistentqu
c.requestDuration = metrics.GetOrCreateHistogram(fmt.Sprintf(`vmagent_remotewrite_duration_seconds{url=%q}`, c.sanitizedURL)) c.requestDuration = metrics.GetOrCreateHistogram(fmt.Sprintf(`vmagent_remotewrite_duration_seconds{url=%q}`, c.sanitizedURL))
c.requestsOKCount = metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_requests_total{url=%q, status_code="2XX"}`, c.sanitizedURL)) c.requestsOKCount = metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_requests_total{url=%q, status_code="2XX"}`, c.sanitizedURL))
c.errorsCount = metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_errors_total{url=%q}`, c.sanitizedURL)) c.errorsCount = metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_errors_total{url=%q}`, c.sanitizedURL))
c.packetsDropped = metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_packets_dropped_total{url=%q}`, c.sanitizedURL))
c.retriesCount = metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_retries_count_total{url=%q}`, c.sanitizedURL)) c.retriesCount = metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_retries_count_total{url=%q}`, c.sanitizedURL))
for i := 0; i < concurrency; i++ { for i := 0; i < concurrency; i++ {
c.wg.Add(1) c.wg.Add(1)
@ -228,10 +230,20 @@ again:
c.requestsOKCount.Inc() c.requestsOKCount.Inc()
return return
} }
metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_requests_total{url=%q, status_code="%d"}`, c.sanitizedURL, statusCode)).Inc()
if statusCode/100 == 4 {
// Just drop block on 4xx status code like Prometheus does.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/873
body, _ := ioutil.ReadAll(resp.Body)
_ = resp.Body.Close()
logger.Errorf("unexpected status code received when sending a block with size %d bytes to %q: #%d; dropping the block for 4XX status code like Prometheus does; "+
"response body=%q", len(block), c.sanitizedURL, statusCode, body)
c.packetsDropped.Inc()
return
}
// Unexpected status code returned // Unexpected status code returned
retriesCount++ retriesCount++
metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_requests_total{url=%q, status_code="%d"}`, c.sanitizedURL, statusCode)).Inc()
retryDuration *= 2 retryDuration *= 2
if retryDuration > time.Minute { if retryDuration > time.Minute {
retryDuration = time.Minute retryDuration = time.Minute

View file

@ -403,7 +403,7 @@ func (ar *AlertingRule) Restore(ctx context.Context, q datasource.Querier, lookb
labelsFilter += fmt.Sprintf(",%s=%q", k, v) labelsFilter += fmt.Sprintf(",%s=%q", k, v)
} }
// Get the last datapoint in range via MetricsQL `last_over_time`. // Get the last data point in range via MetricsQL `last_over_time`.
// We don't use plain PromQL since Prometheus doesn't support // We don't use plain PromQL since Prometheus doesn't support
// remote write protocol which is used for state persistence in vmalert. // remote write protocol which is used for state persistence in vmalert.
expr := fmt.Sprintf("last_over_time(%s{alertname=%q%s}[%ds])", expr := fmt.Sprintf("last_over_time(%s{alertname=%q%s}[%ds])",
@ -417,11 +417,14 @@ func (ar *AlertingRule) Restore(ctx context.Context, q datasource.Querier, lookb
labels := m.Labels labels := m.Labels
m.Labels = make([]datasource.Label, 0) m.Labels = make([]datasource.Label, 0)
// drop all extra labels, so hash key will // drop all extra labels, so hash key will
// be identical to timeseries received in Exec // be identical to time series received in Exec
for _, l := range labels { for _, l := range labels {
if l.Name == alertNameLabel { if l.Name == alertNameLabel {
continue continue
} }
if l.Name == alertGroupNameLabel {
continue
}
// drop all overridden labels // drop all overridden labels
if _, ok := ar.Labels[l.Name]; ok { if _, ok := ar.Labels[l.Name]; ok {
continue continue
@ -436,7 +439,7 @@ func (ar *AlertingRule) Restore(ctx context.Context, q datasource.Querier, lookb
a.ID = hash(m) a.ID = hash(m)
a.State = notifier.StatePending a.State = notifier.StatePending
ar.alerts[a.ID] = a ar.alerts[a.ID] = a
logger.Infof("alert %q(%d) restored to state at %v", a.Name, a.ID, a.Start) logger.Infof("alert %q (%d) restored to state at %v", a.Name, a.ID, a.Start)
} }
return nil return nil
} }

View file

@ -355,6 +355,7 @@ func TestAlertingRule_Restore(t *testing.T) {
metricWithValueAndLabels(t, float64(time.Now().Truncate(time.Hour).Unix()), metricWithValueAndLabels(t, float64(time.Now().Truncate(time.Hour).Unix()),
"__name__", alertForStateMetricName, "__name__", alertForStateMetricName,
alertNameLabel, "", alertNameLabel, "",
alertGroupNameLabel, "groupID",
"foo", "bar", "foo", "bar",
"namespace", "baz", "namespace", "baz",
), ),

View file

@ -11,6 +11,7 @@ import (
"time" "time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/notifier" "github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/notifier"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/utils"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envtemplate" "github.com/VictoriaMetrics/VictoriaMetrics/lib/envtemplate"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger" "github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/metricsql" "github.com/VictoriaMetrics/metricsql"
@ -193,25 +194,32 @@ func Parse(pathPatterns []string, validateAnnotations, validateExpressions bool)
} }
fp = append(fp, matches...) fp = append(fp, matches...)
} }
errGroup := new(utils.ErrGroup)
var groups []Group var groups []Group
for _, file := range fp { for _, file := range fp {
uniqueGroups := map[string]struct{}{} uniqueGroups := map[string]struct{}{}
gr, err := parseFile(file) gr, err := parseFile(file)
if err != nil { if err != nil {
return nil, fmt.Errorf("failed to parse file %q: %w", file, err) errGroup.Add(fmt.Errorf("failed to parse file %q: %w", file, err))
continue
} }
for _, g := range gr { for _, g := range gr {
if err := g.Validate(validateAnnotations, validateExpressions); err != nil { if err := g.Validate(validateAnnotations, validateExpressions); err != nil {
return nil, fmt.Errorf("invalid group %q in file %q: %w", g.Name, file, err) errGroup.Add(fmt.Errorf("invalid group %q in file %q: %w", g.Name, file, err))
continue
} }
if _, ok := uniqueGroups[g.Name]; ok { if _, ok := uniqueGroups[g.Name]; ok {
return nil, fmt.Errorf("group name %q duplicate in file %q", g.Name, file) errGroup.Add(fmt.Errorf("group name %q duplicate in file %q", g.Name, file))
continue
} }
uniqueGroups[g.Name] = struct{}{} uniqueGroups[g.Name] = struct{}{}
g.File = file g.File = file
groups = append(groups, g) groups = append(groups, g)
} }
} }
if err := errGroup.Err(); err != nil {
return nil, err
}
if len(groups) < 1 { if len(groups) < 1 {
logger.Warnf("no groups found in %s", strings.Join(pathPatterns, ";")) logger.Warnf("no groups found in %s", strings.Join(pathPatterns, ";"))
} }

View file

@ -10,6 +10,7 @@ import (
"strings" "strings"
"time" "time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/config"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource" "github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/notifier" "github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/notifier"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/remoteread" "github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/remoteread"
@ -47,6 +48,8 @@ eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{
remoteReadLookBack = flag.Duration("remoteRead.lookback", time.Hour, "Lookback defines how far to look into past for alerts timeseries."+ remoteReadLookBack = flag.Duration("remoteRead.lookback", time.Hour, "Lookback defines how far to look into past for alerts timeseries."+
" For example, if lookback=1h then range from now() to now()-1h will be scanned.") " For example, if lookback=1h then range from now() to now()-1h will be scanned.")
dryRun = flag.Bool("dryRun", false, "Whether to check only config files without running vmalert. The rules file are validated. The `-rule` flag must be specified.")
) )
func main() { func main() {
@ -58,6 +61,18 @@ func main() {
logger.Init() logger.Init()
cgroup.UpdateGOMAXPROCSToCPUQuota() cgroup.UpdateGOMAXPROCSToCPUQuota()
if *dryRun {
u, _ := url.Parse("https://victoriametrics.com/")
notifier.InitTemplateFunc(u)
groups, err := config.Parse(*rulePath, true, true)
if err != nil {
logger.Fatalf(err.Error())
}
if len(groups) == 0 {
logger.Fatalf("No rules for validation. Please specify path to file(s) with alerting and/or recording rules using `-rule` flag")
}
return
}
ctx, cancel := context.WithCancel(context.Background()) ctx, cancel := context.WithCancel(context.Background())
manager, err := newManager(ctx) manager, err := newManager(ctx)
if err != nil { if err != nil {

View file

@ -10,6 +10,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/backup/actions" "github.com/VictoriaMetrics/VictoriaMetrics/lib/backup/actions"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/backup/common" "github.com/VictoriaMetrics/VictoriaMetrics/lib/backup/common"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/backup/fslocal" "github.com/VictoriaMetrics/VictoriaMetrics/lib/backup/fslocal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/backup/fsnil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/buildinfo" "github.com/VictoriaMetrics/VictoriaMetrics/lib/buildinfo"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup" "github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envflag" "github.com/VictoriaMetrics/VictoriaMetrics/lib/envflag"
@ -146,9 +147,9 @@ func newDstFS() (common.RemoteFS, error) {
return fs, nil return fs, nil
} }
func newOriginFS() (common.RemoteFS, error) { func newOriginFS() (common.OriginFS, error) {
if len(*origin) == 0 { if len(*origin) == 0 {
return nil, nil return &fsnil.FS{}, nil
} }
fs, err := actions.NewRemoteFS(*origin) fs, err := actions.NewRemoteFS(*origin)
if err != nil { if err != nil {

View file

@ -159,6 +159,12 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
showOriginalLabels, _ := strconv.ParseBool(r.FormValue("show_original_labels")) showOriginalLabels, _ := strconv.ParseBool(r.FormValue("show_original_labels"))
promscrape.WriteHumanReadableTargetsStatus(w, showOriginalLabels) promscrape.WriteHumanReadableTargetsStatus(w, showOriginalLabels)
return true return true
case "/api/v1/targets":
promscrapeAPIV1TargetsRequests.Inc()
w.Header().Set("Content-Type", "application/json")
state := r.FormValue("state")
promscrape.WriteAPIV1Targets(w, state)
return true
case "/-/reload": case "/-/reload":
promscrapeConfigReloadRequests.Inc() promscrapeConfigReloadRequests.Inc()
procutil.SelfSIGHUP() procutil.SelfSIGHUP()
@ -191,7 +197,8 @@ var (
influxQueryRequests = metrics.NewCounter(`vm_http_requests_total{path="/query", protocol="influx"}`) influxQueryRequests = metrics.NewCounter(`vm_http_requests_total{path="/query", protocol="influx"}`)
promscrapeTargetsRequests = metrics.NewCounter(`vm_http_requests_total{path="/targets"}`) promscrapeTargetsRequests = metrics.NewCounter(`vm_http_requests_total{path="/targets"}`)
promscrapeAPIV1TargetsRequests = metrics.NewCounter(`vm_http_requests_total{path="/api/v1/targets"}`)
promscrapeConfigReloadRequests = metrics.NewCounter(`vm_http_requests_total{path="/-/reload"}`) promscrapeConfigReloadRequests = metrics.NewCounter(`vm_http_requests_total{path="/-/reload"}`)

View file

@ -3,7 +3,7 @@
`vmrestore` restores data from backups created by [vmbackup](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmbackup/README.md). `vmrestore` restores data from backups created by [vmbackup](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmbackup/README.md).
VictoriaMetrics `v1.29.0` and newer versions must be used for working with the restored data. VictoriaMetrics `v1.29.0` and newer versions must be used for working with the restored data.
Restore process can be interrupted at any time. It is automatically resumed from the inerruption point Restore process can be interrupted at any time. It is automatically resumed from the interruption point
when restarting `vmrestore` with the same args. when restarting `vmrestore` with the same args.

View file

@ -69,7 +69,9 @@ func newAggrFunc(afe func(tss []*timeseries) []*timeseries) aggrFunc {
if err != nil { if err != nil {
return nil, err return nil, err
} }
return aggrFuncExt(afe, tss, &afa.ae.Modifier, afa.ae.Limit, false) return aggrFuncExt(func(tss []*timeseries, modififer *metricsql.ModifierExpr) []*timeseries {
return afe(tss)
}, tss, &afa.ae.Modifier, afa.ae.Limit, false)
} }
} }
@ -98,7 +100,8 @@ func removeGroupTags(metricName *storage.MetricName, modifier *metricsql.Modifie
} }
} }
func aggrFuncExt(afe func(tss []*timeseries) []*timeseries, argOrig []*timeseries, modifier *metricsql.ModifierExpr, maxSeries int, keepOriginal bool) ([]*timeseries, error) { func aggrFuncExt(afe func(tss []*timeseries, modifier *metricsql.ModifierExpr) []*timeseries, argOrig []*timeseries,
modifier *metricsql.ModifierExpr, maxSeries int, keepOriginal bool) ([]*timeseries, error) {
arg := copyTimeseriesMetricNames(argOrig, keepOriginal) arg := copyTimeseriesMetricNames(argOrig, keepOriginal)
// Perform grouping. // Perform grouping.
@ -124,7 +127,7 @@ func aggrFuncExt(afe func(tss []*timeseries) []*timeseries, argOrig []*timeserie
dstTssCount := 0 dstTssCount := 0
rvs := make([]*timeseries, 0, len(m)) rvs := make([]*timeseries, 0, len(m))
for _, tss := range m { for _, tss := range m {
rv := afe(tss) rv := afe(tss, modifier)
rvs = append(rvs, rv...) rvs = append(rvs, rv...)
srcTssCount += len(tss) srcTssCount += len(tss)
dstTssCount += len(rv) dstTssCount += len(rv)
@ -141,7 +144,7 @@ func aggrFuncAny(afa *aggrFuncArg) ([]*timeseries, error) {
if err != nil { if err != nil {
return nil, err return nil, err
} }
afe := func(tss []*timeseries) []*timeseries { afe := func(tss []*timeseries, modifier *metricsql.ModifierExpr) []*timeseries {
return tss[:1] return tss[:1]
} }
limit := afa.ae.Limit limit := afa.ae.Limit
@ -178,10 +181,11 @@ func aggrFuncSum(tss []*timeseries) []*timeseries {
sum := float64(0) sum := float64(0)
count := 0 count := 0
for _, ts := range tss { for _, ts := range tss {
if math.IsNaN(ts.Values[i]) { v := ts.Values[i]
if math.IsNaN(v) {
continue continue
} }
sum += ts.Values[i] sum += v
count++ count++
} }
if count == 0 { if count == 0 {
@ -449,7 +453,7 @@ func aggrFuncZScore(afa *aggrFuncArg) ([]*timeseries, error) {
if err != nil { if err != nil {
return nil, err return nil, err
} }
afe := func(tss []*timeseries) []*timeseries { afe := func(tss []*timeseries, modifier *metricsql.ModifierExpr) []*timeseries {
for i := range tss[0].Values { for i := range tss[0].Values {
// Calculate avg and stddev for tss points at position i. // Calculate avg and stddev for tss points at position i.
// See `Rapid calculation methods` at https://en.wikipedia.org/wiki/Standard_deviation // See `Rapid calculation methods` at https://en.wikipedia.org/wiki/Standard_deviation
@ -550,7 +554,7 @@ func aggrFuncCountValues(afa *aggrFuncArg) ([]*timeseries, error) {
// Do nothing // Do nothing
} }
afe := func(tss []*timeseries) []*timeseries { afe := func(tss []*timeseries, modififer *metricsql.ModifierExpr) []*timeseries {
m := make(map[float64]bool) m := make(map[float64]bool)
for _, ts := range tss { for _, ts := range tss {
for _, v := range ts.Values { for _, v := range ts.Values {
@ -602,7 +606,7 @@ func newAggrFuncTopK(isReverse bool) aggrFunc {
if err != nil { if err != nil {
return nil, err return nil, err
} }
afe := func(tss []*timeseries) []*timeseries { afe := func(tss []*timeseries, modififer *metricsql.ModifierExpr) []*timeseries {
for n := range tss[0].Values { for n := range tss[0].Values {
sort.Slice(tss, func(i, j int) bool { sort.Slice(tss, func(i, j int) bool {
a := tss[i].Values[n] a := tss[i].Values[n]
@ -623,21 +627,32 @@ func newAggrFuncTopK(isReverse bool) aggrFunc {
func newAggrFuncRangeTopK(f func(values []float64) float64, isReverse bool) aggrFunc { func newAggrFuncRangeTopK(f func(values []float64) float64, isReverse bool) aggrFunc {
return func(afa *aggrFuncArg) ([]*timeseries, error) { return func(afa *aggrFuncArg) ([]*timeseries, error) {
args := afa.args args := afa.args
if err := expectTransformArgsNum(args, 2); err != nil { if len(args) < 2 {
return nil, err return nil, fmt.Errorf(`unexpected number of args; got %d; want at least %d`, len(args), 2)
}
if len(args) > 3 {
return nil, fmt.Errorf(`unexpected number of args; got %d; want no more than %d`, len(args), 3)
} }
ks, err := getScalar(args[0], 0) ks, err := getScalar(args[0], 0)
if err != nil { if err != nil {
return nil, err return nil, err
} }
afe := func(tss []*timeseries) []*timeseries { remainingSumTagName := ""
return getRangeTopKTimeseries(tss, ks, f, isReverse) if len(args) == 3 {
remainingSumTagName, err = getString(args[2], 2)
if err != nil {
return nil, err
}
}
afe := func(tss []*timeseries, modifier *metricsql.ModifierExpr) []*timeseries {
return getRangeTopKTimeseries(tss, modifier, ks, remainingSumTagName, f, isReverse)
} }
return aggrFuncExt(afe, args[1], &afa.ae.Modifier, afa.ae.Limit, true) return aggrFuncExt(afe, args[1], &afa.ae.Modifier, afa.ae.Limit, true)
} }
} }
func getRangeTopKTimeseries(tss []*timeseries, ks []float64, f func(values []float64) float64, isReverse bool) []*timeseries { func getRangeTopKTimeseries(tss []*timeseries, modifier *metricsql.ModifierExpr, ks []float64, remainingSumTagName string,
f func(values []float64) float64, isReverse bool) []*timeseries {
type tsWithValue struct { type tsWithValue struct {
ts *timeseries ts *timeseries
value float64 value float64
@ -661,28 +676,66 @@ func getRangeTopKTimeseries(tss []*timeseries, ks []float64, f func(values []flo
for i := range maxs { for i := range maxs {
tss[i] = maxs[i].ts tss[i] = maxs[i].ts
} }
remainingSumTS := getRemainingSumTimeseries(tss, modifier, ks, remainingSumTagName)
for i, k := range ks { for i, k := range ks {
fillNaNsAtIdx(i, k, tss) fillNaNsAtIdx(i, k, tss)
} }
if remainingSumTS != nil {
tss = append(tss, remainingSumTS)
}
return removeNaNs(tss) return removeNaNs(tss)
} }
func getRemainingSumTimeseries(tss []*timeseries, modifier *metricsql.ModifierExpr, ks []float64, remainingSumTagName string) *timeseries {
if len(remainingSumTagName) == 0 || len(tss) == 0 {
return nil
}
var dst timeseries
dst.CopyFromShallowTimestamps(tss[0])
removeGroupTags(&dst.MetricName, modifier)
dst.MetricName.RemoveTag(remainingSumTagName)
dst.MetricName.AddTag(remainingSumTagName, remainingSumTagName)
for i, k := range ks {
kn := getIntK(k, len(tss))
var sum float64
count := 0
for _, ts := range tss[:len(tss)-kn] {
v := ts.Values[i]
if math.IsNaN(v) {
continue
}
sum += v
count++
}
if count == 0 {
sum = nan
}
dst.Values[i] = sum
}
return &dst
}
func fillNaNsAtIdx(idx int, k float64, tss []*timeseries) { func fillNaNsAtIdx(idx int, k float64, tss []*timeseries) {
if math.IsNaN(k) { kn := getIntK(k, len(tss))
k = 0
}
kn := int(k)
if kn < 0 {
kn = 0
}
if kn > len(tss) {
kn = len(tss)
}
for _, ts := range tss[:len(tss)-kn] { for _, ts := range tss[:len(tss)-kn] {
ts.Values[idx] = nan ts.Values[idx] = nan
} }
} }
func getIntK(k float64, kMax int) int {
if math.IsNaN(k) {
return 0
}
kn := int(k)
if kn < 0 {
return 0
}
if kn > kMax {
return kMax
}
return kn
}
func minValue(values []float64) float64 { func minValue(values []float64) float64 {
if len(values) == 0 { if len(values) == 0 {
return nan return nan
@ -746,7 +799,7 @@ func aggrFuncOutliersK(afa *aggrFuncArg) ([]*timeseries, error) {
if err != nil { if err != nil {
return nil, err return nil, err
} }
afe := func(tss []*timeseries) []*timeseries { afe := func(tss []*timeseries, modifier *metricsql.ModifierExpr) []*timeseries {
// Calculate medians for each point across tss. // Calculate medians for each point across tss.
medians := make([]float64, len(ks)) medians := make([]float64, len(ks))
h := histogram.GetFast() h := histogram.GetFast()
@ -771,7 +824,7 @@ func aggrFuncOutliersK(afa *aggrFuncArg) ([]*timeseries, error) {
} }
return sum2 return sum2
} }
return getRangeTopKTimeseries(tss, ks, f, false) return getRangeTopKTimeseries(tss, &afa.ae.Modifier, ks, "", f, false)
} }
return aggrFuncExt(afe, args[1], &afa.ae.Modifier, afa.ae.Limit, true) return aggrFuncExt(afe, args[1], &afa.ae.Modifier, afa.ae.Limit, true)
} }
@ -792,7 +845,7 @@ func aggrFuncLimitK(afa *aggrFuncArg) ([]*timeseries, error) {
maxK = k maxK = k
} }
} }
afe := func(tss []*timeseries) []*timeseries { afe := func(tss []*timeseries, modifier *metricsql.ModifierExpr) []*timeseries {
if len(tss) > maxK { if len(tss) > maxK {
tss = tss[:maxK] tss = tss[:maxK]
} }
@ -833,8 +886,8 @@ func aggrFuncMedian(afa *aggrFuncArg) ([]*timeseries, error) {
return aggrFuncExt(afe, tss, &afa.ae.Modifier, afa.ae.Limit, false) return aggrFuncExt(afe, tss, &afa.ae.Modifier, afa.ae.Limit, false)
} }
func newAggrQuantileFunc(phis []float64) func(tss []*timeseries) []*timeseries { func newAggrQuantileFunc(phis []float64) func(tss []*timeseries, modifier *metricsql.ModifierExpr) []*timeseries {
return func(tss []*timeseries) []*timeseries { return func(tss []*timeseries, modifier *metricsql.ModifierExpr) []*timeseries {
dst := tss[0] dst := tss[0]
h := histogram.GetFast() h := histogram.GetFast()
defer histogram.PutFast(h) defer histogram.PutFast(h)

View file

@ -4193,7 +4193,7 @@ func TestExecSuccess(t *testing.T) {
}) })
t.Run(`topk_max(1)`, func(t *testing.T) { t.Run(`topk_max(1)`, func(t *testing.T) {
t.Parallel() t.Parallel()
q := `sort(topk_max(1, label_set(10, "foo", "bar") or label_set(time()/150, "baz", "sss")))` q := `topk_max(1, label_set(10, "foo", "bar") or label_set(time()/150, "baz", "sss"))`
r1 := netstorage.Result{ r1 := netstorage.Result{
MetricName: metricNameExpected, MetricName: metricNameExpected,
Values: []float64{nan, nan, nan, 10.666666666666666, 12, 13.333333333333334}, Values: []float64{nan, nan, nan, 10.666666666666666, 12, 13.333333333333334},
@ -4206,6 +4206,84 @@ func TestExecSuccess(t *testing.T) {
resultExpected := []netstorage.Result{r1} resultExpected := []netstorage.Result{r1}
f(q, resultExpected) f(q, resultExpected)
}) })
t.Run(`topk_max(1, remaining_sum)`, func(t *testing.T) {
t.Parallel()
q := `sort_desc(topk_max(1, label_set(10, "foo", "bar") or label_set(time()/150, "baz", "sss"), "remaining_sum"))`
r1 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{nan, nan, nan, 10.666666666666666, 12, 13.333333333333334},
Timestamps: timestampsExpected,
}
r1.MetricName.Tags = []storage.Tag{{
Key: []byte("baz"),
Value: []byte("sss"),
}}
r2 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{10, 10, 10, 10, 10, 10},
Timestamps: timestampsExpected,
}
r2.MetricName.Tags = []storage.Tag{
{
Key: []byte("remaining_sum"),
Value: []byte("remaining_sum"),
},
}
resultExpected := []netstorage.Result{r1, r2}
f(q, resultExpected)
})
t.Run(`topk_max(2, remaining_sum)`, func(t *testing.T) {
t.Parallel()
q := `sort_desc(topk_max(2, label_set(10, "foo", "bar") or label_set(time()/150, "baz", "sss"), "remaining_sum"))`
r1 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{nan, nan, nan, 10.666666666666666, 12, 13.333333333333334},
Timestamps: timestampsExpected,
}
r1.MetricName.Tags = []storage.Tag{{
Key: []byte("baz"),
Value: []byte("sss"),
}}
r2 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{10, 10, 10, 10, 10, 10},
Timestamps: timestampsExpected,
}
r2.MetricName.Tags = []storage.Tag{
{
Key: []byte("foo"),
Value: []byte("bar"),
},
}
resultExpected := []netstorage.Result{r1, r2}
f(q, resultExpected)
})
t.Run(`topk_max(3, remaining_sum)`, func(t *testing.T) {
t.Parallel()
q := `sort_desc(topk_max(3, label_set(10, "foo", "bar") or label_set(time()/150, "baz", "sss"), "remaining_sum"))`
r1 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{nan, nan, nan, 10.666666666666666, 12, 13.333333333333334},
Timestamps: timestampsExpected,
}
r1.MetricName.Tags = []storage.Tag{{
Key: []byte("baz"),
Value: []byte("sss"),
}}
r2 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{10, 10, 10, 10, 10, 10},
Timestamps: timestampsExpected,
}
r2.MetricName.Tags = []storage.Tag{
{
Key: []byte("foo"),
Value: []byte("bar"),
},
}
resultExpected := []netstorage.Result{r1, r2}
f(q, resultExpected)
})
t.Run(`bottomk_max(1)`, func(t *testing.T) { t.Run(`bottomk_max(1)`, func(t *testing.T) {
t.Parallel() t.Parallel()
q := `sort(bottomk_max(1, label_set(10, "foo", "bar") or label_set(time()/150, "baz", "sss")))` q := `sort(bottomk_max(1, label_set(10, "foo", "bar") or label_set(time()/150, "baz", "sss")))`

View file

@ -519,12 +519,13 @@ func (rc *rollupConfig) doInternal(dstValues []float64, tsm *timeseriesMap, valu
} }
rfa.values = values[i:j] rfa.values = values[i:j]
rfa.timestamps = timestamps[i:j] rfa.timestamps = timestamps[i:j]
if j == len(timestamps) && j > 0 && (tEnd-timestamps[j-1] > stalenessInterval || i == j && len(timestamps) == 1) { if j == len(timestamps) && j > 0 && (tEnd-timestamps[j-1] > stalenessInterval || i == j && len(timestamps) == 1) && rc.End-tEnd >= 2*rc.Step {
// Drop trailing data points in the following cases: // Drop trailing data points in the following cases:
// - if the distance between the last raw sample and tEnd exceeds stalenessInterval // - if the distance between the last raw sample and tEnd exceeds stalenessInterval
// - if time series contains only a single raw sample // - if time series contains only a single raw sample
// This should prevent from double counting when a label changes in time series (for instance, // This should prevent from double counting when a label changes in time series (for instance,
// during new deployment in K8S). See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/748 // during new deployment in K8S). See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/748
// Do not drop trailing data points for instant queries. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/845
rfa.prevValue = nan rfa.prevValue = nan
rfa.values = nil rfa.values = nil
rfa.timestamps = nil rfa.timestamps = nil

View file

@ -11,6 +11,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmstorage/promdb" "github.com/VictoriaMetrics/VictoriaMetrics/app/vmstorage/promdb"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding" "github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime" "github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs" "github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver" "github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger" "github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
@ -20,7 +21,7 @@ import (
) )
var ( var (
retentionPeriod = flag.Int("retentionPeriod", 1, "Retention period in months") retentionPeriod = flagutil.NewDuration("retentionPeriod", 1, "Data with timestamps outside the retentionPeriod is automatically deleted")
snapshotAuthKey = flag.String("snapshotAuthKey", "", "authKey, which must be passed in query string to /snapshot* pages") snapshotAuthKey = flag.String("snapshotAuthKey", "", "authKey, which must be passed in query string to /snapshot* pages")
forceMergeAuthKey = flag.String("forceMergeAuthKey", "", "authKey, which must be passed in query string to /internal/force_merge pages") forceMergeAuthKey = flag.String("forceMergeAuthKey", "", "authKey, which must be passed in query string to /internal/force_merge pages")
@ -45,12 +46,12 @@ func CheckTimeRange(tr storage.TimeRange) error {
if !*denyQueriesOutsideRetention { if !*denyQueriesOutsideRetention {
return nil return nil
} }
minAllowedTimestamp := (int64(fasttime.UnixTimestamp()) - int64(*retentionPeriod)*3600*24*30) * 1000 minAllowedTimestamp := int64(fasttime.UnixTimestamp()*1000) - retentionPeriod.Msecs
if tr.MinTimestamp > minAllowedTimestamp { if tr.MinTimestamp > minAllowedTimestamp {
return nil return nil
} }
return &httpserver.ErrorWithStatusCode{ return &httpserver.ErrorWithStatusCode{
Err: fmt.Errorf("the given time range %s is outside the allowed retention of %d months according to -denyQueriesOutsideRetention", &tr, *retentionPeriod), Err: fmt.Errorf("the given time range %s is outside the allowed -retentionPeriod=%s according to -denyQueriesOutsideRetention", &tr, retentionPeriod),
StatusCode: http.StatusServiceUnavailable, StatusCode: http.StatusServiceUnavailable,
} }
} }
@ -73,12 +74,12 @@ func InitWithoutMetrics() {
storage.SetBigMergeWorkersCount(*bigMergeConcurrency) storage.SetBigMergeWorkersCount(*bigMergeConcurrency)
storage.SetSmallMergeWorkersCount(*smallMergeConcurrency) storage.SetSmallMergeWorkersCount(*smallMergeConcurrency)
logger.Infof("opening storage at %q with retention period %d months", *DataPath, *retentionPeriod) logger.Infof("opening storage at %q with -retentionPeriod=%s", *DataPath, retentionPeriod)
startTime := time.Now() startTime := time.Now()
WG = syncwg.WaitGroup{} WG = syncwg.WaitGroup{}
strg, err := storage.OpenStorage(*DataPath, *retentionPeriod) strg, err := storage.OpenStorage(*DataPath, retentionPeriod.Msecs)
if err != nil { if err != nil {
logger.Fatalf("cannot open a storage at %s with retention period %d months: %s", *DataPath, *retentionPeriod, err) logger.Fatalf("cannot open a storage at %s with -retentionPeriod=%s: %s", *DataPath, retentionPeriod, err)
} }
Storage = strg Storage = strg

View file

@ -56,7 +56,7 @@
"gnetId": 10229, "gnetId": 10229,
"graphTooltip": 0, "graphTooltip": 0,
"id": null, "id": null,
"iteration": 1599034965731, "iteration": 1603307754894,
"links": [ "links": [
{ {
"icon": "doc", "icon": "doc",
@ -925,7 +925,7 @@
"dashLength": 10, "dashLength": 10,
"dashes": false, "dashes": false,
"datasource": "$ds", "datasource": "$ds",
"description": "Shows how many ongoing insertions are taking place.\n* `max` - equal to number of CPU * 2\n* `current` - current number of goroutines busy with inserting rows into storage\n\nWhen `current` hits `max` constantly, it means storage is overloaded and require more CPU.", "description": "Shows how many ongoing insertions (not API /write calls) on disk are taking place, where:\n* `max` - equal to number of CPUs;\n* `current` - current number of goroutines busy with inserting rows into underlying storage.\n\nEvery successful API /write call results into flush on disk. However, these two actions are separated and controlled via different concurrency limiters. The `max` on this panel can't be changed and always equal to number of CPUs. \n\nWhen `current` hits `max` constantly, it means storage is overloaded and requires more CPU.\n\n",
"fieldConfig": { "fieldConfig": {
"defaults": { "defaults": {
"custom": {}, "custom": {},
@ -979,6 +979,7 @@
{ {
"expr": "sum(vm_concurrent_addrows_capacity{job=\"$job\", instance=\"$instance\"})", "expr": "sum(vm_concurrent_addrows_capacity{job=\"$job\", instance=\"$instance\"})",
"format": "time_series", "format": "time_series",
"interval": "",
"intervalFactor": 1, "intervalFactor": 1,
"legendFormat": "max", "legendFormat": "max",
"refId": "A" "refId": "A"
@ -995,7 +996,7 @@
"timeFrom": null, "timeFrom": null,
"timeRegions": [], "timeRegions": [],
"timeShift": null, "timeShift": null,
"title": "Concurrent inserts ($instance)", "title": "Concurrent flushes on disk ($instance)",
"tooltip": { "tooltip": {
"shared": true, "shared": true,
"sort": 2, "sort": 2,
@ -1164,7 +1165,7 @@
"h": 8, "h": 8,
"w": 12, "w": 12,
"x": 0, "x": 0,
"y": 36 "y": 3
}, },
"hiddenSeries": false, "hiddenSeries": false,
"id": 10, "id": 10,
@ -1250,7 +1251,7 @@
"dashLength": 10, "dashLength": 10,
"dashes": false, "dashes": false,
"datasource": "$ds", "datasource": "$ds",
"description": "How many datapoints are in RAM queue waiting to be written into storage. The number of pending data points should be in the range from 0 to `2*<ingestion_rate>`, since VictoriaMetrics pushes pending data to persistent storage every second.", "description": "Shows the time needed to reach the 100% of disk capacity based on the following params:\n* free disk space;\n* rows ingestion rate;\n* compression.\n\nUse this panel for capacity planning in order to estimate the time remaining for running out of the disk space.\n\n",
"fieldConfig": { "fieldConfig": {
"defaults": { "defaults": {
"custom": {}, "custom": {},
@ -1264,63 +1265,53 @@
"h": 8, "h": 8,
"w": 12, "w": 12,
"x": 12, "x": 12,
"y": 36 "y": 3
}, },
"hiddenSeries": false, "hiddenSeries": false,
"id": 34, "id": 73,
"legend": { "legend": {
"avg": false, "alignAsTable": true,
"current": false, "avg": true,
"current": true,
"hideZero": true,
"max": false, "max": false,
"min": false, "min": false,
"show": false, "show": false,
"total": false, "total": false,
"values": false "values": true
}, },
"lines": true, "lines": true,
"linewidth": 1, "linewidth": 1,
"links": [], "links": [],
"nullPointMode": "null", "nullPointMode": "null as zero",
"percentage": false, "percentage": false,
"pluginVersion": "7.1.1", "pluginVersion": "7.1.1",
"pointradius": 2, "pointradius": 2,
"points": false, "points": false,
"renderer": "flot", "renderer": "flot",
"seriesOverrides": [ "seriesOverrides": [],
{
"alias": "pending index entries",
"yaxis": 2
}
],
"spaceLength": 10, "spaceLength": 10,
"stack": false, "stack": false,
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "vm_pending_rows{job=\"$job\", instance=~\"$instance\", type=\"storage\"}", "expr": "vm_free_disk_space_bytes{job=\"$job\", instance=\"$instance\"} / (sum(rate(vm_rows_added_to_storage_total{job=\"$job\", instance=\"$instance\"}[1d])) * (sum(vm_data_size_bytes{job=\"$job\", instance=\"$instance\", type!=\"indexdb\"}) / sum(vm_rows{job=\"$job\", instance=\"$instance\", type!=\"indexdb\"})))",
"format": "time_series", "format": "time_series",
"hide": false, "hide": false,
"interval": "",
"intervalFactor": 1, "intervalFactor": 1,
"legendFormat": "pending datapoints", "legendFormat": "",
"refId": "A" "refId": "A"
},
{
"expr": "vm_pending_rows{job=\"$job\", instance=~\"$instance\", type=\"indexdb\"}",
"format": "time_series",
"hide": false,
"intervalFactor": 1,
"legendFormat": "pending index entries",
"refId": "B"
} }
], ],
"thresholds": [], "thresholds": [],
"timeFrom": null, "timeFrom": null,
"timeRegions": [], "timeRegions": [],
"timeShift": null, "timeShift": null,
"title": "Pending datapoints ($instance)", "title": "Storage full ETA ($instance)",
"tooltip": { "tooltip": {
"shared": true, "shared": true,
"sort": 0, "sort": 2,
"value_type": "individual" "value_type": "individual"
}, },
"type": "graph", "type": "graph",
@ -1333,7 +1324,8 @@
}, },
"yaxes": [ "yaxes": [
{ {
"format": "short", "decimals": null,
"format": "s",
"label": null, "label": null,
"logBase": 1, "logBase": 1,
"max": null, "max": null,
@ -1341,8 +1333,7 @@
"show": true "show": true
}, },
{ {
"decimals": 3, "format": "short",
"format": "none",
"label": null, "label": null,
"logBase": 1, "logBase": 1,
"max": null, "max": null,
@ -1375,7 +1366,7 @@
"h": 8, "h": 8,
"w": 12, "w": 12,
"x": 0, "x": 0,
"y": 44 "y": 11
}, },
"hiddenSeries": false, "hiddenSeries": false,
"id": 30, "id": 30,
@ -1472,7 +1463,7 @@
"dashLength": 10, "dashLength": 10,
"dashes": false, "dashes": false,
"datasource": "$ds", "datasource": "$ds",
"description": "Data parts of LSM tree.\nHigh number of parts could be an evidence of slow merge performance - check the resource utilization.\n* `indexdb` - inverted index\n* `storage/small` - recently added parts of data ingested into storage(hot data)\n* `storage/big` - small parts gradually merged into big parts (cold data)", "description": "How many datapoints are in RAM queue waiting to be written into storage. The number of pending data points should be in the range from 0 to `2*<ingestion_rate>`, since VictoriaMetrics pushes pending data to persistent storage every second.",
"fieldConfig": { "fieldConfig": {
"defaults": { "defaults": {
"custom": {}, "custom": {},
@ -1486,16 +1477,16 @@
"h": 8, "h": 8,
"w": 12, "w": 12,
"x": 12, "x": 12,
"y": 44 "y": 11
}, },
"hiddenSeries": false, "hiddenSeries": false,
"id": 36, "id": 34,
"legend": { "legend": {
"avg": false, "avg": false,
"current": false, "current": false,
"max": false, "max": false,
"min": false, "min": false,
"show": true, "show": false,
"total": false, "total": false,
"values": false "values": false
}, },
@ -1508,27 +1499,41 @@
"pointradius": 2, "pointradius": 2,
"points": false, "points": false,
"renderer": "flot", "renderer": "flot",
"seriesOverrides": [], "seriesOverrides": [
{
"alias": "pending index entries",
"yaxis": 2
}
],
"spaceLength": 10, "spaceLength": 10,
"stack": false, "stack": false,
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(vm_parts{job=\"$job\", instance=\"$instance\"}) by (type)", "expr": "vm_pending_rows{job=\"$job\", instance=~\"$instance\", type=\"storage\"}",
"format": "time_series", "format": "time_series",
"hide": false,
"intervalFactor": 1, "intervalFactor": 1,
"legendFormat": "{{type}}", "legendFormat": "pending datapoints",
"refId": "A" "refId": "A"
},
{
"expr": "vm_pending_rows{job=\"$job\", instance=~\"$instance\", type=\"indexdb\"}",
"format": "time_series",
"hide": false,
"intervalFactor": 1,
"legendFormat": "pending index entries",
"refId": "B"
} }
], ],
"thresholds": [], "thresholds": [],
"timeFrom": null, "timeFrom": null,
"timeRegions": [], "timeRegions": [],
"timeShift": null, "timeShift": null,
"title": "LSM parts ($instance)", "title": "Pending datapoints ($instance)",
"tooltip": { "tooltip": {
"shared": true, "shared": true,
"sort": 2, "sort": 0,
"value_type": "individual" "value_type": "individual"
}, },
"type": "graph", "type": "graph",
@ -1549,7 +1554,8 @@
"show": true "show": true
}, },
{ {
"format": "short", "decimals": 3,
"format": "none",
"label": null, "label": null,
"logBase": 1, "logBase": 1,
"max": null, "max": null,
@ -1582,7 +1588,7 @@
"h": 8, "h": 8,
"w": 12, "w": 12,
"x": 0, "x": 0,
"y": 52 "y": 19
}, },
"hiddenSeries": false, "hiddenSeries": false,
"id": 53, "id": 53,
@ -1669,6 +1675,196 @@
"alignLevel": null "alignLevel": null
} }
}, },
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$ds",
"description": "Data parts of LSM tree.\nHigh number of parts could be an evidence of slow merge performance - check the resource utilization.\n* `indexdb` - inverted index\n* `storage/small` - recently added parts of data ingested into storage(hot data)\n* `storage/big` - small parts gradually merged into big parts (cold data)",
"fieldConfig": {
"defaults": {
"custom": {},
"links": []
},
"overrides": []
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 19
},
"hiddenSeries": false,
"id": 36,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"percentage": false,
"pluginVersion": "7.1.1",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(vm_parts{job=\"$job\", instance=\"$instance\"}) by (type)",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{type}}",
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "LSM parts ($instance)",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": "0",
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": "0",
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$ds",
"description": "The number of on-going merges in storage nodes. It is expected to have high numbers for `storage/small` metric.",
"fieldConfig": {
"defaults": {
"custom": {},
"links": []
},
"overrides": []
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 27
},
"hiddenSeries": false,
"id": 62,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"percentage": false,
"pluginVersion": "7.1.1",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(vm_active_merges{job=\"$job\", instance=\"$instance\"}) by(type)",
"legendFormat": "{{type}}",
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Active merges ($instance)",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"decimals": 0,
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": "0",
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": "0",
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{ {
"aliasColors": {}, "aliasColors": {},
"bars": false, "bars": false,
@ -1689,7 +1885,7 @@
"h": 8, "h": 8,
"w": 12, "w": 12,
"x": 12, "x": 12,
"y": 52 "y": 27
}, },
"hiddenSeries": false, "hiddenSeries": false,
"id": 55, "id": 55,
@ -1764,194 +1960,6 @@
"alignLevel": null "alignLevel": null
} }
}, },
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$ds",
"description": "The number of on-going merges in storage nodes. It is expected to have high numbers for `storage/small` metric.",
"fieldConfig": {
"defaults": {
"custom": {},
"links": []
},
"overrides": []
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 60
},
"hiddenSeries": false,
"id": 62,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"percentage": false,
"pluginVersion": "7.1.1",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(vm_active_merges{job=\"$job\", instance=\"$instance\"}) by(type)",
"legendFormat": "{{type}}",
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Active merges ($instance)",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"decimals": 0,
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": "0",
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": "0",
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$ds",
"description": "The number of rows merged per second by storage nodes.",
"fieldConfig": {
"defaults": {
"custom": {},
"links": []
},
"overrides": []
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 60
},
"hiddenSeries": false,
"id": 64,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"percentage": false,
"pluginVersion": "7.1.1",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(vm_rows_merged_total{job=\"$job\", instance=\"$instance\"}[5m])) by(type)",
"legendFormat": "{{type}}",
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Merge speed ($instance)",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"decimals": 0,
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": "0",
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": "0",
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{ {
"aliasColors": {}, "aliasColors": {},
"bars": false, "bars": false,
@ -1972,7 +1980,7 @@
"h": 8, "h": 8,
"w": 12, "w": 12,
"x": 0, "x": 0,
"y": 68 "y": 35
}, },
"hiddenSeries": false, "hiddenSeries": false,
"id": 58, "id": 58,
@ -2050,6 +2058,100 @@
"alignLevel": null "alignLevel": null
} }
}, },
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$ds",
"description": "The number of rows merged per second by storage nodes.",
"fieldConfig": {
"defaults": {
"custom": {},
"links": []
},
"overrides": []
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 35
},
"hiddenSeries": false,
"id": 64,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"percentage": false,
"pluginVersion": "7.1.1",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(vm_rows_merged_total{job=\"$job\", instance=\"$instance\"}[5m])) by(type)",
"legendFormat": "{{type}}",
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Merge speed ($instance)",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"decimals": 0,
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": "0",
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": "0",
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{ {
"aliasColors": {}, "aliasColors": {},
"bars": false, "bars": false,
@ -2070,7 +2172,7 @@
"h": 8, "h": 8,
"w": 12, "w": 12,
"x": 12, "x": 12,
"y": 68 "y": 43
}, },
"hiddenSeries": false, "hiddenSeries": false,
"id": 67, "id": 67,
@ -3293,4 +3395,4 @@
"title": "VictoriaMetrics", "title": "VictoriaMetrics",
"uid": "wNf0q_kZk", "uid": "wNf0q_kZk",
"version": 1 "version": 1
} }

View file

@ -21,7 +21,10 @@ services:
image: victoriametrics/victoria-metrics image: victoriametrics/victoria-metrics
ports: ports:
- 8428:8428 - 8428:8428
- 8089:8089
- 8089:8089/udp
- 2003:2003 - 2003:2003
- 2003:2003/udp
- 4242:4242 - 4242:4242
volumes: volumes:
- vmdata:/storage - vmdata:/storage
@ -30,6 +33,7 @@ services:
- '--graphiteListenAddr=:2003' - '--graphiteListenAddr=:2003'
- '--opentsdbListenAddr=:4242' - '--opentsdbListenAddr=:4242'
- '--httpListenAddr=:8428' - '--httpListenAddr=:8428'
- '--influxListenAddr=:8089'
networks: networks:
- vm_net - vm_net
restart: always restart: always

View file

@ -113,6 +113,7 @@ This functionality can be tried at [an editable Grafana dashboard](http://play-g
- `bottomk_max(k, q)` - returns bottom K time series with the min maximums on the given time range - `bottomk_max(k, q)` - returns bottom K time series with the min maximums on the given time range
- `bottomk_avg(k, q)` - returns bottom K time series with the min averages on the given time range - `bottomk_avg(k, q)` - returns bottom K time series with the min averages on the given time range
- `bottomk_median(k, q)` - returns bottom K time series with the min medians on the given time range - `bottomk_median(k, q)` - returns bottom K time series with the min medians on the given time range
All the `topk_*` and `bottomk_*` functions accept optional third argument - label name for the sum of the remaining time series outside top K or bottom K time series. For example, `topk_max(3, process_resident_memory_bytes, "remaining_sum")` would return up to 3 time series with the maximum value for `process_resident_memory_bytes` plus fourth time series with the sum of the remaining time series if any. The fourth time series will contain `remaining_sum="remaining_sum"` additional label.
- `share_le_over_time(m[d], le)` - returns share (in the range 0..1) of values in `m` over `d`, which are smaller or equal to `le`. Useful for calculating SLI and SLO. - `share_le_over_time(m[d], le)` - returns share (in the range 0..1) of values in `m` over `d`, which are smaller or equal to `le`. Useful for calculating SLI and SLO.
Example: `share_le_over_time(memory_usage_bytes[24h], 100*1024*1024)` returns the share of time series values for the last 24 hours when memory usage was below or equal to 100MB. Example: `share_le_over_time(memory_usage_bytes[24h], 100*1024*1024)` returns the share of time series values for the last 24 hours when memory usage was below or equal to 100MB.
- `share_gt_over_time(m[d], gt)` - returns share (in the range 0..1) of values in `m` over `d`, which are bigger than `gt`. Useful for calculating SLI and SLO. - `share_gt_over_time(m[d], gt)` - returns share (in the range 0..1) of values in `m` over `d`, which are bigger than `gt`. Useful for calculating SLI and SLO.

View file

@ -8,7 +8,7 @@
and their default values. Default flag values should fit the majoirty of cases. The minimum required flags to configure are: and their default values. Default flag values should fit the majoirty of cases. The minimum required flags to configure are:
* `-storageDataPath` - path to directory where VictoriaMetrics stores all the data. * `-storageDataPath` - path to directory where VictoriaMetrics stores all the data.
* `-retentionPeriod` - data retention in months. * `-retentionPeriod` - data retention.
For instance: For instance:

View file

@ -164,7 +164,7 @@ or [docker image](https://hub.docker.com/r/victoriametrics/victoria-metrics/) wi
The following command-line flags are used the most: The following command-line flags are used the most:
* `-storageDataPath` - path to data directory. VictoriaMetrics stores all the data in this directory. Default path is `victoria-metrics-data` in the current working directory. * `-storageDataPath` - path to data directory. VictoriaMetrics stores all the data in this directory. Default path is `victoria-metrics-data` in the current working directory.
* `-retentionPeriod` - retention period in months for stored data. Older data is automatically deleted. Default period is 1 month. * `-retentionPeriod` - retention for stored data. Older data is automatically deleted. Default retention is 1 month. See [these docs](#retention) for more details.
Other flags have good enough default values, so set them only if you really need this. Pass `-help` to see all the available flags with description and default values. Other flags have good enough default values, so set them only if you really need this. Pass `-help` to see all the available flags with description and default values.
@ -495,6 +495,7 @@ VictoriaMetrics supports the following handlers from [Prometheus querying API](h
* [/api/v1/labels](https://prometheus.io/docs/prometheus/latest/querying/api/#getting-label-names) * [/api/v1/labels](https://prometheus.io/docs/prometheus/latest/querying/api/#getting-label-names)
* [/api/v1/label/.../values](https://prometheus.io/docs/prometheus/latest/querying/api/#querying-label-values) * [/api/v1/label/.../values](https://prometheus.io/docs/prometheus/latest/querying/api/#querying-label-values)
* [/api/v1/status/tsdb](https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats) * [/api/v1/status/tsdb](https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats)
* [/api/v1/targets](https://prometheus.io/docs/prometheus/latest/querying/api/#targets) - see [these docs](#how-to-scrape-prometheus-exporters-such-as-node-exporter) for more details.
These handlers can be queried from Prometheus-compatible clients such as Grafana or curl. These handlers can be queried from Prometheus-compatible clients such as Grafana or curl.
@ -1048,6 +1049,7 @@ The de-duplication reduces disk space usage if multiple identically configured P
write data to the same VictoriaMetrics instance. Note that these Prometheus instances must have identical write data to the same VictoriaMetrics instance. Note that these Prometheus instances must have identical
`external_labels` section in their configs, so they write data to the same time series. `external_labels` section in their configs, so they write data to the same time series.
### Retention ### Retention
Retention is configured with `-retentionPeriod` command-line flag. For instance, `-retentionPeriod=3` means Retention is configured with `-retentionPeriod` command-line flag. For instance, `-retentionPeriod=3` means
@ -1059,6 +1061,10 @@ For example if `-retentionPeriod` is set to 1, data for January is deleted on Ma
It is safe to extend `-retentionPeriod` on existing data. If `-retentionPeriod` is set to lower It is safe to extend `-retentionPeriod` on existing data. If `-retentionPeriod` is set to lower
value than before then data outside the configured period will be eventually deleted. value than before then data outside the configured period will be eventually deleted.
VictoriaMetrics supports retention smaller than 1 month. For example, `-retentionPeriod=5d` would set data retention for 5 days.
Older data is eventually deleted during [background merge](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282).
### Multiple retentions ### Multiple retentions
Just start multiple VictoriaMetrics instances with distinct values for the following flags: Just start multiple VictoriaMetrics instances with distinct values for the following flags:

View file

@ -211,9 +211,13 @@ either via `vmagent` itself or via Prometheus, so the exported metrics could be
Use official [Grafana dashboard](https://grafana.com/grafana/dashboards/12683) for `vmagent` state overview. Use official [Grafana dashboard](https://grafana.com/grafana/dashboards/12683) for `vmagent` state overview.
If you have suggestions, improvements or found a bug - feel free to open an issue on github or add review to the dashboard. If you have suggestions, improvements or found a bug - feel free to open an issue on github or add review to the dashboard.
`vmagent` also exports target statuses at `http://vmagent-host:8429/targets` page in plaintext format. `vmagent` also exports target statuses at the following handlers:
`/targets` handler accepts optional `show_original_labels=1` query arg, which shows the original labels per each target
before applying relabeling. This information may be useful for debugging target relabeling. * `http://vmagent-host:8429/targets`. This handler returns human-readable plaintext status for every active target.
This page is convenient to query from command line with `wget`, `curl` or similar tools.
It accepts optional `show_original_labels=1` query arg, which shows the original labels per each target before applying relabeling.
This information may be useful for debugging target relabeling.
* `http://vmagent-host:8429/api/v1/targets`. This handler returns data compatible with [the corresponding page from Prometheus API](https://prometheus.io/docs/prometheus/latest/querying/api/#targets).
### Troubleshooting ### Troubleshooting
@ -224,7 +228,26 @@ before applying relabeling. This information may be useful for debugging target
since `vmagent` establishes at least a single TCP connection per each target. since `vmagent` establishes at least a single TCP connection per each target.
* When `vmagent` scrapes many unreliable targets, it can flood error log with scrape errors. These errors can be suppressed * When `vmagent` scrapes many unreliable targets, it can flood error log with scrape errors. These errors can be suppressed
by passing `-promscrape.suppressScrapeErrors` command-line flag to `vmagent`. The most recent scrape error per each target can be observed at `http://vmagent-host:8429/targets`. by passing `-promscrape.suppressScrapeErrors` command-line flag to `vmagent`. The most recent scrape error per each target can be observed at `http://vmagent-host:8429/targets`
and `http://vmagent-host:8429/api/v1/targets`.
* If `vmagent` scrapes targets with millions of metrics per each target (for instance, when scraping [federation endpoints](https://prometheus.io/docs/prometheus/latest/federation/)),
then it is recommended enabling `stream parsing mode` in order to reduce memory usage during scraping. This mode may be enabled either globally for all the scrape targets
by passing `-promscrape.streamParse` command-line flag or on a per-scrape target basis with `stream_parse: true` option. For example:
```yml
scrape_configs:
- job_name: 'big-federate'
stream_parse: true
static_configs:
- targets:
- big-prometeus1
- big-prometeus2
honor_labels: true
metrics_path: /federate
params:
'match[]': ['{__name__!=""}']
```
* It is recommended to increase `-remoteWrite.queues` if `vmagent_remotewrite_pending_data_bytes` metric exported at `http://vmagent-host:8429/metrics` page constantly grows. * It is recommended to increase `-remoteWrite.queues` if `vmagent_remotewrite_pending_data_bytes` metric exported at `http://vmagent-host:8429/metrics` page constantly grows.

View file

@ -3,7 +3,7 @@
`vmrestore` restores data from backups created by [vmbackup](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmbackup/README.md). `vmrestore` restores data from backups created by [vmbackup](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmbackup/README.md).
VictoriaMetrics `v1.29.0` and newer versions must be used for working with the restored data. VictoriaMetrics `v1.29.0` and newer versions must be used for working with the restored data.
Restore process can be interrupted at any time. It is automatically resumed from the inerruption point Restore process can be interrupted at any time. It is automatically resumed from the interruption point
when restarting `vmrestore` with the same args. when restarting `vmrestore` with the same args.

69
lib/flagutil/duration.go Normal file
View file

@ -0,0 +1,69 @@
package flagutil
import (
"flag"
"fmt"
"strconv"
"strings"
"github.com/VictoriaMetrics/metricsql"
)
// NewDuration returns new `duration` flag with the given name, defaultValue and description.
//
// DefaultValue is in months.
func NewDuration(name string, defaultValue float64, description string) *Duration {
description += "\nThe following optional suffixes are supported: h (hour), d (day), w (week), y (year). If suffix isn't set, then the duration is counted in months"
d := Duration{
Msecs: int64(defaultValue * msecsPerMonth),
valueString: fmt.Sprintf("%g", defaultValue),
}
flag.Var(&d, name, description)
return &d
}
// Duration is a flag for holding duration.
type Duration struct {
// Msecs contains parsed duration in milliseconds.
Msecs int64
valueString string
}
// String implements flag.Value interface
func (d *Duration) String() string {
return d.valueString
}
// Set implements flag.Value interface
func (d *Duration) Set(value string) error {
// An attempt to parse value in months.
months, err := strconv.ParseFloat(value, 64)
if err == nil {
if months > maxMonths {
return fmt.Errorf("duration months must be smaller than %d; got %g", maxMonths, months)
}
if months < 0 {
return fmt.Errorf("duration months cannot be negative; got %g", months)
}
d.Msecs = int64(months * msecsPerMonth)
d.valueString = value
return nil
}
// Parse duration.
value = strings.ToLower(value)
if strings.HasSuffix(value, "m") {
return fmt.Errorf("duration in months must be set without `m` suffix due to ambiguity with duration in minutes; got %s", value)
}
msecs, err := metricsql.PositiveDurationValue(value, 0)
if err != nil {
return err
}
d.Msecs = msecs
d.valueString = value
return nil
}
const maxMonths = 12 * 100
const msecsPerMonth = 31 * 24 * 3600 * 1000

View file

@ -0,0 +1,57 @@
package flagutil
import (
"strings"
"testing"
)
func TestDurationSetFailure(t *testing.T) {
f := func(value string) {
t.Helper()
var d Duration
if err := d.Set(value); err == nil {
t.Fatalf("expecting non-nil error in d.Set(%q)", value)
}
}
f("")
f("foobar")
f("5foobar")
f("ah")
f("134xd")
f("2.43sdfw")
// Too big value in months
f("12345")
// Negative duration
f("-1")
f("-34h")
// Duration in minutes is confused with duration in months
f("1m")
}
func TestDurationSetSuccess(t *testing.T) {
f := func(value string, expectedMsecs int64) {
t.Helper()
var d Duration
if err := d.Set(value); err != nil {
t.Fatalf("unexpected error in d.Set(%q): %s", value, err)
}
if d.Msecs != expectedMsecs {
t.Fatalf("unexpected result; got %d; want %d", d.Msecs, expectedMsecs)
}
valueString := d.String()
valueExpected := strings.ToLower(value)
if valueString != valueExpected {
t.Fatalf("unexpected valueString; got %q; want %q", valueString, valueExpected)
}
}
f("0", 0)
f("1", msecsPerMonth)
f("123.456", 123.456*msecsPerMonth)
f("1h", 3600*1000)
f("1.5d", 1.5*24*3600*1000)
f("2.3W", 2.3*7*24*3600*1000)
f("0.25y", 0.25*365*24*3600*1000)
}

View file

@ -306,6 +306,9 @@ func maybeGzipResponseWriter(w http.ResponseWriter, r *http.Request) http.Respon
if *disableResponseCompression { if *disableResponseCompression {
return w return w
} }
if r.Header.Get("Connection") == "Upgrade" {
return w
}
ae := r.Header.Get("Accept-Encoding") ae := r.Header.Get("Accept-Encoding")
if ae == "" { if ae == "" {
return w return w

View file

@ -35,12 +35,12 @@ func initOnce() {
mem := sysTotalMemory() mem := sysTotalMemory()
if allowedBytes.N <= 0 { if allowedBytes.N <= 0 {
if *allowedPercent < 1 || *allowedPercent > 200 { if *allowedPercent < 1 || *allowedPercent > 200 {
logger.Panicf("FATAL: -memory.allowedPercent must be in the range [1...200]; got %f", *allowedPercent) logger.Panicf("FATAL: -memory.allowedPercent must be in the range [1...200]; got %g", *allowedPercent)
} }
percent := *allowedPercent / 100 percent := *allowedPercent / 100
allowedMemory = int(float64(mem) * percent) allowedMemory = int(float64(mem) * percent)
remainingMemory = mem - allowedMemory remainingMemory = mem - allowedMemory
logger.Infof("limiting caches to %d bytes, leaving %d bytes to the OS according to -memory.allowedPercent=%f", allowedMemory, remainingMemory, *allowedPercent) logger.Infof("limiting caches to %d bytes, leaving %d bytes to the OS according to -memory.allowedPercent=%g", allowedMemory, remainingMemory, *allowedPercent)
} else { } else {
allowedMemory = allowedBytes.N allowedMemory = allowedBytes.N
remainingMemory = mem - allowedMemory remainingMemory = mem - allowedMemory

View file

@ -1,13 +1,18 @@
package promscrape package promscrape
import ( import (
"context"
"crypto/tls" "crypto/tls"
"flag" "flag"
"fmt" "fmt"
"io"
"io/ioutil"
"net/http"
"strings" "strings"
"time" "time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil" "github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/fasthttp" "github.com/VictoriaMetrics/fasthttp"
"github.com/VictoriaMetrics/metrics" "github.com/VictoriaMetrics/metrics"
) )
@ -22,11 +27,19 @@ var (
"This may be useful when targets has no support for HTTP keep-alive connection. "+ "This may be useful when targets has no support for HTTP keep-alive connection. "+
"It is possible to set `disable_keepalive: true` individually per each 'scrape_config` section in '-promscrape.config' for fine grained control. "+ "It is possible to set `disable_keepalive: true` individually per each 'scrape_config` section in '-promscrape.config' for fine grained control. "+
"Note that disabling HTTP keep-alive may increase load on both vmagent and scrape targets") "Note that disabling HTTP keep-alive may increase load on both vmagent and scrape targets")
streamParse = flag.Bool("promscrape.streamParse", false, "Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful "+
"for reducing memory usage when millions of metrics are exposed per each scrape target. "+
"It is posible to set `stream_parse: true` individually per each `scrape_config` section in `-promscrape.config` for fine grained control")
) )
type client struct { type client struct {
// hc is the default client optimized for common case of scraping targets with moderate number of metrics.
hc *fasthttp.HostClient hc *fasthttp.HostClient
// sc (aka `stream client`) is used instead of hc if ScrapeWork.ParseStream is set.
// It may be useful for scraping targets with millions of metrics per target.
sc *http.Client
scrapeURL string scrapeURL string
host string host string
requestURI string requestURI string
@ -64,8 +77,23 @@ func newClient(sw *ScrapeWork) *client {
MaxResponseBodySize: maxScrapeSize.N, MaxResponseBodySize: maxScrapeSize.N,
MaxIdempotentRequestAttempts: 1, MaxIdempotentRequestAttempts: 1,
} }
var sc *http.Client
if *streamParse || sw.StreamParse {
sc = &http.Client{
Transport: &http.Transport{
TLSClientConfig: tlsCfg,
TLSHandshakeTimeout: 10 * time.Second,
IdleConnTimeout: 2 * sw.ScrapeInterval,
DisableCompression: *disableCompression || sw.DisableCompression,
DisableKeepAlives: *disableKeepAlive || sw.DisableKeepAlive,
DialContext: statStdDial,
},
Timeout: sw.ScrapeTimeout,
}
}
return &client{ return &client{
hc: hc, hc: hc,
sc: sc,
scrapeURL: sw.ScrapeURL, scrapeURL: sw.ScrapeURL,
host: host, host: host,
@ -76,6 +104,43 @@ func newClient(sw *ScrapeWork) *client {
} }
} }
func (c *client) GetStreamReader() (*streamReader, error) {
deadline := time.Now().Add(c.hc.ReadTimeout)
ctx, cancel := context.WithDeadline(context.Background(), deadline)
req, err := http.NewRequestWithContext(ctx, "GET", c.scrapeURL, nil)
if err != nil {
cancel()
return nil, fmt.Errorf("cannot create request for %q: %w", c.scrapeURL, err)
}
// The following `Accept` header has been copied from Prometheus sources.
// See https://github.com/prometheus/prometheus/blob/f9d21f10ecd2a343a381044f131ea4e46381ce09/scrape/scrape.go#L532 .
// This is needed as a workaround for scraping stupid Java-based servers such as Spring Boot.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/608 for details.
// Do not bloat the `Accept` header with OpenMetrics shit, since it looks like dead standard now.
req.Header.Set("Accept", "text/plain;version=0.0.4;q=1,*/*;q=0.1")
if c.authHeader != "" {
req.Header.Set("Authorization", c.authHeader)
}
resp, err := c.sc.Do(req)
if err != nil {
cancel()
return nil, fmt.Errorf("cannot scrape %q: %w", c.scrapeURL, err)
}
if resp.StatusCode != http.StatusOK {
metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_scrapes_total{status_code="%d"}`, resp.StatusCode)).Inc()
respBody, _ := ioutil.ReadAll(resp.Body)
_ = resp.Body.Close()
cancel()
return nil, fmt.Errorf("unexpected status code returned when scraping %q: %d; expecting %d; response body: %q",
c.scrapeURL, resp.StatusCode, http.StatusOK, respBody)
}
scrapesOK.Inc()
return &streamReader{
r: resp.Body,
cancel: cancel,
}, nil
}
func (c *client) ReadData(dst []byte) ([]byte, error) { func (c *client) ReadData(dst []byte) ([]byte, error) {
deadline := time.Now().Add(c.hc.ReadTimeout) deadline := time.Now().Add(c.hc.ReadTimeout)
req := fasthttp.AcquireRequest() req := fasthttp.AcquireRequest()
@ -87,7 +152,7 @@ func (c *client) ReadData(dst []byte) ([]byte, error) {
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/608 for details. // See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/608 for details.
// Do not bloat the `Accept` header with OpenMetrics shit, since it looks like dead standard now. // Do not bloat the `Accept` header with OpenMetrics shit, since it looks like dead standard now.
req.Header.Set("Accept", "text/plain;version=0.0.4;q=1,*/*;q=0.1") req.Header.Set("Accept", "text/plain;version=0.0.4;q=1,*/*;q=0.1")
if !*disableCompression || c.disableCompression { if !*disableCompression && !c.disableCompression {
req.Header.Set("Accept-Encoding", "gzip") req.Header.Set("Accept-Encoding", "gzip")
} }
if *disableKeepAlive || c.disableKeepAlive { if *disableKeepAlive || c.disableKeepAlive {
@ -131,7 +196,6 @@ func (c *client) ReadData(dst []byte) ([]byte, error) {
} }
return dst, fmt.Errorf("error when scraping %q: %w", c.scrapeURL, err) return dst, fmt.Errorf("error when scraping %q: %w", c.scrapeURL, err)
} }
dstLen := len(dst)
if ce := resp.Header.Peek("Content-Encoding"); string(ce) == "gzip" { if ce := resp.Header.Peek("Content-Encoding"); string(ce) == "gzip" {
var err error var err error
var src []byte var src []byte
@ -154,7 +218,7 @@ func (c *client) ReadData(dst []byte) ([]byte, error) {
if statusCode != fasthttp.StatusOK { if statusCode != fasthttp.StatusOK {
metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_scrapes_total{status_code="%d"}`, statusCode)).Inc() metrics.GetOrCreateCounter(fmt.Sprintf(`vm_promscrape_scrapes_total{status_code="%d"}`, statusCode)).Inc()
return dst, fmt.Errorf("unexpected status code returned when scraping %q: %d; expecting %d; response body: %q", return dst, fmt.Errorf("unexpected status code returned when scraping %q: %d; expecting %d; response body: %q",
c.scrapeURL, statusCode, fasthttp.StatusOK, dst[dstLen:]) c.scrapeURL, statusCode, fasthttp.StatusOK, dst)
} }
scrapesOK.Inc() scrapesOK.Inc()
fasthttp.ReleaseResponse(resp) fasthttp.ReleaseResponse(resp)
@ -185,3 +249,22 @@ func doRequestWithPossibleRetry(hc *fasthttp.HostClient, req *fasthttp.Request,
} }
} }
} }
type streamReader struct {
r io.ReadCloser
cancel context.CancelFunc
bytesRead int64
}
func (sr *streamReader) Read(p []byte) (int, error) {
n, err := sr.r.Read(p)
sr.bytesRead += int64(n)
return n, err
}
func (sr *streamReader) MustClose() {
sr.cancel()
if err := sr.r.Close(); err != nil {
logger.Errorf("cannot close reader: %s", err)
}
}

View file

@ -84,6 +84,7 @@ type ScrapeConfig struct {
// These options are supported only by lib/promscrape. // These options are supported only by lib/promscrape.
DisableCompression bool `yaml:"disable_compression"` DisableCompression bool `yaml:"disable_compression"`
DisableKeepAlive bool `yaml:"disable_keepalive"` DisableKeepAlive bool `yaml:"disable_keepalive"`
StreamParse bool `yaml:"stream_parse"`
// This is set in loadConfig // This is set in loadConfig
swc *scrapeWorkConfig swc *scrapeWorkConfig
@ -473,6 +474,7 @@ func getScrapeWorkConfig(sc *ScrapeConfig, baseDir string, globalCfg *GlobalConf
sampleLimit: sc.SampleLimit, sampleLimit: sc.SampleLimit,
disableCompression: sc.DisableCompression, disableCompression: sc.DisableCompression,
disableKeepAlive: sc.DisableKeepAlive, disableKeepAlive: sc.DisableKeepAlive,
streamParse: sc.StreamParse,
} }
return swc, nil return swc, nil
} }
@ -493,6 +495,7 @@ type scrapeWorkConfig struct {
sampleLimit int sampleLimit int
disableCompression bool disableCompression bool
disableKeepAlive bool disableKeepAlive bool
streamParse bool
} }
func appendKubernetesScrapeWork(dst []ScrapeWork, sdc *kubernetes.SDConfig, baseDir string, swc *scrapeWorkConfig) ([]ScrapeWork, bool) { func appendKubernetesScrapeWork(dst []ScrapeWork, sdc *kubernetes.SDConfig, baseDir string, swc *scrapeWorkConfig) ([]ScrapeWork, bool) {
@ -642,6 +645,7 @@ func appendScrapeWork(dst []ScrapeWork, swc *scrapeWorkConfig, target string, ex
labels = promrelabel.RemoveMetaLabels(labels[:0], labels) labels = promrelabel.RemoveMetaLabels(labels[:0], labels)
if len(labels) == 0 { if len(labels) == 0 {
// Drop target without labels. // Drop target without labels.
droppedTargetsMap.Register(originalLabels)
return dst, nil return dst, nil
} }
// See https://www.robustperception.io/life-of-a-label // See https://www.robustperception.io/life-of-a-label
@ -652,10 +656,12 @@ func appendScrapeWork(dst []ScrapeWork, swc *scrapeWorkConfig, target string, ex
addressRelabeled := promrelabel.GetLabelValueByName(labels, "__address__") addressRelabeled := promrelabel.GetLabelValueByName(labels, "__address__")
if len(addressRelabeled) == 0 { if len(addressRelabeled) == 0 {
// Drop target without scrape address. // Drop target without scrape address.
droppedTargetsMap.Register(originalLabels)
return dst, nil return dst, nil
} }
if strings.Contains(addressRelabeled, "/") { if strings.Contains(addressRelabeled, "/") {
// Drop target with '/' // Drop target with '/'
droppedTargetsMap.Register(originalLabels)
return dst, nil return dst, nil
} }
addressRelabeled = addMissingPort(schemeRelabeled, addressRelabeled) addressRelabeled = addMissingPort(schemeRelabeled, addressRelabeled)
@ -663,6 +669,9 @@ func appendScrapeWork(dst []ScrapeWork, swc *scrapeWorkConfig, target string, ex
if metricsPathRelabeled == "" { if metricsPathRelabeled == "" {
metricsPathRelabeled = "/metrics" metricsPathRelabeled = "/metrics"
} }
if !strings.HasPrefix(metricsPathRelabeled, "/") {
metricsPathRelabeled = "/" + metricsPathRelabeled
}
paramsRelabeled := getParamsFromLabels(labels, swc.params) paramsRelabeled := getParamsFromLabels(labels, swc.params)
optionalQuestion := "?" optionalQuestion := "?"
if len(paramsRelabeled) == 0 || strings.Contains(metricsPathRelabeled, "?") { if len(paramsRelabeled) == 0 || strings.Contains(metricsPathRelabeled, "?") {
@ -696,6 +705,7 @@ func appendScrapeWork(dst []ScrapeWork, swc *scrapeWorkConfig, target string, ex
SampleLimit: swc.sampleLimit, SampleLimit: swc.sampleLimit,
DisableCompression: swc.disableCompression, DisableCompression: swc.disableCompression,
DisableKeepAlive: swc.disableKeepAlive, DisableKeepAlive: swc.disableKeepAlive,
StreamParse: swc.streamParse,
jobNameOriginal: swc.jobName, jobNameOriginal: swc.jobName,
}) })

View file

@ -1276,6 +1276,7 @@ scrape_configs:
sample_limit: 100 sample_limit: 100
disable_keepalive: true disable_keepalive: true
disable_compression: true disable_compression: true
stream_parse: true
static_configs: static_configs:
- targets: - targets:
- 192.168.1.2 # SNMP device. - 192.168.1.2 # SNMP device.
@ -1328,9 +1329,49 @@ scrape_configs:
SampleLimit: 100, SampleLimit: 100,
DisableKeepAlive: true, DisableKeepAlive: true,
DisableCompression: true, DisableCompression: true,
StreamParse: true,
jobNameOriginal: "snmp", jobNameOriginal: "snmp",
}, },
}) })
f(`
scrape_configs:
- job_name: path wo slash
static_configs:
- targets: ["foo.bar:1234"]
relabel_configs:
- replacement: metricspath
target_label: __metrics_path__
`, []ScrapeWork{
{
ScrapeURL: "http://foo.bar:1234/metricspath",
ScrapeInterval: defaultScrapeInterval,
ScrapeTimeout: defaultScrapeTimeout,
Labels: []prompbmarshal.Label{
{
Name: "__address__",
Value: "foo.bar:1234",
},
{
Name: "__metrics_path__",
Value: "metricspath",
},
{
Name: "__scheme__",
Value: "http",
},
{
Name: "instance",
Value: "foo.bar:1234",
},
{
Name: "job",
Value: "path wo slash",
},
},
jobNameOriginal: "path wo slash",
AuthConfig: &promauth.Config{},
},
})
} }
var defaultRegexForRelabelConfig = regexp.MustCompile("^(.*)$") var defaultRegexForRelabelConfig = regexp.MustCompile("^(.*)$")

View file

@ -284,6 +284,7 @@ func (sg *scraperGroup) update(sws []ScrapeWork) {
"original labels for target1: %s; original labels for target2: %s", "original labels for target1: %s; original labels for target2: %s",
sw.ScrapeURL, sw.LabelsString(), promLabelsString(originalLabels), promLabelsString(sw.OriginalLabels)) sw.ScrapeURL, sw.LabelsString(), promLabelsString(originalLabels), promLabelsString(sw.OriginalLabels))
} }
droppedTargetsMap.Register(sw.OriginalLabels)
continue continue
} }
swsMap[key] = sw.OriginalLabels swsMap[key] = sw.OriginalLabels
@ -333,6 +334,7 @@ func newScraper(sw *ScrapeWork, group string, pushData func(wr *prompbmarshal.Wr
sc.sw.Config = *sw sc.sw.Config = *sw
sc.sw.ScrapeGroup = group sc.sw.ScrapeGroup = group
sc.sw.ReadData = c.ReadData sc.sw.ReadData = c.ReadData
sc.sw.GetStreamReader = c.GetStreamReader
sc.sw.PushData = pushData sc.sw.PushData = pushData
return sc return sc
} }

View file

@ -82,19 +82,22 @@ type ScrapeWork struct {
// Whether to disable HTTP keep-alive when querying ScrapeURL. // Whether to disable HTTP keep-alive when querying ScrapeURL.
DisableKeepAlive bool DisableKeepAlive bool
// Whether to parse target responses in a streaming manner.
StreamParse bool
// The original 'job_name' // The original 'job_name'
jobNameOriginal string jobNameOriginal string
} }
// key returns unique identifier for the given sw. // key returns unique identifier for the given sw.
// //
// it can be used for comparing for equality two ScrapeWork objects. // it can be used for comparing for equality for two ScrapeWork objects.
func (sw *ScrapeWork) key() string { func (sw *ScrapeWork) key() string {
// Do not take into account OriginalLabels. // Do not take into account OriginalLabels.
key := fmt.Sprintf("ScrapeURL=%s, ScrapeInterval=%s, ScrapeTimeout=%s, HonorLabels=%v, HonorTimestamps=%v, Labels=%s, "+ key := fmt.Sprintf("ScrapeURL=%s, ScrapeInterval=%s, ScrapeTimeout=%s, HonorLabels=%v, HonorTimestamps=%v, Labels=%s, "+
"AuthConfig=%s, MetricRelabelConfigs=%s, SampleLimit=%d, DisableCompression=%v, DisableKeepAlive=%v", "AuthConfig=%s, MetricRelabelConfigs=%s, SampleLimit=%d, DisableCompression=%v, DisableKeepAlive=%v, StreamParse=%v",
sw.ScrapeURL, sw.ScrapeInterval, sw.ScrapeTimeout, sw.HonorLabels, sw.HonorTimestamps, sw.LabelsString(), sw.ScrapeURL, sw.ScrapeInterval, sw.ScrapeTimeout, sw.HonorLabels, sw.HonorTimestamps, sw.LabelsString(),
sw.AuthConfig.String(), sw.metricRelabelConfigsString(), sw.SampleLimit, sw.DisableCompression, sw.DisableKeepAlive) sw.AuthConfig.String(), sw.metricRelabelConfigsString(), sw.SampleLimit, sw.DisableCompression, sw.DisableKeepAlive, sw.StreamParse)
return key return key
} }
@ -132,6 +135,9 @@ type scrapeWork struct {
// ReadData is called for reading the data. // ReadData is called for reading the data.
ReadData func(dst []byte) ([]byte, error) ReadData func(dst []byte) ([]byte, error)
// GetStreamReader is called if Config.StreamParse is set.
GetStreamReader func() (*streamReader, error)
// PushData is called for pushing collected data. // PushData is called for pushing collected data.
PushData func(wr *prompbmarshal.WriteRequest) PushData func(wr *prompbmarshal.WriteRequest)
@ -221,6 +227,15 @@ var (
) )
func (sw *scrapeWork) scrapeInternal(scrapeTimestamp, realTimestamp int64) error { func (sw *scrapeWork) scrapeInternal(scrapeTimestamp, realTimestamp int64) error {
if *streamParse || sw.Config.StreamParse {
// Read data from scrape targets in streaming manner.
// This case is optimized for targets exposing millions and more of metrics per target.
return sw.scrapeStream(scrapeTimestamp, realTimestamp)
}
// Common case: read all the data from scrape target to memory (body) and then process it.
// This case should work more optimally for than stream parse code above for common case when scrape target exposes
// up to a few thouthand metrics.
body := leveledbytebufferpool.Get(sw.prevBodyLen) body := leveledbytebufferpool.Get(sw.prevBodyLen)
var err error var err error
body.B, err = sw.ReadData(body.B[:0]) body.B, err = sw.ReadData(body.B[:0])
@ -281,6 +296,66 @@ func (sw *scrapeWork) scrapeInternal(scrapeTimestamp, realTimestamp int64) error
return err return err
} }
func (sw *scrapeWork) scrapeStream(scrapeTimestamp, realTimestamp int64) error {
sr, err := sw.GetStreamReader()
if err != nil {
return fmt.Errorf("cannot read data: %s", err)
}
samplesScraped := 0
samplesPostRelabeling := 0
wc := writeRequestCtxPool.Get(sw.prevRowsLen)
var mu sync.Mutex
err = parser.ParseStream(sr, scrapeTimestamp, false, func(rows []parser.Row) error {
mu.Lock()
defer mu.Unlock()
samplesScraped += len(rows)
for i := range rows {
sw.addRowToTimeseries(wc, &rows[i], scrapeTimestamp, true)
if len(wc.labels) > 40000 {
// Limit the maximum size of wc.writeRequest.
// This should reduce memory usage when scraping targets with millions of metrics and/or labels.
// For example, when scraping /federate handler from Prometheus - see https://prometheus.io/docs/prometheus/latest/federation/
samplesPostRelabeling += len(wc.writeRequest.Timeseries)
sw.updateSeriesAdded(wc)
startTime := time.Now()
sw.PushData(&wc.writeRequest)
pushDataDuration.UpdateDuration(startTime)
wc.resetNoRows()
}
}
return nil
})
scrapedSamples.Update(float64(samplesScraped))
endTimestamp := time.Now().UnixNano() / 1e6
duration := float64(endTimestamp-realTimestamp) / 1e3
scrapeDuration.Update(duration)
scrapeResponseSize.Update(float64(sr.bytesRead))
sr.MustClose()
up := 1
if err != nil {
if samplesScraped == 0 {
up = 0
}
scrapesFailed.Inc()
}
samplesPostRelabeling += len(wc.writeRequest.Timeseries)
sw.updateSeriesAdded(wc)
seriesAdded := sw.finalizeSeriesAdded(samplesPostRelabeling)
sw.addAutoTimeseries(wc, "up", float64(up), scrapeTimestamp)
sw.addAutoTimeseries(wc, "scrape_duration_seconds", duration, scrapeTimestamp)
sw.addAutoTimeseries(wc, "scrape_samples_scraped", float64(samplesScraped), scrapeTimestamp)
sw.addAutoTimeseries(wc, "scrape_samples_post_metric_relabeling", float64(samplesPostRelabeling), scrapeTimestamp)
sw.addAutoTimeseries(wc, "scrape_series_added", float64(seriesAdded), scrapeTimestamp)
startTime := time.Now()
sw.PushData(&wc.writeRequest)
pushDataDuration.UpdateDuration(startTime)
sw.prevRowsLen = len(wc.rows.Rows)
wc.reset()
writeRequestCtxPool.Put(wc)
tsmGlobal.Update(&sw.Config, sw.ScrapeGroup, up == 1, realTimestamp, int64(duration*1000), err)
return nil
}
// leveledWriteRequestCtxPool allows reducing memory usage when writeRequesCtx // leveledWriteRequestCtxPool allows reducing memory usage when writeRequesCtx
// structs contain mixed number of labels. // structs contain mixed number of labels.
// //

View file

@ -1,14 +1,48 @@
package promscrape package promscrape
import ( import (
"context"
"net" "net"
"sync"
"sync/atomic" "sync/atomic"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/netutil" "github.com/VictoriaMetrics/VictoriaMetrics/lib/netutil"
"github.com/VictoriaMetrics/fasthttp" "github.com/VictoriaMetrics/fasthttp"
"github.com/VictoriaMetrics/metrics" "github.com/VictoriaMetrics/metrics"
) )
func statStdDial(ctx context.Context, network, addr string) (net.Conn, error) {
d := getStdDialer()
conn, err := d.DialContext(ctx, network, addr)
dialsTotal.Inc()
if err != nil {
dialErrors.Inc()
return nil, err
}
conns.Inc()
sc := &statConn{
Conn: conn,
}
return sc, nil
}
func getStdDialer() *net.Dialer {
stdDialerOnce.Do(func() {
stdDialer = &net.Dialer{
Timeout: 30 * time.Second,
KeepAlive: 30 * time.Second,
DualStack: netutil.TCP6Enabled(),
}
})
return stdDialer
}
var (
stdDialer *net.Dialer
stdDialerOnce sync.Once
)
func statDial(addr string) (conn net.Conn, err error) { func statDial(addr string) (conn net.Conn, err error) {
if netutil.TCP6Enabled() { if netutil.TCP6Enabled() {
conn, err = fasthttp.DialDualStack(addr) conn, err = fasthttp.DialDualStack(addr)

View file

@ -6,6 +6,10 @@ import (
"sort" "sort"
"sync" "sync"
"time" "time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
) )
var tsmGlobal = newTargetStatusMap() var tsmGlobal = newTargetStatusMap()
@ -15,6 +19,26 @@ func WriteHumanReadableTargetsStatus(w io.Writer, showOriginalLabels bool) {
tsmGlobal.WriteHumanReadable(w, showOriginalLabels) tsmGlobal.WriteHumanReadable(w, showOriginalLabels)
} }
// WriteAPIV1Targets writes /api/v1/targets to w according to https://prometheus.io/docs/prometheus/latest/querying/api/#targets
func WriteAPIV1Targets(w io.Writer, state string) {
if state == "" {
state = "any"
}
fmt.Fprintf(w, `{"status":"success","data":{"activeTargets":`)
if state == "active" || state == "any" {
tsmGlobal.WriteActiveTargetsJSON(w)
} else {
fmt.Fprintf(w, `[]`)
}
fmt.Fprintf(w, `,"droppedTargets":`)
if state == "dropped" || state == "any" {
droppedTargetsMap.WriteDroppedTargetsJSON(w)
} else {
fmt.Fprintf(w, `[]`)
}
fmt.Fprintf(w, `}}`)
}
type targetStatusMap struct { type targetStatusMap struct {
mu sync.Mutex mu sync.Mutex
m map[uint64]targetStatus m map[uint64]targetStatus
@ -73,6 +97,66 @@ func (tsm *targetStatusMap) StatusByGroup(group string, up bool) int {
return count return count
} }
// WriteActiveTargetsJSON writes `activeTargets` contents to w according to https://prometheus.io/docs/prometheus/latest/querying/api/#targets
func (tsm *targetStatusMap) WriteActiveTargetsJSON(w io.Writer) {
tsm.mu.Lock()
type keyStatus struct {
key string
st targetStatus
}
kss := make([]keyStatus, 0, len(tsm.m))
for _, st := range tsm.m {
key := promLabelsString(st.sw.OriginalLabels)
kss = append(kss, keyStatus{
key: key,
st: st,
})
}
tsm.mu.Unlock()
sort.Slice(kss, func(i, j int) bool {
return kss[i].key < kss[j].key
})
fmt.Fprintf(w, `[`)
for i, ks := range kss {
st := ks.st
fmt.Fprintf(w, `{"discoveredLabels":`)
writeLabelsJSON(w, st.sw.OriginalLabels)
fmt.Fprintf(w, `,"labels":`)
labelsFinalized := promrelabel.FinalizeLabels(nil, st.sw.Labels)
writeLabelsJSON(w, labelsFinalized)
fmt.Fprintf(w, `,"scrapePool":%q`, st.sw.Job())
fmt.Fprintf(w, `,"scrapeUrl":%q`, st.sw.ScrapeURL)
errMsg := ""
if st.err != nil {
errMsg = st.err.Error()
}
fmt.Fprintf(w, `,"lastError":%q`, errMsg)
fmt.Fprintf(w, `,"lastScrape":%q`, time.Unix(st.scrapeTime/1000, (st.scrapeTime%1000)*1e6).Format(time.RFC3339Nano))
fmt.Fprintf(w, `,"lastScrapeDuration":%g`, (time.Millisecond * time.Duration(st.scrapeDuration)).Seconds())
state := "up"
if !st.up {
state = "down"
}
fmt.Fprintf(w, `,"health":%q}`, state)
if i+1 < len(kss) {
fmt.Fprintf(w, `,`)
}
}
fmt.Fprintf(w, `]`)
}
func writeLabelsJSON(w io.Writer, labels []prompbmarshal.Label) {
fmt.Fprintf(w, `{`)
for i, label := range labels {
fmt.Fprintf(w, "%q:%q", label.Name, label.Value)
if i+1 < len(labels) {
fmt.Fprintf(w, `,`)
}
}
fmt.Fprintf(w, `}`)
}
func (tsm *targetStatusMap) WriteHumanReadable(w io.Writer, showOriginalLabels bool) { func (tsm *targetStatusMap) WriteHumanReadable(w io.Writer, showOriginalLabels bool) {
byJob := make(map[string][]targetStatus) byJob := make(map[string][]targetStatus)
tsm.mu.Lock() tsm.mu.Lock()
@ -143,3 +227,69 @@ type targetStatus struct {
func (st *targetStatus) getDurationFromLastScrape() time.Duration { func (st *targetStatus) getDurationFromLastScrape() time.Duration {
return time.Since(time.Unix(st.scrapeTime/1000, (st.scrapeTime%1000)*1e6)) return time.Since(time.Unix(st.scrapeTime/1000, (st.scrapeTime%1000)*1e6))
} }
type droppedTargets struct {
mu sync.Mutex
m map[string]droppedTarget
lastCleanupTime uint64
}
type droppedTarget struct {
originalLabels []prompbmarshal.Label
deadline uint64
}
func (dt *droppedTargets) Register(originalLabels []prompbmarshal.Label) {
key := promLabelsString(originalLabels)
currentTime := fasttime.UnixTimestamp()
dt.mu.Lock()
dt.m[key] = droppedTarget{
originalLabels: originalLabels,
deadline: currentTime + 10*60,
}
if currentTime-dt.lastCleanupTime > 60 {
for k, v := range dt.m {
if currentTime > v.deadline {
delete(dt.m, k)
}
}
dt.lastCleanupTime = currentTime
}
dt.mu.Unlock()
}
// WriteDroppedTargetsJSON writes `droppedTargets` contents to w according to https://prometheus.io/docs/prometheus/latest/querying/api/#targets
func (dt *droppedTargets) WriteDroppedTargetsJSON(w io.Writer) {
dt.mu.Lock()
type keyStatus struct {
key string
originalLabels []prompbmarshal.Label
}
kss := make([]keyStatus, 0, len(dt.m))
for _, v := range dt.m {
key := promLabelsString(v.originalLabels)
kss = append(kss, keyStatus{
key: key,
originalLabels: v.originalLabels,
})
}
dt.mu.Unlock()
sort.Slice(kss, func(i, j int) bool {
return kss[i].key < kss[j].key
})
fmt.Fprintf(w, `[`)
for i, ks := range kss {
fmt.Fprintf(w, `{"discoveredLabels":`)
writeLabelsJSON(w, ks.originalLabels)
fmt.Fprintf(w, `}`)
if i+1 < len(kss) {
fmt.Fprintf(w, `,`)
}
}
fmt.Fprintf(w, `]`)
}
var droppedTargetsMap = &droppedTargets{
m: make(map[string]droppedTarget),
}

View file

@ -23,7 +23,8 @@ var (
// ParseStream parses csv from req and calls callback for the parsed rows. // ParseStream parses csv from req and calls callback for the parsed rows.
// //
// The callback can be called multiple times for streamed data from req. // The callback can be called concurrently multiple times for streamed data from req.
// The callback can be called after ParseStream returns.
// //
// callback shouldn't hold rows after returning. // callback shouldn't hold rows after returning.
func ParseStream(req *http.Request, callback func(rows []Row) error) error { func ParseStream(req *http.Request, callback func(rows []Row) error) error {

View file

@ -23,7 +23,8 @@ var (
// ParseStream parses Graphite lines from r and calls callback for the parsed rows. // ParseStream parses Graphite lines from r and calls callback for the parsed rows.
// //
// The callback can be called multiple times for streamed data from r. // The callback can be called concurrently multiple times for streamed data from r.
// The callback can be called after ParseStream returns.
// //
// callback shouldn't hold rows after returning. // callback shouldn't hold rows after returning.
func ParseStream(r io.Reader, callback func(rows []Row) error) error { func ParseStream(r io.Reader, callback func(rows []Row) error) error {

View file

@ -24,7 +24,8 @@ var (
// ParseStream parses r with the given args and calls callback for the parsed rows. // ParseStream parses r with the given args and calls callback for the parsed rows.
// //
// The callback can be called multiple times for streamed data from r. // The callback can be called concurrently multiple times for streamed data from r.
// The callback can be called after ParseStream returns.
// //
// callback shouldn't hold rows after returning. // callback shouldn't hold rows after returning.
func ParseStream(r io.Reader, isGzipped bool, precision, db string, callback func(db string, rows []Row) error) error { func ParseStream(r io.Reader, isGzipped bool, precision, db string, callback func(db string, rows []Row) error) error {

View file

@ -17,7 +17,8 @@ import (
// ParseStream parses /api/v1/import/native lines from req and calls callback for parsed blocks. // ParseStream parses /api/v1/import/native lines from req and calls callback for parsed blocks.
// //
// The callback can be called multiple times for streamed data from req. // The callback can be called concurrently multiple times for streamed data from req.
// The callback can be called after ParseStream returns.
// //
// callback shouldn't hold block after returning. // callback shouldn't hold block after returning.
// callback can be called in parallel from multiple concurrent goroutines. // callback can be called in parallel from multiple concurrent goroutines.

View file

@ -23,7 +23,8 @@ var (
// ParseStream parses OpenTSDB lines from r and calls callback for the parsed rows. // ParseStream parses OpenTSDB lines from r and calls callback for the parsed rows.
// //
// The callback can be called multiple times for streamed data from r. // The callback can be called concurrently multiple times for streamed data from r.
// The callback can be called after ParseStream returns.
// //
// callback shouldn't hold rows after returning. // callback shouldn't hold rows after returning.
func ParseStream(r io.Reader, callback func(rows []Row) error) error { func ParseStream(r io.Reader, callback func(rows []Row) error) error {

View file

@ -26,7 +26,8 @@ var (
// ParseStream parses OpenTSDB http lines from req and calls callback for the parsed rows. // ParseStream parses OpenTSDB http lines from req and calls callback for the parsed rows.
// //
// The callback can be called multiple times for streamed data from req. // The callback can be called concurrently multiple times for streamed data from req.
// The callback can be called after ParseStream returns.
// //
// callback shouldn't hold rows after returning. // callback shouldn't hold rows after returning.
func ParseStream(req *http.Request, callback func(rows []Row) error) error { func ParseStream(req *http.Request, callback func(rows []Row) error) error {

View file

@ -16,7 +16,8 @@ import (
// ParseStream parses lines with Prometheus exposition format from r and calls callback for the parsed rows. // ParseStream parses lines with Prometheus exposition format from r and calls callback for the parsed rows.
// //
// The callback can be called multiple times for streamed data from r. // The callback can be called concurrently multiple times for streamed data from r.
// It is guaranteed that the callback isn't called after ParseStream returns.
// //
// callback shouldn't hold rows after returning. // callback shouldn't hold rows after returning.
func ParseStream(r io.Reader, defaultTimestamp int64, isGzipped bool, callback func(rows []Row) error) error { func ParseStream(r io.Reader, defaultTimestamp int64, isGzipped bool, callback func(rows []Row) error) error {
@ -32,11 +33,17 @@ func ParseStream(r io.Reader, defaultTimestamp int64, isGzipped bool, callback f
defer putStreamContext(ctx) defer putStreamContext(ctx)
for ctx.Read() { for ctx.Read() {
uw := getUnmarshalWork() uw := getUnmarshalWork()
uw.callback = callback uw.callback = func(rows []Row) error {
err := callback(rows)
ctx.wg.Done()
return err
}
uw.defaultTimestamp = defaultTimestamp uw.defaultTimestamp = defaultTimestamp
uw.reqBuf, ctx.reqBuf = ctx.reqBuf, uw.reqBuf uw.reqBuf, ctx.reqBuf = ctx.reqBuf, uw.reqBuf
ctx.wg.Add(1)
common.ScheduleUnmarshalWork(uw) common.ScheduleUnmarshalWork(uw)
} }
ctx.wg.Wait() // wait for all the outstanding callback calls before returning
return ctx.Error() return ctx.Error()
} }
@ -61,6 +68,8 @@ type streamContext struct {
reqBuf []byte reqBuf []byte
tailBuf []byte tailBuf []byte
err error err error
wg sync.WaitGroup
} }
func (ctx *streamContext) Error() error { func (ctx *streamContext) Error() error {

View file

@ -21,6 +21,9 @@ var maxInsertRequestSize = flagutil.NewBytes("maxInsertRequestSize", 32*1024*102
// ParseStream parses Prometheus remote_write message req and calls callback for the parsed timeseries. // ParseStream parses Prometheus remote_write message req and calls callback for the parsed timeseries.
// //
// The callback can be called concurrently multiple times for streamed data from req.
// The callback can be called after ParseStream returns.
//
// callback shouldn't hold tss after returning. // callback shouldn't hold tss after returning.
func ParseStream(req *http.Request, callback func(tss []prompb.TimeSeries) error) error { func ParseStream(req *http.Request, callback func(tss []prompb.TimeSeries) error) error {
ctx := getPushCtx(req.Body) ctx := getPushCtx(req.Body)

View file

@ -20,10 +20,10 @@ var maxLineLen = flagutil.NewBytes("import.maxLineLen", 100*1024*1024, "The maxi
// ParseStream parses /api/v1/import lines from req and calls callback for the parsed rows. // ParseStream parses /api/v1/import lines from req and calls callback for the parsed rows.
// //
// The callback can be called multiple times for streamed data from req. // The callback can be called concurrently multiple times for streamed data from req.
// The callback can be called after ParseStream returns.
// //
// callback shouldn't hold rows after returning. // callback shouldn't hold rows after returning.
// callback is called from multiple concurrent goroutines.
func ParseStream(req *http.Request, callback func(rows []Row) error) error { func ParseStream(req *http.Request, callback func(rows []Row) error) error {
r := req.Body r := req.Body
if req.Header.Get("Content-Encoding") == "gzip" { if req.Header.Get("Content-Encoding") == "gzip" {

View file

@ -23,7 +23,7 @@ const (
type Block struct { type Block struct {
bh blockHeader bh blockHeader
// nextIdx is the next row index for timestamps and values. // nextIdx is the next index for reading timestamps and values.
nextIdx int nextIdx int
timestamps []int64 timestamps []int64

View file

@ -15,12 +15,12 @@ import (
// //
// rowsMerged is atomically updated with the number of merged rows during the merge. // rowsMerged is atomically updated with the number of merged rows during the merge.
func mergeBlockStreams(ph *partHeader, bsw *blockStreamWriter, bsrs []*blockStreamReader, stopCh <-chan struct{}, func mergeBlockStreams(ph *partHeader, bsw *blockStreamWriter, bsrs []*blockStreamReader, stopCh <-chan struct{},
dmis *uint64set.Set, rowsMerged, rowsDeleted *uint64) error { dmis *uint64set.Set, retentionDeadline int64, rowsMerged, rowsDeleted *uint64) error {
ph.Reset() ph.Reset()
bsm := bsmPool.Get().(*blockStreamMerger) bsm := bsmPool.Get().(*blockStreamMerger)
bsm.Init(bsrs) bsm.Init(bsrs)
err := mergeBlockStreamsInternal(ph, bsw, bsm, stopCh, dmis, rowsMerged, rowsDeleted) err := mergeBlockStreamsInternal(ph, bsw, bsm, stopCh, dmis, retentionDeadline, rowsMerged, rowsDeleted)
bsm.reset() bsm.reset()
bsmPool.Put(bsm) bsmPool.Put(bsm)
bsw.MustClose() bsw.MustClose()
@ -39,29 +39,10 @@ var bsmPool = &sync.Pool{
var errForciblyStopped = fmt.Errorf("forcibly stopped") var errForciblyStopped = fmt.Errorf("forcibly stopped")
func mergeBlockStreamsInternal(ph *partHeader, bsw *blockStreamWriter, bsm *blockStreamMerger, stopCh <-chan struct{}, func mergeBlockStreamsInternal(ph *partHeader, bsw *blockStreamWriter, bsm *blockStreamMerger, stopCh <-chan struct{},
dmis *uint64set.Set, rowsMerged, rowsDeleted *uint64) error { dmis *uint64set.Set, retentionDeadline int64, rowsMerged, rowsDeleted *uint64) error {
// Search for the first block to merge pendingBlockIsEmpty := true
var pendingBlock *Block pendingBlock := getBlock()
for bsm.NextBlock() { defer putBlock(pendingBlock)
select {
case <-stopCh:
return errForciblyStopped
default:
}
if dmis.Has(bsm.Block.bh.TSID.MetricID) {
// Skip blocks for deleted metrics.
*rowsDeleted += uint64(bsm.Block.bh.RowsCount)
continue
}
pendingBlock = getBlock()
pendingBlock.CopyFrom(bsm.Block)
break
}
if pendingBlock != nil {
defer putBlock(pendingBlock)
}
// Merge blocks.
tmpBlock := getBlock() tmpBlock := getBlock()
defer putBlock(tmpBlock) defer putBlock(tmpBlock)
for bsm.NextBlock() { for bsm.NextBlock() {
@ -75,6 +56,17 @@ func mergeBlockStreamsInternal(ph *partHeader, bsw *blockStreamWriter, bsm *bloc
*rowsDeleted += uint64(bsm.Block.bh.RowsCount) *rowsDeleted += uint64(bsm.Block.bh.RowsCount)
continue continue
} }
if bsm.Block.bh.MaxTimestamp < retentionDeadline {
// Skip blocks out of the given retention.
*rowsDeleted += uint64(bsm.Block.bh.RowsCount)
continue
}
if pendingBlockIsEmpty {
// Load the next block if pendingBlock is empty.
pendingBlock.CopyFrom(bsm.Block)
pendingBlockIsEmpty = false
continue
}
// Verify whether pendingBlock may be merged with bsm.Block (the current block). // Verify whether pendingBlock may be merged with bsm.Block (the current block).
if pendingBlock.bh.TSID.MetricID != bsm.Block.bh.TSID.MetricID { if pendingBlock.bh.TSID.MetricID != bsm.Block.bh.TSID.MetricID {
@ -104,16 +96,20 @@ func mergeBlockStreamsInternal(ph *partHeader, bsw *blockStreamWriter, bsm *bloc
tmpBlock.bh.TSID = bsm.Block.bh.TSID tmpBlock.bh.TSID = bsm.Block.bh.TSID
tmpBlock.bh.Scale = bsm.Block.bh.Scale tmpBlock.bh.Scale = bsm.Block.bh.Scale
tmpBlock.bh.PrecisionBits = minUint8(pendingBlock.bh.PrecisionBits, bsm.Block.bh.PrecisionBits) tmpBlock.bh.PrecisionBits = minUint8(pendingBlock.bh.PrecisionBits, bsm.Block.bh.PrecisionBits)
mergeBlocks(tmpBlock, pendingBlock, bsm.Block) mergeBlocks(tmpBlock, pendingBlock, bsm.Block, retentionDeadline, rowsDeleted)
if len(tmpBlock.timestamps) <= maxRowsPerBlock { if len(tmpBlock.timestamps) <= maxRowsPerBlock {
// More entries may be added to tmpBlock. Swap it with pendingBlock, // More entries may be added to tmpBlock. Swap it with pendingBlock,
// so more entries may be added to pendingBlock on the next iteration. // so more entries may be added to pendingBlock on the next iteration.
tmpBlock.fixupTimestamps() if len(tmpBlock.timestamps) > 0 {
tmpBlock.fixupTimestamps()
} else {
pendingBlockIsEmpty = true
}
pendingBlock, tmpBlock = tmpBlock, pendingBlock pendingBlock, tmpBlock = tmpBlock, pendingBlock
continue continue
} }
// Write the first len(maxRowsPerBlock) of tmpBlock.timestamps to bsw, // Write the first maxRowsPerBlock of tmpBlock.timestamps to bsw,
// leave the rest in pendingBlock. // leave the rest in pendingBlock.
tmpBlock.nextIdx = maxRowsPerBlock tmpBlock.nextIdx = maxRowsPerBlock
pendingBlock.CopyFrom(tmpBlock) pendingBlock.CopyFrom(tmpBlock)
@ -127,18 +123,21 @@ func mergeBlockStreamsInternal(ph *partHeader, bsw *blockStreamWriter, bsm *bloc
if err := bsm.Error(); err != nil { if err := bsm.Error(); err != nil {
return fmt.Errorf("cannot read block to be merged: %w", err) return fmt.Errorf("cannot read block to be merged: %w", err)
} }
if pendingBlock != nil { if !pendingBlockIsEmpty {
bsw.WriteExternalBlock(pendingBlock, ph, rowsMerged) bsw.WriteExternalBlock(pendingBlock, ph, rowsMerged)
} }
return nil return nil
} }
// mergeBlocks merges ib1 and ib2 to ob. // mergeBlocks merges ib1 and ib2 to ob.
func mergeBlocks(ob, ib1, ib2 *Block) { func mergeBlocks(ob, ib1, ib2 *Block, retentionDeadline int64, rowsDeleted *uint64) {
ib1.assertMergeable(ib2) ib1.assertMergeable(ib2)
ib1.assertUnmarshaled() ib1.assertUnmarshaled()
ib2.assertUnmarshaled() ib2.assertUnmarshaled()
skipSamplesOutsideRetention(ib1, retentionDeadline, rowsDeleted)
skipSamplesOutsideRetention(ib2, retentionDeadline, rowsDeleted)
if ib1.bh.MaxTimestamp < ib2.bh.MinTimestamp { if ib1.bh.MaxTimestamp < ib2.bh.MinTimestamp {
// Fast path - ib1 values have smaller timestamps than ib2 values. // Fast path - ib1 values have smaller timestamps than ib2 values.
appendRows(ob, ib1) appendRows(ob, ib1)
@ -176,6 +175,16 @@ func mergeBlocks(ob, ib1, ib2 *Block) {
} }
} }
func skipSamplesOutsideRetention(b *Block, retentionDeadline int64, rowsDeleted *uint64) {
timestamps := b.timestamps
nextIdx := b.nextIdx
for nextIdx < len(timestamps) && timestamps[nextIdx] < retentionDeadline {
nextIdx++
}
*rowsDeleted += uint64(nextIdx - b.nextIdx)
b.nextIdx = nextIdx
}
func appendRows(ob, ib *Block) { func appendRows(ob, ib *Block) {
ob.timestamps = append(ob.timestamps, ib.timestamps[ib.nextIdx:]...) ob.timestamps = append(ob.timestamps, ib.timestamps[ib.nextIdx:]...)
ob.values = append(ob.values, ib.values[ib.nextIdx:]...) ob.values = append(ob.values, ib.values[ib.nextIdx:]...)
@ -189,7 +198,7 @@ func unmarshalAndCalibrateScale(b1, b2 *Block) error {
return err return err
} }
scale := decimal.CalibrateScale(b1.values, b1.bh.Scale, b2.values, b2.bh.Scale) scale := decimal.CalibrateScale(b1.values[b1.nextIdx:], b1.bh.Scale, b2.values[b2.nextIdx:], b2.bh.Scale)
b1.bh.Scale = scale b1.bh.Scale = scale
b2.bh.Scale = scale b2.bh.Scale = scale
return nil return nil

View file

@ -365,7 +365,7 @@ func TestMergeForciblyStop(t *testing.T) {
ch := make(chan struct{}) ch := make(chan struct{})
var rowsMerged, rowsDeleted uint64 var rowsMerged, rowsDeleted uint64
close(ch) close(ch)
if err := mergeBlockStreams(&mp.ph, &bsw, bsrs, ch, nil, &rowsMerged, &rowsDeleted); !errors.Is(err, errForciblyStopped) { if err := mergeBlockStreams(&mp.ph, &bsw, bsrs, ch, nil, 0, &rowsMerged, &rowsDeleted); !errors.Is(err, errForciblyStopped) {
t.Fatalf("unexpected error in mergeBlockStreams: got %v; want %v", err, errForciblyStopped) t.Fatalf("unexpected error in mergeBlockStreams: got %v; want %v", err, errForciblyStopped)
} }
if rowsMerged != 0 { if rowsMerged != 0 {
@ -385,7 +385,7 @@ func testMergeBlockStreams(t *testing.T, bsrs []*blockStreamReader, expectedBloc
bsw.InitFromInmemoryPart(&mp) bsw.InitFromInmemoryPart(&mp)
var rowsMerged, rowsDeleted uint64 var rowsMerged, rowsDeleted uint64
if err := mergeBlockStreams(&mp.ph, &bsw, bsrs, nil, nil, &rowsMerged, &rowsDeleted); err != nil { if err := mergeBlockStreams(&mp.ph, &bsw, bsrs, nil, nil, 0, &rowsMerged, &rowsDeleted); err != nil {
t.Fatalf("unexpected error in mergeBlockStreams: %s", err) t.Fatalf("unexpected error in mergeBlockStreams: %s", err)
} }

View file

@ -41,7 +41,7 @@ func benchmarkMergeBlockStreams(b *testing.B, mps []*inmemoryPart, rowsPerLoop i
} }
mpOut.Reset() mpOut.Reset()
bsw.InitFromInmemoryPart(&mpOut) bsw.InitFromInmemoryPart(&mpOut)
if err := mergeBlockStreams(&mpOut.ph, &bsw, bsrs, nil, nil, &rowsMerged, &rowsDeleted); err != nil { if err := mergeBlockStreams(&mpOut.ph, &bsw, bsrs, nil, nil, 0, &rowsMerged, &rowsDeleted); err != nil {
panic(fmt.Errorf("cannot merge block streams: %w", err)) panic(fmt.Errorf("cannot merge block streams: %w", err))
} }
} }

View file

@ -196,6 +196,10 @@ func (mn *MetricName) CopyFrom(src *MetricName) {
// AddTag adds new tag to mn with the given key and value. // AddTag adds new tag to mn with the given key and value.
func (mn *MetricName) AddTag(key, value string) { func (mn *MetricName) AddTag(key, value string) {
if key == string(metricGroupTagKey) {
mn.MetricGroup = append(mn.MetricGroup, value...)
return
}
tag := mn.addNextTag() tag := mn.addNextTag()
tag.Key = append(tag.Key[:0], key...) tag.Key = append(tag.Key[:0], key...)
tag.Value = append(tag.Value[:0], value...) tag.Value = append(tag.Value[:0], value...)
@ -203,6 +207,10 @@ func (mn *MetricName) AddTag(key, value string) {
// AddTagBytes adds new tag to mn with the given key and value. // AddTagBytes adds new tag to mn with the given key and value.
func (mn *MetricName) AddTagBytes(key, value []byte) { func (mn *MetricName) AddTagBytes(key, value []byte) {
if string(key) == string(metricGroupTagKey) {
mn.MetricGroup = append(mn.MetricGroup, value...)
return
}
tag := mn.addNextTag() tag := mn.addNextTag()
tag.Key = append(tag.Key[:0], key...) tag.Key = append(tag.Key[:0], key...)
tag.Value = append(tag.Value[:0], value...) tag.Value = append(tag.Value[:0], value...)

View file

@ -134,6 +134,10 @@ type partition struct {
// The callack that returns deleted metric ids which must be skipped during merge. // The callack that returns deleted metric ids which must be skipped during merge.
getDeletedMetricIDs func() *uint64set.Set getDeletedMetricIDs func() *uint64set.Set
// data retention in milliseconds.
// Used for deleting data outside the retention during background merge.
retentionMsecs int64
// Name is the name of the partition in the form YYYY_MM. // Name is the name of the partition in the form YYYY_MM.
name string name string
@ -206,7 +210,7 @@ func (pw *partWrapper) decRef() {
// createPartition creates new partition for the given timestamp and the given paths // createPartition creates new partition for the given timestamp and the given paths
// to small and big partitions. // to small and big partitions.
func createPartition(timestamp int64, smallPartitionsPath, bigPartitionsPath string, getDeletedMetricIDs func() *uint64set.Set) (*partition, error) { func createPartition(timestamp int64, smallPartitionsPath, bigPartitionsPath string, getDeletedMetricIDs func() *uint64set.Set, retentionMsecs int64) (*partition, error) {
name := timestampToPartitionName(timestamp) name := timestampToPartitionName(timestamp)
smallPartsPath := filepath.Clean(smallPartitionsPath) + "/" + name smallPartsPath := filepath.Clean(smallPartitionsPath) + "/" + name
bigPartsPath := filepath.Clean(bigPartitionsPath) + "/" + name bigPartsPath := filepath.Clean(bigPartitionsPath) + "/" + name
@ -219,7 +223,7 @@ func createPartition(timestamp int64, smallPartitionsPath, bigPartitionsPath str
return nil, fmt.Errorf("cannot create directories for big parts %q: %w", bigPartsPath, err) return nil, fmt.Errorf("cannot create directories for big parts %q: %w", bigPartsPath, err)
} }
pt := newPartition(name, smallPartsPath, bigPartsPath, getDeletedMetricIDs) pt := newPartition(name, smallPartsPath, bigPartsPath, getDeletedMetricIDs, retentionMsecs)
pt.tr.fromPartitionTimestamp(timestamp) pt.tr.fromPartitionTimestamp(timestamp)
pt.startMergeWorkers() pt.startMergeWorkers()
pt.startRawRowsFlusher() pt.startRawRowsFlusher()
@ -241,7 +245,7 @@ func (pt *partition) Drop() {
} }
// openPartition opens the existing partition from the given paths. // openPartition opens the existing partition from the given paths.
func openPartition(smallPartsPath, bigPartsPath string, getDeletedMetricIDs func() *uint64set.Set) (*partition, error) { func openPartition(smallPartsPath, bigPartsPath string, getDeletedMetricIDs func() *uint64set.Set, retentionMsecs int64) (*partition, error) {
smallPartsPath = filepath.Clean(smallPartsPath) smallPartsPath = filepath.Clean(smallPartsPath)
bigPartsPath = filepath.Clean(bigPartsPath) bigPartsPath = filepath.Clean(bigPartsPath)
@ -265,7 +269,7 @@ func openPartition(smallPartsPath, bigPartsPath string, getDeletedMetricIDs func
return nil, fmt.Errorf("cannot open big parts from %q: %w", bigPartsPath, err) return nil, fmt.Errorf("cannot open big parts from %q: %w", bigPartsPath, err)
} }
pt := newPartition(name, smallPartsPath, bigPartsPath, getDeletedMetricIDs) pt := newPartition(name, smallPartsPath, bigPartsPath, getDeletedMetricIDs, retentionMsecs)
pt.smallParts = smallParts pt.smallParts = smallParts
pt.bigParts = bigParts pt.bigParts = bigParts
if err := pt.tr.fromPartitionName(name); err != nil { if err := pt.tr.fromPartitionName(name); err != nil {
@ -278,13 +282,14 @@ func openPartition(smallPartsPath, bigPartsPath string, getDeletedMetricIDs func
return pt, nil return pt, nil
} }
func newPartition(name, smallPartsPath, bigPartsPath string, getDeletedMetricIDs func() *uint64set.Set) *partition { func newPartition(name, smallPartsPath, bigPartsPath string, getDeletedMetricIDs func() *uint64set.Set, retentionMsecs int64) *partition {
p := &partition{ p := &partition{
name: name, name: name,
smallPartsPath: smallPartsPath, smallPartsPath: smallPartsPath,
bigPartsPath: bigPartsPath, bigPartsPath: bigPartsPath,
getDeletedMetricIDs: getDeletedMetricIDs, getDeletedMetricIDs: getDeletedMetricIDs,
retentionMsecs: retentionMsecs,
mergeIdx: uint64(time.Now().UnixNano()), mergeIdx: uint64(time.Now().UnixNano()),
stopCh: make(chan struct{}), stopCh: make(chan struct{}),
@ -1129,7 +1134,8 @@ func (pt *partition) mergeParts(pws []*partWrapper, stopCh <-chan struct{}) erro
atomic.AddUint64(&pt.smallMergesCount, 1) atomic.AddUint64(&pt.smallMergesCount, 1)
atomic.AddUint64(&pt.activeSmallMerges, 1) atomic.AddUint64(&pt.activeSmallMerges, 1)
} }
err := mergeBlockStreams(&ph, bsw, bsrs, stopCh, dmis, rowsMerged, rowsDeleted) retentionDeadline := timestampFromTime(startTime) - pt.retentionMsecs
err := mergeBlockStreams(&ph, bsw, bsrs, stopCh, dmis, retentionDeadline, rowsMerged, rowsDeleted)
if isBigPart { if isBigPart {
atomic.AddUint64(&pt.activeBigMerges, ^uint64(0)) atomic.AddUint64(&pt.activeBigMerges, ^uint64(0))
} else { } else {

View file

@ -167,7 +167,8 @@ func testPartitionSearchEx(t *testing.T, ptt int64, tr TimeRange, partsCount, ma
}) })
// Create partition from rowss and test search on it. // Create partition from rowss and test search on it.
pt, err := createPartition(ptt, "./small-table", "./big-table", nilGetDeletedMetricIDs) retentionMsecs := timestampFromTime(time.Now()) - ptr.MinTimestamp + 3600*1000
pt, err := createPartition(ptt, "./small-table", "./big-table", nilGetDeletedMetricIDs, retentionMsecs)
if err != nil { if err != nil {
t.Fatalf("cannot create partition: %s", err) t.Fatalf("cannot create partition: %s", err)
} }
@ -191,7 +192,7 @@ func testPartitionSearchEx(t *testing.T, ptt int64, tr TimeRange, partsCount, ma
pt.MustClose() pt.MustClose()
// Open the created partition and test search on it. // Open the created partition and test search on it.
pt, err = openPartition(smallPartsPath, bigPartsPath, nilGetDeletedMetricIDs) pt, err = openPartition(smallPartsPath, bigPartsPath, nilGetDeletedMetricIDs, retentionMsecs)
if err != nil { if err != nil {
t.Fatalf("cannot open partition: %s", err) t.Fatalf("cannot open partition: %s", err)
} }

View file

@ -27,7 +27,10 @@ import (
"github.com/VictoriaMetrics/fastcache" "github.com/VictoriaMetrics/fastcache"
) )
const maxRetentionMonths = 12 * 100 const (
msecsPerMonth = 31 * 24 * 3600 * 1000
maxRetentionMsecs = 100 * 12 * msecsPerMonth
)
// Storage represents TSDB storage. // Storage represents TSDB storage.
type Storage struct { type Storage struct {
@ -47,9 +50,9 @@ type Storage struct {
slowPerDayIndexInserts uint64 slowPerDayIndexInserts uint64
slowMetricNameLoads uint64 slowMetricNameLoads uint64
path string path string
cachePath string cachePath string
retentionMonths int retentionMsecs int64
// lock file for exclusive access to the storage on the given path. // lock file for exclusive access to the storage on the given path.
flockF *os.File flockF *os.File
@ -106,23 +109,19 @@ type Storage struct {
snapshotLock sync.Mutex snapshotLock sync.Mutex
} }
// OpenStorage opens storage on the given path with the given number of retention months. // OpenStorage opens storage on the given path with the given retentionMsecs.
func OpenStorage(path string, retentionMonths int) (*Storage, error) { func OpenStorage(path string, retentionMsecs int64) (*Storage, error) {
if retentionMonths > maxRetentionMonths {
return nil, fmt.Errorf("too big retentionMonths=%d; cannot exceed %d", retentionMonths, maxRetentionMonths)
}
if retentionMonths <= 0 {
retentionMonths = maxRetentionMonths
}
path, err := filepath.Abs(path) path, err := filepath.Abs(path)
if err != nil { if err != nil {
return nil, fmt.Errorf("cannot determine absolute path for %q: %w", path, err) return nil, fmt.Errorf("cannot determine absolute path for %q: %w", path, err)
} }
if retentionMsecs <= 0 {
retentionMsecs = maxRetentionMsecs
}
s := &Storage{ s := &Storage{
path: path, path: path,
cachePath: path + "/cache", cachePath: path + "/cache",
retentionMonths: retentionMonths, retentionMsecs: retentionMsecs,
stop: make(chan struct{}), stop: make(chan struct{}),
} }
@ -178,7 +177,7 @@ func OpenStorage(path string, retentionMonths int) (*Storage, error) {
// Load data // Load data
tablePath := path + "/data" tablePath := path + "/data"
tb, err := openTable(tablePath, retentionMonths, s.getDeletedMetricIDs) tb, err := openTable(tablePath, s.getDeletedMetricIDs, retentionMsecs)
if err != nil { if err != nil {
s.idb().MustClose() s.idb().MustClose()
return nil, fmt.Errorf("cannot open table at %q: %w", tablePath, err) return nil, fmt.Errorf("cannot open table at %q: %w", tablePath, err)
@ -473,8 +472,9 @@ func (s *Storage) startRetentionWatcher() {
} }
func (s *Storage) retentionWatcher() { func (s *Storage) retentionWatcher() {
retentionMonths := int((s.retentionMsecs + (msecsPerMonth - 1)) / msecsPerMonth)
for { for {
d := nextRetentionDuration(s.retentionMonths) d := nextRetentionDuration(retentionMonths)
select { select {
case <-s.stop: case <-s.stop:
return return

View file

@ -353,8 +353,8 @@ func TestStorageOpenMultipleTimes(t *testing.T) {
func TestStorageRandTimestamps(t *testing.T) { func TestStorageRandTimestamps(t *testing.T) {
path := "TestStorageRandTimestamps" path := "TestStorageRandTimestamps"
retentionMonths := 60 retentionMsecs := int64(60 * msecsPerMonth)
s, err := OpenStorage(path, retentionMonths) s, err := OpenStorage(path, retentionMsecs)
if err != nil { if err != nil {
t.Fatalf("cannot open storage: %s", err) t.Fatalf("cannot open storage: %s", err)
} }
@ -364,7 +364,7 @@ func TestStorageRandTimestamps(t *testing.T) {
t.Fatal(err) t.Fatal(err)
} }
s.MustClose() s.MustClose()
s, err = OpenStorage(path, retentionMonths) s, err = OpenStorage(path, retentionMsecs)
} }
}) })
t.Run("concurrent", func(t *testing.T) { t.Run("concurrent", func(t *testing.T) {

View file

@ -22,6 +22,7 @@ type table struct {
bigPartitionsPath string bigPartitionsPath string
getDeletedMetricIDs func() *uint64set.Set getDeletedMetricIDs func() *uint64set.Set
retentionMsecs int64
ptws []*partitionWrapper ptws []*partitionWrapper
ptwsLock sync.Mutex ptwsLock sync.Mutex
@ -30,8 +31,7 @@ type table struct {
stop chan struct{} stop chan struct{}
retentionMilliseconds int64 retentionWatcherWG sync.WaitGroup
retentionWatcherWG sync.WaitGroup
} }
// partitionWrapper provides refcounting mechanism for the partition. // partitionWrapper provides refcounting mechanism for the partition.
@ -77,12 +77,12 @@ func (ptw *partitionWrapper) scheduleToDrop() {
atomic.AddUint64(&ptw.mustDrop, 1) atomic.AddUint64(&ptw.mustDrop, 1)
} }
// openTable opens a table on the given path with the given retentionMonths. // openTable opens a table on the given path with the given retentionMsecs.
// //
// The table is created if it doesn't exist. // The table is created if it doesn't exist.
// //
// Data older than the retentionMonths may be dropped at any time. // Data older than the retentionMsecs may be dropped at any time.
func openTable(path string, retentionMonths int, getDeletedMetricIDs func() *uint64set.Set) (*table, error) { func openTable(path string, getDeletedMetricIDs func() *uint64set.Set, retentionMsecs int64) (*table, error) {
path = filepath.Clean(path) path = filepath.Clean(path)
// Create a directory for the table if it doesn't exist yet. // Create a directory for the table if it doesn't exist yet.
@ -115,7 +115,7 @@ func openTable(path string, retentionMonths int, getDeletedMetricIDs func() *uin
} }
// Open partitions. // Open partitions.
pts, err := openPartitions(smallPartitionsPath, bigPartitionsPath, getDeletedMetricIDs) pts, err := openPartitions(smallPartitionsPath, bigPartitionsPath, getDeletedMetricIDs, retentionMsecs)
if err != nil { if err != nil {
return nil, fmt.Errorf("cannot open partitions in the table %q: %w", path, err) return nil, fmt.Errorf("cannot open partitions in the table %q: %w", path, err)
} }
@ -125,6 +125,7 @@ func openTable(path string, retentionMonths int, getDeletedMetricIDs func() *uin
smallPartitionsPath: smallPartitionsPath, smallPartitionsPath: smallPartitionsPath,
bigPartitionsPath: bigPartitionsPath, bigPartitionsPath: bigPartitionsPath,
getDeletedMetricIDs: getDeletedMetricIDs, getDeletedMetricIDs: getDeletedMetricIDs,
retentionMsecs: retentionMsecs,
flockF: flockF, flockF: flockF,
@ -133,11 +134,6 @@ func openTable(path string, retentionMonths int, getDeletedMetricIDs func() *uin
for _, pt := range pts { for _, pt := range pts {
tb.addPartitionNolock(pt) tb.addPartitionNolock(pt)
} }
if retentionMonths <= 0 || retentionMonths > maxRetentionMonths {
retentionMonths = maxRetentionMonths
}
tb.retentionMilliseconds = int64(retentionMonths) * 31 * 24 * 3600 * 1e3
tb.startRetentionWatcher() tb.startRetentionWatcher()
return tb, nil return tb, nil
} }
@ -357,7 +353,7 @@ func (tb *table) AddRows(rows []rawRow) error {
continue continue
} }
pt, err := createPartition(r.Timestamp, tb.smallPartitionsPath, tb.bigPartitionsPath, tb.getDeletedMetricIDs) pt, err := createPartition(r.Timestamp, tb.smallPartitionsPath, tb.bigPartitionsPath, tb.getDeletedMetricIDs, tb.retentionMsecs)
if err != nil { if err != nil {
errors = append(errors, err) errors = append(errors, err)
continue continue
@ -376,7 +372,7 @@ func (tb *table) AddRows(rows []rawRow) error {
func (tb *table) getMinMaxTimestamps() (int64, int64) { func (tb *table) getMinMaxTimestamps() (int64, int64) {
now := int64(fasttime.UnixTimestamp() * 1000) now := int64(fasttime.UnixTimestamp() * 1000)
minTimestamp := now - tb.retentionMilliseconds minTimestamp := now - tb.retentionMsecs
maxTimestamp := now + 2*24*3600*1000 // allow max +2 days from now due to timezones shit :) maxTimestamp := now + 2*24*3600*1000 // allow max +2 days from now due to timezones shit :)
if minTimestamp < 0 { if minTimestamp < 0 {
// Negative timestamps aren't supported by the storage. // Negative timestamps aren't supported by the storage.
@ -406,7 +402,7 @@ func (tb *table) retentionWatcher() {
case <-ticker.C: case <-ticker.C:
} }
minTimestamp := int64(fasttime.UnixTimestamp()*1000) - tb.retentionMilliseconds minTimestamp := int64(fasttime.UnixTimestamp()*1000) - tb.retentionMsecs
var ptwsDrop []*partitionWrapper var ptwsDrop []*partitionWrapper
tb.ptwsLock.Lock() tb.ptwsLock.Lock()
dst := tb.ptws[:0] dst := tb.ptws[:0]
@ -457,7 +453,7 @@ func (tb *table) PutPartitions(ptws []*partitionWrapper) {
} }
} }
func openPartitions(smallPartitionsPath, bigPartitionsPath string, getDeletedMetricIDs func() *uint64set.Set) ([]*partition, error) { func openPartitions(smallPartitionsPath, bigPartitionsPath string, getDeletedMetricIDs func() *uint64set.Set, retentionMsecs int64) ([]*partition, error) {
// Certain partition directories in either `big` or `small` dir may be missing // Certain partition directories in either `big` or `small` dir may be missing
// after restoring from backup. So populate partition names from both dirs. // after restoring from backup. So populate partition names from both dirs.
ptNames := make(map[string]bool) ptNames := make(map[string]bool)
@ -471,7 +467,7 @@ func openPartitions(smallPartitionsPath, bigPartitionsPath string, getDeletedMet
for ptName := range ptNames { for ptName := range ptNames {
smallPartsPath := smallPartitionsPath + "/" + ptName smallPartsPath := smallPartitionsPath + "/" + ptName
bigPartsPath := bigPartitionsPath + "/" + ptName bigPartsPath := bigPartitionsPath + "/" + ptName
pt, err := openPartition(smallPartsPath, bigPartsPath, getDeletedMetricIDs) pt, err := openPartition(smallPartsPath, bigPartsPath, getDeletedMetricIDs, retentionMsecs)
if err != nil { if err != nil {
mustClosePartitions(pts) mustClosePartitions(pts)
return nil, fmt.Errorf("cannot open partition %q: %w", ptName, err) return nil, fmt.Errorf("cannot open partition %q: %w", ptName, err)

View file

@ -66,7 +66,7 @@ func (ts *tableSearch) Init(tb *table, tsids []TSID, tr TimeRange) {
// Adjust tr.MinTimestamp, so it doesn't obtain data older // Adjust tr.MinTimestamp, so it doesn't obtain data older
// than the tb retention. // than the tb retention.
now := int64(fasttime.UnixTimestamp() * 1000) now := int64(fasttime.UnixTimestamp() * 1000)
minTimestamp := now - tb.retentionMilliseconds minTimestamp := now - tb.retentionMsecs
if tr.MinTimestamp < minTimestamp { if tr.MinTimestamp < minTimestamp {
tr.MinTimestamp = minTimestamp tr.MinTimestamp = minTimestamp
} }

View file

@ -181,7 +181,7 @@ func testTableSearchEx(t *testing.T, trData, trSearch TimeRange, partitionsCount
}) })
// Create a table from rowss and test search on it. // Create a table from rowss and test search on it.
tb, err := openTable("./test-table", -1, nilGetDeletedMetricIDs) tb, err := openTable("./test-table", nilGetDeletedMetricIDs, maxRetentionMsecs)
if err != nil { if err != nil {
t.Fatalf("cannot create table: %s", err) t.Fatalf("cannot create table: %s", err)
} }
@ -202,7 +202,7 @@ func testTableSearchEx(t *testing.T, trData, trSearch TimeRange, partitionsCount
tb.MustClose() tb.MustClose()
// Open the created table and test search on it. // Open the created table and test search on it.
tb, err = openTable("./test-table", -1, nilGetDeletedMetricIDs) tb, err = openTable("./test-table", nilGetDeletedMetricIDs, maxRetentionMsecs)
if err != nil { if err != nil {
t.Fatalf("cannot open table: %s", err) t.Fatalf("cannot open table: %s", err)
} }

View file

@ -47,7 +47,7 @@ func openBenchTable(b *testing.B, startTimestamp int64, rowsPerInsert, rowsCount
createBenchTable(b, path, startTimestamp, rowsPerInsert, rowsCount, tsidsCount) createBenchTable(b, path, startTimestamp, rowsPerInsert, rowsCount, tsidsCount)
createdBenchTables[path] = true createdBenchTables[path] = true
} }
tb, err := openTable(path, -1, nilGetDeletedMetricIDs) tb, err := openTable(path, nilGetDeletedMetricIDs, maxRetentionMsecs)
if err != nil { if err != nil {
b.Fatalf("cnanot open table %q: %s", path, err) b.Fatalf("cnanot open table %q: %s", path, err)
} }
@ -70,7 +70,7 @@ var createdBenchTables = make(map[string]bool)
func createBenchTable(b *testing.B, path string, startTimestamp int64, rowsPerInsert, rowsCount, tsidsCount int) { func createBenchTable(b *testing.B, path string, startTimestamp int64, rowsPerInsert, rowsCount, tsidsCount int) {
b.Helper() b.Helper()
tb, err := openTable(path, -1, nilGetDeletedMetricIDs) tb, err := openTable(path, nilGetDeletedMetricIDs, maxRetentionMsecs)
if err != nil { if err != nil {
b.Fatalf("cannot open table %q: %s", path, err) b.Fatalf("cannot open table %q: %s", path, err)
} }

View file

@ -7,7 +7,7 @@ import (
func TestTableOpenClose(t *testing.T) { func TestTableOpenClose(t *testing.T) {
const path = "TestTableOpenClose" const path = "TestTableOpenClose"
const retentionMonths = 123 const retentionMsecs = 123 * msecsPerMonth
if err := os.RemoveAll(path); err != nil { if err := os.RemoveAll(path); err != nil {
t.Fatalf("cannot remove %q: %s", path, err) t.Fatalf("cannot remove %q: %s", path, err)
@ -17,7 +17,7 @@ func TestTableOpenClose(t *testing.T) {
}() }()
// Create a new table // Create a new table
tb, err := openTable(path, retentionMonths, nilGetDeletedMetricIDs) tb, err := openTable(path, nilGetDeletedMetricIDs, retentionMsecs)
if err != nil { if err != nil {
t.Fatalf("cannot create new table: %s", err) t.Fatalf("cannot create new table: %s", err)
} }
@ -27,7 +27,7 @@ func TestTableOpenClose(t *testing.T) {
// Re-open created table multiple times. // Re-open created table multiple times.
for i := 0; i < 10; i++ { for i := 0; i < 10; i++ {
tb, err := openTable(path, retentionMonths, nilGetDeletedMetricIDs) tb, err := openTable(path, nilGetDeletedMetricIDs, retentionMsecs)
if err != nil { if err != nil {
t.Fatalf("cannot open created table: %s", err) t.Fatalf("cannot open created table: %s", err)
} }
@ -37,20 +37,20 @@ func TestTableOpenClose(t *testing.T) {
func TestTableOpenMultipleTimes(t *testing.T) { func TestTableOpenMultipleTimes(t *testing.T) {
const path = "TestTableOpenMultipleTimes" const path = "TestTableOpenMultipleTimes"
const retentionMonths = 123 const retentionMsecs = 123 * msecsPerMonth
defer func() { defer func() {
_ = os.RemoveAll(path) _ = os.RemoveAll(path)
}() }()
tb1, err := openTable(path, retentionMonths, nilGetDeletedMetricIDs) tb1, err := openTable(path, nilGetDeletedMetricIDs, retentionMsecs)
if err != nil { if err != nil {
t.Fatalf("cannot open table the first time: %s", err) t.Fatalf("cannot open table the first time: %s", err)
} }
defer tb1.MustClose() defer tb1.MustClose()
for i := 0; i < 10; i++ { for i := 0; i < 10; i++ {
tb2, err := openTable(path, retentionMonths, nilGetDeletedMetricIDs) tb2, err := openTable(path, nilGetDeletedMetricIDs, retentionMsecs)
if err == nil { if err == nil {
tb2.MustClose() tb2.MustClose()
t.Fatalf("expecting non-nil error when opening already opened table") t.Fatalf("expecting non-nil error when opening already opened table")

View file

@ -45,7 +45,7 @@ func benchmarkTableAddRows(b *testing.B, rowsPerInsert, tsidsCount int) {
b.SetBytes(int64(rowsCountExpected)) b.SetBytes(int64(rowsCountExpected))
tablePath := "./benchmarkTableAddRows" tablePath := "./benchmarkTableAddRows"
for i := 0; i < b.N; i++ { for i := 0; i < b.N; i++ {
tb, err := openTable(tablePath, -1, nilGetDeletedMetricIDs) tb, err := openTable(tablePath, nilGetDeletedMetricIDs, maxRetentionMsecs)
if err != nil { if err != nil {
b.Fatalf("cannot open table %q: %s", tablePath, err) b.Fatalf("cannot open table %q: %s", tablePath, err)
} }
@ -93,7 +93,7 @@ func benchmarkTableAddRows(b *testing.B, rowsPerInsert, tsidsCount int) {
tb.MustClose() tb.MustClose()
// Open the table from files and verify the rows count on it // Open the table from files and verify the rows count on it
tb, err = openTable(tablePath, -1, nilGetDeletedMetricIDs) tb, err = openTable(tablePath, nilGetDeletedMetricIDs, maxRetentionMsecs)
if err != nil { if err != nil {
b.Fatalf("cannot open table %q: %s", tablePath, err) b.Fatalf("cannot open table %q: %s", tablePath, err)
} }

11
ports/OpenBSD/README.md Normal file
View file

@ -0,0 +1,11 @@
# OpenBSD ports
Tested with Release 6.7
The VictoriaMetrics DB must be place in `/usr/ports/sysutils` directory
and the file `/usr/ports/infrastructure/db/user.list`
should be modified with a new line
```
855 _vmetrics _vmetrics sysutils/VictoriaMetrics
```

View file

@ -0,0 +1,38 @@
# $OpenBSD$
COMMENT = fast, cost-effective and scalable time series database
GH_ACCOUNT = VictoriaMetrics
GH_PROJECT = VictoriaMetrics
GH_TAGNAME = v1.44.0
CATEGORIES = sysutils
HOMEPAGE = https://victoriametrics.com/
MAINTAINER = VictoriaMetrics <info@victoriametrics.com>
# Apache License 2.0
PERMIT_PACKAGE = Yes
WANTLIB = c pthread
USE_GMAKE = Yes
MODULES= lang/go
MODGO_GOPATH= ${MODGO_WORKSPACE}
do-build:
cd ${WRKSRC} && GOOS=openbsd ${MAKE_ENV} ${MAKE_PROGRAM} victoria-metrics-pure
cd ${WRKSRC} && GOOS=openbsd ${MAKE_ENV} ${MAKE_PROGRAM} vmbackup
do-install:
${INSTALL_PROGRAM} ./pkg/vmlogger.pl ${PREFIX}/bin/vmetricslogger.pl
${INSTALL_PROGRAM} ${WRKSRC}/bin/victoria-metrics-pure ${PREFIX}/bin/vmetrics
${INSTALL_PROGRAM} ${WRKSRC}/bin/vmbackup ${PREFIX}/bin/vmetricsbackup
${INSTALL_DATA_DIR} ${PREFIX}/share/doc/vmetrics/
${INSTALL_DATA} ${WRKSRC}/README.md ${PREFIX}/share/doc/vmetrics/
${INSTALL_DATA} ${WRKSRC}/LICENSE ${PREFIX}/share/doc/vmetrics/
${INSTALL_DATA} ${WRKSRC}/docs/* ${PREFIX}/share/doc/vmetrics/
.include <bsd.port.mk>

View file

@ -0,0 +1,2 @@
SHA256 (VictoriaMetrics-1.44.0.tar.gz) = OIXIyqiijWvAPDgq5wMoDpv1rENcIOWIcXmz4T5v1lU=
SIZE (VictoriaMetrics-1.44.0.tar.gz) = 8898365

View file

@ -0,0 +1,3 @@
VictoriaMetrics is fast,
cost-effective and scalable time-series database.

View file

@ -0,0 +1,34 @@
@comment $OpenBSD$
@newgroup _vmetrics:855
@newuser _vmetrics:855:_vmetrics:daemon:VictoriaMetrics:${VARBASE}/db/vmetrics:/sbin/nologin
@sample ${SYSCONFDIR}/prometheus/
@rcscript ${RCDIR}/vmetrics
@bin bin/vmetricslogger.pl
@bin bin/vmetrics
@bin bin/vmetricsbackup
share/doc/vmetrics/
share/doc/vmetrics/Articles.md
share/doc/vmetrics/CaseStudies.md
share/doc/vmetrics/Cluster-VictoriaMetrics.md
share/doc/vmetrics/ExtendedPromQL.md
share/doc/vmetrics/FAQ.md
share/doc/vmetrics/Home.md
share/doc/vmetrics/LICENSE
share/doc/vmetrics/MetricsQL.md
share/doc/vmetrics/Quick-Start.md
share/doc/vmetrics/README.md
share/doc/vmetrics/Release-Guide.md
share/doc/vmetrics/SampleSizeCalculations.md
share/doc/vmetrics/Single-server-VictoriaMetrics.md
share/doc/vmetrics/logo.png
share/doc/vmetrics/robots.txt
share/doc/vmetrics/vmagent.md
share/doc/vmetrics/vmagent.png
share/doc/vmetrics/vmalert.md
share/doc/vmetrics/vmauth.md
share/doc/vmetrics/vmbackup.md
share/doc/vmetrics/vmrestore.md
@mode 0755
@owner _vmetrics
@group _vmetrics
@sample ${VARBASE}/db/vmetrics

View file

@ -0,0 +1,19 @@
#!/bin/sh
#
# $OpenBSD$
daemon="${TRUEPREFIX}/bin/vmetrics"
daemon_flags="-storageDataPath=/var/db/vmetrics/ ${daemon_flags}"
daemon_user=_vmetrics
. /etc/rc.d/rc.subr
pexp="${daemon}.*"
rc_bg=YES
rc_reload=NO
rc_start() {
${rcexec} "${daemon} -loggerDisableTimestamps ${daemon_flags} < /dev/null 2>&1 | ${TRUEPREFIX}/bin/vmetricslogger.pl"
}
rc_cmd $1

View file

@ -0,0 +1,18 @@
#!/usr/bin/perl
use Sys::Syslog qw(:standard :macros);
openlog("victoria-metrics", "pid", "daemon");
while (my $l = <>) {
my @d = split /\t/, $l;
# go level : "INFO", "WARN", "ERROR", "FATAL", "PANIC":
my $lvl = $d[0];
$lvl = LOG_EMERG if ($lvl eq 'panic');
$lvl = 'crit' if ($lvl eq 'fatal');
$lvl = 'err' if ($lvl eq 'error');
$lvl = 'warning' if ($lvl eq 'warn');
chomp $d[2];
syslog( $lvl, $d[2] );
}
closelog();