Commit graph

216 commits

Author SHA1 Message Date
Aliaksandr Valialkin
9a1f6848ca app/vmselect/promql: fix results caching for multi-arg rollup functions such as quantile_over_time
Previosly only a single arg was taken into account, so caching didn't work properly for multi-arg rollup funcs.
2020-01-03 20:44:54 +02:00
Aliaksandr Valialkin
3d0c7b095a app/vmselect/promql: use scrapeInterval instead of window in denominator when calculating rate for the first point on the time series
This should provide better estimation for `rate` in the beginning of time series.
2020-01-03 19:03:32 +02:00
Aliaksandr Valialkin
6ea7f23446 app/vmselect/promql: increase the estimated number of time series returned by aggr() by (something) from 100 to 1K, since 100 may result in OOM for high number of time series 2020-01-03 01:02:30 +02:00
Aliaksandr Valialkin
e0abf45d45 app/vmselect/promql: add share_le_over_time and share_gt_over_time functions for SLI and SLO calculations 2020-01-03 00:41:36 +02:00
Aliaksandr Valialkin
eb1a66c577 lib/metricsq: add ExpandWithExprs 2019-12-25 22:20:21 +02:00
Aliaksandr Valialkin
453d71d082 Rename lib/promql to lib/metricsql and apply small fixes 2019-12-25 22:09:09 +02:00
Mike Poindexter
009d1559db Split Extended PromQL parsing to a separate library 2019-12-25 22:09:07 +02:00
Aliaksandr Valialkin
ff18101d30 app/vmselect/promql: make sure AdjustStartEnd returns time range covering the same number of points as the initial time range
This should prevent from the following panic at app/vmselect/promql/binary_op.go:255:

    BUG: len(leftVaues) must match len(rightValues) and len(dstValues)
2019-12-24 22:45:49 +02:00
Aliaksandr Valialkin
e24ee43109 app/vmselect/promql: adjust calculations for rate and increase for the first value
These calculations should trigger alerts on `/api/v1/query` for counters starting from values greater than 0.
2019-12-24 19:41:03 +02:00
Aliaksandr Valialkin
9a2554691c app/vmselect/promql: properly calculate rate on the first data point
It is calculated as `value / scrape_interval`, since the value was missing on the previous scrape,
i.e. we can assume its value was 0 at this time.
2019-12-24 15:55:15 +02:00
Aliaksandr Valialkin
97de50dd4c app/vmselect/netstorage: improve error message when reading data size in readBytes 2019-12-24 14:40:14 +02:00
Aliaksandr Valialkin
6358cf3d47 app/vmselect/netstorage: move MustAdviseSequentialRead to lib/fs 2019-12-23 23:16:26 +02:00
Aliaksandr Valialkin
cc8a1bae0e app/vmselect: add -search.maxExportDuration command-line flag for limiting /api/v1/export duration
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/275
2019-12-20 11:37:18 +02:00
Aliaksandr Valialkin
6a185b7809 app/vmselect: add ability to pass match[], start and end to /api/v1/labels
This makes the `/api/v1/labels` handler consistent with already existing functionality for `/api/v1/label/.../values`.

See https://github.com/prometheus/prometheus/issues/6178 for more details.
2019-12-15 00:20:43 +02:00
Aliaksandr Valialkin
b238997a84 all: rename Extended PromQL to PromQL extensions 2019-12-12 19:29:59 +02:00
Aliaksandr Valialkin
cffaeda0f1 all: publish Docker images for the following GOARCH: amd64, arm, arm64, ppc64le and 386
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/258
2019-12-11 23:33:11 +02:00
Aliaksandr Valialkin
c25b97829f app/vmselect/promql: return lower and upper bounds for the estimated percentile from histogram_quantile if third arg is passed
Updates https://github.com/prometheus/prometheus/issues/5706
2019-12-11 14:00:18 +02:00
Aliaksandr Valialkin
f79b61e2a1 app/vmselect/promql: return matrix instead of vector on subqueries to /api/v1/query like Prometheus does 2019-12-11 00:57:54 +02:00
Aliaksandr Valialkin
5d2ff573aa app/vmselect/promql: allow negative offsets
Updates https://github.com/prometheus/prometheus/issues/6282
2019-12-11 00:57:51 +02:00
Aliaksandr Valialkin
d39bba3547 app/vmselect/promql: add {topk|bottomk}_{min|max|avg|median} aggregate functions for returning the exact k time series on the given time range
The full list of functions added:
- `topk_min(k, q)` - returns top K time series with the max minimums on the given time range
- `topk_max(k, q)` - returns top K time series with the max maximums on the given time range
- `topk_avg(k, q)` - returns top K time series with the max averages on the given time range
- `topk_median(k, q)` - returns top K time series with the max medians on the given time range
- `bottomk_min(k, q)` - returns bottom K time series with the min minimums on the given time range
- `bottomk_max(k, q)` - returns bottom K time series with the min maximums on the given time range
- `bottomk_avg(k, q)` - returns bottom K time series with the min averages on the given time range
- `bottomk_median(k, q)` - returns bottom K time series with the min medians on the given time range
2019-12-05 19:27:45 +02:00
Aliaksandr Valialkin
e0f43e1f66 app/vmselect: add placeholders for /api/v1/rules and /api/v1/alerts 2019-12-03 19:38:09 +02:00
Aliaksandr Valialkin
819bb36852 app/vmselect/promql: estimate per-series scrape interval as 0.6 quantile for the first 100 intervals
This should improve scrape interval estimation for tiem series with gaps.
2019-12-02 13:43:04 +02:00
Aliaksandr Valialkin
1e2019b1b6 app/vmselect/promql: fix corner case for increase over time series with gaps
In this case `increase` could return invalid high value for the first point after the gap.
2019-11-30 01:34:18 +02:00
Aliaksandr Valialkin
4c63caa37c deployment/docker/certs: update TLS certs source from alpine:3.9 to alpine:3.10 2019-11-29 19:55:36 +02:00
Aliaksandr Valialkin
def9ccd360 app/vmselect/prometheus: consistently apply nocache arg to /api/v1/query the same way ast to /api/v1/query_range
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/241
2019-11-26 22:55:50 +02:00
Aliaksandr Valialkin
e0ac068112 app/vmselect/prometheus: fix content-type for /api/v1/export responses
The correct Content-Type should be `application/stream+json` instead of `application/json`
Thanks to Joshua Ryder for pointing to this.
2019-11-26 17:44:27 +02:00
Aliaksandr Valialkin
28cc4c09b5 app/vmselect/promql: remove zero timeseries from prometheus_buckets output 2019-11-25 19:10:13 +02:00
Aliaksandr Valialkin
8811bec14e app/vmselect/prometheus: reduce default value for -search.latencyOffset from 60s to 30s
30 seconds should be enough for almost all the cases
2019-11-25 16:33:36 +02:00
Aliaksandr Valialkin
f7da9b2db2 app/vmselect/promql: allow nested parens 2019-11-25 16:13:33 +02:00
Aliaksandr Valialkin
d2619d6dce vendor: update github.com/VictoriaMetrics/metrics from v1.9.0 to v1.9.1 2019-11-25 15:22:50 +02:00
Aliaksandr Valialkin
f46fb6c740 app/vmselect/promql: re-use metrics.Histogram when calculating histogram function for each point on the graph
This should reduce the amounts memory allocations
2019-11-25 14:24:30 +02:00
Aliaksandr Valialkin
0f184affa7 app/vmselect/promql: optimize binary search over big number of samples during rollup calculations 2019-11-25 14:01:54 +02:00
Aliaksandr Valialkin
dbd07041ae app/vmselect/promql: adjust tests after the upgrade of github.com/VictoriaMetrics/metrics from v1.8.3 to v1.9.0 2019-11-25 13:44:08 +02:00
Aliaksandr Valialkin
8bb254d960 app/vmselect/promql: add histogram aggregate function, which is useful for building heatmaps from multiple time series 2019-11-24 00:04:15 +02:00
Aliaksandr Valialkin
414259f47b app/vmselect/promql: do not take into account buckets with negative counters in prometheus_buckets 2019-11-23 14:19:19 +02:00
Aliaksandr Valialkin
193d553f6d app/vmselect/promql: properly handle histogram_quantile(0, ...) with zero buckets 2019-11-23 14:02:25 +02:00
Aliaksandr Valialkin
f8298c7f13 app/vmselect: add vm_per_query_{rows,series}_processed_count histograms 2019-11-23 13:23:03 +02:00
Aliaksandr Valialkin
4d76977745 app/vmselect/promql: transparently apply prometheus_buckets in histogram_quantile 2019-11-23 11:49:16 +02:00
Aliaksandr Valialkin
5f6f03c692 app/vmselect/promql: add prometheus_buckets function for converting the upcoming histogram buckets from github.com/VictoriaMetrics/metrics to Prometheus-compatible buckets 2019-11-23 00:21:56 +02:00
Aliaksandr Valialkin
17d08c1fe0 app/vmselect: adjust end arg instead of adjusting start arg if start > end
`start` arg has higher chances to be set properly comparing to `end` arg,
so it is expected that the `end` arg could be adjusted if it was set incorrectly.
2019-11-22 16:12:53 +02:00
Aliaksandr Valialkin
5ae47e8940 app/vmselect/prometheus: properly adjust too big time time on /api/v1/query
Too big `time` must be adjusted to `now()-queryOffset`.
2019-11-19 00:42:07 +02:00
Aliaksandr Valialkin
77bb66a5be app/vmselect/promql: properly calculate integrate(q[d]) 2019-11-13 21:11:03 +02:00
Aliaksandr Valialkin
c33640664a app/vmselect/promql: use universal approach for determining maxByteSliceLen on 32-bit and 64-bit archs
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/235
2019-11-13 20:26:07 +02:00
Aliaksandr Valialkin
87b39222be Revert "lib/fs: do not postpone directory removal on NFS error"
This reverts commit 21aeb02b46649ac9906cb37733f7b155a77a0db9.
2019-11-12 16:29:50 +02:00
Oleg Kovalov
74ba42d111 fix misspelled words (#229) 2019-11-12 00:18:24 +02:00
Aliaksandr Valialkin
5f52eb7653 lib/fs: do not postpone directory removal on NFS error
Continue trying to remove NFS directory on temporary errors for up to a minute.

The previous async removal process breaks in the following case during VictoriaMetrics start

- VictoriaMetrics opens index, finds incomplete merge transactions and starts replaying them.
- The transaction instructs removing old directories for parts, which were already merged into bigger part.
- VictoriaMetrics removes these directories, but their removal is delayed due to NFS errors.
- VictoriaMetrics scans partition directory after all the incomplete merge transactions are finished
  and finds directories, which should be removed, but weren't still removed due to NFS errors.
- VictoriaMetrics panics when it finds unexpected empty directory.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/162
2019-11-10 13:27:16 +02:00
Aliaksandr Valialkin
33abbec6b4 app/vmselect/promql: adjust memory limits calculations for incremental aggregate functions
Incremental aggregate functions don't keep all the selected time series in memory -
they keep only up to GOMAXPROCS time series for incremental aggregations.

Take into account that the number of time series in RAM can be higher if they are split
into many groups with `by (...)` or `without (...)` modifiers.

This should reduce the number of `not enough memory for processing ... data points` false
positive errors.
2019-11-08 19:37:43 +02:00
Aliaksandr Valialkin
4a8251feff app/vmselect/promql: add lag(q[d]) function, which returns the lag between the current timestamp and the timstamp for the last data point in q 2019-11-01 12:21:43 +02:00
Aliaksandr Valialkin
4e6bf6f538 app/vmselect: add -search.latencyOffset flag for tuning the time after data collection when data points become visible in query results
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/218
2019-10-28 12:32:36 +02:00
Aliaksandr Valialkin
5b01b7fb01 all: add support for GOARCH=386 and fix all the issues related to 32-bit architectures such as GOARCH=arm
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/212
2019-10-17 18:27:49 +03:00
Aliaksandr Valialkin
99786c2864 app/vmselect/prometheus: add -search.maxLookback command-line flag for overriding dynamic calculations for max lookback interval
This flag is similar to `-search.lookback-delta` if set. The max lookback interval is determined dynamically
from interval between datapoints for each input time series if the flag isn't set.

The interval can be overriden on per-query basis by passing `max_lookback=<duration>` query arg to `/api/v1/query` and `/api/v1/query_range`.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/209
2019-10-15 21:37:17 +03:00
Aliaksandr Valialkin
a5302a6651 app/vmselect/promql: take into account the previous point when calculating max_over_time and min_over_time
This lines up with `first_over_time` function used in `rollup_candlestick`, so `rollup=low` always returns
the minimum value.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/204
2019-10-08 12:30:16 +03:00
Aliaksandr Valialkin
483af3a97a app/vmselect/netstorage: hint the OS that tmpBlocksFile is read almost sequentially
This became the case after b7ee2e7af2 .
2019-09-30 00:13:33 +03:00
Aliaksandr Valialkin
946ca438a6 app/vmselect/netstorage: marshal block outside tmpBlocksFile.WriteBlock
This also allows marshaling outside lock, thus reducing the amount of work under the lock.
2019-09-28 20:57:20 +03:00
Aliaksandr Valialkin
e92e39eddf app/vmselect/netstorage: reduce the number of disk seeks when the query processes big number of time series 2019-09-28 20:57:20 +03:00
Aliaksandr Valialkin
56dff57f77 app/vmselect/netstorage: reduce memory usage when fetching big number of data blocks from vmstorage
Dump data blocks directly to temporary file instead of buffering them in RAM
2019-09-28 12:21:57 +03:00
Aliaksandr Valialkin
ba460f62e6 app/vmselect/promql: do not generate timestamps for NaN values in timestamp function according to Prometheus logic 2019-09-27 18:55:16 +03:00
Aliaksandr Valialkin
bd1cf053f6 app/vmselect/promql: add increases_over_time and decreases_over_time functions
`increases_over_time(q[d])` returns the number of `q` increases during the given duration `d`.
`decreases_over_time(q[d])` returns the number of `q` decreases during the given duration `d`.
2019-09-25 20:38:51 +03:00
Aliaksandr Valialkin
ee4585db33 app/vmselect/promql: properly handle subqueries like aggr_func(rollup_func(metric[window:step]))
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/184
2019-09-13 21:42:11 +03:00
Aliaksandr Valialkin
828e5f6d26 app/vmselect/promql: binary operation fixes according to Prometheus behaviour
The follosing issues were fixed:
- VictoriaMetrics could leave superflouos labels when using `on` or `ignoring` modifiers
- VictoriaMetrics could return `duplicate timeseries` error when using `group_left` or `group_right` with non-empty label list
2019-09-13 17:43:09 +03:00
Aliaksandr Valialkin
b101064f8b all: report the number of bytes read on io.ReadFull error
This should simplify error investigation similar to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/175
2019-09-11 14:50:24 +03:00
Aliaksandr Valialkin
2c654258ef lib/fs: add MustStopDirRemover for waiting until pending directories are removed on graceful shutdown
This patch is mainly required for laggy NFS. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/162
2019-09-05 11:17:17 +03:00
Aliaksandr Valialkin
d0953e9f02 app/vmselect/promql: ignore grouping by destination label in count_values, since such a grouping is performed automatically 2019-09-04 19:59:02 +03:00
Aliaksandr Valialkin
f78ffe565f app/vmselect/promql: do not return artificial points beyond the last point in time series 2019-09-04 16:34:29 +03:00
Aliaksandr Valialkin
a7d5d611fe app/vmselect/prometheus: do not adjust start and end args in /api/v1/query_range if nocache=1 arg is set
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/171
2019-09-04 13:10:17 +03:00
Aliaksandr Valialkin
b08f085082 app/vmselect/promql: reset timeseries name on group_left and group_right as Prometheus does 2019-09-03 20:43:29 +03:00
Aliaksandr Valialkin
458d412bb6 app/vmselect/netstorage: adaptively adjust the maximum inmemory file size for storing temporary blocks
The maximum inmemory file size now depends on `-memory.allowedPercent`.
This should improve performance and reduce the number of filesystem calls
on machines with big amounts of RAM when performing heavy queries
over big number of samples and time series.
2019-09-03 13:32:18 +03:00
Aliaksandr Valialkin
604a4312f9 all: port to FreeBSD on GOARCH=amd64 2019-08-28 01:46:09 +03:00
Aliaksandr Valialkin
c197641978 all: return 503 http error if service is temporarily unavailable
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/156
2019-08-23 09:49:50 +03:00
Aliaksandr Valialkin
e9db22a551 app/vmselect/promql: attempt to repair invalid bucket counts passed to histogram_quantile
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/136
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/154
2019-08-22 14:39:24 +03:00
Aliaksandr Valialkin
90a4b00b10 app/vmselect/promql: fix panic on -search.disableCache
Reset the cache if it is disabled instead of stopping, since it is stopped on graceful shutdown.
2019-08-21 17:12:01 +03:00
Aliaksandr Valialkin
491b1762c8 app/vmselect/promql: explain why empty timeseries arent removed in transformLabelValue 2019-08-21 11:29:41 +03:00
Aliaksandr Valialkin
db1de4277c app/vmselect/promql: remove NaNs from /api/v1/query_range output like Prometheus does
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/153
2019-08-20 23:01:59 +03:00
Aliaksandr Valialkin
99331606e1 app/vmselect/promql: pre-allocate memory for map for checking for duplicate timeseries
This should reduce memory allocations for big number of timeseries
2019-08-20 23:01:57 +03:00
Aliaksandr Valialkin
1101765adb app/vmselect/promql: add label_value(q, label_name) func, which returns numeric value labels with name label_name in q 2019-08-20 00:28:44 +03:00
Aliaksandr Valialkin
940349ccb9 app/vmselect/promql: independently track offset hints for tStart and tEnd
This should improve performance if timeseries starts or ends on the selected time range
2019-08-19 13:40:24 +03:00
Aliaksandr Valialkin
6ae4b4190f app/vmselect/promql: optimize search for timestamp boundaries in rollupConfig.Do
This should improve the performance of queries over big number of time series
with big number of output points.
2019-08-19 13:03:38 +03:00
Aliaksandr Valialkin
005aabd305 app/vmselect/promql: add scrape_interval(q[d]) function, which would return scrape interval for q over d 2019-08-18 21:08:15 +03:00
Aliaksandr Valialkin
218cb4623a app/vmselect/promql: hande comparisons with NaN similar to Prometheus
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/150
2019-08-18 00:25:58 +03:00
Aliaksandr Valialkin
dcce92c63c app/vmselect/promql: add lifetime(q[d]) function, which returns the lifetime of q over d in seconds.
This function is useful for determining time series lifetime.
`d` must exceed the expected lifetime of the time series, otherwise
the function would return values close to `d`.
2019-08-16 11:59:51 +03:00
Aliaksandr Valialkin
0cb66a8f95 app/vmselect/promql: fix corner-case calculation for ideriv 2019-08-16 11:59:50 +03:00
Aliaksandr Valialkin
1b5b9ced27 app/vmselect/promql: properly handle corner cases for rollup functions 2019-08-15 23:31:28 +03:00
Aliaksandr Valialkin
b8bbe92de1 app/vmselect/promql: store compressed results in the cache
This should increase rollup results cache capacity.
2019-08-14 02:32:16 +03:00
Aliaksandr Valialkin
8c2158af24 all: use workingsetcache instead of fastcache
This should reduce the amount of RAM required for processing time series
with non-zero churn rate.

The previous cache behavior can be restored with `-cache.oldBehavior` command-line flag.
2019-08-13 21:40:28 +03:00
Aliaksandr Valialkin
8e05758ff5 app: add vm_concurrent_ metrics for visibility in concurrency limiters for vminsert and vmselect 2019-08-05 18:30:29 +03:00
Aliaksandr Valialkin
53c8f56436 app/vmselect: allow passing match[], start and time to /api/v1/label/<label_name>/values
`/api/v1/label/<label_name>/values?match[]=q` emulates emulates `label_values(q, <label_name>)`
call in Grafana templating.
2019-08-04 23:07:00 +03:00
Aliaksandr Valialkin
880b1d80b1 app/vmselect: optimize /api/v1/series by skipping storage data
Fetch and process only time series metainfo.
2019-08-04 23:00:46 +03:00
Aliaksandr Valialkin
7f5afae1e3 app/vmselect/prometheus: prevent from fetching and scanning all the data on /api/v1/searies call by default 2019-08-04 19:42:45 +03:00
Aliaksandr Valialkin
000c154641 app/vmselect/promql: tune automatic window adjustement
Increase the windows adjustement for small scrape intervals,
since they usually have higher jitter.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/139
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/134
2019-08-04 19:34:11 +03:00
Aliaksandr Valialkin
1d4ddadbb1 app/vmselect/promql: further increase the allowed jitter for scrape interval
Real-world production data shows higher jitter than 1/8 of scrape interval.
This may results in gaps on the graph. So increase the allowed jitter to 1/4
of scrape interval in order to reduce the probability of gaps on the graphs
over time series with high jitter for scrape_interval.
2019-08-02 20:16:41 +03:00
Aliaksandr Valialkin
ade7bc30db app/vmselect/promql: tolerate higher jitter in scrape interval
Allow jitter for up to 1/8 instead of 1/16 for the scrape interval.
This should imrpove graphs when `step` is smaller than the `scrape_interval`.
2019-08-01 23:25:53 +03:00
Aliaksandr Valialkin
c994fbf500 app/vmselect/promql: add vm_slow_queries_total metric for counting slow queries
The query is slow if its execution time exceeds `-search.logSlowQueryDuration`
2019-07-31 03:36:45 +03:00
Aliaksandr Valialkin
071a122119 app/vmselect/promql: return NaN from histogram_quantile if at least a single bucket is broken 2019-07-31 01:18:11 +03:00
Aliaksandr Valialkin
b9a16b93e7 app/vmselect/promql: allow adjusting window for default rollup function
Default rollup function is `last_over_time`. It must support adjusting
the provided window in order to prevent from gaps on the graph
for window values smaller than scrape interval.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/134
2019-07-31 00:45:58 +03:00
Aliaksandr Valialkin
c901a6472f app/vmselect/promql: return NaN values if invalid bucket counts are passed to histogram_quantile
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/136
2019-07-30 22:05:55 +03:00
Aliaksandr Valialkin
5b8526e925 app/vmselect/netstorage: improve error message when reading data blocks from storage
Mention the block number in the error. This should simplify troubleshooting in this code.
2019-07-28 12:17:33 +03:00
Aliaksandr Valialkin
aac482517f app/vmselect/promql: return NaN from count() over zero time series
This aligns `count` behavior with Prometheus.
2019-07-25 22:02:34 +03:00
Aliaksandr Valialkin
0e52357f35 app/vmselect/promql: properly calculate incremental aggregations grouped by __name__
Previously the following query may fail on multiple distinct metric names match:

    sum(count_over_time{__name__!=''}) by (__name__)
2019-07-25 21:53:26 +03:00
Aliaksandr Valialkin
54f035d4ce all: small updates after PR #114 2019-07-24 17:43:43 +03:00
Aliaksandr Valialkin
cb8104cf77 app: clarify error messages when -storageNode arg is missing in vminsert and vmselect 2019-07-20 10:21:59 +03:00