This commit is contained in:
Aliaksandr Valialkin 2024-05-05 00:28:01 +02:00
parent f8dcf7be1d
commit bc7dfd5ba4
No known key found for this signature in database
GPG key ID: 52C003EE2BCDB9EB
20 changed files with 1225 additions and 450 deletions

View file

@ -20,12 +20,15 @@ according to [these docs](https://docs.victoriametrics.com/VictoriaLogs/QuickSta
## tip
* FEATURE: return all the log fields by default in query results. Previously only [`_stream`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#stream-fields), [`_time`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#time-field) and [`_msg`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) fields were returned by default.
* FEATURE: add support for returning only the requested log [fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). See [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#querying-specific-fields).
* FEATURE: add support for calculating `count()`, `uniq()`, `sum()`, `avg()`, `min()`, `max()` and `uniq_array()` over [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). Grouping by arbitrary set of [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) is supported. See [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#stats) for details.
* FEATURE: add support for returning only the requested log [fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). See [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#fields-pipe).
* FEATURE: add support for calculating `count()`, `uniq()`, `sum()`, `avg()`, `min()`, `max()` and `uniq_array()` over [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). Grouping by arbitrary set of [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) is supported. See [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#stats-pipe) for details.
* FEATURE: add support for sorting the returned results. See [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#sort-pipe).
* FEATURE: add support for limiting the number of returned results. See [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#limiters).
* FEATURE: optimize performance for [LogsQL query](https://docs.victoriametrics.com/victorialogs/logsql/), which contains multiple filters for [words](https://docs.victoriametrics.com/victorialogs/logsql/#word-filter) or [phrases](https://docs.victoriametrics.com/victorialogs/logsql/#phrase-filter) delimited with [`AND` operator](https://docs.victoriametrics.com/victorialogs/logsql/#logical-filter). For example, `foo AND bar` query must find [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with `foo` and `bar` words at faster speed.
* FEATURE: add support for copying and renaming the selected log fields. See [these](https://docs.victoriametrics.com/victorialogs/logsql/#copy-pipe) and [these](https://docs.victoriametrics.com/victorialogs/logsql/#rename-pipe) docs.
* FEATURE: allow using `_` inside numbers. For example, `score:range[1_000, 5_000_000]` for [`range` filter](https://docs.victoriametrics.com/victorialogs/logsql/#range-filter).
* FEATURE: allow numbers in hexadecimal and binary form. For example, `response_size:range[0xff, 0b10001101101]` for [`range` filter](https://docs.victoriametrics.com/victorialogs/logsql/#range-filter).
* FEATURE: allow using duration and byte size suffixes in numeric values inside LogsQL queries. See [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#numeric-values).
* FEATURE: optimize performance for [LogsQL query](https://docs.victoriametrics.com/victorialogs/logsql/), which contains multiple filters for [words](https://docs.victoriametrics.com/victorialogs/logsql/#word-filter) or [phrases](https://docs.victoriametrics.com/victorialogs/logsql/#phrase-filter) delimited with [`AND` operator](https://docs.victoriametrics.com/victorialogs/logsql/#logical-filter). For example, `foo AND bar` query must find [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with `foo` and `bar` words at faster speed.
* BUGFIX: prevent from additional CPU usage for up to a few seconds after canceling the query.
* BUGFIX: prevent from returning log entries with emtpy `_stream` field in the form `"_stream":""` in [search query results](https://docs.victoriametrics.com/victorialogs/querying/). See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6042).

View file

@ -19,7 +19,7 @@ It provides the following features:
See [word filter](#word-filter), [phrase filter](#phrase-filter) and [prefix filter](#prefix-filter).
- Ability to combine filters into arbitrary complex [logical filters](#logical-filter).
- Ability to extract structured fields from unstructured logs at query time. See [these docs](#transformations).
- Ability to calculate various stats over the selected log entries. See [these docs](#stats).
- Ability to calculate various stats over the selected log entries. See [these docs](#stats-pipe).
## LogsQL tutorial
@ -177,17 +177,22 @@ These words are taken into account by full-text search filters such as
#### Query syntax
LogsQL query consists of the following parts delimited by `|`:
LogsQL query must contain [filters](#filters) for selecting the matching logs. At least a single filter is required.
For example, the following query selects all the logs for the last 5 minutes by using [`_time` filter](#time-filter):
- [Filters](#filters), which select log entries for further processing. This part is required in LogsQL. Other parts are optional.
- Optional [stream context](#stream-context), which allows selecting surrounding log lines for the matching log lines.
- Optional [transformations](#transformations) for the selected log fields.
For example, an additional fields can be extracted or constructed from existing fields.
- Optional [post-filters](#post-filters) for post-filtering of the selected results. For example, post-filtering can filter
results based on the fields constructed by [transformations](#transformations).
- Optional [stats](#stats) transformations, which can calculate various stats across selected results.
- Optional [sorting](#sorting), which can sort the results by the sepcified fields.
- Optional [limiters](#limiters), which can apply various limits on the selected results.
```logsql
_time:5m
```
Additionally to filters, LogQL query may contain arbitrary mix of optional actions for processing the selected logs. These actions are delimited by `|` and are known as `pipes`.
For example, the following query uses [`stats` pipe](#stats-pipe) for returning the number of [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field)
with the `error` [word](#word) for the last 5 minutes:
```logsql
_time:5m error | stats count() errors
```
See [the list of supported pipes in LogsQL](#pipes).
## Filters
@ -1025,6 +1030,435 @@ Performance tips:
- See [other performance tips](#performance-tips).
## Pipes
Additionally to [filters](#filters), LogsQL query may contain arbitrary mix of '|'-delimited actions known as `pipes`.
For example, the following query uses [`stats`](#stats-pipe), [`sort`](#sort-pipe) and [`head`](#head-pipe) pipes
for returning top 10 [log streams](https://docs.victoriametrics.com/victorialogs/keyconcepts/#stream-fields)
with the biggest number of logs during the last 5 minutes:
```logsql
_time:5m | stats by (_stream) count() per_stream_logs | sort by (per_stream_logs desc) | head 10
```
LogsQL supports the following pipes:
- [`copy`](#copy-pipe) copies [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model).
- [`delete`](#delete-pipe) deletes [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model).
- [`fields`](#fields-pipe) selects the given set of [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model).
- [`head`](#head-pipe) limits the number selected logs.
- [`rename`](#rename-pipe) renames [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model).
- [`skip`](#skip-pipe) skips the given number of selected logs.
- [`sort`](#sort-pipe) sorts logs by the given [fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model).
- [`stats`](#stats-pipe) calculates various stats over the selected logs.
### copy pipe
If some [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model) must be copied, then `| copy src1 as dst1, ..., srcN as dstN` [pipe](#pipes) can be used.
For example, the following query copies `host` field to `server` for logs over the last 5 minutes, so the output contains both `host` and `server` fields:
```logsq
_time:5m | copy host as server
```
Multiple fields can be copied with a single `| copy ...` pipe. For example, the following query copies
[`_time` field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#time-field) to `timestamp`, while [`_msg` field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field)
is copied to `message`:
```logsql
_time:5m | copy _time as timestmap, _msg as message
```
The `as` keyword is optional.
See also:
- [`rename` pipe](#rename-pipe)
- [`fields` pipe](#fields-pipe)
- [`delete` pipe](#delete-pipe)
### delete pipe
If some [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model) must be deleted, then `| delete field1, ..., fieldN` [pipe](#pipes) can be used.
For example, the following query deletes `host` and `app` fields from the logs over the last 5 minutes:
```logsql
_time:5m | delete host, app
```
See also:
- [`rename` pipe](#rename-pipe)
- [`fields` pipe](#fields-pipe)
### fields pipe
By default all the [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model) are returned in the response.
It is possible to select the given set of log fields with `| fields field1, ..., fieldN` [pipe](#pipes). For example, the following query selects only `host`
and [`_msg`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) fields from logs for the last 5 minutes:
```logsq
_time:5m | fields host, _msg
```
See also:
- [`copy` pipe](#copy-pipe)
- [`rename` pipe](#rename-pipe)
- [`delete` pipe](#delete-pipe)
### head pipe
If only a subset of selected logs must be processed, then `| head N` [pipe](#pipes) can be used. For example, the following query returns up to 100 logs over the last 5 minutes:
```logsql
_time:5m | head 100
```
By default rows are selected in arbitrary order because of performance reasons, so the query above can return different sets of logs every time it is executed.
[`sort` pipe](#sort-pipe) can be used for making sure the logs are in the same order before applying `head ...` to them.
See also:
- [`skip` pipe](#skip-pipe)
### rename pipe
If some [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model) must be renamed, then `| rename src1 as dst1, ..., srcN as dstN` [pipe](#pipes) can be used.
For example, the following query renames `host` field to `server` for logs over the last 5 minutes, so the output contains `server` field instead of `host` field:
```logsql
_time:5m | rename host as server
```
Multiple fields can be renamed with a single `| rename ...` pipe. For example, the following query renames `host` to `instance` and `app` to `job`:
```logsql
_time:5m | rename host as instance, app as job
```
See also:
- [`copy` pipe](#copy-pipe)
- [`fields` pipe](#fields-pipe)
- [`delete` pipe](#delete-pipe)
### skip pipe
If some number of selected logs must be skipped after [`sort`](#sort-pipe), then `| skip N` [pipe](#pipes) can be used. For example, the following query skips the first 100 logs
over the last 5 minutes after soring them by [`_time`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#time-field):
```logsql
_time:5m | sort by (_time) | skip 100
```
Note that skipping rows without sorting has little sense, since they can be returned in arbitrary order because of performance reasons.
Rows can be sorted with [`sort` pipe](#sort-pipe).
See also:
- [`head` pipe](#head-pipe)
### sort pipe
By default logs are selected in arbitrary order because of performance reasons. If logs must be sorted, then `| sort by (field1, ..., fieldN)` [pipe](#pipes) must be used.
For example, the following query returns logs for the last 5 minutes sorted by [`_stream`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#stream-fields)
and then by [`_time`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#time-field):
```logsql
_time:5m | sort by (_stream, _time)
```
Sorting in reverse order is supported - just add `desc` after the given log field. For example, the folliwng query sorts log fields in reverse order of `request_duration_seconds` field:
```logsql
_time:5m | sort by (request_duration_seconds desc)
```
Note that sorting of big number of logs can be slow and can consume a lot of additional memory.
It is recommended limiting the number of logs before sorting with the following approaches:
- Reducing the selected time range with [time filter](#time-filter).
- Using more specific [filters](#filters), so they select less logs.
See also:
- [`stats` pipe](#stats-pipe)
- [`head` pipe](#head-pipe)
- [`skip` pipe](#skip-pipe)
### stats pipe
`| stats ...` pipe allows calculating various stats over the selected logs. For example, the following LogsQL query
uses [`count` stats function](#count-stats) for calculating the number of logs for the last 5 minutes:
```logsql
_time:5m | stats count() logs_total
```
`| stats ...` pipe has the following basic format:
```logsql
... | stats
stats_func1(...) as result_name1,
...
stats_funcN(...) as result_nameN
```
Where `stats_func*` is any of the supported [stats function](#stats-pipe-functions), while `result_name*` is the name of the log field
to store the result of the corresponding stats function. The `as` keyword is optional.
For example, the following query calculates the following stats for logs over the last 5 minutes:
- the number of logs with the help of [`count` stats function](#count-stats);
- the number of unique [log streams](https://docs.victoriametrics.com/victorialogs/keyconcepts/#stream-fields) with the help of [`uniq` stats function](#uniq-stats):
```logsql
_time:5m | stats count() logs_total, uniq(_stream) streams_total
```
See also:
- [`sort` pipe](#sort-pipe)
#### Stats by fields
The following LogsQL syntax can be used for calculating independent stats per group of log fields:
```logsql
... | stats by (field1, ..., fieldM)
stats_func1(...) as result_name1,
...
stats_funcN(...) as result_nameN
```
This calculates `stats_func*` per each `(field1, ..., fieldM)` group of [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
For example, the following query calculates the number of logs and unique ip addresses over the last 5 minutes,
grouped by `(host, path)` fields:
```logsql
_time:5m | stats by (host, path) count() logs_total, uniq(ip) ips_total
```
#### Stats by time buckets
The following syntax can be used for calculating stats grouped by time buckets:
```logsql
... | stats by (_time:step)
stats_func1(...) as result_name1,
...
stats_funcN(...) as result_nameN
```
This calculates `stats_func*` per each `step` of [`_time`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#time-field) field.
The `step` can have any [duration value](#duration-values). For example, the following LogsQL query returns per-minute number of logs and unique ip addresses
over the last 5 minutes:
```
_time:5m | stats by (_time:1m) count() logs_total, uniq(ip) ips_total
```
#### Stats by time buckets with timezone offset
VictoriaLogs stores [`_time`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#time-field) values as [Unix time](https://en.wikipedia.org/wiki/Unix_time)
in nanoseconds. This time corresponds to [UTC](https://en.wikipedia.org/wiki/Coordinated_Universal_Time) time zone. Sometimes it is needed calculating stats
grouped by days or weeks at non-UTC timezone. This is possible with the following syntax:
```logsql
... | stats by (_time:step offset timezone_offset) ...
```
For example, the following query calculates per-day number of logs over the last week, in `UTC+02:00` [time zone](https://en.wikipedia.org/wiki/Time_zone):
```logsql
_time:1w | stats by (_time:1d offset 2h) count() logs_total
```
#### Stats by field buckets
Every log field inside `| stats by (...)` can be bucketed in the same way at `_time` field in [this example](#stats-by-time-buckets).
Any [numeric value](#numeric-values) can be used as `step` value for the bucket. For example, the following query calculates
the number of requests for the last hour, bucketed by 10KB of `request_size_bytes` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model):
```logsql
_time:1h | stats by (request_size_bytes:10KB) count() requests
```
#### Stats by IPv4 buckets
Stats can be bucketed by [log field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) containing [IPv4 addresses](https://en.wikipedia.org/wiki/IP_address)
via the `ip_field_name:/network_mask` syntax inside `by(...)` clause. For example, the following query returns the number of log entries per `/24` subnetwork
extracted from the `ip` [log field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) during the last 5 minutes:
```logsql
_time:5m | stats by (ip:/24) count() requests_per_subnet
```
## stats pipe functions
LogsQL supports the following functions for [`stats` pipe](#stats-pipe):
- [`avg`](#avg-stats) calculates the average value over the given numeric [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
- [`count`](#count-stats) calculates the number of log entries.
- [`max`](#max-stats) calcualtes the maximum value over the given numeric [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
- [`min`](#min-stats) calculates the minumum value over the given numeric [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
- [`sum`](#sum-stats) calculates the sum for the given numeric [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
- [`uniq`](#uniq-stats) calculates the number of unique non-empty values for the given [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
- [`uniq_array`](#uniq_array-stats) returns unique non-empty values for the given [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
### avg stats
`avg(field1, ..., fieldN)` [stats pipe](#stats-pipe) calculates the average value across
all the mentioned [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
Non-numeric values are ignored.
For example, the following query returns the average value for the `duration` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
over logs for the last 5 minutes:
```logsql
_time:5m | stats avg(duration) avg_duration
```
See also:
- [`min`](#min-stats)
- [`max`](#max-stats)
- [`sum`](#sum-stats)
- [`count`](#count-stats)
### count stats
`count()` calculates the number of selected logs.
For example, the following query returns the number of logs over the last 5 minutes:
```logsql
_time:5m | stats count() logs
```
It is possible calculating the number of logs with non-empty values for some [log field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
with the `count(fieldName)` syntax. For example, the following query returns the number of logs with non-empty `username` field over the last 5 minutes:
```logsq
_time:5m | stats count(username) logs_with_username
```
If multiple fields are enumerated inside `count()`, then it counts the number of logs with at least a single non-empty field mentioned inside `count()`.
For example, the following query returns the number of logs with non-empty `username` or `password` [fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
over the last 5 minutes:
```logsql
_time:5m | stats count(username, password) logs_with_username_or_password
```
See also:
- [`sum`](#sum-stats)
- [`avg`](#avg-stats)
### max stats
`max(field1, ..., fieldN)` [stats pipe](#stats-pipe) calculates the maximum value across
all the mentioned [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
Non-numeric values are ignored.
For example, the following query returns the maximum value for the `duration` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
over logs for the last 5 minutes:
```logsql
_time:5m | stats max(duration) max_duration
```
See also:
- [`min`](#min-stats)
- [`avg`](#avg-stats)
- [`sum`](#sum-stats)
- [`count`](#count-stats)
### min stats
`min(field1, ..., fieldN)` [stats pipe](#stats-pipe) calculates the minimum value across
all the mentioned [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
Non-numeric values are ignored.
For example, the following query returns the minimum value for the `duration` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
over logs for the last 5 minutes:
```logsql
_time:5m | stats min(duration) min_duration
```
See also:
- [`max`](#max-stats)
- [`avg`](#avg-stats)
- [`sum`](#sum-stats)
- [`count`](#count-stats)
### sum stats
`sum(field1, ..., fieldN)` [stats pipe](#stats-pipe) calculates the sum of numeric values across
all the mentioned [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
For example, the following query returns the sum of numeric values for the `duration` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
over logs for the last 5 minutes:
```logsql
_time:5m | stats sum(duration) sum_duration
```
See also:
- [`count`](#count-stats)
- [`avg`](#avg-stats)
- [`max`](#max-stats)
- [`min`](#min-stats)
### uniq stats
`uniq(field1, ..., fieldN)` [stats pipe](#stats-pipe) calculates the number of unique non-empty `(field1, ..., fieldN)` tuples.
For example, the following query returns the number of unique non-empty values for `ip` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
over the last 5 minutes:
```logsql
_time:5m | stats uniq(ip) ips
```
The following query returns the number of unique `(host, path)` pairs for the corresponding [fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
over the last 5 minutes:
```logsql
_time:5m | stats uniq(host, path) unique_host_path_pairs
```
See also:
- [`uniq_array`](#uniq_array-stats)
- [`count`](#count-stats)
### uniq_array stats
`uniq_array(field1, ..., fieldN)` [stats pipe](#stats-pipe) returns the unique non-empty values across
the mentioned [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
The returned values are sorted and encoded in JSON array.
For example, the following query returns unique non-empty values for the `ip` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
over logs for the last 5 minutes:
```logsql
_time:5m | stats uniq_array(ip) unique_ips
```
See also:
- [`uniq`](#uniq-stats)
- [`count`](#count-stats)
## Stream context
LogsQL will support the ability to select the given number of surrounding log lines for the selected log lines
@ -1046,11 +1480,9 @@ LogsQL will support the following transformations for the [selected](#filters) l
- Creating a new field from existing [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model)
according to the provided format.
- Creating a new field according to math calculations over existing [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model).
- Copying of the existing [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model).
- Parsing duration strings into floating-point seconds for further [stats calculations](#stats).
- Parsing duration strings into floating-point seconds for further [stats calculations](#stats-pipe).
- Creating a boolean field with the result of arbitrary [post-filters](#post-filters) applied to the current fields.
Boolean fields may be useful for [conditional stats calculation](#stats).
- Creating an integer field with the length of the given field value. This can be useful for [stats calculations](#stats).
- Creating an integer field with the length of the given field value. This can be useful for [stats calculations](#stats-pipe).
See the [Roadmap](https://docs.victoriametrics.com/VictoriaLogs/Roadmap.html) for details.
@ -1069,166 +1501,7 @@ See the [Roadmap](https://docs.victoriametrics.com/VictoriaLogs/Roadmap.html) fo
## Stats
### stats functions
LogsQL supports the following stats functions:
- [`count`](#count) - calculates the number of log entries.
- [`uniq`](#uniq) - calculates the number of unique non-empty values for the given [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
- [`sum`](#sum) - calculates the sum for the given numeric [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
- [`avg`](#avg) - calculates the average value over the given numeric [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
- [`min`](#min) - calculates the minumum value over the given numeric [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
- [`max`](#max) - calcualtes the maximum value over the given numeric [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
- [`uniq_array`](#uniq_array) - returns unique non-empty values for the given [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
#### count
Examples:
- `error | stats count() as errors_total` returns the number of [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with the `error` [word](#word).
- `error | stats by (_stream) count() as errors_by_stream` returns the number of [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field)
with the `error` [word](#word) grouped by [`_stream`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#stream-fields).
- `error | stats by (datacenter, namespace) count(trace_id, user_id) as errors_with_trace_and_user` returns the number
of [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) containing the `error` [word](#word),
which contain non-empty `trace_id` or `user_id` [fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model), grouped by `datacenter` and `namespace` fields.
See also [`sum`](#sum) and [`avg`](#avg).
#### uniq
Examples:
- `error | stats uniq(client_ip) as unique_ips` returns the number of unique values for `client_ip` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
across [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with the `error` [word](#word).
- `error | stats by (app) uniq(path, host) as unique_path_hosts` - returns the number of unique `(path, host)` pairs
for [field values](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) across [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field)
with the `error` [word](#word), grouped by `app` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
- `error | fields path, host | stats uniq(*) unique_path_hosts` - returns the number of unique `(path, host)` pairs
for [field values](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) across [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field)
with the `error` [word](#word).
See also [`uniq_array`](#uniq_array).
#### sum
Examples:
- `error | stats sum(duration) duration_total` - returns the sum of `duration` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) values
across [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with the `error` [word](#word).
- `GET | stats by (path) sum(response_size) response_size_sum` - returns the sum of `response_size` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) values
across [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with the `GET` [word](#word), grouped
by `path` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) value.
See also [count](#count) and [avg](#avg).
#### avg
Examples:
- `error | stats avg(duration) duration_avg` - returns the average value for the `duration` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
across [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with the `error` [word](#word).
- `GET | stats by (path) avg(response_size) avg_response_size` - returns the average value for the `response_size` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
across [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with the `GET` [word](#word), grouped
by `path` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) value.
See also [sum](#sum) and [count](#count).
#### max
Examples:
- `error | stats max(duration) duration_max` - returns the maximum value for the `duration` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
across [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with the `error` [word](#word).
- `GET | stats by (path) max(response_size) max_response_size` - returns the maximum value for the `response_size` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
across [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with the `GET` [word](#word), grouped
by `path` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) value.
See also [min](#min).
#### min
Examples:
- `error | stats min(duration) duration_min` - returns the minimum value for the `duration` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
across [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with the `error` [word](#word).
- `GET | stats by (path) min(response_size) min_response_size` - returns the minimum value for the `response_size` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
across [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with the `GET` [word](#word), grouped
by `path` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) value.
See also [max](#max).
#### uniq_array
Examples:
- `_time:1h | stats uniq_array(client_ip) as unique_ips` returns unique values for `client_ip` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
across logs for the last hour. The unqiue values are returned in JSON array such as `["1.2.4.5","5.6.7.8"]`.
- `_time:1h | stats by (host) unique_array(path) as unique_paths` returns unique values for `path` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
across logs for the last hour, grouped by `host` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
See also [uniq](#uniq) and [count](#count).
### Grouping stats by buckets
#### Time buckets
Stats can be bucketed by [`_time`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#time-field) with the `_time:bucket_duration` syntax inside `by(...)` clause.
For example, the following query returns per-minute number of log messages with the `error` [word](#word) for the last 10 minutes:
```logsql
_time:10m error | stats by (_time:1m) count() errors_per_minute
```
It is possible to add offset (for example, [timezone offset](https://en.wikipedia.org/wiki/UTC_offset)) when bucketing by `_time`. For example, the following query calculates
the number of per-day log entries for the last week at '2h' offset aka `UTC+02:00` offset:
```logsql
_time:1w | stats by (_time:1d offset 2h) count() logs_per_day_kyiv_offset
```
#### Numeric buckets
Stats can be bucketed by any numeric [log field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) with the `field_name:bucket_size` syntax inside `by(...)` clause.
For example, the following query returns the number of log messages with the `status=200` [phrase](#phrase-filter) bucketed by `request_duration_seconds` numeric field with `0.5` step:
```logsql
_time:10m "status=200" | stats by (request_duration_seconds:0.5) count() requests
```
The `bucket_size` can contain the following convenient suffixes:
- `KB` - the `bucket_size` is multiplied by `1000` in this case. For example, `10KB`.
- `MB` - the `bucket_size` is multiplied by `1_000_000` in this case. For example, `10MB`.
- `GB` - the `bucket_size` is multiplied by `1_000_000_000` in this case. For example, `10GB`.
- `TB` - the `bucket_size` is multiplied by `1_000_000_000_000` in this case. For example, `10TB`.
- `KiB` - the `bucket_size` is multiplied by `1024` in this case. For example, `10KiB`.
- `MiB` - the `bucket_size` is multiplied by `1024*1024` in this case. For example, `10MiB`.
- `GiB` - the `bucket_size` is multiplied by `1024*1024*1024` in this case. For example, `10GiB`.
- `TiB` - the `bucket_size` is multiplied by `1024*1024*1024*1024` in this case. For example, `10TiB`.
#### IPv4 mask buckets
Stats can be bucketed by [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) with [IPv4 addresses](https://en.wikipedia.org/wiki/IP_address)
via the `ip_field_name:/network_mask` syntax inside `by(...)` clause. For example, the following query returns the number of log entries per `/24` subnetwork during the last 10 minutes:
```logsql
_time:10m | stats by (ip:/24) count() requests_per_subnet
```
### Calculating multiple stats
Stats calculations can be combined. For example, the following query calculates the number of log messages with the `error` [word](#word),
the number of unique values for `ip` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) and the sum of `duration`
[field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model), grouped by `namespace` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model):
```logsql
error | stats by (namespace)
count() as errors_total,
uniq(ip) as unique_ips,
sum(duration) as duration_sum
```
### Stats TODO
Stats over the selected logs can be calculated via [`stats` pipe](#stats-pipe).
LogsQL will support calculating the following additional stats based on the [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model)
and fields created by [transformations](#transformations):
@ -1238,52 +1511,70 @@ and fields created by [transformations](#transformations):
It will be possible specifying an optional condition [filter](#post-filters) when calculating the stats.
For example, `sum(response_size) if (is_admin:true)` calculates the total response size for admins only.
It will be possible to group stats by the specified time buckets.
It is possible to perform stats calculations on the [selected log entries](#filters) at client side with `sort`, `uniq`, etc. Unix commands
according to [these docs](https://docs.victoriametrics.com/VictoriaLogs/querying/#command-line).
See the [Roadmap](https://docs.victoriametrics.com/VictoriaLogs/Roadmap.html) for details.
## Sorting
By default VictoriaLogs sorts the returned results by [`_time` field](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#time-field)
if their total size doesn't exceed `-select.maxSortBufferSize` command-line value (by default it is set to one megabytes).
Otherwise sorting is skipped because of performance and efficiency concerns described [here](https://docs.victoriametrics.com/VictoriaLogs/querying/).
if their total size doesn't exceed `-select.maxSortBufferSize` command-line value (by default it is set to 1MB).
Otherwise sorting is skipped because of performance reasons.
It is possible to sort the [selected log entries](#filters) at client side with `sort` Unix command
according to [these docs](https://docs.victoriametrics.com/VictoriaLogs/querying/#command-line).
LogsQL will support results' sorting by the given set of [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model).
See the [Roadmap](https://docs.victoriametrics.com/VictoriaLogs/Roadmap.html) for details.
Use [`sort` pipe](#sort-pipe) for sorting the results.
## Limiters
LogsQL provides the following functionality for limiting the number of returned log entries:
LogsQL provides the following [pipes](#pipes) for limiting the number of returned log entries:
- `error | head 10` - returns up to 10 log entries with the `error` [word](#word).
- `error | skip 10` - skips the first 10 log entris with the `error` [word](#word).
It is recommended [sorting](#sorting) entries before limiting the number of returned log entries,
in order to get consistent results.
It is possible to limit the returned results with `head`, `tail`, `less`, etc. Unix commands
according to [these docs](https://docs.victoriametrics.com/VictoriaLogs/querying/#command-line).
See the [Roadmap](https://docs.victoriametrics.com/VictoriaLogs/Roadmap.html) for details.
- [`fields`](#fields-pipe) and [`delete`](#delete-pipe) pipes allow limiting the set of [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model) to return.
- [`head` pipe](#head-pipe) allows limiting the number of log entries to return.
## Querying specific fields
By default VictoriaLogs query response contains all the [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
Specific log fields can be queried via [`fields` pipe](#fields-pipe).
If you want selecting some specific fields, then add `| fields field1, field2, ... fieldN` to the query.
For example, the following query returns only [`_time`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#time-field),
[`_stream`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#stream-fields), `host` and [`_msg`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) fields:
## Numeric values
```logsql
error | fields _time, _stream, host, _msg
```
LogsQL accepts numeric values in the following formats:
- regular integers like `12345` or `-12345`
- regular floating point numbers like `0.123` or `-12.34`
- [short numeric format](#short-numeric-values)
- [duration format](#duration-values)
### Short numeric values
LogsQL accepts integer and floating point values with the following suffixes:
- `K` and `KB` - the value is multiplied by `10^3`
- `M` and `MB` - the value is multiplied by `10^6`
- `G` and `GB` - the value is multiplied by `10^9`
- `T` and `TB` - the value is multiplied by `10^12`
- `Ki` and `KiB` - the value is multiplied by `2^10`
- `Mi` and `MiB` - the value is multiplied by `2^20`
- `Gi` and `GiB` - the value is multiplied by `2^30`
- `Ti` and `TiB` - the value is multiplied by `2^40`
All the numbers may contain `_` delimiters, which may improve readability of the query. For example, `1_234_567` is equivalent to `1234567`,
while `1.234_567` is equivalent to `1.234567`.
## Duration values
LogsQL accepts duration values with the following suffixes at places where the duration is allowed:
- `ns` - nanoseconds. For example, `123ns`.
- `µs` - microseconds. For example, `1.23µs`.
- `ms` - milliseconds. For example, `1.23456ms`
- `s` - seconds. For example, `1.234s`
- `m` - minutes. For example, `1.5m`
- `h` - hours. For example, `1.5h`
- `d` - days. For example, `1.5d`
- `w` - weeks. For example, `1w`
- `y` - years as 365 days. For example, `1.5y`
Multiple durations can be combined. For example, `1h33m55s`.
Internally duration values are converted into nanoseconds.
## Performance tips

View file

@ -30,6 +30,7 @@ See [these docs](https://docs.victoriametrics.com/VictoriaLogs/) for details.
The following functionality is planned in the future versions of VictoriaLogs:
- Support for [data ingestion](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/) from popular log collectors and formats:
- OpenTelemetry for logs
- Fluentd
- Syslog
- Journald (systemd)
@ -37,9 +38,6 @@ The following functionality is planned in the future versions of VictoriaLogs:
- [Stream context](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html#stream-context).
- [Transformation functions](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html#transformations).
- [Post-filtering](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html#post-filters).
- [Stats calculations](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html#stats).
- [Sorting](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html#sorting).
- [Limiters](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html#limiters).
- The ability to use subqueries inside [in()](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html#multi-exact-filter) function.
- Live tailing for [LogsQL filters](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html#filters) aka `tail -f`.
- Web UI with the following abilities:

View file

@ -3,6 +3,7 @@ package logstorage
import (
"encoding/binary"
"math"
"slices"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/decimal"
@ -121,9 +122,6 @@ func (br *blockResult) fetchAllColumns(bs *blockSearch, bm *bitmap) {
func (br *blockResult) fetchRequestedColumns(bs *blockSearch, bm *bitmap) {
for _, columnName := range bs.bsw.so.resultColumnNames {
if columnName == "" {
columnName = "_msg"
}
switch columnName {
case "_stream":
if !br.addStreamColumn(bs) {
@ -275,10 +273,7 @@ func (br *blockResult) addColumn(bs *blockSearch, ch *columnHeader, bm *bitmap)
}
dictValues = valuesBuf[valuesBufLen:]
name := ch.name
if name == "" {
name = "_msg"
}
name := getCanonicalColumnName(ch.name)
br.cs = append(br.cs, blockResultColumn{
name: name,
valueType: ch.valueType,
@ -425,6 +420,7 @@ func (br *blockResult) getBucketedTimestampValues(bucketSize, bucketOffset float
timestamp := timestamps[i]
timestamp -= bucketOffsetInt
timestamp -= timestamp % bucketSizeInt
timestamp += bucketOffsetInt
if i > 0 && timestamp == prevTimestamp {
valuesBuf = append(valuesBuf, s)
continue
@ -516,6 +512,7 @@ func (br *blockResult) getBucketedUint8Values(encodedValues []string, bucketSize
n := uint64(v[0])
n -= bucketOffsetInt
n -= n % bucketSizeInt
n += bucketOffsetInt
if i > 0 && n == nPrev {
valuesBuf = append(valuesBuf, s)
continue
@ -570,6 +567,7 @@ func (br *blockResult) getBucketedUint16Values(encodedValues []string, bucketSiz
n := uint64(encoding.UnmarshalUint16(b))
n -= bucketOffsetInt
n -= n % bucketSizeInt
n += bucketOffsetInt
if i > 0 && n == nPrev {
valuesBuf = append(valuesBuf, s)
continue
@ -624,6 +622,7 @@ func (br *blockResult) getBucketedUint32Values(encodedValues []string, bucketSiz
n := uint64(encoding.UnmarshalUint32(b))
n -= bucketOffsetInt
n -= n % bucketSizeInt
n += bucketOffsetInt
if i > 0 && n == nPrev {
valuesBuf = append(valuesBuf, s)
continue
@ -678,6 +677,7 @@ func (br *blockResult) getBucketedUint64Values(encodedValues []string, bucketSiz
n := encoding.UnmarshalUint64(b)
n -= bucketOffsetInt
n -= n % bucketSizeInt
n += bucketOffsetInt
if i > 0 && n == nPrev {
valuesBuf = append(valuesBuf, s)
continue
@ -742,6 +742,8 @@ func (br *blockResult) getBucketedFloat64Values(encodedValues []string, bucketSi
fP10 -= fP10 % bucketSizeP10
f = float64(fP10) / p10
f += bucketOffset
if i > 0 && f == fPrev {
valuesBuf = append(valuesBuf, s)
continue
@ -794,6 +796,7 @@ func (br *blockResult) getBucketedIPv4Values(encodedValues []string, bucketSize,
n := binary.BigEndian.Uint32(b)
n -= bucketOffsetInt
n -= n % bucketSizeInt
n += bucketOffsetInt
if i > 0 && n == nPrev {
valuesBuf = append(valuesBuf, s)
continue
@ -850,6 +853,7 @@ func (br *blockResult) getBucketedTimestampISO8601Values(encodedValues []string,
n := encoding.UnmarshalUint64(b)
n -= bucketOffsetInt
n -= n % bucketSizeInt
n += bucketOffsetInt
if i > 0 && n == nPrev {
valuesBuf = append(valuesBuf, s)
continue
@ -887,6 +891,7 @@ func (br *blockResult) getBucketedValue(s string, bucketSize, bucketOffset float
if f, ok := tryParseFloat64(s); ok {
f -= bucketOffset
// emulate f % bucketSize for float64 values
_, e := decimal.FromFloat(bucketSize)
p10 := math.Pow10(int(-e))
@ -894,6 +899,8 @@ func (br *blockResult) getBucketedValue(s string, bucketSize, bucketOffset float
fP10 -= fP10 % int64(bucketSize*p10)
f = float64(fP10) / p10
f += bucketOffset
bufLen := len(br.buf)
br.buf = marshalFloat64(br.buf, f)
return bytesutil.ToUnsafeString(br.buf[bufLen:])
@ -902,6 +909,7 @@ func (br *blockResult) getBucketedValue(s string, bucketSize, bucketOffset float
if nsecs, ok := tryParseTimestampISO8601(s); ok {
nsecs -= int64(bucketOffset)
nsecs -= nsecs % int64(bucketSize)
nsecs += int64(bucketOffset)
bufLen := len(br.buf)
br.buf = marshalTimestampISO8601(br.buf, nsecs)
return bytesutil.ToUnsafeString(br.buf[bufLen:])
@ -910,6 +918,7 @@ func (br *blockResult) getBucketedValue(s string, bucketSize, bucketOffset float
if nsecs, ok := tryParseTimestampRFC3339Nano(s); ok {
nsecs -= int64(bucketOffset)
nsecs -= nsecs % int64(bucketSize)
nsecs += int64(bucketOffset)
bufLen := len(br.buf)
br.buf = marshalTimestampRFC3339Nano(br.buf, nsecs)
return bytesutil.ToUnsafeString(br.buf[bufLen:])
@ -918,6 +927,7 @@ func (br *blockResult) getBucketedValue(s string, bucketSize, bucketOffset float
if n, ok := tryParseIPv4(s); ok {
n -= uint32(int32(bucketOffset))
n -= n % uint32(bucketSize)
n += uint32(int32(bucketOffset))
bufLen := len(br.buf)
br.buf = marshalIPv4(br.buf, n)
return bytesutil.ToUnsafeString(br.buf[bufLen:])
@ -926,6 +936,7 @@ func (br *blockResult) getBucketedValue(s string, bucketSize, bucketOffset float
if nsecs, ok := tryParseDuration(s); ok {
nsecs -= int64(bucketOffset)
nsecs -= nsecs % int64(bucketSize)
nsecs += int64(bucketOffset)
bufLen := len(br.buf)
br.buf = marshalDuration(br.buf, nsecs)
return bytesutil.ToUnsafeString(br.buf[bufLen:])
@ -942,7 +953,69 @@ func (br *blockResult) addEmptyStringColumn(columnName string) {
})
}
func (br *blockResult) updateColumns(columnNames []string) {
// copyColumns copies columns from srcColumnNames to dstColumnNames.
func (br *blockResult) copyColumns(srcColumnNames, dstColumnNames []string) {
if len(srcColumnNames) == 0 {
return
}
cs := br.cs
csOffset := len(cs)
for _, c := range br.getColumns() {
if idx := slices.Index(srcColumnNames, c.name); idx >= 0 {
c.name = dstColumnNames[idx]
cs = append(cs, c)
// continue is skipped intentionally in order to leave the original column in the columns list.
}
if !slices.Contains(dstColumnNames, c.name) {
cs = append(cs, c)
}
}
br.csOffset = csOffset
br.cs = cs
}
// renameColumns renames columns from srcColumnNames to dstColumnNames.
func (br *blockResult) renameColumns(srcColumnNames, dstColumnNames []string) {
if len(srcColumnNames) == 0 {
return
}
cs := br.cs
csOffset := len(cs)
for _, c := range br.getColumns() {
if idx := slices.Index(srcColumnNames, c.name); idx >= 0 {
c.name = dstColumnNames[idx]
cs = append(cs, c)
continue
}
if !slices.Contains(dstColumnNames, c.name) {
cs = append(cs, c)
}
}
br.csOffset = csOffset
br.cs = cs
}
// deleteColumns deletes columns with the given columnNames.
func (br *blockResult) deleteColumns(columnNames []string) {
if len(columnNames) == 0 {
return
}
cs := br.cs
csOffset := len(cs)
for _, c := range br.getColumns() {
if !slices.Contains(columnNames, c.name) {
cs = append(cs, c)
}
}
br.csOffset = csOffset
br.cs = cs
}
// setColumns sets the resulting columns to the given columnNames.
func (br *blockResult) setColumns(columnNames []string) {
if br.areSameColumns(columnNames) {
// Fast path - nothing to change.
return
@ -973,10 +1046,6 @@ func (br *blockResult) areSameColumns(columnNames []string) bool {
}
func (br *blockResult) getColumnByName(columnName string) blockResultColumn {
if columnName == "" {
columnName = "_msg"
}
cs := br.getColumns()
// iterate columns in reverse order, so overridden column results are returned instead of original column results.
@ -1110,37 +1179,6 @@ func (c *blockResultColumn) addValue(v string) {
c.values = c.valuesBuf
}
// getEncodedValues returns encoded values for the given column.
//
// The returned encoded values are valid until br.reset() is called.
func (c *blockResultColumn) getEncodedValues(br *blockResult) []string {
if c.encodedValues != nil {
return c.encodedValues
}
if !c.isTime {
logger.Panicf("BUG: encodedValues may be missing only for _time column; got %q column", c.name)
}
buf := br.buf
valuesBuf := br.valuesBuf
valuesBufLen := len(valuesBuf)
for _, timestamp := range br.timestamps {
bufLen := len(buf)
buf = encoding.MarshalInt64(buf, timestamp)
s := bytesutil.ToUnsafeString(buf[bufLen:])
valuesBuf = append(valuesBuf, s)
}
c.encodedValues = valuesBuf[valuesBufLen:]
br.valuesBuf = valuesBuf
br.buf = buf
return c.encodedValues
}
// getValueAtRow returns value for the value at the given rowIdx.
//
// The returned value is valid until br.reset() is called.

View file

@ -202,16 +202,65 @@ func (q *Query) String() string {
}
func (q *Query) getResultColumnNames() []string {
for _, p := range q.pipes {
switch t := p.(type) {
case *pipeFields:
return t.fields
case *pipeStats:
return t.neededFields()
input := []string{"*"}
pipes := q.pipes
for i := len(pipes) - 1; i >= 0; i-- {
fields, m := pipes[i].getNeededFields()
if len(fields) == 0 {
input = nil
}
if len(input) == 0 {
break
}
// transform upper input fields to the current input fields according to the given mapping.
if input[0] != "*" {
var dst []string
for _, f := range input {
if a, ok := m[f]; ok {
dst = append(dst, a...)
} else {
dst = append(dst, f)
}
}
input = normalizeFields(dst)
}
// intersect fields with input
if fields[0] != "*" {
m := make(map[string]struct{})
for _, f := range input {
m[f] = struct{}{}
}
var dst []string
for _, f := range fields {
if _, ok := m[f]; ok {
dst = append(dst, f)
}
}
input = normalizeFields(dst)
}
}
return input
}
func normalizeFields(a []string) []string {
m := make(map[string]struct{}, len(a))
dst := make([]string, 0, len(a))
for _, s := range a {
if s == "*" {
return []string{"*"}
}
if _, ok := m[s]; ok {
continue
}
m[s] = struct{}{}
dst = append(dst, s)
}
return dst
}
// ParseQuery parses s.
func ParseQuery(s string) (*Query, error) {
@ -522,14 +571,17 @@ func parseFilterLenRange(lex *lexer, fieldName string) (filter, error) {
if len(args) != 2 {
return nil, fmt.Errorf("unexpected number of args for %s(); got %d; want 2", funcName, len(args))
}
minLen, err := parseUint(args[0])
if err != nil {
return nil, fmt.Errorf("cannot parse minLen at %s(): %w", funcName, err)
}
maxLen, err := parseUint(args[1])
if err != nil {
return nil, fmt.Errorf("cannot parse maxLen at %s(): %w", funcName, err)
}
stringRepr := "(" + args[0] + ", " + args[1] + ")"
fr := &filterLenRange{
fieldName: fieldName,
@ -739,17 +791,18 @@ func parseFilterRange(lex *lexer, fieldName string) (filter, error) {
func parseFloat64(lex *lexer) (float64, string, error) {
s := getCompoundToken(lex)
f, err := strconv.ParseFloat(s, 64)
if err != nil {
if err == nil {
return f, s, nil
}
// Try parsing s as integer.
// This handles 0x..., 0b... and 0... prefixes.
// This handles 0x..., 0b... and 0... prefixes, alongside '_' delimiters.
n, err := parseInt(s)
if err == nil {
return float64(n), s, nil
}
return 0, "", fmt.Errorf("cannot parse %q as float64: %w", lex.token, err)
}
return f, s, nil
}
func parseFuncArg(lex *lexer, fieldName string, callback func(args string) (filter, error)) (filter, error) {
funcName := lex.token
@ -1184,7 +1237,22 @@ func parseUint(s string) (uint64, error) {
if strings.EqualFold(s, "inf") || strings.EqualFold(s, "+inf") {
return math.MaxUint64, nil
}
return strconv.ParseUint(s, 0, 64)
n, err := strconv.ParseUint(s, 0, 64)
if err == nil {
return n, nil
}
nn, ok := tryParseBytes(s)
if !ok {
nn, ok = tryParseDuration(s)
if !ok {
return 0, fmt.Errorf("cannot parse %q as unsigned integer: %w", s, err)
}
if nn < 0 {
return 0, fmt.Errorf("cannot parse negative value %q as unsigned integer", s)
}
}
return uint64(nn), nil
}
func parseInt(s string) (int64, error) {
@ -1193,7 +1261,18 @@ func parseInt(s string) (int64, error) {
return math.MaxInt64, nil
case strings.EqualFold(s, "-inf"):
return math.MinInt64, nil
default:
return strconv.ParseInt(s, 0, 64)
}
n, err := strconv.ParseInt(s, 0, 64)
if err == nil {
return n, nil
}
nn, ok := tryParseBytes(s)
if !ok {
nn, ok = tryParseDuration(s)
if !ok {
return 0, fmt.Errorf("cannot parse %q as integer: %w", s, err)
}
}
return nn, nil
}

View file

@ -3,6 +3,7 @@ package logstorage
import (
"math"
"reflect"
"slices"
"testing"
"time"
)
@ -540,6 +541,12 @@ func TestParseRangeFilter(t *testing.T) {
f(`:range(1, 2)`, ``, math.Nextafter(1, math.Inf(1)), math.Nextafter(2, math.Inf(-1)))
f(`range[1, 2)`, ``, 1, math.Nextafter(2, math.Inf(-1)))
f(`range("1", 2]`, ``, math.Nextafter(1, math.Inf(1)), 2)
f(`response_size:range[1KB, 10MiB]`, `response_size`, 1_000, 10*(1<<20))
f(`response_size:range[1G, 10Ti]`, `response_size`, 1_000_000_000, 10*(1<<40))
f(`response_size:range[10, inf]`, `response_size`, 10, math.Inf(1))
f(`duration:range[100ns, 1y2w2.5m3s5ms]`, `duration`, 100, 1*nsecsPerYear+2*nsecsPerWeek+2.5*nsecsPerMinute+3*nsecsPerSecond+5*nsecsPerMillisecond)
}
func TestParseQuerySuccess(t *testing.T) {
@ -749,6 +756,7 @@ func TestParseQuerySuccess(t *testing.T) {
f(`len_range(10, +InF)`, `len_range(10, +InF)`)
f(`len_range(10, 1_000_000)`, `len_range(10, 1_000_000)`)
f(`len_range(0x10,0b100101)`, `len_range(0x10, 0b100101)`)
f(`len_range(1.5KB, 22MB100KB)`, `len_range(1.5KB, 22MB100KB)`)
// range filter
f(`range(1.234, 5656.43454)`, `range(1.234, 5656.43454)`)
@ -760,6 +768,7 @@ func TestParseQuerySuccess(t *testing.T) {
f(`range(1_000, 0o7532)`, `range(1_000, 0o7532)`)
f(`range(0x1ff, inf)`, `range(0x1ff, inf)`)
f(`range(-INF,+inF)`, `range(-INF, +inF)`)
f(`range(1.5K, 22.5GiB)`, `range(1.5K, 22.5GiB)`)
// re filter
f("re('foo|ba(r.+)')", `re("foo|ba(r.+)")`)
@ -816,19 +825,34 @@ func TestParseQuerySuccess(t *testing.T) {
f(`foo | fields bar`, `foo | fields bar`)
f(`foo|FIELDS bar,Baz , "a,b|c"`, `foo | fields bar, Baz, "a,b|c"`)
f(`foo | Fields x.y, "abc:z/a", _b$c`, `foo | fields x.y, "abc:z/a", "_b$c"`)
f(`foo | fields "", a`, `foo | fields _msg, a`)
// multiple fields pipes
f(`foo | fields bar | fields baz, abc`, `foo | fields bar | fields baz, abc`)
// copy pipe
f(`* | copy foo as bar`, `* | copy foo as bar`)
f(`* | COPY foo as bar, x y | Copy a as b`, `* | copy foo as bar, x as y | copy a as b`)
// rename pipe
f(`* | rename foo as bar`, `* | rename foo as bar`)
f(`* | RENAME foo AS bar, x y | Rename a as b`, `* | rename foo as bar, x as y | rename a as b`)
// delete pipe
f(`* | delete foo`, `* | delete foo`)
f(`* | DELETE foo, bar`, `* | delete foo, bar`)
// head pipe
f(`foo | head 10`, `foo | head 10`)
f(`foo | HEAD 1123432`, `foo | head 1123432`)
f(`foo | HEAD 1_123_432`, `foo | head 1123432`)
f(`foo | head 10K`, `foo | head 10000`)
// multiple head pipes
f(`foo | head 100 | head 10 | head 234`, `foo | head 100 | head 10 | head 234`)
// skip pipe
f(`foo | skip 10`, `foo | skip 10`)
f(`foo | skip 12_345M`, `foo | skip 12345000000`)
// multiple skip pipes
f(`foo | skip 10 | skip 100`, `foo | skip 10 | skip 100`)
@ -839,6 +863,8 @@ func TestParseQuerySuccess(t *testing.T) {
f(`* | stats count() x`, `* | stats count(*) as x`)
f(`* | stats count(*) x`, `* | stats count(*) as x`)
f(`* | stats count(foo,*,bar) x`, `* | stats count(*) as x`)
f(`* | stats count('') foo`, `* | stats count(_msg) as foo`)
f(`* | stats count(foo) ''`, `* | stats count(foo) as _msg`)
// stats pipe sum
f(`* | stats Sum(foo) bar`, `* | stats sum(foo) as bar`)
@ -1107,6 +1133,23 @@ func TestParseQueryFailure(t *testing.T) {
f(`foo | fields bar,`)
f(`foo | fields bar,,`)
// invalid copy pipe
f(`foo | copy`)
f(`foo | copy foo`)
f(`foo | copy foo,`)
f(`foo | copy foo,,`)
// invalid rename pipe
f(`foo | rename`)
f(`foo | rename foo`)
f(`foo | rename foo,`)
f(`foo | rename foo,,`)
// invalid delete pipe
f(`foo | delete`)
f(`foo | delete foo,`)
f(`foo | delete foo,,`)
// missing head pipe value
f(`foo | head`)
@ -1175,3 +1218,25 @@ func TestParseQueryFailure(t *testing.T) {
f(`foo | stats by(bar,`)
f(`foo | stats by(bar)`)
}
func TestNormalizeFields(t *testing.T) {
f := func(fields, normalizedExpected []string) {
t.Helper()
normalized := normalizeFields(fields)
if !slices.Equal(normalized, normalizedExpected) {
t.Fatalf("unexpected normalized fields for %q; got %q; want %q", fields, normalized, normalizedExpected)
}
}
f(nil, nil)
f([]string{"foo"}, []string{"foo"})
// duplicate fields
f([]string{"foo", "bar", "foo", "x"}, []string{"foo", "bar", "x"})
f([]string{"foo", "foo", "x", "x", "x"}, []string{"foo", "x"})
// star field
f([]string{"*"}, []string{"*"})
f([]string{"foo", "*", "bar"}, []string{"*"})
}

View file

@ -8,6 +8,12 @@ type pipe interface {
// String returns string representation of the pipe.
String() string
// getNeededFields must return the required input fields alongside the mapping from output fields to input fields for the given pipe.
//
// It must return []string{"*"} if the set of input fields cannot be determined at the given pipe.
// It must return nil map if the pipe doesn't add new fields to the output.
getNeededFields() ([]string, map[string][]string)
// newPipeProcessor must return new pipeProcessor for the given ppBase.
//
// workersCount is the number of goroutine workers, which will call writeBlock() method.
@ -68,12 +74,6 @@ func parsePipes(lex *lexer) ([]pipe, error) {
return nil, fmt.Errorf("missing token after '|'")
}
switch {
case lex.isKeyword("fields"):
pf, err := parsePipeFields(lex)
if err != nil {
return nil, fmt.Errorf("cannot parse 'fields' pipe: %w", err)
}
pipes = append(pipes, pf)
case lex.isKeyword("stats"):
ps, err := parsePipeStats(lex)
if err != nil {
@ -92,6 +92,30 @@ func parsePipes(lex *lexer) ([]pipe, error) {
return nil, fmt.Errorf("cannot parse 'skip' pipe: %w", err)
}
pipes = append(pipes, ps)
case lex.isKeyword("fields"):
pf, err := parsePipeFields(lex)
if err != nil {
return nil, fmt.Errorf("cannot parse 'fields' pipe: %w", err)
}
pipes = append(pipes, pf)
case lex.isKeyword("copy"):
pc, err := parsePipeCopy(lex)
if err != nil {
return nil, fmt.Errorf("cannot parse 'copy' pipe: %w", err)
}
pipes = append(pipes, pc)
case lex.isKeyword("rename"):
pr, err := parsePipeRename(lex)
if err != nil {
return nil, fmt.Errorf("cannot parse 'rename' pipe: %w", err)
}
pipes = append(pipes, pr)
case lex.isKeyword("delete"):
pd, err := parsePipeDelete(lex)
if err != nil {
return nil, fmt.Errorf("cannot parse 'delete' pipe: %w", err)
}
pipes = append(pipes, pd)
default:
return nil, fmt.Errorf("unexpected pipe %q", lex.token)
}

View file

@ -0,0 +1,99 @@
package logstorage
import (
"fmt"
"strings"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
)
// pipeCopy implements '| copy ...' pipe.
//
// See https://docs.victoriametrics.com/victorialogs/logsql/#transformations
type pipeCopy struct {
// srcFields contains a list of source fields to copy
srcFields []string
// dstFields contains a list of destination fields
dstFields []string
}
func (pc *pipeCopy) String() string {
if len(pc.srcFields) == 0 {
logger.Panicf("BUG: pipeCopy must contain at least a single srcField")
}
a := make([]string, len(pc.srcFields))
for i, srcField := range pc.srcFields {
dstField := pc.dstFields[i]
a[i] = quoteTokenIfNeeded(srcField) + " as " + quoteTokenIfNeeded(dstField)
}
return "copy " + strings.Join(a, ", ")
}
func (pc *pipeCopy) getNeededFields() ([]string, map[string][]string) {
m := make(map[string][]string, len(pc.srcFields))
for i, dstField := range pc.dstFields {
m[dstField] = append(m[dstField], pc.srcFields[i])
}
return []string{"*"}, m
}
func (pc *pipeCopy) newPipeProcessor(_ int, _ <-chan struct{}, _ func(), ppBase pipeProcessor) pipeProcessor {
return &pipeCopyProcessor{
pc: pc,
ppBase: ppBase,
}
}
type pipeCopyProcessor struct {
pc *pipeCopy
ppBase pipeProcessor
}
func (pcp *pipeCopyProcessor) writeBlock(workerID uint, br *blockResult) {
br.copyColumns(pcp.pc.srcFields, pcp.pc.dstFields)
pcp.ppBase.writeBlock(workerID, br)
}
func (pcp *pipeCopyProcessor) flush() error {
return nil
}
func parsePipeCopy(lex *lexer) (*pipeCopy, error) {
if !lex.isKeyword("copy") {
return nil, fmt.Errorf("expecting 'copy'; got %q", lex.token)
}
var srcFields []string
var dstFields []string
for {
lex.nextToken()
srcField, err := parseFieldName(lex)
if err != nil {
return nil, fmt.Errorf("cannot parse src field name: %w", err)
}
if lex.isKeyword("as") {
lex.nextToken()
}
dstField, err := parseFieldName(lex)
if err != nil {
return nil, fmt.Errorf("cannot parse dst field name: %w", err)
}
srcFields = append(srcFields, srcField)
dstFields = append(dstFields, dstField)
switch {
case lex.isKeyword("|", ")", ""):
pc := &pipeCopy{
srcFields: srcFields,
dstFields: dstFields,
}
return pc, nil
case lex.isKeyword(","):
default:
return nil, fmt.Errorf("unexpected token: %q; expecting ',', '|' or ')'", lex.token)
}
}
}

View file

@ -0,0 +1,76 @@
package logstorage
import (
"fmt"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
)
// pipeDelete implements '| delete ...' pipe.
//
// See https://docs.victoriametrics.com/victorialogs/logsql/#transformations
type pipeDelete struct {
// fields contains a list of fields to delete
fields []string
}
func (pd *pipeDelete) String() string {
if len(pd.fields) == 0 {
logger.Panicf("BUG: pipeDelete must contain at least a single field")
}
return "delete " + fieldNamesString(pd.fields)
}
func (pd *pipeDelete) getNeededFields() ([]string, map[string][]string) {
return []string{"*"}, nil
}
func (pd *pipeDelete) newPipeProcessor(_ int, _ <-chan struct{}, _ func(), ppBase pipeProcessor) pipeProcessor {
return &pipeDeleteProcessor{
pd: pd,
ppBase: ppBase,
}
}
type pipeDeleteProcessor struct {
pd *pipeDelete
ppBase pipeProcessor
}
func (pdp *pipeDeleteProcessor) writeBlock(workerID uint, br *blockResult) {
br.deleteColumns(pdp.pd.fields)
pdp.ppBase.writeBlock(workerID, br)
}
func (pdp *pipeDeleteProcessor) flush() error {
return nil
}
func parsePipeDelete(lex *lexer) (*pipeDelete, error) {
if !lex.isKeyword("delete") {
return nil, fmt.Errorf("expecting 'delete'; got %q", lex.token)
}
var fields []string
for {
lex.nextToken()
field, err := parseFieldName(lex)
if err != nil {
return nil, fmt.Errorf("cannot parse field name: %w", err)
}
fields = append(fields, field)
switch {
case lex.isKeyword("|", ")", ""):
pd := &pipeDelete{
fields: fields,
}
return pd, nil
case lex.isKeyword(","):
default:
return nil, fmt.Errorf("unexpected token: %q; expecting ',', '|' or ')'", lex.token)
}
}
}

View file

@ -9,7 +9,7 @@ import (
// pipeFields implements '| fields ...' pipe.
//
// See https://docs.victoriametrics.com/victorialogs/logsql/#limiters
// See https://docs.victoriametrics.com/victorialogs/logsql/#fields-pipe
type pipeFields struct {
// fields contains list of fields to fetch
fields []string
@ -25,6 +25,13 @@ func (pf *pipeFields) String() string {
return "fields " + fieldNamesString(pf.fields)
}
func (pf *pipeFields) getNeededFields() ([]string, map[string][]string) {
if pf.containsStar {
return []string{"*"}, nil
}
return pf.fields, nil
}
func (pf *pipeFields) newPipeProcessor(_ int, _ <-chan struct{}, _ func(), ppBase pipeProcessor) pipeProcessor {
return &pipeFieldsProcessor{
pf: pf,
@ -39,7 +46,7 @@ type pipeFieldsProcessor struct {
func (pfp *pipeFieldsProcessor) writeBlock(workerID uint, br *blockResult) {
if !pfp.pf.containsStar {
br.updateColumns(pfp.pf.fields)
br.setColumns(pfp.pf.fields)
}
pfp.ppBase.writeBlock(workerID, br)
}
@ -49,11 +56,13 @@ func (pfp *pipeFieldsProcessor) flush() error {
}
func parsePipeFields(lex *lexer) (*pipeFields, error) {
if !lex.isKeyword("fields") {
return nil, fmt.Errorf("expecting 'fields'; got %q", lex.token)
}
var fields []string
for {
if !lex.mustNextToken() {
return nil, fmt.Errorf("missing field name")
}
lex.nextToken()
field, err := parseFieldName(lex)
if err != nil {
return nil, fmt.Errorf("cannot parse field name: %w", err)
@ -61,6 +70,9 @@ func parsePipeFields(lex *lexer) (*pipeFields, error) {
fields = append(fields, field)
switch {
case lex.isKeyword("|", ")", ""):
if slices.Contains(fields, "*") {
fields = []string{"*"}
}
pf := &pipeFields{
fields: fields,
containsStar: slices.Contains(fields, "*"),

View file

@ -16,6 +16,10 @@ func (ph *pipeHead) String() string {
return fmt.Sprintf("head %d", ph.n)
}
func (ph *pipeHead) getNeededFields() ([]string, map[string][]string) {
return []string{"*"}, nil
}
func (ph *pipeHead) newPipeProcessor(_ int, _ <-chan struct{}, cancel func(), ppBase pipeProcessor) pipeProcessor {
if ph.n == 0 {
// Special case - notify the caller to stop writing data to the returned pipeHeadProcessor
@ -65,12 +69,14 @@ func (php *pipeHeadProcessor) flush() error {
}
func parsePipeHead(lex *lexer) (*pipeHead, error) {
if !lex.mustNextToken() {
return nil, fmt.Errorf("missing the number of head rows to return")
if !lex.isKeyword("head") {
return nil, fmt.Errorf("expecting 'head'; got %q", lex.token)
}
lex.nextToken()
n, err := parseUint(lex.token)
if err != nil {
return nil, fmt.Errorf("cannot parse the number of head rows to return %q: %w", lex.token, err)
return nil, fmt.Errorf("cannot parse the number of head rows to return from %q: %w", lex.token, err)
}
lex.nextToken()
ph := &pipeHead{

View file

@ -0,0 +1,99 @@
package logstorage
import (
"fmt"
"strings"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
)
// pipeRename implements '| rename ...' pipe.
//
// See https://docs.victoriametrics.com/victorialogs/logsql/#transformations
type pipeRename struct {
// srcFields contains a list of source fields to rename
srcFields []string
// dstFields contains a list of destination fields
dstFields []string
}
func (pr *pipeRename) String() string {
if len(pr.srcFields) == 0 {
logger.Panicf("BUG: pipeRename must contain at least a single srcField")
}
a := make([]string, len(pr.srcFields))
for i, srcField := range pr.srcFields {
dstField := pr.dstFields[i]
a[i] = quoteTokenIfNeeded(srcField) + " as " + quoteTokenIfNeeded(dstField)
}
return "rename " + strings.Join(a, ", ")
}
func (pr *pipeRename) getNeededFields() ([]string, map[string][]string) {
m := make(map[string][]string, len(pr.srcFields))
for i, dstField := range pr.dstFields {
m[dstField] = append(m[dstField], pr.srcFields[i])
}
return []string{"*"}, m
}
func (pr *pipeRename) newPipeProcessor(_ int, _ <-chan struct{}, _ func(), ppBase pipeProcessor) pipeProcessor {
return &pipeRenameProcessor{
pr: pr,
ppBase: ppBase,
}
}
type pipeRenameProcessor struct {
pr *pipeRename
ppBase pipeProcessor
}
func (prp *pipeRenameProcessor) writeBlock(workerID uint, br *blockResult) {
br.renameColumns(prp.pr.srcFields, prp.pr.dstFields)
prp.ppBase.writeBlock(workerID, br)
}
func (prp *pipeRenameProcessor) flush() error {
return nil
}
func parsePipeRename(lex *lexer) (*pipeRename, error) {
if !lex.isKeyword("rename") {
return nil, fmt.Errorf("expecting 'rename'; got %q", lex.token)
}
var srcFields []string
var dstFields []string
for {
lex.nextToken()
srcField, err := parseFieldName(lex)
if err != nil {
return nil, fmt.Errorf("cannot parse src field name: %w", err)
}
if lex.isKeyword("as") {
lex.nextToken()
}
dstField, err := parseFieldName(lex)
if err != nil {
return nil, fmt.Errorf("cannot parse dst field name: %w", err)
}
srcFields = append(srcFields, srcField)
dstFields = append(dstFields, dstField)
switch {
case lex.isKeyword("|", ")", ""):
pr := &pipeRename{
srcFields: srcFields,
dstFields: dstFields,
}
return pr, nil
case lex.isKeyword(","):
default:
return nil, fmt.Errorf("unexpected token: %q; expecting ',', '|' or ')'", lex.token)
}
}
}

View file

@ -16,6 +16,10 @@ func (ps *pipeSkip) String() string {
return fmt.Sprintf("skip %d", ps.n)
}
func (ps *pipeSkip) getNeededFields() ([]string, map[string][]string) {
return []string{"*"}, nil
}
func (ps *pipeSkip) newPipeProcessor(workersCount int, _ <-chan struct{}, _ func(), ppBase pipeProcessor) pipeProcessor {
return &pipeSkipProcessor{
ps: ps,
@ -52,12 +56,14 @@ func (psp *pipeSkipProcessor) flush() error {
}
func parsePipeSkip(lex *lexer) (*pipeSkip, error) {
if !lex.mustNextToken() {
return nil, fmt.Errorf("missing the number of rows to skip")
if !lex.isKeyword("skip") {
return nil, fmt.Errorf("expecting 'rename'; got %q", lex.token)
}
lex.nextToken()
n, err := parseUint(lex.token)
if err != nil {
return nil, fmt.Errorf("cannot parse the number of rows to skip %q: %w", lex.token, err)
return nil, fmt.Errorf("cannot parse the number of rows to skip from %q: %w", lex.token, err)
}
lex.nextToken()
ps := &pipeSkip{

View file

@ -83,6 +83,27 @@ func (ps *pipeStats) String() string {
return s
}
func (ps *pipeStats) getNeededFields() ([]string, map[string][]string) {
var byFields []string
for _, bf := range ps.byFields {
byFields = append(byFields, bf.name)
}
neededFields := append([]string{}, byFields...)
m := make(map[string][]string)
for i, f := range ps.funcs {
funcFields := f.neededFields()
neededFields = append(neededFields, funcFields...)
resultName := ps.resultNames[i]
m[resultName] = append(m[resultName], byFields...)
m[resultName] = append(m[resultName], funcFields...)
}
return neededFields, m
}
const stateSizeBudgetChunk = 1 << 20
func (ps *pipeStats) newPipeProcessor(workersCount int, stopCh <-chan struct{}, cancel func(), ppBase pipeProcessor) pipeProcessor {
@ -376,35 +397,13 @@ func (psp *pipeStatsProcessor) flush() error {
return nil
}
func (ps *pipeStats) neededFields() []string {
var neededFields []string
m := make(map[string]struct{})
for _, bf := range ps.byFields {
name := bf.name
if _, ok := m[name]; !ok {
m[name] = struct{}{}
neededFields = append(neededFields, name)
}
}
for _, f := range ps.funcs {
for _, fieldName := range f.neededFields() {
if _, ok := m[fieldName]; !ok {
m[fieldName] = struct{}{}
neededFields = append(neededFields, fieldName)
}
}
}
return neededFields
}
func parsePipeStats(lex *lexer) (*pipeStats, error) {
if !lex.mustNextToken() {
return nil, fmt.Errorf("missing stats config")
if !lex.isKeyword("stats") {
return nil, fmt.Errorf("expecting 'stats'; got %q", lex.token)
}
lex.nextToken()
var ps pipeStats
if lex.isKeyword("by") {
lex.nextToken()
@ -494,9 +493,7 @@ func parseStatsFunc(lex *lexer) (statsFunc, string, error) {
func parseResultName(lex *lexer) (string, error) {
if lex.isKeyword("as") {
if !lex.mustNextToken() {
return "", fmt.Errorf("missing token after 'as' keyword")
}
lex.nextToken()
}
resultName, err := parseFieldName(lex)
if err != nil {
@ -543,9 +540,7 @@ func parseByFields(lex *lexer) ([]*byField, error) {
}
var bfs []*byField
for {
if !lex.mustNextToken() {
return nil, fmt.Errorf("missing field name or ')'")
}
lex.nextToken()
if lex.isKeyword(")") {
lex.nextToken()
return bfs, nil
@ -657,6 +652,9 @@ func tryParseBucketSize(s string) (float64, bool) {
return 0, false
}
// parseFieldNamesForStatsFunc parses field names for statsFunc.
//
// It returns ["*"] if the fields names list is empty or if it contains "*" field.
func parseFieldNamesForStatsFunc(lex *lexer, funcName string) ([]string, error) {
if !lex.isKeyword(funcName) {
return nil, fmt.Errorf("unexpected func; got %q; want %q", lex.token, funcName)
@ -678,9 +676,7 @@ func parseFieldNamesInParens(lex *lexer) ([]string, error) {
}
var fields []string
for {
if !lex.mustNextToken() {
return nil, fmt.Errorf("missing field name or ')'")
}
lex.nextToken()
if lex.isKeyword(")") {
lex.nextToken()
return fields, nil
@ -708,8 +704,9 @@ func parseFieldName(lex *lexer) (string, error) {
if lex.isKeyword(",", "(", ")", "[", "]", "|", ":", "") {
return "", fmt.Errorf("unexpected token: %q", lex.token)
}
token := getCompoundPhrase(lex, false)
return token, nil
fieldName := getCompoundPhrase(lex, false)
fieldName = getCanonicalColumnName(fieldName)
return fieldName, nil
}
func fieldNamesString(fields []string) string {

View file

@ -35,12 +35,12 @@ func TestTryParseBucketSize_Success(t *testing.T) {
f("-1h5m3.5s", -(nsecsPerHour + 5*nsecsPerMinute + 3.5*nsecsPerSecond))
// bytes
f("1b", 1)
f("1k", 1_000)
f("1Kb", 1_000)
f("1B", 1)
f("1K", 1_000)
f("1KB", 1_000)
f("5.5KiB", 5.5*(1<<10))
f("10MB500KB10B", 10*1_000_000+500*1_000+10)
f("10m0k", 10*1_000_000)
f("10M", 10*1_000_000)
f("-10MB", -10*1_000_000)
// ipv4 mask
@ -95,13 +95,13 @@ func TestTryParseBucketOffset_Success(t *testing.T) {
f("-1h5m3.5s", -(nsecsPerHour + 5*nsecsPerMinute + 3.5*nsecsPerSecond))
// bytes
f("1b", 1)
f("1k", 1_000)
f("1Kb", 1_000)
f("1B", 1)
f("1K", 1_000)
f("1KB", 1_000)
f("5.5KiB", 5.5*(1<<10))
f("10MB500KB10B", 10*1_000_000+500*1_000+10)
f("10m0k", 10*1_000_000)
f("-10mb", -10*1_000_000)
f("10M", 10*1_000_000)
f("-10MB", -10*1_000_000)
}
func TestTryParseBucketOffset_Failure(t *testing.T) {

View file

@ -24,10 +24,7 @@ func (f *Field) Reset() {
// String returns string representation of f.
func (f *Field) String() string {
name := f.Name
if name == "" {
name = "_msg"
}
name := getCanonicalColumnName(f.Name)
return fmt.Sprintf("%q:%q", name, f.Value)
}
@ -121,3 +118,10 @@ func (rs *rows) mergeRows(timestampsA, timestampsB []int64, fieldsA, fieldsB [][
rs.appendRows(timestampsA, fieldsA)
}
}
func getCanonicalColumnName(columnName string) string {
if columnName == "" {
return "_msg"
}
return columnName
}

View file

@ -18,7 +18,11 @@ func (sc *statsCount) String() string {
}
func (sc *statsCount) neededFields() []string {
return getFieldsIgnoreStar(sc.fields)
if sc.containsStar {
// There is no need in fetching any columns for count(*) - the number of matching rows can be calculated as len(blockResult.timestamps)
return nil
}
return sc.fields
}
func (sc *statsCount) newStatsProcessor() (statsProcessor, int) {
@ -204,13 +208,3 @@ func parseStatsCount(lex *lexer) (*statsCount, error) {
}
return sc, nil
}
func getFieldsIgnoreStar(fields []string) []string {
var result []string
for _, f := range fields {
if f != "*" {
result = append(result, f)
}
}
return result
}

View file

@ -88,8 +88,7 @@ func (sup *statsUniqProcessor) updateStatsForAllRows(br *blockResult) int {
}
if len(fields) == 1 {
// Fast path for a single column.
// The unique key is formed as "<is_time> <value_type>? <encodedValue>",
// where <value_type> is skipped if <is_time> == 1.
// The unique key is formed as "<is_time> <value>",
// This guarantees that keys do not clash for different column types across blocks.
c := br.getColumnByName(fields[0])
if c.isTime {
@ -119,7 +118,7 @@ func (sup *statsUniqProcessor) updateStatsForAllRows(br *blockResult) int {
return stateSizeIncrease
}
keyBuf := sup.keyBuf[:0]
keyBuf = append(keyBuf[:0], 0, byte(valueTypeString))
keyBuf = append(keyBuf[:0], 0)
keyBuf = append(keyBuf, v...)
if _, ok := m[string(keyBuf)]; !ok {
m[string(keyBuf)] = struct{}{}
@ -131,13 +130,13 @@ func (sup *statsUniqProcessor) updateStatsForAllRows(br *blockResult) int {
if c.valueType == valueTypeDict {
// count unique non-zero c.dictValues
keyBuf := sup.keyBuf[:0]
for i, v := range c.dictValues {
for _, v := range c.dictValues {
if v == "" {
// Do not count empty values
continue
}
keyBuf = append(keyBuf[:0], 0, byte(valueTypeDict))
keyBuf = append(keyBuf, byte(i))
keyBuf = append(keyBuf[:0], 0)
keyBuf = append(keyBuf, v...)
if _, ok := m[string(keyBuf)]; !ok {
m[string(keyBuf)] = struct{}{}
stateSizeIncrease += len(keyBuf) + int(unsafe.Sizeof(""))
@ -148,19 +147,18 @@ func (sup *statsUniqProcessor) updateStatsForAllRows(br *blockResult) int {
}
// Count unique values across encodedValues
encodedValues := c.getEncodedValues(br)
isStringValueType := c.valueType == valueTypeString
values := c.getValues(br)
keyBuf := sup.keyBuf[:0]
for i, v := range encodedValues {
if isStringValueType && v == "" {
for i, v := range values {
if v == "" {
// Do not count empty values
continue
}
if i > 0 && encodedValues[i-1] == v {
if i > 0 && values[i-1] == v {
// This value has been already counted.
continue
}
keyBuf = append(keyBuf[:0], 0, byte(c.valueType))
keyBuf = append(keyBuf[:0], 0)
keyBuf = append(keyBuf, v...)
if _, ok := m[string(keyBuf)]; !ok {
m[string(keyBuf)] = struct{}{}
@ -249,8 +247,7 @@ func (sup *statsUniqProcessor) updateStatsForRow(br *blockResult, rowIdx int) in
}
if len(fields) == 1 {
// Fast path for a single column.
// The unique key is formed as "<is_time> <value_type>? <encodedValue>",
// where <value_type> is skipped if <is_time> == 1.
// The unique key is formed as "<is_time> <value>",
// This guarantees that keys do not clash for different column types across blocks.
c := br.getColumnByName(fields[0])
if c.isTime {
@ -273,7 +270,7 @@ func (sup *statsUniqProcessor) updateStatsForRow(br *blockResult, rowIdx int) in
return stateSizeIncrease
}
keyBuf := sup.keyBuf[:0]
keyBuf = append(keyBuf[:0], 0, byte(valueTypeString))
keyBuf = append(keyBuf[:0], 0)
keyBuf = append(keyBuf, v...)
if _, ok := m[string(keyBuf)]; !ok {
m[string(keyBuf)] = struct{}{}
@ -285,13 +282,14 @@ func (sup *statsUniqProcessor) updateStatsForRow(br *blockResult, rowIdx int) in
if c.valueType == valueTypeDict {
// count unique non-zero c.dictValues
dictIdx := c.encodedValues[rowIdx][0]
if c.dictValues[dictIdx] == "" {
v := c.dictValues[dictIdx]
if v == "" {
// Do not count empty values
return stateSizeIncrease
}
keyBuf := sup.keyBuf[:0]
keyBuf = append(keyBuf[:0], 0, byte(valueTypeDict))
keyBuf = append(keyBuf, dictIdx)
keyBuf = append(keyBuf[:0], 0)
keyBuf = append(keyBuf, v...)
if _, ok := m[string(keyBuf)]; !ok {
m[string(keyBuf)] = struct{}{}
stateSizeIncrease += len(keyBuf) + int(unsafe.Sizeof(""))
@ -301,14 +299,13 @@ func (sup *statsUniqProcessor) updateStatsForRow(br *blockResult, rowIdx int) in
}
// Count unique values for the given rowIdx
encodedValues := c.getEncodedValues(br)
v := encodedValues[rowIdx]
if c.valueType == valueTypeString && v == "" {
v := c.getValueAtRow(br, rowIdx)
if v == "" {
// Do not count empty values
return stateSizeIncrease
}
keyBuf := sup.keyBuf[:0]
keyBuf = append(keyBuf[:0], 0, byte(c.valueType))
keyBuf = append(keyBuf[:0], 0)
keyBuf = append(keyBuf, v...)
if _, ok := m[string(keyBuf)]; !ok {
m[string(keyBuf)] = struct{}{}

View file

@ -731,88 +731,91 @@ func tryParseBytes(s string) (int64, bool) {
if !ok {
return 0, false
}
if len(tail) == 0 {
if _, frac := math.Modf(f); frac != 0 {
// deny floating-point numbers without any suffix.
return 0, false
}
}
s = tail
if len(s) == 0 {
n += int64(f)
continue
}
if len(s) >= 3 {
prefix := s[:3]
switch {
case strings.EqualFold(prefix, "kib"):
case strings.HasPrefix(s, "KiB"):
n += int64(f * (1 << 10))
s = s[3:]
continue
case strings.EqualFold(prefix, "mib"):
case strings.HasPrefix(s, "MiB"):
n += int64(f * (1 << 20))
s = s[3:]
continue
case strings.EqualFold(prefix, "gib"):
case strings.HasPrefix(s, "GiB"):
n += int64(f * (1 << 30))
s = s[3:]
continue
case strings.EqualFold(prefix, "tib"):
case strings.HasPrefix(s, "TiB"):
n += int64(f * (1 << 40))
s = s[3:]
continue
}
}
if len(s) >= 2 {
prefix := s[:2]
switch {
case strings.EqualFold(prefix, "ki"):
case strings.HasPrefix(s, "Ki"):
n += int64(f * (1 << 10))
s = s[2:]
continue
case strings.EqualFold(prefix, "mi"):
case strings.HasPrefix(s, "Mi"):
n += int64(f * (1 << 20))
s = s[2:]
continue
case strings.EqualFold(prefix, "gi"):
case strings.HasPrefix(s, "Gi"):
n += int64(f * (1 << 30))
s = s[2:]
continue
case strings.EqualFold(prefix, "ti"):
case strings.HasPrefix(s, "Ti"):
n += int64(f * (1 << 40))
s = s[2:]
continue
case strings.EqualFold(prefix, "kb"):
case strings.HasPrefix(s, "KB"):
n += int64(f * 1_000)
s = s[2:]
continue
case strings.EqualFold(prefix, "mb"):
case strings.HasPrefix(s, "MB"):
n += int64(f * 1_000_000)
s = s[2:]
continue
case strings.EqualFold(prefix, "gb"):
case strings.HasPrefix(s, "GB"):
n += int64(f * 1_000_000_000)
s = s[2:]
continue
case strings.EqualFold(prefix, "tb"):
case strings.HasPrefix(s, "TB"):
n += int64(f * 1_000_000_000_000)
s = s[2:]
continue
}
}
prefix := s[:1]
switch {
case strings.EqualFold(prefix, "b"):
case strings.HasPrefix(s, "B"):
n += int64(f)
s = s[1:]
continue
case strings.EqualFold(prefix, "k"):
case strings.HasPrefix(s, "K"):
n += int64(f * 1_000)
s = s[1:]
continue
case strings.EqualFold(prefix, "m"):
case strings.HasPrefix(s, "M"):
n += int64(f * 1_000_000)
s = s[1:]
continue
case strings.EqualFold(prefix, "g"):
case strings.HasPrefix(s, "G"):
n += int64(f * 1_000_000_000)
s = s[1:]
continue
case strings.EqualFold(prefix, "t"):
case strings.HasPrefix(s, "T"):
n += int64(f * 1_000_000_000_000)
s = s[1:]
continue
@ -859,48 +862,45 @@ func tryParseDuration(s string) (int64, bool) {
return 0, false
}
if len(s) >= 3 {
prefix := s[:3]
if strings.EqualFold(prefix, "µs") {
if strings.HasPrefix(s, "µs") {
nsecs += int64(f * nsecsPerMicrosecond)
s = s[3:]
continue
}
}
if len(s) >= 2 {
prefix := s[:2]
switch {
case strings.EqualFold(prefix, "ms"):
case strings.HasPrefix(s, "ms"):
nsecs += int64(f * nsecsPerMillisecond)
s = s[2:]
continue
case strings.EqualFold(prefix, "ns"):
case strings.HasPrefix(s, "ns"):
nsecs += int64(f)
s = s[2:]
continue
}
}
prefix := s[:1]
switch {
case strings.EqualFold(prefix, "y"):
case strings.HasPrefix(s, "y"):
nsecs += int64(f * nsecsPerYear)
s = s[1:]
case strings.EqualFold(prefix, "w"):
case strings.HasPrefix(s, "w"):
nsecs += int64(f * nsecsPerWeek)
s = s[1:]
continue
case strings.EqualFold(prefix, "d"):
case strings.HasPrefix(s, "d"):
nsecs += int64(f * nsecsPerDay)
s = s[1:]
continue
case strings.EqualFold(prefix, "h"):
case strings.HasPrefix(s, "h"):
nsecs += int64(f * nsecsPerHour)
s = s[1:]
continue
case strings.EqualFold(prefix, "m"):
case strings.HasPrefix(s, "m"):
nsecs += int64(f * nsecsPerMinute)
s = s[1:]
continue
case strings.EqualFold(prefix, "s"):
case strings.HasPrefix(s, "s"):
nsecs += int64(f * nsecsPerSecond)
s = s[1:]
continue

View file

@ -325,7 +325,6 @@ func TestTryParseDuration_Success(t *testing.T) {
// zero duration
f("0s", 0)
f("0S", 0)
f("0.0w0d0h0s0.0ms", 0)
f("-0w", 0)
@ -334,15 +333,9 @@ func TestTryParseDuration_Success(t *testing.T) {
f("1.5ms", 1.5*nsecsPerMillisecond)
f("1µs", nsecsPerMicrosecond)
f("1ns", 1)
f("1NS", 1)
f("1nS", 1)
f("1Ns", 1)
f("1h", nsecsPerHour)
f("1H", nsecsPerHour)
f("1.5d", 1.5*nsecsPerDay)
f("1.5D", 1.5*nsecsPerDay)
f("1.5w", 1.5*nsecsPerWeek)
f("1.5W", 1.5*nsecsPerWeek)
f("2.5y", 2.5*nsecsPerYear)
f("1m5.123456789s", nsecsPerMinute+5.123456789*nsecsPerSecond)
@ -417,62 +410,25 @@ func TestTryParseBytes_Success(t *testing.T) {
}
}
f("123.456", 123)
f("1_500", 1_500)
f("2.5b", 2)
f("2.5B", 2)
f("1.5k", 1_500)
f("1.5m", 1_500_000)
f("1.5g", 1_500_000_000)
f("1.5t", 1_500_000_000_000)
f("1.5K", 1_500)
f("1.5M", 1_500_000)
f("1.5G", 1_500_000_000)
f("1.5T", 1_500_000_000_000)
f("1.5kb", 1_500)
f("1.5mb", 1_500_000)
f("1.5gb", 1_500_000_000)
f("1.5tb", 1_500_000_000_000)
f("1.5Kb", 1_500)
f("1.5Mb", 1_500_000)
f("1.5Gb", 1_500_000_000)
f("1.5Tb", 1_500_000_000_000)
f("1.5KB", 1_500)
f("1.5MB", 1_500_000)
f("1.5GB", 1_500_000_000)
f("1.5TB", 1_500_000_000_000)
f("1.5ki", 1.5*(1<<10))
f("1.5mi", 1.5*(1<<20))
f("1.5gi", 1.5*(1<<30))
f("1.5ti", 1.5*(1<<40))
f("1.5Ki", 1.5*(1<<10))
f("1.5Mi", 1.5*(1<<20))
f("1.5Gi", 1.5*(1<<30))
f("1.5Ti", 1.5*(1<<40))
f("1.5KI", 1.5*(1<<10))
f("1.5MI", 1.5*(1<<20))
f("1.5GI", 1.5*(1<<30))
f("1.5TI", 1.5*(1<<40))
f("1.5kib", 1.5*(1<<10))
f("1.5mib", 1.5*(1<<20))
f("1.5gib", 1.5*(1<<30))
f("1.5tib", 1.5*(1<<40))
f("1.5kiB", 1.5*(1<<10))
f("1.5miB", 1.5*(1<<20))
f("1.5giB", 1.5*(1<<30))
f("1.5tiB", 1.5*(1<<40))
f("1.5KiB", 1.5*(1<<10))
f("1.5MiB", 1.5*(1<<20))
f("1.5GiB", 1.5*(1<<30))
@ -503,6 +459,37 @@ func TestTryParseBytes_Failure(t *testing.T) {
f("123qsb")
f("123sqsb")
f("123s5qsb")
// invalid case for the suffix
f("1b")
f("1k")
f("1m")
f("1g")
f("1t")
f("1kb")
f("1mb")
f("1gb")
f("1tb")
f("1ki")
f("1mi")
f("1gi")
f("1ti")
f("1kib")
f("1mib")
f("1gib")
f("1tib")
f("1KIB")
f("1MIB")
f("1GIB")
f("1TIB")
// fractional number without suffix
f("123.456")
}
func TestTryParseFloat64_Success(t *testing.T) {