This commit is contained in:
Aliaksandr Valialkin 2024-05-05 00:28:01 +02:00
parent f8dcf7be1d
commit bc7dfd5ba4
No known key found for this signature in database
GPG key ID: 52C003EE2BCDB9EB
20 changed files with 1225 additions and 450 deletions

View file

@ -20,12 +20,15 @@ according to [these docs](https://docs.victoriametrics.com/VictoriaLogs/QuickSta
## tip ## tip
* FEATURE: return all the log fields by default in query results. Previously only [`_stream`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#stream-fields), [`_time`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#time-field) and [`_msg`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) fields were returned by default. * FEATURE: return all the log fields by default in query results. Previously only [`_stream`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#stream-fields), [`_time`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#time-field) and [`_msg`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) fields were returned by default.
* FEATURE: add support for returning only the requested log [fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). See [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#querying-specific-fields). * FEATURE: add support for returning only the requested log [fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). See [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#fields-pipe).
* FEATURE: add support for calculating `count()`, `uniq()`, `sum()`, `avg()`, `min()`, `max()` and `uniq_array()` over [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). Grouping by arbitrary set of [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) is supported. See [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#stats) for details. * FEATURE: add support for calculating `count()`, `uniq()`, `sum()`, `avg()`, `min()`, `max()` and `uniq_array()` over [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). Grouping by arbitrary set of [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) is supported. See [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#stats-pipe) for details.
* FEATURE: add support for sorting the returned results. See [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#sort-pipe).
* FEATURE: add support for limiting the number of returned results. See [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#limiters). * FEATURE: add support for limiting the number of returned results. See [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#limiters).
* FEATURE: optimize performance for [LogsQL query](https://docs.victoriametrics.com/victorialogs/logsql/), which contains multiple filters for [words](https://docs.victoriametrics.com/victorialogs/logsql/#word-filter) or [phrases](https://docs.victoriametrics.com/victorialogs/logsql/#phrase-filter) delimited with [`AND` operator](https://docs.victoriametrics.com/victorialogs/logsql/#logical-filter). For example, `foo AND bar` query must find [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with `foo` and `bar` words at faster speed. * FEATURE: add support for copying and renaming the selected log fields. See [these](https://docs.victoriametrics.com/victorialogs/logsql/#copy-pipe) and [these](https://docs.victoriametrics.com/victorialogs/logsql/#rename-pipe) docs.
* FEATURE: allow using `_` inside numbers. For example, `score:range[1_000, 5_000_000]` for [`range` filter](https://docs.victoriametrics.com/victorialogs/logsql/#range-filter). * FEATURE: allow using `_` inside numbers. For example, `score:range[1_000, 5_000_000]` for [`range` filter](https://docs.victoriametrics.com/victorialogs/logsql/#range-filter).
* FEATURE: allow numbers in hexadecimal and binary form. For example, `response_size:range[0xff, 0b10001101101]` for [`range` filter](https://docs.victoriametrics.com/victorialogs/logsql/#range-filter). * FEATURE: allow numbers in hexadecimal and binary form. For example, `response_size:range[0xff, 0b10001101101]` for [`range` filter](https://docs.victoriametrics.com/victorialogs/logsql/#range-filter).
* FEATURE: allow using duration and byte size suffixes in numeric values inside LogsQL queries. See [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#numeric-values).
* FEATURE: optimize performance for [LogsQL query](https://docs.victoriametrics.com/victorialogs/logsql/), which contains multiple filters for [words](https://docs.victoriametrics.com/victorialogs/logsql/#word-filter) or [phrases](https://docs.victoriametrics.com/victorialogs/logsql/#phrase-filter) delimited with [`AND` operator](https://docs.victoriametrics.com/victorialogs/logsql/#logical-filter). For example, `foo AND bar` query must find [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with `foo` and `bar` words at faster speed.
* BUGFIX: prevent from additional CPU usage for up to a few seconds after canceling the query. * BUGFIX: prevent from additional CPU usage for up to a few seconds after canceling the query.
* BUGFIX: prevent from returning log entries with emtpy `_stream` field in the form `"_stream":""` in [search query results](https://docs.victoriametrics.com/victorialogs/querying/). See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6042). * BUGFIX: prevent from returning log entries with emtpy `_stream` field in the form `"_stream":""` in [search query results](https://docs.victoriametrics.com/victorialogs/querying/). See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6042).

View file

@ -19,7 +19,7 @@ It provides the following features:
See [word filter](#word-filter), [phrase filter](#phrase-filter) and [prefix filter](#prefix-filter). See [word filter](#word-filter), [phrase filter](#phrase-filter) and [prefix filter](#prefix-filter).
- Ability to combine filters into arbitrary complex [logical filters](#logical-filter). - Ability to combine filters into arbitrary complex [logical filters](#logical-filter).
- Ability to extract structured fields from unstructured logs at query time. See [these docs](#transformations). - Ability to extract structured fields from unstructured logs at query time. See [these docs](#transformations).
- Ability to calculate various stats over the selected log entries. See [these docs](#stats). - Ability to calculate various stats over the selected log entries. See [these docs](#stats-pipe).
## LogsQL tutorial ## LogsQL tutorial
@ -177,17 +177,22 @@ These words are taken into account by full-text search filters such as
#### Query syntax #### Query syntax
LogsQL query consists of the following parts delimited by `|`: LogsQL query must contain [filters](#filters) for selecting the matching logs. At least a single filter is required.
For example, the following query selects all the logs for the last 5 minutes by using [`_time` filter](#time-filter):
- [Filters](#filters), which select log entries for further processing. This part is required in LogsQL. Other parts are optional. ```logsql
- Optional [stream context](#stream-context), which allows selecting surrounding log lines for the matching log lines. _time:5m
- Optional [transformations](#transformations) for the selected log fields. ```
For example, an additional fields can be extracted or constructed from existing fields.
- Optional [post-filters](#post-filters) for post-filtering of the selected results. For example, post-filtering can filter Additionally to filters, LogQL query may contain arbitrary mix of optional actions for processing the selected logs. These actions are delimited by `|` and are known as `pipes`.
results based on the fields constructed by [transformations](#transformations). For example, the following query uses [`stats` pipe](#stats-pipe) for returning the number of [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field)
- Optional [stats](#stats) transformations, which can calculate various stats across selected results. with the `error` [word](#word) for the last 5 minutes:
- Optional [sorting](#sorting), which can sort the results by the sepcified fields.
- Optional [limiters](#limiters), which can apply various limits on the selected results. ```logsql
_time:5m error | stats count() errors
```
See [the list of supported pipes in LogsQL](#pipes).
## Filters ## Filters
@ -1025,6 +1030,435 @@ Performance tips:
- See [other performance tips](#performance-tips). - See [other performance tips](#performance-tips).
## Pipes
Additionally to [filters](#filters), LogsQL query may contain arbitrary mix of '|'-delimited actions known as `pipes`.
For example, the following query uses [`stats`](#stats-pipe), [`sort`](#sort-pipe) and [`head`](#head-pipe) pipes
for returning top 10 [log streams](https://docs.victoriametrics.com/victorialogs/keyconcepts/#stream-fields)
with the biggest number of logs during the last 5 minutes:
```logsql
_time:5m | stats by (_stream) count() per_stream_logs | sort by (per_stream_logs desc) | head 10
```
LogsQL supports the following pipes:
- [`copy`](#copy-pipe) copies [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model).
- [`delete`](#delete-pipe) deletes [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model).
- [`fields`](#fields-pipe) selects the given set of [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model).
- [`head`](#head-pipe) limits the number selected logs.
- [`rename`](#rename-pipe) renames [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model).
- [`skip`](#skip-pipe) skips the given number of selected logs.
- [`sort`](#sort-pipe) sorts logs by the given [fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model).
- [`stats`](#stats-pipe) calculates various stats over the selected logs.
### copy pipe
If some [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model) must be copied, then `| copy src1 as dst1, ..., srcN as dstN` [pipe](#pipes) can be used.
For example, the following query copies `host` field to `server` for logs over the last 5 minutes, so the output contains both `host` and `server` fields:
```logsq
_time:5m | copy host as server
```
Multiple fields can be copied with a single `| copy ...` pipe. For example, the following query copies
[`_time` field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#time-field) to `timestamp`, while [`_msg` field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field)
is copied to `message`:
```logsql
_time:5m | copy _time as timestmap, _msg as message
```
The `as` keyword is optional.
See also:
- [`rename` pipe](#rename-pipe)
- [`fields` pipe](#fields-pipe)
- [`delete` pipe](#delete-pipe)
### delete pipe
If some [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model) must be deleted, then `| delete field1, ..., fieldN` [pipe](#pipes) can be used.
For example, the following query deletes `host` and `app` fields from the logs over the last 5 minutes:
```logsql
_time:5m | delete host, app
```
See also:
- [`rename` pipe](#rename-pipe)
- [`fields` pipe](#fields-pipe)
### fields pipe
By default all the [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model) are returned in the response.
It is possible to select the given set of log fields with `| fields field1, ..., fieldN` [pipe](#pipes). For example, the following query selects only `host`
and [`_msg`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) fields from logs for the last 5 minutes:
```logsq
_time:5m | fields host, _msg
```
See also:
- [`copy` pipe](#copy-pipe)
- [`rename` pipe](#rename-pipe)
- [`delete` pipe](#delete-pipe)
### head pipe
If only a subset of selected logs must be processed, then `| head N` [pipe](#pipes) can be used. For example, the following query returns up to 100 logs over the last 5 minutes:
```logsql
_time:5m | head 100
```
By default rows are selected in arbitrary order because of performance reasons, so the query above can return different sets of logs every time it is executed.
[`sort` pipe](#sort-pipe) can be used for making sure the logs are in the same order before applying `head ...` to them.
See also:
- [`skip` pipe](#skip-pipe)
### rename pipe
If some [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model) must be renamed, then `| rename src1 as dst1, ..., srcN as dstN` [pipe](#pipes) can be used.
For example, the following query renames `host` field to `server` for logs over the last 5 minutes, so the output contains `server` field instead of `host` field:
```logsql
_time:5m | rename host as server
```
Multiple fields can be renamed with a single `| rename ...` pipe. For example, the following query renames `host` to `instance` and `app` to `job`:
```logsql
_time:5m | rename host as instance, app as job
```
See also:
- [`copy` pipe](#copy-pipe)
- [`fields` pipe](#fields-pipe)
- [`delete` pipe](#delete-pipe)
### skip pipe
If some number of selected logs must be skipped after [`sort`](#sort-pipe), then `| skip N` [pipe](#pipes) can be used. For example, the following query skips the first 100 logs
over the last 5 minutes after soring them by [`_time`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#time-field):
```logsql
_time:5m | sort by (_time) | skip 100
```
Note that skipping rows without sorting has little sense, since they can be returned in arbitrary order because of performance reasons.
Rows can be sorted with [`sort` pipe](#sort-pipe).
See also:
- [`head` pipe](#head-pipe)
### sort pipe
By default logs are selected in arbitrary order because of performance reasons. If logs must be sorted, then `| sort by (field1, ..., fieldN)` [pipe](#pipes) must be used.
For example, the following query returns logs for the last 5 minutes sorted by [`_stream`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#stream-fields)
and then by [`_time`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#time-field):
```logsql
_time:5m | sort by (_stream, _time)
```
Sorting in reverse order is supported - just add `desc` after the given log field. For example, the folliwng query sorts log fields in reverse order of `request_duration_seconds` field:
```logsql
_time:5m | sort by (request_duration_seconds desc)
```
Note that sorting of big number of logs can be slow and can consume a lot of additional memory.
It is recommended limiting the number of logs before sorting with the following approaches:
- Reducing the selected time range with [time filter](#time-filter).
- Using more specific [filters](#filters), so they select less logs.
See also:
- [`stats` pipe](#stats-pipe)
- [`head` pipe](#head-pipe)
- [`skip` pipe](#skip-pipe)
### stats pipe
`| stats ...` pipe allows calculating various stats over the selected logs. For example, the following LogsQL query
uses [`count` stats function](#count-stats) for calculating the number of logs for the last 5 minutes:
```logsql
_time:5m | stats count() logs_total
```
`| stats ...` pipe has the following basic format:
```logsql
... | stats
stats_func1(...) as result_name1,
...
stats_funcN(...) as result_nameN
```
Where `stats_func*` is any of the supported [stats function](#stats-pipe-functions), while `result_name*` is the name of the log field
to store the result of the corresponding stats function. The `as` keyword is optional.
For example, the following query calculates the following stats for logs over the last 5 minutes:
- the number of logs with the help of [`count` stats function](#count-stats);
- the number of unique [log streams](https://docs.victoriametrics.com/victorialogs/keyconcepts/#stream-fields) with the help of [`uniq` stats function](#uniq-stats):
```logsql
_time:5m | stats count() logs_total, uniq(_stream) streams_total
```
See also:
- [`sort` pipe](#sort-pipe)
#### Stats by fields
The following LogsQL syntax can be used for calculating independent stats per group of log fields:
```logsql
... | stats by (field1, ..., fieldM)
stats_func1(...) as result_name1,
...
stats_funcN(...) as result_nameN
```
This calculates `stats_func*` per each `(field1, ..., fieldM)` group of [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
For example, the following query calculates the number of logs and unique ip addresses over the last 5 minutes,
grouped by `(host, path)` fields:
```logsql
_time:5m | stats by (host, path) count() logs_total, uniq(ip) ips_total
```
#### Stats by time buckets
The following syntax can be used for calculating stats grouped by time buckets:
```logsql
... | stats by (_time:step)
stats_func1(...) as result_name1,
...
stats_funcN(...) as result_nameN
```
This calculates `stats_func*` per each `step` of [`_time`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#time-field) field.
The `step` can have any [duration value](#duration-values). For example, the following LogsQL query returns per-minute number of logs and unique ip addresses
over the last 5 minutes:
```
_time:5m | stats by (_time:1m) count() logs_total, uniq(ip) ips_total
```
#### Stats by time buckets with timezone offset
VictoriaLogs stores [`_time`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#time-field) values as [Unix time](https://en.wikipedia.org/wiki/Unix_time)
in nanoseconds. This time corresponds to [UTC](https://en.wikipedia.org/wiki/Coordinated_Universal_Time) time zone. Sometimes it is needed calculating stats
grouped by days or weeks at non-UTC timezone. This is possible with the following syntax:
```logsql
... | stats by (_time:step offset timezone_offset) ...
```
For example, the following query calculates per-day number of logs over the last week, in `UTC+02:00` [time zone](https://en.wikipedia.org/wiki/Time_zone):
```logsql
_time:1w | stats by (_time:1d offset 2h) count() logs_total
```
#### Stats by field buckets
Every log field inside `| stats by (...)` can be bucketed in the same way at `_time` field in [this example](#stats-by-time-buckets).
Any [numeric value](#numeric-values) can be used as `step` value for the bucket. For example, the following query calculates
the number of requests for the last hour, bucketed by 10KB of `request_size_bytes` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model):
```logsql
_time:1h | stats by (request_size_bytes:10KB) count() requests
```
#### Stats by IPv4 buckets
Stats can be bucketed by [log field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) containing [IPv4 addresses](https://en.wikipedia.org/wiki/IP_address)
via the `ip_field_name:/network_mask` syntax inside `by(...)` clause. For example, the following query returns the number of log entries per `/24` subnetwork
extracted from the `ip` [log field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) during the last 5 minutes:
```logsql
_time:5m | stats by (ip:/24) count() requests_per_subnet
```
## stats pipe functions
LogsQL supports the following functions for [`stats` pipe](#stats-pipe):
- [`avg`](#avg-stats) calculates the average value over the given numeric [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
- [`count`](#count-stats) calculates the number of log entries.
- [`max`](#max-stats) calcualtes the maximum value over the given numeric [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
- [`min`](#min-stats) calculates the minumum value over the given numeric [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
- [`sum`](#sum-stats) calculates the sum for the given numeric [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
- [`uniq`](#uniq-stats) calculates the number of unique non-empty values for the given [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
- [`uniq_array`](#uniq_array-stats) returns unique non-empty values for the given [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
### avg stats
`avg(field1, ..., fieldN)` [stats pipe](#stats-pipe) calculates the average value across
all the mentioned [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
Non-numeric values are ignored.
For example, the following query returns the average value for the `duration` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
over logs for the last 5 minutes:
```logsql
_time:5m | stats avg(duration) avg_duration
```
See also:
- [`min`](#min-stats)
- [`max`](#max-stats)
- [`sum`](#sum-stats)
- [`count`](#count-stats)
### count stats
`count()` calculates the number of selected logs.
For example, the following query returns the number of logs over the last 5 minutes:
```logsql
_time:5m | stats count() logs
```
It is possible calculating the number of logs with non-empty values for some [log field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
with the `count(fieldName)` syntax. For example, the following query returns the number of logs with non-empty `username` field over the last 5 minutes:
```logsq
_time:5m | stats count(username) logs_with_username
```
If multiple fields are enumerated inside `count()`, then it counts the number of logs with at least a single non-empty field mentioned inside `count()`.
For example, the following query returns the number of logs with non-empty `username` or `password` [fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
over the last 5 minutes:
```logsql
_time:5m | stats count(username, password) logs_with_username_or_password
```
See also:
- [`sum`](#sum-stats)
- [`avg`](#avg-stats)
### max stats
`max(field1, ..., fieldN)` [stats pipe](#stats-pipe) calculates the maximum value across
all the mentioned [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
Non-numeric values are ignored.
For example, the following query returns the maximum value for the `duration` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
over logs for the last 5 minutes:
```logsql
_time:5m | stats max(duration) max_duration
```
See also:
- [`min`](#min-stats)
- [`avg`](#avg-stats)
- [`sum`](#sum-stats)
- [`count`](#count-stats)
### min stats
`min(field1, ..., fieldN)` [stats pipe](#stats-pipe) calculates the minimum value across
all the mentioned [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
Non-numeric values are ignored.
For example, the following query returns the minimum value for the `duration` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
over logs for the last 5 minutes:
```logsql
_time:5m | stats min(duration) min_duration
```
See also:
- [`max`](#max-stats)
- [`avg`](#avg-stats)
- [`sum`](#sum-stats)
- [`count`](#count-stats)
### sum stats
`sum(field1, ..., fieldN)` [stats pipe](#stats-pipe) calculates the sum of numeric values across
all the mentioned [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
For example, the following query returns the sum of numeric values for the `duration` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
over logs for the last 5 minutes:
```logsql
_time:5m | stats sum(duration) sum_duration
```
See also:
- [`count`](#count-stats)
- [`avg`](#avg-stats)
- [`max`](#max-stats)
- [`min`](#min-stats)
### uniq stats
`uniq(field1, ..., fieldN)` [stats pipe](#stats-pipe) calculates the number of unique non-empty `(field1, ..., fieldN)` tuples.
For example, the following query returns the number of unique non-empty values for `ip` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
over the last 5 minutes:
```logsql
_time:5m | stats uniq(ip) ips
```
The following query returns the number of unique `(host, path)` pairs for the corresponding [fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
over the last 5 minutes:
```logsql
_time:5m | stats uniq(host, path) unique_host_path_pairs
```
See also:
- [`uniq_array`](#uniq_array-stats)
- [`count`](#count-stats)
### uniq_array stats
`uniq_array(field1, ..., fieldN)` [stats pipe](#stats-pipe) returns the unique non-empty values across
the mentioned [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
The returned values are sorted and encoded in JSON array.
For example, the following query returns unique non-empty values for the `ip` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
over logs for the last 5 minutes:
```logsql
_time:5m | stats uniq_array(ip) unique_ips
```
See also:
- [`uniq`](#uniq-stats)
- [`count`](#count-stats)
## Stream context ## Stream context
LogsQL will support the ability to select the given number of surrounding log lines for the selected log lines LogsQL will support the ability to select the given number of surrounding log lines for the selected log lines
@ -1046,11 +1480,9 @@ LogsQL will support the following transformations for the [selected](#filters) l
- Creating a new field from existing [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model) - Creating a new field from existing [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model)
according to the provided format. according to the provided format.
- Creating a new field according to math calculations over existing [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model). - Creating a new field according to math calculations over existing [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model).
- Copying of the existing [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model). - Parsing duration strings into floating-point seconds for further [stats calculations](#stats-pipe).
- Parsing duration strings into floating-point seconds for further [stats calculations](#stats).
- Creating a boolean field with the result of arbitrary [post-filters](#post-filters) applied to the current fields. - Creating a boolean field with the result of arbitrary [post-filters](#post-filters) applied to the current fields.
Boolean fields may be useful for [conditional stats calculation](#stats). - Creating an integer field with the length of the given field value. This can be useful for [stats calculations](#stats-pipe).
- Creating an integer field with the length of the given field value. This can be useful for [stats calculations](#stats).
See the [Roadmap](https://docs.victoriametrics.com/VictoriaLogs/Roadmap.html) for details. See the [Roadmap](https://docs.victoriametrics.com/VictoriaLogs/Roadmap.html) for details.
@ -1069,166 +1501,7 @@ See the [Roadmap](https://docs.victoriametrics.com/VictoriaLogs/Roadmap.html) fo
## Stats ## Stats
### stats functions Stats over the selected logs can be calculated via [`stats` pipe](#stats-pipe).
LogsQL supports the following stats functions:
- [`count`](#count) - calculates the number of log entries.
- [`uniq`](#uniq) - calculates the number of unique non-empty values for the given [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
- [`sum`](#sum) - calculates the sum for the given numeric [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
- [`avg`](#avg) - calculates the average value over the given numeric [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
- [`min`](#min) - calculates the minumum value over the given numeric [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
- [`max`](#max) - calcualtes the maximum value over the given numeric [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
- [`uniq_array`](#uniq_array) - returns unique non-empty values for the given [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
#### count
Examples:
- `error | stats count() as errors_total` returns the number of [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with the `error` [word](#word).
- `error | stats by (_stream) count() as errors_by_stream` returns the number of [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field)
with the `error` [word](#word) grouped by [`_stream`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#stream-fields).
- `error | stats by (datacenter, namespace) count(trace_id, user_id) as errors_with_trace_and_user` returns the number
of [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) containing the `error` [word](#word),
which contain non-empty `trace_id` or `user_id` [fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model), grouped by `datacenter` and `namespace` fields.
See also [`sum`](#sum) and [`avg`](#avg).
#### uniq
Examples:
- `error | stats uniq(client_ip) as unique_ips` returns the number of unique values for `client_ip` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
across [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with the `error` [word](#word).
- `error | stats by (app) uniq(path, host) as unique_path_hosts` - returns the number of unique `(path, host)` pairs
for [field values](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) across [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field)
with the `error` [word](#word), grouped by `app` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
- `error | fields path, host | stats uniq(*) unique_path_hosts` - returns the number of unique `(path, host)` pairs
for [field values](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) across [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field)
with the `error` [word](#word).
See also [`uniq_array`](#uniq_array).
#### sum
Examples:
- `error | stats sum(duration) duration_total` - returns the sum of `duration` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) values
across [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with the `error` [word](#word).
- `GET | stats by (path) sum(response_size) response_size_sum` - returns the sum of `response_size` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) values
across [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with the `GET` [word](#word), grouped
by `path` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) value.
See also [count](#count) and [avg](#avg).
#### avg
Examples:
- `error | stats avg(duration) duration_avg` - returns the average value for the `duration` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
across [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with the `error` [word](#word).
- `GET | stats by (path) avg(response_size) avg_response_size` - returns the average value for the `response_size` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
across [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with the `GET` [word](#word), grouped
by `path` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) value.
See also [sum](#sum) and [count](#count).
#### max
Examples:
- `error | stats max(duration) duration_max` - returns the maximum value for the `duration` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
across [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with the `error` [word](#word).
- `GET | stats by (path) max(response_size) max_response_size` - returns the maximum value for the `response_size` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
across [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with the `GET` [word](#word), grouped
by `path` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) value.
See also [min](#min).
#### min
Examples:
- `error | stats min(duration) duration_min` - returns the minimum value for the `duration` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
across [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with the `error` [word](#word).
- `GET | stats by (path) min(response_size) min_response_size` - returns the minimum value for the `response_size` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
across [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with the `GET` [word](#word), grouped
by `path` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) value.
See also [max](#max).
#### uniq_array
Examples:
- `_time:1h | stats uniq_array(client_ip) as unique_ips` returns unique values for `client_ip` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
across logs for the last hour. The unqiue values are returned in JSON array such as `["1.2.4.5","5.6.7.8"]`.
- `_time:1h | stats by (host) unique_array(path) as unique_paths` returns unique values for `path` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
across logs for the last hour, grouped by `host` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
See also [uniq](#uniq) and [count](#count).
### Grouping stats by buckets
#### Time buckets
Stats can be bucketed by [`_time`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#time-field) with the `_time:bucket_duration` syntax inside `by(...)` clause.
For example, the following query returns per-minute number of log messages with the `error` [word](#word) for the last 10 minutes:
```logsql
_time:10m error | stats by (_time:1m) count() errors_per_minute
```
It is possible to add offset (for example, [timezone offset](https://en.wikipedia.org/wiki/UTC_offset)) when bucketing by `_time`. For example, the following query calculates
the number of per-day log entries for the last week at '2h' offset aka `UTC+02:00` offset:
```logsql
_time:1w | stats by (_time:1d offset 2h) count() logs_per_day_kyiv_offset
```
#### Numeric buckets
Stats can be bucketed by any numeric [log field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) with the `field_name:bucket_size` syntax inside `by(...)` clause.
For example, the following query returns the number of log messages with the `status=200` [phrase](#phrase-filter) bucketed by `request_duration_seconds` numeric field with `0.5` step:
```logsql
_time:10m "status=200" | stats by (request_duration_seconds:0.5) count() requests
```
The `bucket_size` can contain the following convenient suffixes:
- `KB` - the `bucket_size` is multiplied by `1000` in this case. For example, `10KB`.
- `MB` - the `bucket_size` is multiplied by `1_000_000` in this case. For example, `10MB`.
- `GB` - the `bucket_size` is multiplied by `1_000_000_000` in this case. For example, `10GB`.
- `TB` - the `bucket_size` is multiplied by `1_000_000_000_000` in this case. For example, `10TB`.
- `KiB` - the `bucket_size` is multiplied by `1024` in this case. For example, `10KiB`.
- `MiB` - the `bucket_size` is multiplied by `1024*1024` in this case. For example, `10MiB`.
- `GiB` - the `bucket_size` is multiplied by `1024*1024*1024` in this case. For example, `10GiB`.
- `TiB` - the `bucket_size` is multiplied by `1024*1024*1024*1024` in this case. For example, `10TiB`.
#### IPv4 mask buckets
Stats can be bucketed by [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) with [IPv4 addresses](https://en.wikipedia.org/wiki/IP_address)
via the `ip_field_name:/network_mask` syntax inside `by(...)` clause. For example, the following query returns the number of log entries per `/24` subnetwork during the last 10 minutes:
```logsql
_time:10m | stats by (ip:/24) count() requests_per_subnet
```
### Calculating multiple stats
Stats calculations can be combined. For example, the following query calculates the number of log messages with the `error` [word](#word),
the number of unique values for `ip` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) and the sum of `duration`
[field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model), grouped by `namespace` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model):
```logsql
error | stats by (namespace)
count() as errors_total,
uniq(ip) as unique_ips,
sum(duration) as duration_sum
```
### Stats TODO
LogsQL will support calculating the following additional stats based on the [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model) LogsQL will support calculating the following additional stats based on the [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model)
and fields created by [transformations](#transformations): and fields created by [transformations](#transformations):
@ -1238,52 +1511,70 @@ and fields created by [transformations](#transformations):
It will be possible specifying an optional condition [filter](#post-filters) when calculating the stats. It will be possible specifying an optional condition [filter](#post-filters) when calculating the stats.
For example, `sum(response_size) if (is_admin:true)` calculates the total response size for admins only. For example, `sum(response_size) if (is_admin:true)` calculates the total response size for admins only.
It will be possible to group stats by the specified time buckets.
It is possible to perform stats calculations on the [selected log entries](#filters) at client side with `sort`, `uniq`, etc. Unix commands It is possible to perform stats calculations on the [selected log entries](#filters) at client side with `sort`, `uniq`, etc. Unix commands
according to [these docs](https://docs.victoriametrics.com/VictoriaLogs/querying/#command-line). according to [these docs](https://docs.victoriametrics.com/VictoriaLogs/querying/#command-line).
See the [Roadmap](https://docs.victoriametrics.com/VictoriaLogs/Roadmap.html) for details.
## Sorting ## Sorting
By default VictoriaLogs sorts the returned results by [`_time` field](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#time-field) By default VictoriaLogs sorts the returned results by [`_time` field](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#time-field)
if their total size doesn't exceed `-select.maxSortBufferSize` command-line value (by default it is set to one megabytes). if their total size doesn't exceed `-select.maxSortBufferSize` command-line value (by default it is set to 1MB).
Otherwise sorting is skipped because of performance and efficiency concerns described [here](https://docs.victoriametrics.com/VictoriaLogs/querying/). Otherwise sorting is skipped because of performance reasons.
It is possible to sort the [selected log entries](#filters) at client side with `sort` Unix command Use [`sort` pipe](#sort-pipe) for sorting the results.
according to [these docs](https://docs.victoriametrics.com/VictoriaLogs/querying/#command-line).
LogsQL will support results' sorting by the given set of [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model).
See the [Roadmap](https://docs.victoriametrics.com/VictoriaLogs/Roadmap.html) for details.
## Limiters ## Limiters
LogsQL provides the following functionality for limiting the number of returned log entries: LogsQL provides the following [pipes](#pipes) for limiting the number of returned log entries:
- `error | head 10` - returns up to 10 log entries with the `error` [word](#word). - [`fields`](#fields-pipe) and [`delete`](#delete-pipe) pipes allow limiting the set of [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model) to return.
- `error | skip 10` - skips the first 10 log entris with the `error` [word](#word). - [`head` pipe](#head-pipe) allows limiting the number of log entries to return.
It is recommended [sorting](#sorting) entries before limiting the number of returned log entries,
in order to get consistent results.
It is possible to limit the returned results with `head`, `tail`, `less`, etc. Unix commands
according to [these docs](https://docs.victoriametrics.com/VictoriaLogs/querying/#command-line).
See the [Roadmap](https://docs.victoriametrics.com/VictoriaLogs/Roadmap.html) for details.
## Querying specific fields ## Querying specific fields
By default VictoriaLogs query response contains all the [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). Specific log fields can be queried via [`fields` pipe](#fields-pipe).
If you want selecting some specific fields, then add `| fields field1, field2, ... fieldN` to the query. ## Numeric values
For example, the following query returns only [`_time`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#time-field),
[`_stream`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#stream-fields), `host` and [`_msg`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) fields:
```logsql LogsQL accepts numeric values in the following formats:
error | fields _time, _stream, host, _msg
``` - regular integers like `12345` or `-12345`
- regular floating point numbers like `0.123` or `-12.34`
- [short numeric format](#short-numeric-values)
- [duration format](#duration-values)
### Short numeric values
LogsQL accepts integer and floating point values with the following suffixes:
- `K` and `KB` - the value is multiplied by `10^3`
- `M` and `MB` - the value is multiplied by `10^6`
- `G` and `GB` - the value is multiplied by `10^9`
- `T` and `TB` - the value is multiplied by `10^12`
- `Ki` and `KiB` - the value is multiplied by `2^10`
- `Mi` and `MiB` - the value is multiplied by `2^20`
- `Gi` and `GiB` - the value is multiplied by `2^30`
- `Ti` and `TiB` - the value is multiplied by `2^40`
All the numbers may contain `_` delimiters, which may improve readability of the query. For example, `1_234_567` is equivalent to `1234567`,
while `1.234_567` is equivalent to `1.234567`.
## Duration values
LogsQL accepts duration values with the following suffixes at places where the duration is allowed:
- `ns` - nanoseconds. For example, `123ns`.
- `µs` - microseconds. For example, `1.23µs`.
- `ms` - milliseconds. For example, `1.23456ms`
- `s` - seconds. For example, `1.234s`
- `m` - minutes. For example, `1.5m`
- `h` - hours. For example, `1.5h`
- `d` - days. For example, `1.5d`
- `w` - weeks. For example, `1w`
- `y` - years as 365 days. For example, `1.5y`
Multiple durations can be combined. For example, `1h33m55s`.
Internally duration values are converted into nanoseconds.
## Performance tips ## Performance tips

View file

@ -30,6 +30,7 @@ See [these docs](https://docs.victoriametrics.com/VictoriaLogs/) for details.
The following functionality is planned in the future versions of VictoriaLogs: The following functionality is planned in the future versions of VictoriaLogs:
- Support for [data ingestion](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/) from popular log collectors and formats: - Support for [data ingestion](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/) from popular log collectors and formats:
- OpenTelemetry for logs
- Fluentd - Fluentd
- Syslog - Syslog
- Journald (systemd) - Journald (systemd)
@ -37,9 +38,6 @@ The following functionality is planned in the future versions of VictoriaLogs:
- [Stream context](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html#stream-context). - [Stream context](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html#stream-context).
- [Transformation functions](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html#transformations). - [Transformation functions](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html#transformations).
- [Post-filtering](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html#post-filters). - [Post-filtering](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html#post-filters).
- [Stats calculations](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html#stats).
- [Sorting](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html#sorting).
- [Limiters](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html#limiters).
- The ability to use subqueries inside [in()](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html#multi-exact-filter) function. - The ability to use subqueries inside [in()](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html#multi-exact-filter) function.
- Live tailing for [LogsQL filters](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html#filters) aka `tail -f`. - Live tailing for [LogsQL filters](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html#filters) aka `tail -f`.
- Web UI with the following abilities: - Web UI with the following abilities:

View file

@ -3,6 +3,7 @@ package logstorage
import ( import (
"encoding/binary" "encoding/binary"
"math" "math"
"slices"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil" "github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/decimal" "github.com/VictoriaMetrics/VictoriaMetrics/lib/decimal"
@ -121,9 +122,6 @@ func (br *blockResult) fetchAllColumns(bs *blockSearch, bm *bitmap) {
func (br *blockResult) fetchRequestedColumns(bs *blockSearch, bm *bitmap) { func (br *blockResult) fetchRequestedColumns(bs *blockSearch, bm *bitmap) {
for _, columnName := range bs.bsw.so.resultColumnNames { for _, columnName := range bs.bsw.so.resultColumnNames {
if columnName == "" {
columnName = "_msg"
}
switch columnName { switch columnName {
case "_stream": case "_stream":
if !br.addStreamColumn(bs) { if !br.addStreamColumn(bs) {
@ -275,10 +273,7 @@ func (br *blockResult) addColumn(bs *blockSearch, ch *columnHeader, bm *bitmap)
} }
dictValues = valuesBuf[valuesBufLen:] dictValues = valuesBuf[valuesBufLen:]
name := ch.name name := getCanonicalColumnName(ch.name)
if name == "" {
name = "_msg"
}
br.cs = append(br.cs, blockResultColumn{ br.cs = append(br.cs, blockResultColumn{
name: name, name: name,
valueType: ch.valueType, valueType: ch.valueType,
@ -425,6 +420,7 @@ func (br *blockResult) getBucketedTimestampValues(bucketSize, bucketOffset float
timestamp := timestamps[i] timestamp := timestamps[i]
timestamp -= bucketOffsetInt timestamp -= bucketOffsetInt
timestamp -= timestamp % bucketSizeInt timestamp -= timestamp % bucketSizeInt
timestamp += bucketOffsetInt
if i > 0 && timestamp == prevTimestamp { if i > 0 && timestamp == prevTimestamp {
valuesBuf = append(valuesBuf, s) valuesBuf = append(valuesBuf, s)
continue continue
@ -516,6 +512,7 @@ func (br *blockResult) getBucketedUint8Values(encodedValues []string, bucketSize
n := uint64(v[0]) n := uint64(v[0])
n -= bucketOffsetInt n -= bucketOffsetInt
n -= n % bucketSizeInt n -= n % bucketSizeInt
n += bucketOffsetInt
if i > 0 && n == nPrev { if i > 0 && n == nPrev {
valuesBuf = append(valuesBuf, s) valuesBuf = append(valuesBuf, s)
continue continue
@ -570,6 +567,7 @@ func (br *blockResult) getBucketedUint16Values(encodedValues []string, bucketSiz
n := uint64(encoding.UnmarshalUint16(b)) n := uint64(encoding.UnmarshalUint16(b))
n -= bucketOffsetInt n -= bucketOffsetInt
n -= n % bucketSizeInt n -= n % bucketSizeInt
n += bucketOffsetInt
if i > 0 && n == nPrev { if i > 0 && n == nPrev {
valuesBuf = append(valuesBuf, s) valuesBuf = append(valuesBuf, s)
continue continue
@ -624,6 +622,7 @@ func (br *blockResult) getBucketedUint32Values(encodedValues []string, bucketSiz
n := uint64(encoding.UnmarshalUint32(b)) n := uint64(encoding.UnmarshalUint32(b))
n -= bucketOffsetInt n -= bucketOffsetInt
n -= n % bucketSizeInt n -= n % bucketSizeInt
n += bucketOffsetInt
if i > 0 && n == nPrev { if i > 0 && n == nPrev {
valuesBuf = append(valuesBuf, s) valuesBuf = append(valuesBuf, s)
continue continue
@ -678,6 +677,7 @@ func (br *blockResult) getBucketedUint64Values(encodedValues []string, bucketSiz
n := encoding.UnmarshalUint64(b) n := encoding.UnmarshalUint64(b)
n -= bucketOffsetInt n -= bucketOffsetInt
n -= n % bucketSizeInt n -= n % bucketSizeInt
n += bucketOffsetInt
if i > 0 && n == nPrev { if i > 0 && n == nPrev {
valuesBuf = append(valuesBuf, s) valuesBuf = append(valuesBuf, s)
continue continue
@ -742,6 +742,8 @@ func (br *blockResult) getBucketedFloat64Values(encodedValues []string, bucketSi
fP10 -= fP10 % bucketSizeP10 fP10 -= fP10 % bucketSizeP10
f = float64(fP10) / p10 f = float64(fP10) / p10
f += bucketOffset
if i > 0 && f == fPrev { if i > 0 && f == fPrev {
valuesBuf = append(valuesBuf, s) valuesBuf = append(valuesBuf, s)
continue continue
@ -794,6 +796,7 @@ func (br *blockResult) getBucketedIPv4Values(encodedValues []string, bucketSize,
n := binary.BigEndian.Uint32(b) n := binary.BigEndian.Uint32(b)
n -= bucketOffsetInt n -= bucketOffsetInt
n -= n % bucketSizeInt n -= n % bucketSizeInt
n += bucketOffsetInt
if i > 0 && n == nPrev { if i > 0 && n == nPrev {
valuesBuf = append(valuesBuf, s) valuesBuf = append(valuesBuf, s)
continue continue
@ -850,6 +853,7 @@ func (br *blockResult) getBucketedTimestampISO8601Values(encodedValues []string,
n := encoding.UnmarshalUint64(b) n := encoding.UnmarshalUint64(b)
n -= bucketOffsetInt n -= bucketOffsetInt
n -= n % bucketSizeInt n -= n % bucketSizeInt
n += bucketOffsetInt
if i > 0 && n == nPrev { if i > 0 && n == nPrev {
valuesBuf = append(valuesBuf, s) valuesBuf = append(valuesBuf, s)
continue continue
@ -887,6 +891,7 @@ func (br *blockResult) getBucketedValue(s string, bucketSize, bucketOffset float
if f, ok := tryParseFloat64(s); ok { if f, ok := tryParseFloat64(s); ok {
f -= bucketOffset f -= bucketOffset
// emulate f % bucketSize for float64 values // emulate f % bucketSize for float64 values
_, e := decimal.FromFloat(bucketSize) _, e := decimal.FromFloat(bucketSize)
p10 := math.Pow10(int(-e)) p10 := math.Pow10(int(-e))
@ -894,6 +899,8 @@ func (br *blockResult) getBucketedValue(s string, bucketSize, bucketOffset float
fP10 -= fP10 % int64(bucketSize*p10) fP10 -= fP10 % int64(bucketSize*p10)
f = float64(fP10) / p10 f = float64(fP10) / p10
f += bucketOffset
bufLen := len(br.buf) bufLen := len(br.buf)
br.buf = marshalFloat64(br.buf, f) br.buf = marshalFloat64(br.buf, f)
return bytesutil.ToUnsafeString(br.buf[bufLen:]) return bytesutil.ToUnsafeString(br.buf[bufLen:])
@ -902,6 +909,7 @@ func (br *blockResult) getBucketedValue(s string, bucketSize, bucketOffset float
if nsecs, ok := tryParseTimestampISO8601(s); ok { if nsecs, ok := tryParseTimestampISO8601(s); ok {
nsecs -= int64(bucketOffset) nsecs -= int64(bucketOffset)
nsecs -= nsecs % int64(bucketSize) nsecs -= nsecs % int64(bucketSize)
nsecs += int64(bucketOffset)
bufLen := len(br.buf) bufLen := len(br.buf)
br.buf = marshalTimestampISO8601(br.buf, nsecs) br.buf = marshalTimestampISO8601(br.buf, nsecs)
return bytesutil.ToUnsafeString(br.buf[bufLen:]) return bytesutil.ToUnsafeString(br.buf[bufLen:])
@ -910,6 +918,7 @@ func (br *blockResult) getBucketedValue(s string, bucketSize, bucketOffset float
if nsecs, ok := tryParseTimestampRFC3339Nano(s); ok { if nsecs, ok := tryParseTimestampRFC3339Nano(s); ok {
nsecs -= int64(bucketOffset) nsecs -= int64(bucketOffset)
nsecs -= nsecs % int64(bucketSize) nsecs -= nsecs % int64(bucketSize)
nsecs += int64(bucketOffset)
bufLen := len(br.buf) bufLen := len(br.buf)
br.buf = marshalTimestampRFC3339Nano(br.buf, nsecs) br.buf = marshalTimestampRFC3339Nano(br.buf, nsecs)
return bytesutil.ToUnsafeString(br.buf[bufLen:]) return bytesutil.ToUnsafeString(br.buf[bufLen:])
@ -918,6 +927,7 @@ func (br *blockResult) getBucketedValue(s string, bucketSize, bucketOffset float
if n, ok := tryParseIPv4(s); ok { if n, ok := tryParseIPv4(s); ok {
n -= uint32(int32(bucketOffset)) n -= uint32(int32(bucketOffset))
n -= n % uint32(bucketSize) n -= n % uint32(bucketSize)
n += uint32(int32(bucketOffset))
bufLen := len(br.buf) bufLen := len(br.buf)
br.buf = marshalIPv4(br.buf, n) br.buf = marshalIPv4(br.buf, n)
return bytesutil.ToUnsafeString(br.buf[bufLen:]) return bytesutil.ToUnsafeString(br.buf[bufLen:])
@ -926,6 +936,7 @@ func (br *blockResult) getBucketedValue(s string, bucketSize, bucketOffset float
if nsecs, ok := tryParseDuration(s); ok { if nsecs, ok := tryParseDuration(s); ok {
nsecs -= int64(bucketOffset) nsecs -= int64(bucketOffset)
nsecs -= nsecs % int64(bucketSize) nsecs -= nsecs % int64(bucketSize)
nsecs += int64(bucketOffset)
bufLen := len(br.buf) bufLen := len(br.buf)
br.buf = marshalDuration(br.buf, nsecs) br.buf = marshalDuration(br.buf, nsecs)
return bytesutil.ToUnsafeString(br.buf[bufLen:]) return bytesutil.ToUnsafeString(br.buf[bufLen:])
@ -942,7 +953,69 @@ func (br *blockResult) addEmptyStringColumn(columnName string) {
}) })
} }
func (br *blockResult) updateColumns(columnNames []string) { // copyColumns copies columns from srcColumnNames to dstColumnNames.
func (br *blockResult) copyColumns(srcColumnNames, dstColumnNames []string) {
if len(srcColumnNames) == 0 {
return
}
cs := br.cs
csOffset := len(cs)
for _, c := range br.getColumns() {
if idx := slices.Index(srcColumnNames, c.name); idx >= 0 {
c.name = dstColumnNames[idx]
cs = append(cs, c)
// continue is skipped intentionally in order to leave the original column in the columns list.
}
if !slices.Contains(dstColumnNames, c.name) {
cs = append(cs, c)
}
}
br.csOffset = csOffset
br.cs = cs
}
// renameColumns renames columns from srcColumnNames to dstColumnNames.
func (br *blockResult) renameColumns(srcColumnNames, dstColumnNames []string) {
if len(srcColumnNames) == 0 {
return
}
cs := br.cs
csOffset := len(cs)
for _, c := range br.getColumns() {
if idx := slices.Index(srcColumnNames, c.name); idx >= 0 {
c.name = dstColumnNames[idx]
cs = append(cs, c)
continue
}
if !slices.Contains(dstColumnNames, c.name) {
cs = append(cs, c)
}
}
br.csOffset = csOffset
br.cs = cs
}
// deleteColumns deletes columns with the given columnNames.
func (br *blockResult) deleteColumns(columnNames []string) {
if len(columnNames) == 0 {
return
}
cs := br.cs
csOffset := len(cs)
for _, c := range br.getColumns() {
if !slices.Contains(columnNames, c.name) {
cs = append(cs, c)
}
}
br.csOffset = csOffset
br.cs = cs
}
// setColumns sets the resulting columns to the given columnNames.
func (br *blockResult) setColumns(columnNames []string) {
if br.areSameColumns(columnNames) { if br.areSameColumns(columnNames) {
// Fast path - nothing to change. // Fast path - nothing to change.
return return
@ -973,10 +1046,6 @@ func (br *blockResult) areSameColumns(columnNames []string) bool {
} }
func (br *blockResult) getColumnByName(columnName string) blockResultColumn { func (br *blockResult) getColumnByName(columnName string) blockResultColumn {
if columnName == "" {
columnName = "_msg"
}
cs := br.getColumns() cs := br.getColumns()
// iterate columns in reverse order, so overridden column results are returned instead of original column results. // iterate columns in reverse order, so overridden column results are returned instead of original column results.
@ -1110,37 +1179,6 @@ func (c *blockResultColumn) addValue(v string) {
c.values = c.valuesBuf c.values = c.valuesBuf
} }
// getEncodedValues returns encoded values for the given column.
//
// The returned encoded values are valid until br.reset() is called.
func (c *blockResultColumn) getEncodedValues(br *blockResult) []string {
if c.encodedValues != nil {
return c.encodedValues
}
if !c.isTime {
logger.Panicf("BUG: encodedValues may be missing only for _time column; got %q column", c.name)
}
buf := br.buf
valuesBuf := br.valuesBuf
valuesBufLen := len(valuesBuf)
for _, timestamp := range br.timestamps {
bufLen := len(buf)
buf = encoding.MarshalInt64(buf, timestamp)
s := bytesutil.ToUnsafeString(buf[bufLen:])
valuesBuf = append(valuesBuf, s)
}
c.encodedValues = valuesBuf[valuesBufLen:]
br.valuesBuf = valuesBuf
br.buf = buf
return c.encodedValues
}
// getValueAtRow returns value for the value at the given rowIdx. // getValueAtRow returns value for the value at the given rowIdx.
// //
// The returned value is valid until br.reset() is called. // The returned value is valid until br.reset() is called.

View file

@ -202,15 +202,64 @@ func (q *Query) String() string {
} }
func (q *Query) getResultColumnNames() []string { func (q *Query) getResultColumnNames() []string {
for _, p := range q.pipes { input := []string{"*"}
switch t := p.(type) {
case *pipeFields: pipes := q.pipes
return t.fields for i := len(pipes) - 1; i >= 0; i-- {
case *pipeStats: fields, m := pipes[i].getNeededFields()
return t.neededFields() if len(fields) == 0 {
input = nil
}
if len(input) == 0 {
break
}
// transform upper input fields to the current input fields according to the given mapping.
if input[0] != "*" {
var dst []string
for _, f := range input {
if a, ok := m[f]; ok {
dst = append(dst, a...)
} else {
dst = append(dst, f)
}
}
input = normalizeFields(dst)
}
// intersect fields with input
if fields[0] != "*" {
m := make(map[string]struct{})
for _, f := range input {
m[f] = struct{}{}
}
var dst []string
for _, f := range fields {
if _, ok := m[f]; ok {
dst = append(dst, f)
}
}
input = normalizeFields(dst)
} }
} }
return []string{"*"}
return input
}
func normalizeFields(a []string) []string {
m := make(map[string]struct{}, len(a))
dst := make([]string, 0, len(a))
for _, s := range a {
if s == "*" {
return []string{"*"}
}
if _, ok := m[s]; ok {
continue
}
m[s] = struct{}{}
dst = append(dst, s)
}
return dst
} }
// ParseQuery parses s. // ParseQuery parses s.
@ -522,14 +571,17 @@ func parseFilterLenRange(lex *lexer, fieldName string) (filter, error) {
if len(args) != 2 { if len(args) != 2 {
return nil, fmt.Errorf("unexpected number of args for %s(); got %d; want 2", funcName, len(args)) return nil, fmt.Errorf("unexpected number of args for %s(); got %d; want 2", funcName, len(args))
} }
minLen, err := parseUint(args[0]) minLen, err := parseUint(args[0])
if err != nil { if err != nil {
return nil, fmt.Errorf("cannot parse minLen at %s(): %w", funcName, err) return nil, fmt.Errorf("cannot parse minLen at %s(): %w", funcName, err)
} }
maxLen, err := parseUint(args[1]) maxLen, err := parseUint(args[1])
if err != nil { if err != nil {
return nil, fmt.Errorf("cannot parse maxLen at %s(): %w", funcName, err) return nil, fmt.Errorf("cannot parse maxLen at %s(): %w", funcName, err)
} }
stringRepr := "(" + args[0] + ", " + args[1] + ")" stringRepr := "(" + args[0] + ", " + args[1] + ")"
fr := &filterLenRange{ fr := &filterLenRange{
fieldName: fieldName, fieldName: fieldName,
@ -739,16 +791,17 @@ func parseFilterRange(lex *lexer, fieldName string) (filter, error) {
func parseFloat64(lex *lexer) (float64, string, error) { func parseFloat64(lex *lexer) (float64, string, error) {
s := getCompoundToken(lex) s := getCompoundToken(lex)
f, err := strconv.ParseFloat(s, 64) f, err := strconv.ParseFloat(s, 64)
if err != nil { if err == nil {
// Try parsing s as integer. return f, s, nil
// This handles 0x..., 0b... and 0... prefixes.
n, err := parseInt(s)
if err == nil {
return float64(n), s, nil
}
return 0, "", fmt.Errorf("cannot parse %q as float64: %w", lex.token, err)
} }
return f, s, nil
// Try parsing s as integer.
// This handles 0x..., 0b... and 0... prefixes, alongside '_' delimiters.
n, err := parseInt(s)
if err == nil {
return float64(n), s, nil
}
return 0, "", fmt.Errorf("cannot parse %q as float64: %w", lex.token, err)
} }
func parseFuncArg(lex *lexer, fieldName string, callback func(args string) (filter, error)) (filter, error) { func parseFuncArg(lex *lexer, fieldName string, callback func(args string) (filter, error)) (filter, error) {
@ -1184,7 +1237,22 @@ func parseUint(s string) (uint64, error) {
if strings.EqualFold(s, "inf") || strings.EqualFold(s, "+inf") { if strings.EqualFold(s, "inf") || strings.EqualFold(s, "+inf") {
return math.MaxUint64, nil return math.MaxUint64, nil
} }
return strconv.ParseUint(s, 0, 64)
n, err := strconv.ParseUint(s, 0, 64)
if err == nil {
return n, nil
}
nn, ok := tryParseBytes(s)
if !ok {
nn, ok = tryParseDuration(s)
if !ok {
return 0, fmt.Errorf("cannot parse %q as unsigned integer: %w", s, err)
}
if nn < 0 {
return 0, fmt.Errorf("cannot parse negative value %q as unsigned integer", s)
}
}
return uint64(nn), nil
} }
func parseInt(s string) (int64, error) { func parseInt(s string) (int64, error) {
@ -1193,7 +1261,18 @@ func parseInt(s string) (int64, error) {
return math.MaxInt64, nil return math.MaxInt64, nil
case strings.EqualFold(s, "-inf"): case strings.EqualFold(s, "-inf"):
return math.MinInt64, nil return math.MinInt64, nil
default:
return strconv.ParseInt(s, 0, 64)
} }
n, err := strconv.ParseInt(s, 0, 64)
if err == nil {
return n, nil
}
nn, ok := tryParseBytes(s)
if !ok {
nn, ok = tryParseDuration(s)
if !ok {
return 0, fmt.Errorf("cannot parse %q as integer: %w", s, err)
}
}
return nn, nil
} }

View file

@ -3,6 +3,7 @@ package logstorage
import ( import (
"math" "math"
"reflect" "reflect"
"slices"
"testing" "testing"
"time" "time"
) )
@ -540,6 +541,12 @@ func TestParseRangeFilter(t *testing.T) {
f(`:range(1, 2)`, ``, math.Nextafter(1, math.Inf(1)), math.Nextafter(2, math.Inf(-1))) f(`:range(1, 2)`, ``, math.Nextafter(1, math.Inf(1)), math.Nextafter(2, math.Inf(-1)))
f(`range[1, 2)`, ``, 1, math.Nextafter(2, math.Inf(-1))) f(`range[1, 2)`, ``, 1, math.Nextafter(2, math.Inf(-1)))
f(`range("1", 2]`, ``, math.Nextafter(1, math.Inf(1)), 2) f(`range("1", 2]`, ``, math.Nextafter(1, math.Inf(1)), 2)
f(`response_size:range[1KB, 10MiB]`, `response_size`, 1_000, 10*(1<<20))
f(`response_size:range[1G, 10Ti]`, `response_size`, 1_000_000_000, 10*(1<<40))
f(`response_size:range[10, inf]`, `response_size`, 10, math.Inf(1))
f(`duration:range[100ns, 1y2w2.5m3s5ms]`, `duration`, 100, 1*nsecsPerYear+2*nsecsPerWeek+2.5*nsecsPerMinute+3*nsecsPerSecond+5*nsecsPerMillisecond)
} }
func TestParseQuerySuccess(t *testing.T) { func TestParseQuerySuccess(t *testing.T) {
@ -749,6 +756,7 @@ func TestParseQuerySuccess(t *testing.T) {
f(`len_range(10, +InF)`, `len_range(10, +InF)`) f(`len_range(10, +InF)`, `len_range(10, +InF)`)
f(`len_range(10, 1_000_000)`, `len_range(10, 1_000_000)`) f(`len_range(10, 1_000_000)`, `len_range(10, 1_000_000)`)
f(`len_range(0x10,0b100101)`, `len_range(0x10, 0b100101)`) f(`len_range(0x10,0b100101)`, `len_range(0x10, 0b100101)`)
f(`len_range(1.5KB, 22MB100KB)`, `len_range(1.5KB, 22MB100KB)`)
// range filter // range filter
f(`range(1.234, 5656.43454)`, `range(1.234, 5656.43454)`) f(`range(1.234, 5656.43454)`, `range(1.234, 5656.43454)`)
@ -760,6 +768,7 @@ func TestParseQuerySuccess(t *testing.T) {
f(`range(1_000, 0o7532)`, `range(1_000, 0o7532)`) f(`range(1_000, 0o7532)`, `range(1_000, 0o7532)`)
f(`range(0x1ff, inf)`, `range(0x1ff, inf)`) f(`range(0x1ff, inf)`, `range(0x1ff, inf)`)
f(`range(-INF,+inF)`, `range(-INF, +inF)`) f(`range(-INF,+inF)`, `range(-INF, +inF)`)
f(`range(1.5K, 22.5GiB)`, `range(1.5K, 22.5GiB)`)
// re filter // re filter
f("re('foo|ba(r.+)')", `re("foo|ba(r.+)")`) f("re('foo|ba(r.+)')", `re("foo|ba(r.+)")`)
@ -816,19 +825,34 @@ func TestParseQuerySuccess(t *testing.T) {
f(`foo | fields bar`, `foo | fields bar`) f(`foo | fields bar`, `foo | fields bar`)
f(`foo|FIELDS bar,Baz , "a,b|c"`, `foo | fields bar, Baz, "a,b|c"`) f(`foo|FIELDS bar,Baz , "a,b|c"`, `foo | fields bar, Baz, "a,b|c"`)
f(`foo | Fields x.y, "abc:z/a", _b$c`, `foo | fields x.y, "abc:z/a", "_b$c"`) f(`foo | Fields x.y, "abc:z/a", _b$c`, `foo | fields x.y, "abc:z/a", "_b$c"`)
f(`foo | fields "", a`, `foo | fields _msg, a`)
// multiple fields pipes // multiple fields pipes
f(`foo | fields bar | fields baz, abc`, `foo | fields bar | fields baz, abc`) f(`foo | fields bar | fields baz, abc`, `foo | fields bar | fields baz, abc`)
// copy pipe
f(`* | copy foo as bar`, `* | copy foo as bar`)
f(`* | COPY foo as bar, x y | Copy a as b`, `* | copy foo as bar, x as y | copy a as b`)
// rename pipe
f(`* | rename foo as bar`, `* | rename foo as bar`)
f(`* | RENAME foo AS bar, x y | Rename a as b`, `* | rename foo as bar, x as y | rename a as b`)
// delete pipe
f(`* | delete foo`, `* | delete foo`)
f(`* | DELETE foo, bar`, `* | delete foo, bar`)
// head pipe // head pipe
f(`foo | head 10`, `foo | head 10`) f(`foo | head 10`, `foo | head 10`)
f(`foo | HEAD 1123432`, `foo | head 1123432`) f(`foo | HEAD 1_123_432`, `foo | head 1123432`)
f(`foo | head 10K`, `foo | head 10000`)
// multiple head pipes // multiple head pipes
f(`foo | head 100 | head 10 | head 234`, `foo | head 100 | head 10 | head 234`) f(`foo | head 100 | head 10 | head 234`, `foo | head 100 | head 10 | head 234`)
// skip pipe // skip pipe
f(`foo | skip 10`, `foo | skip 10`) f(`foo | skip 10`, `foo | skip 10`)
f(`foo | skip 12_345M`, `foo | skip 12345000000`)
// multiple skip pipes // multiple skip pipes
f(`foo | skip 10 | skip 100`, `foo | skip 10 | skip 100`) f(`foo | skip 10 | skip 100`, `foo | skip 10 | skip 100`)
@ -839,6 +863,8 @@ func TestParseQuerySuccess(t *testing.T) {
f(`* | stats count() x`, `* | stats count(*) as x`) f(`* | stats count() x`, `* | stats count(*) as x`)
f(`* | stats count(*) x`, `* | stats count(*) as x`) f(`* | stats count(*) x`, `* | stats count(*) as x`)
f(`* | stats count(foo,*,bar) x`, `* | stats count(*) as x`) f(`* | stats count(foo,*,bar) x`, `* | stats count(*) as x`)
f(`* | stats count('') foo`, `* | stats count(_msg) as foo`)
f(`* | stats count(foo) ''`, `* | stats count(foo) as _msg`)
// stats pipe sum // stats pipe sum
f(`* | stats Sum(foo) bar`, `* | stats sum(foo) as bar`) f(`* | stats Sum(foo) bar`, `* | stats sum(foo) as bar`)
@ -1107,6 +1133,23 @@ func TestParseQueryFailure(t *testing.T) {
f(`foo | fields bar,`) f(`foo | fields bar,`)
f(`foo | fields bar,,`) f(`foo | fields bar,,`)
// invalid copy pipe
f(`foo | copy`)
f(`foo | copy foo`)
f(`foo | copy foo,`)
f(`foo | copy foo,,`)
// invalid rename pipe
f(`foo | rename`)
f(`foo | rename foo`)
f(`foo | rename foo,`)
f(`foo | rename foo,,`)
// invalid delete pipe
f(`foo | delete`)
f(`foo | delete foo,`)
f(`foo | delete foo,,`)
// missing head pipe value // missing head pipe value
f(`foo | head`) f(`foo | head`)
@ -1175,3 +1218,25 @@ func TestParseQueryFailure(t *testing.T) {
f(`foo | stats by(bar,`) f(`foo | stats by(bar,`)
f(`foo | stats by(bar)`) f(`foo | stats by(bar)`)
} }
func TestNormalizeFields(t *testing.T) {
f := func(fields, normalizedExpected []string) {
t.Helper()
normalized := normalizeFields(fields)
if !slices.Equal(normalized, normalizedExpected) {
t.Fatalf("unexpected normalized fields for %q; got %q; want %q", fields, normalized, normalizedExpected)
}
}
f(nil, nil)
f([]string{"foo"}, []string{"foo"})
// duplicate fields
f([]string{"foo", "bar", "foo", "x"}, []string{"foo", "bar", "x"})
f([]string{"foo", "foo", "x", "x", "x"}, []string{"foo", "x"})
// star field
f([]string{"*"}, []string{"*"})
f([]string{"foo", "*", "bar"}, []string{"*"})
}

View file

@ -8,6 +8,12 @@ type pipe interface {
// String returns string representation of the pipe. // String returns string representation of the pipe.
String() string String() string
// getNeededFields must return the required input fields alongside the mapping from output fields to input fields for the given pipe.
//
// It must return []string{"*"} if the set of input fields cannot be determined at the given pipe.
// It must return nil map if the pipe doesn't add new fields to the output.
getNeededFields() ([]string, map[string][]string)
// newPipeProcessor must return new pipeProcessor for the given ppBase. // newPipeProcessor must return new pipeProcessor for the given ppBase.
// //
// workersCount is the number of goroutine workers, which will call writeBlock() method. // workersCount is the number of goroutine workers, which will call writeBlock() method.
@ -68,12 +74,6 @@ func parsePipes(lex *lexer) ([]pipe, error) {
return nil, fmt.Errorf("missing token after '|'") return nil, fmt.Errorf("missing token after '|'")
} }
switch { switch {
case lex.isKeyword("fields"):
pf, err := parsePipeFields(lex)
if err != nil {
return nil, fmt.Errorf("cannot parse 'fields' pipe: %w", err)
}
pipes = append(pipes, pf)
case lex.isKeyword("stats"): case lex.isKeyword("stats"):
ps, err := parsePipeStats(lex) ps, err := parsePipeStats(lex)
if err != nil { if err != nil {
@ -92,6 +92,30 @@ func parsePipes(lex *lexer) ([]pipe, error) {
return nil, fmt.Errorf("cannot parse 'skip' pipe: %w", err) return nil, fmt.Errorf("cannot parse 'skip' pipe: %w", err)
} }
pipes = append(pipes, ps) pipes = append(pipes, ps)
case lex.isKeyword("fields"):
pf, err := parsePipeFields(lex)
if err != nil {
return nil, fmt.Errorf("cannot parse 'fields' pipe: %w", err)
}
pipes = append(pipes, pf)
case lex.isKeyword("copy"):
pc, err := parsePipeCopy(lex)
if err != nil {
return nil, fmt.Errorf("cannot parse 'copy' pipe: %w", err)
}
pipes = append(pipes, pc)
case lex.isKeyword("rename"):
pr, err := parsePipeRename(lex)
if err != nil {
return nil, fmt.Errorf("cannot parse 'rename' pipe: %w", err)
}
pipes = append(pipes, pr)
case lex.isKeyword("delete"):
pd, err := parsePipeDelete(lex)
if err != nil {
return nil, fmt.Errorf("cannot parse 'delete' pipe: %w", err)
}
pipes = append(pipes, pd)
default: default:
return nil, fmt.Errorf("unexpected pipe %q", lex.token) return nil, fmt.Errorf("unexpected pipe %q", lex.token)
} }

View file

@ -0,0 +1,99 @@
package logstorage
import (
"fmt"
"strings"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
)
// pipeCopy implements '| copy ...' pipe.
//
// See https://docs.victoriametrics.com/victorialogs/logsql/#transformations
type pipeCopy struct {
// srcFields contains a list of source fields to copy
srcFields []string
// dstFields contains a list of destination fields
dstFields []string
}
func (pc *pipeCopy) String() string {
if len(pc.srcFields) == 0 {
logger.Panicf("BUG: pipeCopy must contain at least a single srcField")
}
a := make([]string, len(pc.srcFields))
for i, srcField := range pc.srcFields {
dstField := pc.dstFields[i]
a[i] = quoteTokenIfNeeded(srcField) + " as " + quoteTokenIfNeeded(dstField)
}
return "copy " + strings.Join(a, ", ")
}
func (pc *pipeCopy) getNeededFields() ([]string, map[string][]string) {
m := make(map[string][]string, len(pc.srcFields))
for i, dstField := range pc.dstFields {
m[dstField] = append(m[dstField], pc.srcFields[i])
}
return []string{"*"}, m
}
func (pc *pipeCopy) newPipeProcessor(_ int, _ <-chan struct{}, _ func(), ppBase pipeProcessor) pipeProcessor {
return &pipeCopyProcessor{
pc: pc,
ppBase: ppBase,
}
}
type pipeCopyProcessor struct {
pc *pipeCopy
ppBase pipeProcessor
}
func (pcp *pipeCopyProcessor) writeBlock(workerID uint, br *blockResult) {
br.copyColumns(pcp.pc.srcFields, pcp.pc.dstFields)
pcp.ppBase.writeBlock(workerID, br)
}
func (pcp *pipeCopyProcessor) flush() error {
return nil
}
func parsePipeCopy(lex *lexer) (*pipeCopy, error) {
if !lex.isKeyword("copy") {
return nil, fmt.Errorf("expecting 'copy'; got %q", lex.token)
}
var srcFields []string
var dstFields []string
for {
lex.nextToken()
srcField, err := parseFieldName(lex)
if err != nil {
return nil, fmt.Errorf("cannot parse src field name: %w", err)
}
if lex.isKeyword("as") {
lex.nextToken()
}
dstField, err := parseFieldName(lex)
if err != nil {
return nil, fmt.Errorf("cannot parse dst field name: %w", err)
}
srcFields = append(srcFields, srcField)
dstFields = append(dstFields, dstField)
switch {
case lex.isKeyword("|", ")", ""):
pc := &pipeCopy{
srcFields: srcFields,
dstFields: dstFields,
}
return pc, nil
case lex.isKeyword(","):
default:
return nil, fmt.Errorf("unexpected token: %q; expecting ',', '|' or ')'", lex.token)
}
}
}

View file

@ -0,0 +1,76 @@
package logstorage
import (
"fmt"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
)
// pipeDelete implements '| delete ...' pipe.
//
// See https://docs.victoriametrics.com/victorialogs/logsql/#transformations
type pipeDelete struct {
// fields contains a list of fields to delete
fields []string
}
func (pd *pipeDelete) String() string {
if len(pd.fields) == 0 {
logger.Panicf("BUG: pipeDelete must contain at least a single field")
}
return "delete " + fieldNamesString(pd.fields)
}
func (pd *pipeDelete) getNeededFields() ([]string, map[string][]string) {
return []string{"*"}, nil
}
func (pd *pipeDelete) newPipeProcessor(_ int, _ <-chan struct{}, _ func(), ppBase pipeProcessor) pipeProcessor {
return &pipeDeleteProcessor{
pd: pd,
ppBase: ppBase,
}
}
type pipeDeleteProcessor struct {
pd *pipeDelete
ppBase pipeProcessor
}
func (pdp *pipeDeleteProcessor) writeBlock(workerID uint, br *blockResult) {
br.deleteColumns(pdp.pd.fields)
pdp.ppBase.writeBlock(workerID, br)
}
func (pdp *pipeDeleteProcessor) flush() error {
return nil
}
func parsePipeDelete(lex *lexer) (*pipeDelete, error) {
if !lex.isKeyword("delete") {
return nil, fmt.Errorf("expecting 'delete'; got %q", lex.token)
}
var fields []string
for {
lex.nextToken()
field, err := parseFieldName(lex)
if err != nil {
return nil, fmt.Errorf("cannot parse field name: %w", err)
}
fields = append(fields, field)
switch {
case lex.isKeyword("|", ")", ""):
pd := &pipeDelete{
fields: fields,
}
return pd, nil
case lex.isKeyword(","):
default:
return nil, fmt.Errorf("unexpected token: %q; expecting ',', '|' or ')'", lex.token)
}
}
}

View file

@ -9,7 +9,7 @@ import (
// pipeFields implements '| fields ...' pipe. // pipeFields implements '| fields ...' pipe.
// //
// See https://docs.victoriametrics.com/victorialogs/logsql/#limiters // See https://docs.victoriametrics.com/victorialogs/logsql/#fields-pipe
type pipeFields struct { type pipeFields struct {
// fields contains list of fields to fetch // fields contains list of fields to fetch
fields []string fields []string
@ -25,6 +25,13 @@ func (pf *pipeFields) String() string {
return "fields " + fieldNamesString(pf.fields) return "fields " + fieldNamesString(pf.fields)
} }
func (pf *pipeFields) getNeededFields() ([]string, map[string][]string) {
if pf.containsStar {
return []string{"*"}, nil
}
return pf.fields, nil
}
func (pf *pipeFields) newPipeProcessor(_ int, _ <-chan struct{}, _ func(), ppBase pipeProcessor) pipeProcessor { func (pf *pipeFields) newPipeProcessor(_ int, _ <-chan struct{}, _ func(), ppBase pipeProcessor) pipeProcessor {
return &pipeFieldsProcessor{ return &pipeFieldsProcessor{
pf: pf, pf: pf,
@ -39,7 +46,7 @@ type pipeFieldsProcessor struct {
func (pfp *pipeFieldsProcessor) writeBlock(workerID uint, br *blockResult) { func (pfp *pipeFieldsProcessor) writeBlock(workerID uint, br *blockResult) {
if !pfp.pf.containsStar { if !pfp.pf.containsStar {
br.updateColumns(pfp.pf.fields) br.setColumns(pfp.pf.fields)
} }
pfp.ppBase.writeBlock(workerID, br) pfp.ppBase.writeBlock(workerID, br)
} }
@ -49,11 +56,13 @@ func (pfp *pipeFieldsProcessor) flush() error {
} }
func parsePipeFields(lex *lexer) (*pipeFields, error) { func parsePipeFields(lex *lexer) (*pipeFields, error) {
if !lex.isKeyword("fields") {
return nil, fmt.Errorf("expecting 'fields'; got %q", lex.token)
}
var fields []string var fields []string
for { for {
if !lex.mustNextToken() { lex.nextToken()
return nil, fmt.Errorf("missing field name")
}
field, err := parseFieldName(lex) field, err := parseFieldName(lex)
if err != nil { if err != nil {
return nil, fmt.Errorf("cannot parse field name: %w", err) return nil, fmt.Errorf("cannot parse field name: %w", err)
@ -61,6 +70,9 @@ func parsePipeFields(lex *lexer) (*pipeFields, error) {
fields = append(fields, field) fields = append(fields, field)
switch { switch {
case lex.isKeyword("|", ")", ""): case lex.isKeyword("|", ")", ""):
if slices.Contains(fields, "*") {
fields = []string{"*"}
}
pf := &pipeFields{ pf := &pipeFields{
fields: fields, fields: fields,
containsStar: slices.Contains(fields, "*"), containsStar: slices.Contains(fields, "*"),

View file

@ -16,6 +16,10 @@ func (ph *pipeHead) String() string {
return fmt.Sprintf("head %d", ph.n) return fmt.Sprintf("head %d", ph.n)
} }
func (ph *pipeHead) getNeededFields() ([]string, map[string][]string) {
return []string{"*"}, nil
}
func (ph *pipeHead) newPipeProcessor(_ int, _ <-chan struct{}, cancel func(), ppBase pipeProcessor) pipeProcessor { func (ph *pipeHead) newPipeProcessor(_ int, _ <-chan struct{}, cancel func(), ppBase pipeProcessor) pipeProcessor {
if ph.n == 0 { if ph.n == 0 {
// Special case - notify the caller to stop writing data to the returned pipeHeadProcessor // Special case - notify the caller to stop writing data to the returned pipeHeadProcessor
@ -65,12 +69,14 @@ func (php *pipeHeadProcessor) flush() error {
} }
func parsePipeHead(lex *lexer) (*pipeHead, error) { func parsePipeHead(lex *lexer) (*pipeHead, error) {
if !lex.mustNextToken() { if !lex.isKeyword("head") {
return nil, fmt.Errorf("missing the number of head rows to return") return nil, fmt.Errorf("expecting 'head'; got %q", lex.token)
} }
lex.nextToken()
n, err := parseUint(lex.token) n, err := parseUint(lex.token)
if err != nil { if err != nil {
return nil, fmt.Errorf("cannot parse the number of head rows to return %q: %w", lex.token, err) return nil, fmt.Errorf("cannot parse the number of head rows to return from %q: %w", lex.token, err)
} }
lex.nextToken() lex.nextToken()
ph := &pipeHead{ ph := &pipeHead{

View file

@ -0,0 +1,99 @@
package logstorage
import (
"fmt"
"strings"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
)
// pipeRename implements '| rename ...' pipe.
//
// See https://docs.victoriametrics.com/victorialogs/logsql/#transformations
type pipeRename struct {
// srcFields contains a list of source fields to rename
srcFields []string
// dstFields contains a list of destination fields
dstFields []string
}
func (pr *pipeRename) String() string {
if len(pr.srcFields) == 0 {
logger.Panicf("BUG: pipeRename must contain at least a single srcField")
}
a := make([]string, len(pr.srcFields))
for i, srcField := range pr.srcFields {
dstField := pr.dstFields[i]
a[i] = quoteTokenIfNeeded(srcField) + " as " + quoteTokenIfNeeded(dstField)
}
return "rename " + strings.Join(a, ", ")
}
func (pr *pipeRename) getNeededFields() ([]string, map[string][]string) {
m := make(map[string][]string, len(pr.srcFields))
for i, dstField := range pr.dstFields {
m[dstField] = append(m[dstField], pr.srcFields[i])
}
return []string{"*"}, m
}
func (pr *pipeRename) newPipeProcessor(_ int, _ <-chan struct{}, _ func(), ppBase pipeProcessor) pipeProcessor {
return &pipeRenameProcessor{
pr: pr,
ppBase: ppBase,
}
}
type pipeRenameProcessor struct {
pr *pipeRename
ppBase pipeProcessor
}
func (prp *pipeRenameProcessor) writeBlock(workerID uint, br *blockResult) {
br.renameColumns(prp.pr.srcFields, prp.pr.dstFields)
prp.ppBase.writeBlock(workerID, br)
}
func (prp *pipeRenameProcessor) flush() error {
return nil
}
func parsePipeRename(lex *lexer) (*pipeRename, error) {
if !lex.isKeyword("rename") {
return nil, fmt.Errorf("expecting 'rename'; got %q", lex.token)
}
var srcFields []string
var dstFields []string
for {
lex.nextToken()
srcField, err := parseFieldName(lex)
if err != nil {
return nil, fmt.Errorf("cannot parse src field name: %w", err)
}
if lex.isKeyword("as") {
lex.nextToken()
}
dstField, err := parseFieldName(lex)
if err != nil {
return nil, fmt.Errorf("cannot parse dst field name: %w", err)
}
srcFields = append(srcFields, srcField)
dstFields = append(dstFields, dstField)
switch {
case lex.isKeyword("|", ")", ""):
pr := &pipeRename{
srcFields: srcFields,
dstFields: dstFields,
}
return pr, nil
case lex.isKeyword(","):
default:
return nil, fmt.Errorf("unexpected token: %q; expecting ',', '|' or ')'", lex.token)
}
}
}

View file

@ -16,6 +16,10 @@ func (ps *pipeSkip) String() string {
return fmt.Sprintf("skip %d", ps.n) return fmt.Sprintf("skip %d", ps.n)
} }
func (ps *pipeSkip) getNeededFields() ([]string, map[string][]string) {
return []string{"*"}, nil
}
func (ps *pipeSkip) newPipeProcessor(workersCount int, _ <-chan struct{}, _ func(), ppBase pipeProcessor) pipeProcessor { func (ps *pipeSkip) newPipeProcessor(workersCount int, _ <-chan struct{}, _ func(), ppBase pipeProcessor) pipeProcessor {
return &pipeSkipProcessor{ return &pipeSkipProcessor{
ps: ps, ps: ps,
@ -52,12 +56,14 @@ func (psp *pipeSkipProcessor) flush() error {
} }
func parsePipeSkip(lex *lexer) (*pipeSkip, error) { func parsePipeSkip(lex *lexer) (*pipeSkip, error) {
if !lex.mustNextToken() { if !lex.isKeyword("skip") {
return nil, fmt.Errorf("missing the number of rows to skip") return nil, fmt.Errorf("expecting 'rename'; got %q", lex.token)
} }
lex.nextToken()
n, err := parseUint(lex.token) n, err := parseUint(lex.token)
if err != nil { if err != nil {
return nil, fmt.Errorf("cannot parse the number of rows to skip %q: %w", lex.token, err) return nil, fmt.Errorf("cannot parse the number of rows to skip from %q: %w", lex.token, err)
} }
lex.nextToken() lex.nextToken()
ps := &pipeSkip{ ps := &pipeSkip{

View file

@ -83,6 +83,27 @@ func (ps *pipeStats) String() string {
return s return s
} }
func (ps *pipeStats) getNeededFields() ([]string, map[string][]string) {
var byFields []string
for _, bf := range ps.byFields {
byFields = append(byFields, bf.name)
}
neededFields := append([]string{}, byFields...)
m := make(map[string][]string)
for i, f := range ps.funcs {
funcFields := f.neededFields()
neededFields = append(neededFields, funcFields...)
resultName := ps.resultNames[i]
m[resultName] = append(m[resultName], byFields...)
m[resultName] = append(m[resultName], funcFields...)
}
return neededFields, m
}
const stateSizeBudgetChunk = 1 << 20 const stateSizeBudgetChunk = 1 << 20
func (ps *pipeStats) newPipeProcessor(workersCount int, stopCh <-chan struct{}, cancel func(), ppBase pipeProcessor) pipeProcessor { func (ps *pipeStats) newPipeProcessor(workersCount int, stopCh <-chan struct{}, cancel func(), ppBase pipeProcessor) pipeProcessor {
@ -376,35 +397,13 @@ func (psp *pipeStatsProcessor) flush() error {
return nil return nil
} }
func (ps *pipeStats) neededFields() []string {
var neededFields []string
m := make(map[string]struct{})
for _, bf := range ps.byFields {
name := bf.name
if _, ok := m[name]; !ok {
m[name] = struct{}{}
neededFields = append(neededFields, name)
}
}
for _, f := range ps.funcs {
for _, fieldName := range f.neededFields() {
if _, ok := m[fieldName]; !ok {
m[fieldName] = struct{}{}
neededFields = append(neededFields, fieldName)
}
}
}
return neededFields
}
func parsePipeStats(lex *lexer) (*pipeStats, error) { func parsePipeStats(lex *lexer) (*pipeStats, error) {
if !lex.mustNextToken() { if !lex.isKeyword("stats") {
return nil, fmt.Errorf("missing stats config") return nil, fmt.Errorf("expecting 'stats'; got %q", lex.token)
} }
lex.nextToken()
var ps pipeStats var ps pipeStats
if lex.isKeyword("by") { if lex.isKeyword("by") {
lex.nextToken() lex.nextToken()
@ -494,9 +493,7 @@ func parseStatsFunc(lex *lexer) (statsFunc, string, error) {
func parseResultName(lex *lexer) (string, error) { func parseResultName(lex *lexer) (string, error) {
if lex.isKeyword("as") { if lex.isKeyword("as") {
if !lex.mustNextToken() { lex.nextToken()
return "", fmt.Errorf("missing token after 'as' keyword")
}
} }
resultName, err := parseFieldName(lex) resultName, err := parseFieldName(lex)
if err != nil { if err != nil {
@ -543,9 +540,7 @@ func parseByFields(lex *lexer) ([]*byField, error) {
} }
var bfs []*byField var bfs []*byField
for { for {
if !lex.mustNextToken() { lex.nextToken()
return nil, fmt.Errorf("missing field name or ')'")
}
if lex.isKeyword(")") { if lex.isKeyword(")") {
lex.nextToken() lex.nextToken()
return bfs, nil return bfs, nil
@ -657,6 +652,9 @@ func tryParseBucketSize(s string) (float64, bool) {
return 0, false return 0, false
} }
// parseFieldNamesForStatsFunc parses field names for statsFunc.
//
// It returns ["*"] if the fields names list is empty or if it contains "*" field.
func parseFieldNamesForStatsFunc(lex *lexer, funcName string) ([]string, error) { func parseFieldNamesForStatsFunc(lex *lexer, funcName string) ([]string, error) {
if !lex.isKeyword(funcName) { if !lex.isKeyword(funcName) {
return nil, fmt.Errorf("unexpected func; got %q; want %q", lex.token, funcName) return nil, fmt.Errorf("unexpected func; got %q; want %q", lex.token, funcName)
@ -678,9 +676,7 @@ func parseFieldNamesInParens(lex *lexer) ([]string, error) {
} }
var fields []string var fields []string
for { for {
if !lex.mustNextToken() { lex.nextToken()
return nil, fmt.Errorf("missing field name or ')'")
}
if lex.isKeyword(")") { if lex.isKeyword(")") {
lex.nextToken() lex.nextToken()
return fields, nil return fields, nil
@ -708,8 +704,9 @@ func parseFieldName(lex *lexer) (string, error) {
if lex.isKeyword(",", "(", ")", "[", "]", "|", ":", "") { if lex.isKeyword(",", "(", ")", "[", "]", "|", ":", "") {
return "", fmt.Errorf("unexpected token: %q", lex.token) return "", fmt.Errorf("unexpected token: %q", lex.token)
} }
token := getCompoundPhrase(lex, false) fieldName := getCompoundPhrase(lex, false)
return token, nil fieldName = getCanonicalColumnName(fieldName)
return fieldName, nil
} }
func fieldNamesString(fields []string) string { func fieldNamesString(fields []string) string {

View file

@ -35,12 +35,12 @@ func TestTryParseBucketSize_Success(t *testing.T) {
f("-1h5m3.5s", -(nsecsPerHour + 5*nsecsPerMinute + 3.5*nsecsPerSecond)) f("-1h5m3.5s", -(nsecsPerHour + 5*nsecsPerMinute + 3.5*nsecsPerSecond))
// bytes // bytes
f("1b", 1) f("1B", 1)
f("1k", 1_000) f("1K", 1_000)
f("1Kb", 1_000) f("1KB", 1_000)
f("5.5KiB", 5.5*(1<<10)) f("5.5KiB", 5.5*(1<<10))
f("10MB500KB10B", 10*1_000_000+500*1_000+10) f("10MB500KB10B", 10*1_000_000+500*1_000+10)
f("10m0k", 10*1_000_000) f("10M", 10*1_000_000)
f("-10MB", -10*1_000_000) f("-10MB", -10*1_000_000)
// ipv4 mask // ipv4 mask
@ -95,13 +95,13 @@ func TestTryParseBucketOffset_Success(t *testing.T) {
f("-1h5m3.5s", -(nsecsPerHour + 5*nsecsPerMinute + 3.5*nsecsPerSecond)) f("-1h5m3.5s", -(nsecsPerHour + 5*nsecsPerMinute + 3.5*nsecsPerSecond))
// bytes // bytes
f("1b", 1) f("1B", 1)
f("1k", 1_000) f("1K", 1_000)
f("1Kb", 1_000) f("1KB", 1_000)
f("5.5KiB", 5.5*(1<<10)) f("5.5KiB", 5.5*(1<<10))
f("10MB500KB10B", 10*1_000_000+500*1_000+10) f("10MB500KB10B", 10*1_000_000+500*1_000+10)
f("10m0k", 10*1_000_000) f("10M", 10*1_000_000)
f("-10mb", -10*1_000_000) f("-10MB", -10*1_000_000)
} }
func TestTryParseBucketOffset_Failure(t *testing.T) { func TestTryParseBucketOffset_Failure(t *testing.T) {

View file

@ -24,10 +24,7 @@ func (f *Field) Reset() {
// String returns string representation of f. // String returns string representation of f.
func (f *Field) String() string { func (f *Field) String() string {
name := f.Name name := getCanonicalColumnName(f.Name)
if name == "" {
name = "_msg"
}
return fmt.Sprintf("%q:%q", name, f.Value) return fmt.Sprintf("%q:%q", name, f.Value)
} }
@ -121,3 +118,10 @@ func (rs *rows) mergeRows(timestampsA, timestampsB []int64, fieldsA, fieldsB [][
rs.appendRows(timestampsA, fieldsA) rs.appendRows(timestampsA, fieldsA)
} }
} }
func getCanonicalColumnName(columnName string) string {
if columnName == "" {
return "_msg"
}
return columnName
}

View file

@ -18,7 +18,11 @@ func (sc *statsCount) String() string {
} }
func (sc *statsCount) neededFields() []string { func (sc *statsCount) neededFields() []string {
return getFieldsIgnoreStar(sc.fields) if sc.containsStar {
// There is no need in fetching any columns for count(*) - the number of matching rows can be calculated as len(blockResult.timestamps)
return nil
}
return sc.fields
} }
func (sc *statsCount) newStatsProcessor() (statsProcessor, int) { func (sc *statsCount) newStatsProcessor() (statsProcessor, int) {
@ -204,13 +208,3 @@ func parseStatsCount(lex *lexer) (*statsCount, error) {
} }
return sc, nil return sc, nil
} }
func getFieldsIgnoreStar(fields []string) []string {
var result []string
for _, f := range fields {
if f != "*" {
result = append(result, f)
}
}
return result
}

View file

@ -88,8 +88,7 @@ func (sup *statsUniqProcessor) updateStatsForAllRows(br *blockResult) int {
} }
if len(fields) == 1 { if len(fields) == 1 {
// Fast path for a single column. // Fast path for a single column.
// The unique key is formed as "<is_time> <value_type>? <encodedValue>", // The unique key is formed as "<is_time> <value>",
// where <value_type> is skipped if <is_time> == 1.
// This guarantees that keys do not clash for different column types across blocks. // This guarantees that keys do not clash for different column types across blocks.
c := br.getColumnByName(fields[0]) c := br.getColumnByName(fields[0])
if c.isTime { if c.isTime {
@ -119,7 +118,7 @@ func (sup *statsUniqProcessor) updateStatsForAllRows(br *blockResult) int {
return stateSizeIncrease return stateSizeIncrease
} }
keyBuf := sup.keyBuf[:0] keyBuf := sup.keyBuf[:0]
keyBuf = append(keyBuf[:0], 0, byte(valueTypeString)) keyBuf = append(keyBuf[:0], 0)
keyBuf = append(keyBuf, v...) keyBuf = append(keyBuf, v...)
if _, ok := m[string(keyBuf)]; !ok { if _, ok := m[string(keyBuf)]; !ok {
m[string(keyBuf)] = struct{}{} m[string(keyBuf)] = struct{}{}
@ -131,13 +130,13 @@ func (sup *statsUniqProcessor) updateStatsForAllRows(br *blockResult) int {
if c.valueType == valueTypeDict { if c.valueType == valueTypeDict {
// count unique non-zero c.dictValues // count unique non-zero c.dictValues
keyBuf := sup.keyBuf[:0] keyBuf := sup.keyBuf[:0]
for i, v := range c.dictValues { for _, v := range c.dictValues {
if v == "" { if v == "" {
// Do not count empty values // Do not count empty values
continue continue
} }
keyBuf = append(keyBuf[:0], 0, byte(valueTypeDict)) keyBuf = append(keyBuf[:0], 0)
keyBuf = append(keyBuf, byte(i)) keyBuf = append(keyBuf, v...)
if _, ok := m[string(keyBuf)]; !ok { if _, ok := m[string(keyBuf)]; !ok {
m[string(keyBuf)] = struct{}{} m[string(keyBuf)] = struct{}{}
stateSizeIncrease += len(keyBuf) + int(unsafe.Sizeof("")) stateSizeIncrease += len(keyBuf) + int(unsafe.Sizeof(""))
@ -148,19 +147,18 @@ func (sup *statsUniqProcessor) updateStatsForAllRows(br *blockResult) int {
} }
// Count unique values across encodedValues // Count unique values across encodedValues
encodedValues := c.getEncodedValues(br) values := c.getValues(br)
isStringValueType := c.valueType == valueTypeString
keyBuf := sup.keyBuf[:0] keyBuf := sup.keyBuf[:0]
for i, v := range encodedValues { for i, v := range values {
if isStringValueType && v == "" { if v == "" {
// Do not count empty values // Do not count empty values
continue continue
} }
if i > 0 && encodedValues[i-1] == v { if i > 0 && values[i-1] == v {
// This value has been already counted. // This value has been already counted.
continue continue
} }
keyBuf = append(keyBuf[:0], 0, byte(c.valueType)) keyBuf = append(keyBuf[:0], 0)
keyBuf = append(keyBuf, v...) keyBuf = append(keyBuf, v...)
if _, ok := m[string(keyBuf)]; !ok { if _, ok := m[string(keyBuf)]; !ok {
m[string(keyBuf)] = struct{}{} m[string(keyBuf)] = struct{}{}
@ -249,8 +247,7 @@ func (sup *statsUniqProcessor) updateStatsForRow(br *blockResult, rowIdx int) in
} }
if len(fields) == 1 { if len(fields) == 1 {
// Fast path for a single column. // Fast path for a single column.
// The unique key is formed as "<is_time> <value_type>? <encodedValue>", // The unique key is formed as "<is_time> <value>",
// where <value_type> is skipped if <is_time> == 1.
// This guarantees that keys do not clash for different column types across blocks. // This guarantees that keys do not clash for different column types across blocks.
c := br.getColumnByName(fields[0]) c := br.getColumnByName(fields[0])
if c.isTime { if c.isTime {
@ -273,7 +270,7 @@ func (sup *statsUniqProcessor) updateStatsForRow(br *blockResult, rowIdx int) in
return stateSizeIncrease return stateSizeIncrease
} }
keyBuf := sup.keyBuf[:0] keyBuf := sup.keyBuf[:0]
keyBuf = append(keyBuf[:0], 0, byte(valueTypeString)) keyBuf = append(keyBuf[:0], 0)
keyBuf = append(keyBuf, v...) keyBuf = append(keyBuf, v...)
if _, ok := m[string(keyBuf)]; !ok { if _, ok := m[string(keyBuf)]; !ok {
m[string(keyBuf)] = struct{}{} m[string(keyBuf)] = struct{}{}
@ -285,13 +282,14 @@ func (sup *statsUniqProcessor) updateStatsForRow(br *blockResult, rowIdx int) in
if c.valueType == valueTypeDict { if c.valueType == valueTypeDict {
// count unique non-zero c.dictValues // count unique non-zero c.dictValues
dictIdx := c.encodedValues[rowIdx][0] dictIdx := c.encodedValues[rowIdx][0]
if c.dictValues[dictIdx] == "" { v := c.dictValues[dictIdx]
if v == "" {
// Do not count empty values // Do not count empty values
return stateSizeIncrease return stateSizeIncrease
} }
keyBuf := sup.keyBuf[:0] keyBuf := sup.keyBuf[:0]
keyBuf = append(keyBuf[:0], 0, byte(valueTypeDict)) keyBuf = append(keyBuf[:0], 0)
keyBuf = append(keyBuf, dictIdx) keyBuf = append(keyBuf, v...)
if _, ok := m[string(keyBuf)]; !ok { if _, ok := m[string(keyBuf)]; !ok {
m[string(keyBuf)] = struct{}{} m[string(keyBuf)] = struct{}{}
stateSizeIncrease += len(keyBuf) + int(unsafe.Sizeof("")) stateSizeIncrease += len(keyBuf) + int(unsafe.Sizeof(""))
@ -301,14 +299,13 @@ func (sup *statsUniqProcessor) updateStatsForRow(br *blockResult, rowIdx int) in
} }
// Count unique values for the given rowIdx // Count unique values for the given rowIdx
encodedValues := c.getEncodedValues(br) v := c.getValueAtRow(br, rowIdx)
v := encodedValues[rowIdx] if v == "" {
if c.valueType == valueTypeString && v == "" {
// Do not count empty values // Do not count empty values
return stateSizeIncrease return stateSizeIncrease
} }
keyBuf := sup.keyBuf[:0] keyBuf := sup.keyBuf[:0]
keyBuf = append(keyBuf[:0], 0, byte(c.valueType)) keyBuf = append(keyBuf[:0], 0)
keyBuf = append(keyBuf, v...) keyBuf = append(keyBuf, v...)
if _, ok := m[string(keyBuf)]; !ok { if _, ok := m[string(keyBuf)]; !ok {
m[string(keyBuf)] = struct{}{} m[string(keyBuf)] = struct{}{}

View file

@ -731,88 +731,91 @@ func tryParseBytes(s string) (int64, bool) {
if !ok { if !ok {
return 0, false return 0, false
} }
if len(tail) == 0 {
if _, frac := math.Modf(f); frac != 0 {
// deny floating-point numbers without any suffix.
return 0, false
}
}
s = tail s = tail
if len(s) == 0 { if len(s) == 0 {
n += int64(f) n += int64(f)
continue continue
} }
if len(s) >= 3 { if len(s) >= 3 {
prefix := s[:3]
switch { switch {
case strings.EqualFold(prefix, "kib"): case strings.HasPrefix(s, "KiB"):
n += int64(f * (1 << 10)) n += int64(f * (1 << 10))
s = s[3:] s = s[3:]
continue continue
case strings.EqualFold(prefix, "mib"): case strings.HasPrefix(s, "MiB"):
n += int64(f * (1 << 20)) n += int64(f * (1 << 20))
s = s[3:] s = s[3:]
continue continue
case strings.EqualFold(prefix, "gib"): case strings.HasPrefix(s, "GiB"):
n += int64(f * (1 << 30)) n += int64(f * (1 << 30))
s = s[3:] s = s[3:]
continue continue
case strings.EqualFold(prefix, "tib"): case strings.HasPrefix(s, "TiB"):
n += int64(f * (1 << 40)) n += int64(f * (1 << 40))
s = s[3:] s = s[3:]
continue continue
} }
} }
if len(s) >= 2 { if len(s) >= 2 {
prefix := s[:2]
switch { switch {
case strings.EqualFold(prefix, "ki"): case strings.HasPrefix(s, "Ki"):
n += int64(f * (1 << 10)) n += int64(f * (1 << 10))
s = s[2:] s = s[2:]
continue continue
case strings.EqualFold(prefix, "mi"): case strings.HasPrefix(s, "Mi"):
n += int64(f * (1 << 20)) n += int64(f * (1 << 20))
s = s[2:] s = s[2:]
continue continue
case strings.EqualFold(prefix, "gi"): case strings.HasPrefix(s, "Gi"):
n += int64(f * (1 << 30)) n += int64(f * (1 << 30))
s = s[2:] s = s[2:]
continue continue
case strings.EqualFold(prefix, "ti"): case strings.HasPrefix(s, "Ti"):
n += int64(f * (1 << 40)) n += int64(f * (1 << 40))
s = s[2:] s = s[2:]
continue continue
case strings.EqualFold(prefix, "kb"): case strings.HasPrefix(s, "KB"):
n += int64(f * 1_000) n += int64(f * 1_000)
s = s[2:] s = s[2:]
continue continue
case strings.EqualFold(prefix, "mb"): case strings.HasPrefix(s, "MB"):
n += int64(f * 1_000_000) n += int64(f * 1_000_000)
s = s[2:] s = s[2:]
continue continue
case strings.EqualFold(prefix, "gb"): case strings.HasPrefix(s, "GB"):
n += int64(f * 1_000_000_000) n += int64(f * 1_000_000_000)
s = s[2:] s = s[2:]
continue continue
case strings.EqualFold(prefix, "tb"): case strings.HasPrefix(s, "TB"):
n += int64(f * 1_000_000_000_000) n += int64(f * 1_000_000_000_000)
s = s[2:] s = s[2:]
continue continue
} }
} }
prefix := s[:1]
switch { switch {
case strings.EqualFold(prefix, "b"): case strings.HasPrefix(s, "B"):
n += int64(f) n += int64(f)
s = s[1:] s = s[1:]
continue continue
case strings.EqualFold(prefix, "k"): case strings.HasPrefix(s, "K"):
n += int64(f * 1_000) n += int64(f * 1_000)
s = s[1:] s = s[1:]
continue continue
case strings.EqualFold(prefix, "m"): case strings.HasPrefix(s, "M"):
n += int64(f * 1_000_000) n += int64(f * 1_000_000)
s = s[1:] s = s[1:]
continue continue
case strings.EqualFold(prefix, "g"): case strings.HasPrefix(s, "G"):
n += int64(f * 1_000_000_000) n += int64(f * 1_000_000_000)
s = s[1:] s = s[1:]
continue continue
case strings.EqualFold(prefix, "t"): case strings.HasPrefix(s, "T"):
n += int64(f * 1_000_000_000_000) n += int64(f * 1_000_000_000_000)
s = s[1:] s = s[1:]
continue continue
@ -859,48 +862,45 @@ func tryParseDuration(s string) (int64, bool) {
return 0, false return 0, false
} }
if len(s) >= 3 { if len(s) >= 3 {
prefix := s[:3] if strings.HasPrefix(s, "µs") {
if strings.EqualFold(prefix, "µs") {
nsecs += int64(f * nsecsPerMicrosecond) nsecs += int64(f * nsecsPerMicrosecond)
s = s[3:] s = s[3:]
continue continue
} }
} }
if len(s) >= 2 { if len(s) >= 2 {
prefix := s[:2]
switch { switch {
case strings.EqualFold(prefix, "ms"): case strings.HasPrefix(s, "ms"):
nsecs += int64(f * nsecsPerMillisecond) nsecs += int64(f * nsecsPerMillisecond)
s = s[2:] s = s[2:]
continue continue
case strings.EqualFold(prefix, "ns"): case strings.HasPrefix(s, "ns"):
nsecs += int64(f) nsecs += int64(f)
s = s[2:] s = s[2:]
continue continue
} }
} }
prefix := s[:1]
switch { switch {
case strings.EqualFold(prefix, "y"): case strings.HasPrefix(s, "y"):
nsecs += int64(f * nsecsPerYear) nsecs += int64(f * nsecsPerYear)
s = s[1:] s = s[1:]
case strings.EqualFold(prefix, "w"): case strings.HasPrefix(s, "w"):
nsecs += int64(f * nsecsPerWeek) nsecs += int64(f * nsecsPerWeek)
s = s[1:] s = s[1:]
continue continue
case strings.EqualFold(prefix, "d"): case strings.HasPrefix(s, "d"):
nsecs += int64(f * nsecsPerDay) nsecs += int64(f * nsecsPerDay)
s = s[1:] s = s[1:]
continue continue
case strings.EqualFold(prefix, "h"): case strings.HasPrefix(s, "h"):
nsecs += int64(f * nsecsPerHour) nsecs += int64(f * nsecsPerHour)
s = s[1:] s = s[1:]
continue continue
case strings.EqualFold(prefix, "m"): case strings.HasPrefix(s, "m"):
nsecs += int64(f * nsecsPerMinute) nsecs += int64(f * nsecsPerMinute)
s = s[1:] s = s[1:]
continue continue
case strings.EqualFold(prefix, "s"): case strings.HasPrefix(s, "s"):
nsecs += int64(f * nsecsPerSecond) nsecs += int64(f * nsecsPerSecond)
s = s[1:] s = s[1:]
continue continue

View file

@ -325,7 +325,6 @@ func TestTryParseDuration_Success(t *testing.T) {
// zero duration // zero duration
f("0s", 0) f("0s", 0)
f("0S", 0)
f("0.0w0d0h0s0.0ms", 0) f("0.0w0d0h0s0.0ms", 0)
f("-0w", 0) f("-0w", 0)
@ -334,15 +333,9 @@ func TestTryParseDuration_Success(t *testing.T) {
f("1.5ms", 1.5*nsecsPerMillisecond) f("1.5ms", 1.5*nsecsPerMillisecond)
f("1µs", nsecsPerMicrosecond) f("1µs", nsecsPerMicrosecond)
f("1ns", 1) f("1ns", 1)
f("1NS", 1)
f("1nS", 1)
f("1Ns", 1)
f("1h", nsecsPerHour) f("1h", nsecsPerHour)
f("1H", nsecsPerHour)
f("1.5d", 1.5*nsecsPerDay) f("1.5d", 1.5*nsecsPerDay)
f("1.5D", 1.5*nsecsPerDay)
f("1.5w", 1.5*nsecsPerWeek) f("1.5w", 1.5*nsecsPerWeek)
f("1.5W", 1.5*nsecsPerWeek)
f("2.5y", 2.5*nsecsPerYear) f("2.5y", 2.5*nsecsPerYear)
f("1m5.123456789s", nsecsPerMinute+5.123456789*nsecsPerSecond) f("1m5.123456789s", nsecsPerMinute+5.123456789*nsecsPerSecond)
@ -417,62 +410,25 @@ func TestTryParseBytes_Success(t *testing.T) {
} }
} }
f("123.456", 123)
f("1_500", 1_500) f("1_500", 1_500)
f("2.5b", 2)
f("2.5B", 2) f("2.5B", 2)
f("1.5k", 1_500)
f("1.5m", 1_500_000)
f("1.5g", 1_500_000_000)
f("1.5t", 1_500_000_000_000)
f("1.5K", 1_500) f("1.5K", 1_500)
f("1.5M", 1_500_000) f("1.5M", 1_500_000)
f("1.5G", 1_500_000_000) f("1.5G", 1_500_000_000)
f("1.5T", 1_500_000_000_000) f("1.5T", 1_500_000_000_000)
f("1.5kb", 1_500)
f("1.5mb", 1_500_000)
f("1.5gb", 1_500_000_000)
f("1.5tb", 1_500_000_000_000)
f("1.5Kb", 1_500)
f("1.5Mb", 1_500_000)
f("1.5Gb", 1_500_000_000)
f("1.5Tb", 1_500_000_000_000)
f("1.5KB", 1_500) f("1.5KB", 1_500)
f("1.5MB", 1_500_000) f("1.5MB", 1_500_000)
f("1.5GB", 1_500_000_000) f("1.5GB", 1_500_000_000)
f("1.5TB", 1_500_000_000_000) f("1.5TB", 1_500_000_000_000)
f("1.5ki", 1.5*(1<<10))
f("1.5mi", 1.5*(1<<20))
f("1.5gi", 1.5*(1<<30))
f("1.5ti", 1.5*(1<<40))
f("1.5Ki", 1.5*(1<<10)) f("1.5Ki", 1.5*(1<<10))
f("1.5Mi", 1.5*(1<<20)) f("1.5Mi", 1.5*(1<<20))
f("1.5Gi", 1.5*(1<<30)) f("1.5Gi", 1.5*(1<<30))
f("1.5Ti", 1.5*(1<<40)) f("1.5Ti", 1.5*(1<<40))
f("1.5KI", 1.5*(1<<10))
f("1.5MI", 1.5*(1<<20))
f("1.5GI", 1.5*(1<<30))
f("1.5TI", 1.5*(1<<40))
f("1.5kib", 1.5*(1<<10))
f("1.5mib", 1.5*(1<<20))
f("1.5gib", 1.5*(1<<30))
f("1.5tib", 1.5*(1<<40))
f("1.5kiB", 1.5*(1<<10))
f("1.5miB", 1.5*(1<<20))
f("1.5giB", 1.5*(1<<30))
f("1.5tiB", 1.5*(1<<40))
f("1.5KiB", 1.5*(1<<10)) f("1.5KiB", 1.5*(1<<10))
f("1.5MiB", 1.5*(1<<20)) f("1.5MiB", 1.5*(1<<20))
f("1.5GiB", 1.5*(1<<30)) f("1.5GiB", 1.5*(1<<30))
@ -503,6 +459,37 @@ func TestTryParseBytes_Failure(t *testing.T) {
f("123qsb") f("123qsb")
f("123sqsb") f("123sqsb")
f("123s5qsb") f("123s5qsb")
// invalid case for the suffix
f("1b")
f("1k")
f("1m")
f("1g")
f("1t")
f("1kb")
f("1mb")
f("1gb")
f("1tb")
f("1ki")
f("1mi")
f("1gi")
f("1ti")
f("1kib")
f("1mib")
f("1gib")
f("1tib")
f("1KIB")
f("1MIB")
f("1GIB")
f("1TIB")
// fractional number without suffix
f("123.456")
} }
func TestTryParseFloat64_Success(t *testing.T) { func TestTryParseFloat64_Success(t *testing.T) {