diff --git a/docs/VictoriaLogs/CHANGELOG.md b/docs/VictoriaLogs/CHANGELOG.md index 268faff38..d55b2a441 100644 --- a/docs/VictoriaLogs/CHANGELOG.md +++ b/docs/VictoriaLogs/CHANGELOG.md @@ -20,12 +20,15 @@ according to [these docs](https://docs.victoriametrics.com/VictoriaLogs/QuickSta ## tip * FEATURE: return all the log fields by default in query results. Previously only [`_stream`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#stream-fields), [`_time`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#time-field) and [`_msg`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) fields were returned by default. -* FEATURE: add support for returning only the requested log [fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). See [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#querying-specific-fields). -* FEATURE: add support for calculating `count()`, `uniq()`, `sum()`, `avg()`, `min()`, `max()` and `uniq_array()` over [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). Grouping by arbitrary set of [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) is supported. See [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#stats) for details. +* FEATURE: add support for returning only the requested log [fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). See [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#fields-pipe). +* FEATURE: add support for calculating `count()`, `uniq()`, `sum()`, `avg()`, `min()`, `max()` and `uniq_array()` over [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). Grouping by arbitrary set of [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) is supported. See [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#stats-pipe) for details. +* FEATURE: add support for sorting the returned results. See [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#sort-pipe). * FEATURE: add support for limiting the number of returned results. See [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#limiters). -* FEATURE: optimize performance for [LogsQL query](https://docs.victoriametrics.com/victorialogs/logsql/), which contains multiple filters for [words](https://docs.victoriametrics.com/victorialogs/logsql/#word-filter) or [phrases](https://docs.victoriametrics.com/victorialogs/logsql/#phrase-filter) delimited with [`AND` operator](https://docs.victoriametrics.com/victorialogs/logsql/#logical-filter). For example, `foo AND bar` query must find [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with `foo` and `bar` words at faster speed. +* FEATURE: add support for copying and renaming the selected log fields. See [these](https://docs.victoriametrics.com/victorialogs/logsql/#copy-pipe) and [these](https://docs.victoriametrics.com/victorialogs/logsql/#rename-pipe) docs. * FEATURE: allow using `_` inside numbers. For example, `score:range[1_000, 5_000_000]` for [`range` filter](https://docs.victoriametrics.com/victorialogs/logsql/#range-filter). * FEATURE: allow numbers in hexadecimal and binary form. For example, `response_size:range[0xff, 0b10001101101]` for [`range` filter](https://docs.victoriametrics.com/victorialogs/logsql/#range-filter). +* FEATURE: allow using duration and byte size suffixes in numeric values inside LogsQL queries. See [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#numeric-values). +* FEATURE: optimize performance for [LogsQL query](https://docs.victoriametrics.com/victorialogs/logsql/), which contains multiple filters for [words](https://docs.victoriametrics.com/victorialogs/logsql/#word-filter) or [phrases](https://docs.victoriametrics.com/victorialogs/logsql/#phrase-filter) delimited with [`AND` operator](https://docs.victoriametrics.com/victorialogs/logsql/#logical-filter). For example, `foo AND bar` query must find [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with `foo` and `bar` words at faster speed. * BUGFIX: prevent from additional CPU usage for up to a few seconds after canceling the query. * BUGFIX: prevent from returning log entries with emtpy `_stream` field in the form `"_stream":""` in [search query results](https://docs.victoriametrics.com/victorialogs/querying/). See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6042). diff --git a/docs/VictoriaLogs/LogsQL.md b/docs/VictoriaLogs/LogsQL.md index bdbb9ace0..d99c05b4a 100644 --- a/docs/VictoriaLogs/LogsQL.md +++ b/docs/VictoriaLogs/LogsQL.md @@ -19,7 +19,7 @@ It provides the following features: See [word filter](#word-filter), [phrase filter](#phrase-filter) and [prefix filter](#prefix-filter). - Ability to combine filters into arbitrary complex [logical filters](#logical-filter). - Ability to extract structured fields from unstructured logs at query time. See [these docs](#transformations). -- Ability to calculate various stats over the selected log entries. See [these docs](#stats). +- Ability to calculate various stats over the selected log entries. See [these docs](#stats-pipe). ## LogsQL tutorial @@ -177,17 +177,22 @@ These words are taken into account by full-text search filters such as #### Query syntax -LogsQL query consists of the following parts delimited by `|`: +LogsQL query must contain [filters](#filters) for selecting the matching logs. At least a single filter is required. +For example, the following query selects all the logs for the last 5 minutes by using [`_time` filter](#time-filter): -- [Filters](#filters), which select log entries for further processing. This part is required in LogsQL. Other parts are optional. -- Optional [stream context](#stream-context), which allows selecting surrounding log lines for the matching log lines. -- Optional [transformations](#transformations) for the selected log fields. - For example, an additional fields can be extracted or constructed from existing fields. -- Optional [post-filters](#post-filters) for post-filtering of the selected results. For example, post-filtering can filter - results based on the fields constructed by [transformations](#transformations). -- Optional [stats](#stats) transformations, which can calculate various stats across selected results. -- Optional [sorting](#sorting), which can sort the results by the sepcified fields. -- Optional [limiters](#limiters), which can apply various limits on the selected results. +```logsql +_time:5m +``` + +Additionally to filters, LogQL query may contain arbitrary mix of optional actions for processing the selected logs. These actions are delimited by `|` and are known as `pipes`. +For example, the following query uses [`stats` pipe](#stats-pipe) for returning the number of [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) +with the `error` [word](#word) for the last 5 minutes: + +```logsql +_time:5m error | stats count() errors +``` + +See [the list of supported pipes in LogsQL](#pipes). ## Filters @@ -1025,6 +1030,435 @@ Performance tips: - See [other performance tips](#performance-tips). +## Pipes + +Additionally to [filters](#filters), LogsQL query may contain arbitrary mix of '|'-delimited actions known as `pipes`. +For example, the following query uses [`stats`](#stats-pipe), [`sort`](#sort-pipe) and [`head`](#head-pipe) pipes +for returning top 10 [log streams](https://docs.victoriametrics.com/victorialogs/keyconcepts/#stream-fields) +with the biggest number of logs during the last 5 minutes: + +```logsql +_time:5m | stats by (_stream) count() per_stream_logs | sort by (per_stream_logs desc) | head 10 +``` + +LogsQL supports the following pipes: + +- [`copy`](#copy-pipe) copies [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model). +- [`delete`](#delete-pipe) deletes [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model). +- [`fields`](#fields-pipe) selects the given set of [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model). +- [`head`](#head-pipe) limits the number selected logs. +- [`rename`](#rename-pipe) renames [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model). +- [`skip`](#skip-pipe) skips the given number of selected logs. +- [`sort`](#sort-pipe) sorts logs by the given [fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model). +- [`stats`](#stats-pipe) calculates various stats over the selected logs. + +### copy pipe + +If some [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model) must be copied, then `| copy src1 as dst1, ..., srcN as dstN` [pipe](#pipes) can be used. +For example, the following query copies `host` field to `server` for logs over the last 5 minutes, so the output contains both `host` and `server` fields: + +```logsq +_time:5m | copy host as server +``` + +Multiple fields can be copied with a single `| copy ...` pipe. For example, the following query copies +[`_time` field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#time-field) to `timestamp`, while [`_msg` field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) +is copied to `message`: + +```logsql +_time:5m | copy _time as timestmap, _msg as message +``` + +The `as` keyword is optional. + +See also: + +- [`rename` pipe](#rename-pipe) +- [`fields` pipe](#fields-pipe) +- [`delete` pipe](#delete-pipe) + +### delete pipe + +If some [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model) must be deleted, then `| delete field1, ..., fieldN` [pipe](#pipes) can be used. +For example, the following query deletes `host` and `app` fields from the logs over the last 5 minutes: + +```logsql +_time:5m | delete host, app +``` + +See also: + +- [`rename` pipe](#rename-pipe) +- [`fields` pipe](#fields-pipe) + +### fields pipe + +By default all the [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model) are returned in the response. +It is possible to select the given set of log fields with `| fields field1, ..., fieldN` [pipe](#pipes). For example, the following query selects only `host` +and [`_msg`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) fields from logs for the last 5 minutes: + +```logsq +_time:5m | fields host, _msg +``` + +See also: + +- [`copy` pipe](#copy-pipe) +- [`rename` pipe](#rename-pipe) +- [`delete` pipe](#delete-pipe) + +### head pipe + +If only a subset of selected logs must be processed, then `| head N` [pipe](#pipes) can be used. For example, the following query returns up to 100 logs over the last 5 minutes: + +```logsql +_time:5m | head 100 +``` + +By default rows are selected in arbitrary order because of performance reasons, so the query above can return different sets of logs every time it is executed. +[`sort` pipe](#sort-pipe) can be used for making sure the logs are in the same order before applying `head ...` to them. + +See also: + +- [`skip` pipe](#skip-pipe) + +### rename pipe + +If some [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model) must be renamed, then `| rename src1 as dst1, ..., srcN as dstN` [pipe](#pipes) can be used. +For example, the following query renames `host` field to `server` for logs over the last 5 minutes, so the output contains `server` field instead of `host` field: + +```logsql +_time:5m | rename host as server +``` + +Multiple fields can be renamed with a single `| rename ...` pipe. For example, the following query renames `host` to `instance` and `app` to `job`: + +```logsql +_time:5m | rename host as instance, app as job +``` + +See also: + +- [`copy` pipe](#copy-pipe) +- [`fields` pipe](#fields-pipe) +- [`delete` pipe](#delete-pipe) + +### skip pipe + +If some number of selected logs must be skipped after [`sort`](#sort-pipe), then `| skip N` [pipe](#pipes) can be used. For example, the following query skips the first 100 logs +over the last 5 minutes after soring them by [`_time`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#time-field): + +```logsql +_time:5m | sort by (_time) | skip 100 +``` + +Note that skipping rows without sorting has little sense, since they can be returned in arbitrary order because of performance reasons. +Rows can be sorted with [`sort` pipe](#sort-pipe). + +See also: + +- [`head` pipe](#head-pipe) + +### sort pipe + +By default logs are selected in arbitrary order because of performance reasons. If logs must be sorted, then `| sort by (field1, ..., fieldN)` [pipe](#pipes) must be used. +For example, the following query returns logs for the last 5 minutes sorted by [`_stream`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#stream-fields) +and then by [`_time`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#time-field): + +```logsql +_time:5m | sort by (_stream, _time) +``` + +Sorting in reverse order is supported - just add `desc` after the given log field. For example, the folliwng query sorts log fields in reverse order of `request_duration_seconds` field: + +```logsql +_time:5m | sort by (request_duration_seconds desc) +``` + +Note that sorting of big number of logs can be slow and can consume a lot of additional memory. +It is recommended limiting the number of logs before sorting with the following approaches: + +- Reducing the selected time range with [time filter](#time-filter). +- Using more specific [filters](#filters), so they select less logs. + +See also: + +- [`stats` pipe](#stats-pipe) +- [`head` pipe](#head-pipe) +- [`skip` pipe](#skip-pipe) + +### stats pipe + +`| stats ...` pipe allows calculating various stats over the selected logs. For example, the following LogsQL query +uses [`count` stats function](#count-stats) for calculating the number of logs for the last 5 minutes: + +```logsql +_time:5m | stats count() logs_total +``` + +`| stats ...` pipe has the following basic format: + +```logsql +... | stats + stats_func1(...) as result_name1, + ... + stats_funcN(...) as result_nameN +``` + +Where `stats_func*` is any of the supported [stats function](#stats-pipe-functions), while `result_name*` is the name of the log field +to store the result of the corresponding stats function. The `as` keyword is optional. + +For example, the following query calculates the following stats for logs over the last 5 minutes: + +- the number of logs with the help of [`count` stats function](#count-stats); +- the number of unique [log streams](https://docs.victoriametrics.com/victorialogs/keyconcepts/#stream-fields) with the help of [`uniq` stats function](#uniq-stats): + +```logsql +_time:5m | stats count() logs_total, uniq(_stream) streams_total +``` + +See also: + +- [`sort` pipe](#sort-pipe) + + +#### Stats by fields + +The following LogsQL syntax can be used for calculating independent stats per group of log fields: + +```logsql +... | stats by (field1, ..., fieldM) + stats_func1(...) as result_name1, + ... + stats_funcN(...) as result_nameN +``` + +This calculates `stats_func*` per each `(field1, ..., fieldM)` group of [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). + +For example, the following query calculates the number of logs and unique ip addresses over the last 5 minutes, +grouped by `(host, path)` fields: + +```logsql +_time:5m | stats by (host, path) count() logs_total, uniq(ip) ips_total +``` + +#### Stats by time buckets + +The following syntax can be used for calculating stats grouped by time buckets: + +```logsql +... | stats by (_time:step) + stats_func1(...) as result_name1, + ... + stats_funcN(...) as result_nameN +``` + +This calculates `stats_func*` per each `step` of [`_time`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#time-field) field. +The `step` can have any [duration value](#duration-values). For example, the following LogsQL query returns per-minute number of logs and unique ip addresses +over the last 5 minutes: + +``` +_time:5m | stats by (_time:1m) count() logs_total, uniq(ip) ips_total +``` + +#### Stats by time buckets with timezone offset + +VictoriaLogs stores [`_time`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#time-field) values as [Unix time](https://en.wikipedia.org/wiki/Unix_time) +in nanoseconds. This time corresponds to [UTC](https://en.wikipedia.org/wiki/Coordinated_Universal_Time) time zone. Sometimes it is needed calculating stats +grouped by days or weeks at non-UTC timezone. This is possible with the following syntax: + +```logsql +... | stats by (_time:step offset timezone_offset) ... +``` + +For example, the following query calculates per-day number of logs over the last week, in `UTC+02:00` [time zone](https://en.wikipedia.org/wiki/Time_zone): + +```logsql +_time:1w | stats by (_time:1d offset 2h) count() logs_total +``` + +#### Stats by field buckets + +Every log field inside `| stats by (...)` can be bucketed in the same way at `_time` field in [this example](#stats-by-time-buckets). +Any [numeric value](#numeric-values) can be used as `step` value for the bucket. For example, the following query calculates +the number of requests for the last hour, bucketed by 10KB of `request_size_bytes` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model): + +```logsql +_time:1h | stats by (request_size_bytes:10KB) count() requests +``` + +#### Stats by IPv4 buckets + +Stats can be bucketed by [log field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) containing [IPv4 addresses](https://en.wikipedia.org/wiki/IP_address) +via the `ip_field_name:/network_mask` syntax inside `by(...)` clause. For example, the following query returns the number of log entries per `/24` subnetwork +extracted from the `ip` [log field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) during the last 5 minutes: + +```logsql +_time:5m | stats by (ip:/24) count() requests_per_subnet +``` + +## stats pipe functions + +LogsQL supports the following functions for [`stats` pipe](#stats-pipe): + +- [`avg`](#avg-stats) calculates the average value over the given numeric [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). +- [`count`](#count-stats) calculates the number of log entries. +- [`max`](#max-stats) calcualtes the maximum value over the given numeric [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). +- [`min`](#min-stats) calculates the minumum value over the given numeric [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). +- [`sum`](#sum-stats) calculates the sum for the given numeric [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). +- [`uniq`](#uniq-stats) calculates the number of unique non-empty values for the given [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). +- [`uniq_array`](#uniq_array-stats) returns unique non-empty values for the given [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). + +### avg stats + +`avg(field1, ..., fieldN)` [stats pipe](#stats-pipe) calculates the average value across +all the mentioned [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). +Non-numeric values are ignored. + +For example, the following query returns the average value for the `duration` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) +over logs for the last 5 minutes: + +```logsql +_time:5m | stats avg(duration) avg_duration +``` + +See also: + +- [`min`](#min-stats) +- [`max`](#max-stats) +- [`sum`](#sum-stats) +- [`count`](#count-stats) + +### count stats + +`count()` calculates the number of selected logs. + +For example, the following query returns the number of logs over the last 5 minutes: + +```logsql +_time:5m | stats count() logs +``` + +It is possible calculating the number of logs with non-empty values for some [log field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) +with the `count(fieldName)` syntax. For example, the following query returns the number of logs with non-empty `username` field over the last 5 minutes: + +```logsq +_time:5m | stats count(username) logs_with_username +``` + +If multiple fields are enumerated inside `count()`, then it counts the number of logs with at least a single non-empty field mentioned inside `count()`. +For example, the following query returns the number of logs with non-empty `username` or `password` [fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) +over the last 5 minutes: + +```logsql +_time:5m | stats count(username, password) logs_with_username_or_password +``` + +See also: + +- [`sum`](#sum-stats) +- [`avg`](#avg-stats) + +### max stats + +`max(field1, ..., fieldN)` [stats pipe](#stats-pipe) calculates the maximum value across +all the mentioned [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). +Non-numeric values are ignored. + +For example, the following query returns the maximum value for the `duration` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) +over logs for the last 5 minutes: + +```logsql +_time:5m | stats max(duration) max_duration +``` + +See also: + +- [`min`](#min-stats) +- [`avg`](#avg-stats) +- [`sum`](#sum-stats) +- [`count`](#count-stats) + +### min stats + +`min(field1, ..., fieldN)` [stats pipe](#stats-pipe) calculates the minimum value across +all the mentioned [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). +Non-numeric values are ignored. + +For example, the following query returns the minimum value for the `duration` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) +over logs for the last 5 minutes: + +```logsql +_time:5m | stats min(duration) min_duration +``` + +See also: + +- [`max`](#max-stats) +- [`avg`](#avg-stats) +- [`sum`](#sum-stats) +- [`count`](#count-stats) + +### sum stats + +`sum(field1, ..., fieldN)` [stats pipe](#stats-pipe) calculates the sum of numeric values across +all the mentioned [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). + +For example, the following query returns the sum of numeric values for the `duration` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) +over logs for the last 5 minutes: + +```logsql +_time:5m | stats sum(duration) sum_duration +``` + +See also: + +- [`count`](#count-stats) +- [`avg`](#avg-stats) +- [`max`](#max-stats) +- [`min`](#min-stats) + +### uniq stats + +`uniq(field1, ..., fieldN)` [stats pipe](#stats-pipe) calculates the number of unique non-empty `(field1, ..., fieldN)` tuples. + +For example, the following query returns the number of unique non-empty values for `ip` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) +over the last 5 minutes: + +```logsql +_time:5m | stats uniq(ip) ips +``` + +The following query returns the number of unique `(host, path)` pairs for the corresponding [fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) +over the last 5 minutes: + +```logsql +_time:5m | stats uniq(host, path) unique_host_path_pairs +``` + +See also: + +- [`uniq_array`](#uniq_array-stats) +- [`count`](#count-stats) + +### uniq_array stats + +`uniq_array(field1, ..., fieldN)` [stats pipe](#stats-pipe) returns the unique non-empty values across +the mentioned [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). +The returned values are sorted and encoded in JSON array. + +For example, the following query returns unique non-empty values for the `ip` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) +over logs for the last 5 minutes: + +```logsql +_time:5m | stats uniq_array(ip) unique_ips +``` + +See also: + +- [`uniq`](#uniq-stats) +- [`count`](#count-stats) + ## Stream context LogsQL will support the ability to select the given number of surrounding log lines for the selected log lines @@ -1046,11 +1480,9 @@ LogsQL will support the following transformations for the [selected](#filters) l - Creating a new field from existing [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model) according to the provided format. - Creating a new field according to math calculations over existing [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model). -- Copying of the existing [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model). -- Parsing duration strings into floating-point seconds for further [stats calculations](#stats). +- Parsing duration strings into floating-point seconds for further [stats calculations](#stats-pipe). - Creating a boolean field with the result of arbitrary [post-filters](#post-filters) applied to the current fields. - Boolean fields may be useful for [conditional stats calculation](#stats). -- Creating an integer field with the length of the given field value. This can be useful for [stats calculations](#stats). +- Creating an integer field with the length of the given field value. This can be useful for [stats calculations](#stats-pipe). See the [Roadmap](https://docs.victoriametrics.com/VictoriaLogs/Roadmap.html) for details. @@ -1069,166 +1501,7 @@ See the [Roadmap](https://docs.victoriametrics.com/VictoriaLogs/Roadmap.html) fo ## Stats -### stats functions - -LogsQL supports the following stats functions: - -- [`count`](#count) - calculates the number of log entries. -- [`uniq`](#uniq) - calculates the number of unique non-empty values for the given [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). -- [`sum`](#sum) - calculates the sum for the given numeric [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). -- [`avg`](#avg) - calculates the average value over the given numeric [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). -- [`min`](#min) - calculates the minumum value over the given numeric [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). -- [`max`](#max) - calcualtes the maximum value over the given numeric [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). -- [`uniq_array`](#uniq_array) - returns unique non-empty values for the given [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). - -#### count - -Examples: - -- `error | stats count() as errors_total` returns the number of [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with the `error` [word](#word). -- `error | stats by (_stream) count() as errors_by_stream` returns the number of [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) - with the `error` [word](#word) grouped by [`_stream`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#stream-fields). -- `error | stats by (datacenter, namespace) count(trace_id, user_id) as errors_with_trace_and_user` returns the number - of [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) containing the `error` [word](#word), - which contain non-empty `trace_id` or `user_id` [fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model), grouped by `datacenter` and `namespace` fields. - -See also [`sum`](#sum) and [`avg`](#avg). - -#### uniq - -Examples: - -- `error | stats uniq(client_ip) as unique_ips` returns the number of unique values for `client_ip` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) - across [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with the `error` [word](#word). -- `error | stats by (app) uniq(path, host) as unique_path_hosts` - returns the number of unique `(path, host)` pairs - for [field values](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) across [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) - with the `error` [word](#word), grouped by `app` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). -- `error | fields path, host | stats uniq(*) unique_path_hosts` - returns the number of unique `(path, host)` pairs - for [field values](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) across [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) - with the `error` [word](#word). - -See also [`uniq_array`](#uniq_array). - -#### sum - -Examples: - -- `error | stats sum(duration) duration_total` - returns the sum of `duration` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) values - across [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with the `error` [word](#word). -- `GET | stats by (path) sum(response_size) response_size_sum` - returns the sum of `response_size` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) values - across [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with the `GET` [word](#word), grouped - by `path` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) value. - -See also [count](#count) and [avg](#avg). - -#### avg - -Examples: - -- `error | stats avg(duration) duration_avg` - returns the average value for the `duration` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) - across [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with the `error` [word](#word). -- `GET | stats by (path) avg(response_size) avg_response_size` - returns the average value for the `response_size` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) - across [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with the `GET` [word](#word), grouped - by `path` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) value. - -See also [sum](#sum) and [count](#count). - -#### max - -Examples: - -- `error | stats max(duration) duration_max` - returns the maximum value for the `duration` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) - across [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with the `error` [word](#word). -- `GET | stats by (path) max(response_size) max_response_size` - returns the maximum value for the `response_size` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) - across [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with the `GET` [word](#word), grouped - by `path` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) value. - -See also [min](#min). - -#### min - -Examples: - -- `error | stats min(duration) duration_min` - returns the minimum value for the `duration` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) - across [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with the `error` [word](#word). -- `GET | stats by (path) min(response_size) min_response_size` - returns the minimum value for the `response_size` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) - across [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with the `GET` [word](#word), grouped - by `path` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) value. - -See also [max](#max). - -#### uniq_array - -Examples: - -- `_time:1h | stats uniq_array(client_ip) as unique_ips` returns unique values for `client_ip` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) - across logs for the last hour. The unqiue values are returned in JSON array such as `["1.2.4.5","5.6.7.8"]`. -- `_time:1h | stats by (host) unique_array(path) as unique_paths` returns unique values for `path` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) - across logs for the last hour, grouped by `host` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). - -See also [uniq](#uniq) and [count](#count). - -### Grouping stats by buckets - -#### Time buckets - -Stats can be bucketed by [`_time`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#time-field) with the `_time:bucket_duration` syntax inside `by(...)` clause. -For example, the following query returns per-minute number of log messages with the `error` [word](#word) for the last 10 minutes: - -```logsql -_time:10m error | stats by (_time:1m) count() errors_per_minute -``` - -It is possible to add offset (for example, [timezone offset](https://en.wikipedia.org/wiki/UTC_offset)) when bucketing by `_time`. For example, the following query calculates -the number of per-day log entries for the last week at '2h' offset aka `UTC+02:00` offset: - -```logsql -_time:1w | stats by (_time:1d offset 2h) count() logs_per_day_kyiv_offset -``` - -#### Numeric buckets - -Stats can be bucketed by any numeric [log field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) with the `field_name:bucket_size` syntax inside `by(...)` clause. -For example, the following query returns the number of log messages with the `status=200` [phrase](#phrase-filter) bucketed by `request_duration_seconds` numeric field with `0.5` step: - -```logsql -_time:10m "status=200" | stats by (request_duration_seconds:0.5) count() requests -``` - -The `bucket_size` can contain the following convenient suffixes: - -- `KB` - the `bucket_size` is multiplied by `1000` in this case. For example, `10KB`. -- `MB` - the `bucket_size` is multiplied by `1_000_000` in this case. For example, `10MB`. -- `GB` - the `bucket_size` is multiplied by `1_000_000_000` in this case. For example, `10GB`. -- `TB` - the `bucket_size` is multiplied by `1_000_000_000_000` in this case. For example, `10TB`. -- `KiB` - the `bucket_size` is multiplied by `1024` in this case. For example, `10KiB`. -- `MiB` - the `bucket_size` is multiplied by `1024*1024` in this case. For example, `10MiB`. -- `GiB` - the `bucket_size` is multiplied by `1024*1024*1024` in this case. For example, `10GiB`. -- `TiB` - the `bucket_size` is multiplied by `1024*1024*1024*1024` in this case. For example, `10TiB`. - -#### IPv4 mask buckets - -Stats can be bucketed by [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) with [IPv4 addresses](https://en.wikipedia.org/wiki/IP_address) -via the `ip_field_name:/network_mask` syntax inside `by(...)` clause. For example, the following query returns the number of log entries per `/24` subnetwork during the last 10 minutes: - -```logsql -_time:10m | stats by (ip:/24) count() requests_per_subnet -``` - -### Calculating multiple stats - -Stats calculations can be combined. For example, the following query calculates the number of log messages with the `error` [word](#word), -the number of unique values for `ip` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) and the sum of `duration` -[field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model), grouped by `namespace` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model): - -```logsql -error | stats by (namespace) - count() as errors_total, - uniq(ip) as unique_ips, - sum(duration) as duration_sum -``` - -### Stats TODO +Stats over the selected logs can be calculated via [`stats` pipe](#stats-pipe). LogsQL will support calculating the following additional stats based on the [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model) and fields created by [transformations](#transformations): @@ -1238,52 +1511,70 @@ and fields created by [transformations](#transformations): It will be possible specifying an optional condition [filter](#post-filters) when calculating the stats. For example, `sum(response_size) if (is_admin:true)` calculates the total response size for admins only. -It will be possible to group stats by the specified time buckets. - It is possible to perform stats calculations on the [selected log entries](#filters) at client side with `sort`, `uniq`, etc. Unix commands according to [these docs](https://docs.victoriametrics.com/VictoriaLogs/querying/#command-line). -See the [Roadmap](https://docs.victoriametrics.com/VictoriaLogs/Roadmap.html) for details. - ## Sorting By default VictoriaLogs sorts the returned results by [`_time` field](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#time-field) -if their total size doesn't exceed `-select.maxSortBufferSize` command-line value (by default it is set to one megabytes). -Otherwise sorting is skipped because of performance and efficiency concerns described [here](https://docs.victoriametrics.com/VictoriaLogs/querying/). +if their total size doesn't exceed `-select.maxSortBufferSize` command-line value (by default it is set to 1MB). +Otherwise sorting is skipped because of performance reasons. -It is possible to sort the [selected log entries](#filters) at client side with `sort` Unix command -according to [these docs](https://docs.victoriametrics.com/VictoriaLogs/querying/#command-line). - -LogsQL will support results' sorting by the given set of [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model). - -See the [Roadmap](https://docs.victoriametrics.com/VictoriaLogs/Roadmap.html) for details. +Use [`sort` pipe](#sort-pipe) for sorting the results. ## Limiters -LogsQL provides the following functionality for limiting the number of returned log entries: +LogsQL provides the following [pipes](#pipes) for limiting the number of returned log entries: -- `error | head 10` - returns up to 10 log entries with the `error` [word](#word). -- `error | skip 10` - skips the first 10 log entris with the `error` [word](#word). - -It is recommended [sorting](#sorting) entries before limiting the number of returned log entries, -in order to get consistent results. - -It is possible to limit the returned results with `head`, `tail`, `less`, etc. Unix commands -according to [these docs](https://docs.victoriametrics.com/VictoriaLogs/querying/#command-line). - -See the [Roadmap](https://docs.victoriametrics.com/VictoriaLogs/Roadmap.html) for details. +- [`fields`](#fields-pipe) and [`delete`](#delete-pipe) pipes allow limiting the set of [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model) to return. +- [`head` pipe](#head-pipe) allows limiting the number of log entries to return. ## Querying specific fields -By default VictoriaLogs query response contains all the [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). +Specific log fields can be queried via [`fields` pipe](#fields-pipe). -If you want selecting some specific fields, then add `| fields field1, field2, ... fieldN` to the query. -For example, the following query returns only [`_time`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#time-field), -[`_stream`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#stream-fields), `host` and [`_msg`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) fields: +## Numeric values -```logsql -error | fields _time, _stream, host, _msg -``` +LogsQL accepts numeric values in the following formats: + +- regular integers like `12345` or `-12345` +- regular floating point numbers like `0.123` or `-12.34` +- [short numeric format](#short-numeric-values) +- [duration format](#duration-values) + +### Short numeric values + +LogsQL accepts integer and floating point values with the following suffixes: + +- `K` and `KB` - the value is multiplied by `10^3` +- `M` and `MB` - the value is multiplied by `10^6` +- `G` and `GB` - the value is multiplied by `10^9` +- `T` and `TB` - the value is multiplied by `10^12` +- `Ki` and `KiB` - the value is multiplied by `2^10` +- `Mi` and `MiB` - the value is multiplied by `2^20` +- `Gi` and `GiB` - the value is multiplied by `2^30` +- `Ti` and `TiB` - the value is multiplied by `2^40` + +All the numbers may contain `_` delimiters, which may improve readability of the query. For example, `1_234_567` is equivalent to `1234567`, +while `1.234_567` is equivalent to `1.234567`. + +## Duration values + +LogsQL accepts duration values with the following suffixes at places where the duration is allowed: + +- `ns` - nanoseconds. For example, `123ns`. +- `µs` - microseconds. For example, `1.23µs`. +- `ms` - milliseconds. For example, `1.23456ms` +- `s` - seconds. For example, `1.234s` +- `m` - minutes. For example, `1.5m` +- `h` - hours. For example, `1.5h` +- `d` - days. For example, `1.5d` +- `w` - weeks. For example, `1w` +- `y` - years as 365 days. For example, `1.5y` + +Multiple durations can be combined. For example, `1h33m55s`. + +Internally duration values are converted into nanoseconds. ## Performance tips diff --git a/docs/VictoriaLogs/Roadmap.md b/docs/VictoriaLogs/Roadmap.md index a82dc3326..42b4b06f1 100644 --- a/docs/VictoriaLogs/Roadmap.md +++ b/docs/VictoriaLogs/Roadmap.md @@ -30,6 +30,7 @@ See [these docs](https://docs.victoriametrics.com/VictoriaLogs/) for details. The following functionality is planned in the future versions of VictoriaLogs: - Support for [data ingestion](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/) from popular log collectors and formats: + - OpenTelemetry for logs - Fluentd - Syslog - Journald (systemd) @@ -37,9 +38,6 @@ The following functionality is planned in the future versions of VictoriaLogs: - [Stream context](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html#stream-context). - [Transformation functions](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html#transformations). - [Post-filtering](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html#post-filters). - - [Stats calculations](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html#stats). - - [Sorting](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html#sorting). - - [Limiters](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html#limiters). - The ability to use subqueries inside [in()](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html#multi-exact-filter) function. - Live tailing for [LogsQL filters](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html#filters) aka `tail -f`. - Web UI with the following abilities: diff --git a/lib/logstorage/block_result.go b/lib/logstorage/block_result.go index 7c3673176..ae65d2a38 100644 --- a/lib/logstorage/block_result.go +++ b/lib/logstorage/block_result.go @@ -3,6 +3,7 @@ package logstorage import ( "encoding/binary" "math" + "slices" "github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil" "github.com/VictoriaMetrics/VictoriaMetrics/lib/decimal" @@ -121,9 +122,6 @@ func (br *blockResult) fetchAllColumns(bs *blockSearch, bm *bitmap) { func (br *blockResult) fetchRequestedColumns(bs *blockSearch, bm *bitmap) { for _, columnName := range bs.bsw.so.resultColumnNames { - if columnName == "" { - columnName = "_msg" - } switch columnName { case "_stream": if !br.addStreamColumn(bs) { @@ -275,10 +273,7 @@ func (br *blockResult) addColumn(bs *blockSearch, ch *columnHeader, bm *bitmap) } dictValues = valuesBuf[valuesBufLen:] - name := ch.name - if name == "" { - name = "_msg" - } + name := getCanonicalColumnName(ch.name) br.cs = append(br.cs, blockResultColumn{ name: name, valueType: ch.valueType, @@ -425,6 +420,7 @@ func (br *blockResult) getBucketedTimestampValues(bucketSize, bucketOffset float timestamp := timestamps[i] timestamp -= bucketOffsetInt timestamp -= timestamp % bucketSizeInt + timestamp += bucketOffsetInt if i > 0 && timestamp == prevTimestamp { valuesBuf = append(valuesBuf, s) continue @@ -516,6 +512,7 @@ func (br *blockResult) getBucketedUint8Values(encodedValues []string, bucketSize n := uint64(v[0]) n -= bucketOffsetInt n -= n % bucketSizeInt + n += bucketOffsetInt if i > 0 && n == nPrev { valuesBuf = append(valuesBuf, s) continue @@ -570,6 +567,7 @@ func (br *blockResult) getBucketedUint16Values(encodedValues []string, bucketSiz n := uint64(encoding.UnmarshalUint16(b)) n -= bucketOffsetInt n -= n % bucketSizeInt + n += bucketOffsetInt if i > 0 && n == nPrev { valuesBuf = append(valuesBuf, s) continue @@ -624,6 +622,7 @@ func (br *blockResult) getBucketedUint32Values(encodedValues []string, bucketSiz n := uint64(encoding.UnmarshalUint32(b)) n -= bucketOffsetInt n -= n % bucketSizeInt + n += bucketOffsetInt if i > 0 && n == nPrev { valuesBuf = append(valuesBuf, s) continue @@ -678,6 +677,7 @@ func (br *blockResult) getBucketedUint64Values(encodedValues []string, bucketSiz n := encoding.UnmarshalUint64(b) n -= bucketOffsetInt n -= n % bucketSizeInt + n += bucketOffsetInt if i > 0 && n == nPrev { valuesBuf = append(valuesBuf, s) continue @@ -742,6 +742,8 @@ func (br *blockResult) getBucketedFloat64Values(encodedValues []string, bucketSi fP10 -= fP10 % bucketSizeP10 f = float64(fP10) / p10 + f += bucketOffset + if i > 0 && f == fPrev { valuesBuf = append(valuesBuf, s) continue @@ -794,6 +796,7 @@ func (br *blockResult) getBucketedIPv4Values(encodedValues []string, bucketSize, n := binary.BigEndian.Uint32(b) n -= bucketOffsetInt n -= n % bucketSizeInt + n += bucketOffsetInt if i > 0 && n == nPrev { valuesBuf = append(valuesBuf, s) continue @@ -850,6 +853,7 @@ func (br *blockResult) getBucketedTimestampISO8601Values(encodedValues []string, n := encoding.UnmarshalUint64(b) n -= bucketOffsetInt n -= n % bucketSizeInt + n += bucketOffsetInt if i > 0 && n == nPrev { valuesBuf = append(valuesBuf, s) continue @@ -887,6 +891,7 @@ func (br *blockResult) getBucketedValue(s string, bucketSize, bucketOffset float if f, ok := tryParseFloat64(s); ok { f -= bucketOffset + // emulate f % bucketSize for float64 values _, e := decimal.FromFloat(bucketSize) p10 := math.Pow10(int(-e)) @@ -894,6 +899,8 @@ func (br *blockResult) getBucketedValue(s string, bucketSize, bucketOffset float fP10 -= fP10 % int64(bucketSize*p10) f = float64(fP10) / p10 + f += bucketOffset + bufLen := len(br.buf) br.buf = marshalFloat64(br.buf, f) return bytesutil.ToUnsafeString(br.buf[bufLen:]) @@ -902,6 +909,7 @@ func (br *blockResult) getBucketedValue(s string, bucketSize, bucketOffset float if nsecs, ok := tryParseTimestampISO8601(s); ok { nsecs -= int64(bucketOffset) nsecs -= nsecs % int64(bucketSize) + nsecs += int64(bucketOffset) bufLen := len(br.buf) br.buf = marshalTimestampISO8601(br.buf, nsecs) return bytesutil.ToUnsafeString(br.buf[bufLen:]) @@ -910,6 +918,7 @@ func (br *blockResult) getBucketedValue(s string, bucketSize, bucketOffset float if nsecs, ok := tryParseTimestampRFC3339Nano(s); ok { nsecs -= int64(bucketOffset) nsecs -= nsecs % int64(bucketSize) + nsecs += int64(bucketOffset) bufLen := len(br.buf) br.buf = marshalTimestampRFC3339Nano(br.buf, nsecs) return bytesutil.ToUnsafeString(br.buf[bufLen:]) @@ -918,6 +927,7 @@ func (br *blockResult) getBucketedValue(s string, bucketSize, bucketOffset float if n, ok := tryParseIPv4(s); ok { n -= uint32(int32(bucketOffset)) n -= n % uint32(bucketSize) + n += uint32(int32(bucketOffset)) bufLen := len(br.buf) br.buf = marshalIPv4(br.buf, n) return bytesutil.ToUnsafeString(br.buf[bufLen:]) @@ -926,6 +936,7 @@ func (br *blockResult) getBucketedValue(s string, bucketSize, bucketOffset float if nsecs, ok := tryParseDuration(s); ok { nsecs -= int64(bucketOffset) nsecs -= nsecs % int64(bucketSize) + nsecs += int64(bucketOffset) bufLen := len(br.buf) br.buf = marshalDuration(br.buf, nsecs) return bytesutil.ToUnsafeString(br.buf[bufLen:]) @@ -942,7 +953,69 @@ func (br *blockResult) addEmptyStringColumn(columnName string) { }) } -func (br *blockResult) updateColumns(columnNames []string) { +// copyColumns copies columns from srcColumnNames to dstColumnNames. +func (br *blockResult) copyColumns(srcColumnNames, dstColumnNames []string) { + if len(srcColumnNames) == 0 { + return + } + + cs := br.cs + csOffset := len(cs) + for _, c := range br.getColumns() { + if idx := slices.Index(srcColumnNames, c.name); idx >= 0 { + c.name = dstColumnNames[idx] + cs = append(cs, c) + // continue is skipped intentionally in order to leave the original column in the columns list. + } + if !slices.Contains(dstColumnNames, c.name) { + cs = append(cs, c) + } + } + br.csOffset = csOffset + br.cs = cs +} + +// renameColumns renames columns from srcColumnNames to dstColumnNames. +func (br *blockResult) renameColumns(srcColumnNames, dstColumnNames []string) { + if len(srcColumnNames) == 0 { + return + } + + cs := br.cs + csOffset := len(cs) + for _, c := range br.getColumns() { + if idx := slices.Index(srcColumnNames, c.name); idx >= 0 { + c.name = dstColumnNames[idx] + cs = append(cs, c) + continue + } + if !slices.Contains(dstColumnNames, c.name) { + cs = append(cs, c) + } + } + br.csOffset = csOffset + br.cs = cs +} + +// deleteColumns deletes columns with the given columnNames. +func (br *blockResult) deleteColumns(columnNames []string) { + if len(columnNames) == 0 { + return + } + + cs := br.cs + csOffset := len(cs) + for _, c := range br.getColumns() { + if !slices.Contains(columnNames, c.name) { + cs = append(cs, c) + } + } + br.csOffset = csOffset + br.cs = cs +} + +// setColumns sets the resulting columns to the given columnNames. +func (br *blockResult) setColumns(columnNames []string) { if br.areSameColumns(columnNames) { // Fast path - nothing to change. return @@ -973,10 +1046,6 @@ func (br *blockResult) areSameColumns(columnNames []string) bool { } func (br *blockResult) getColumnByName(columnName string) blockResultColumn { - if columnName == "" { - columnName = "_msg" - } - cs := br.getColumns() // iterate columns in reverse order, so overridden column results are returned instead of original column results. @@ -1110,37 +1179,6 @@ func (c *blockResultColumn) addValue(v string) { c.values = c.valuesBuf } -// getEncodedValues returns encoded values for the given column. -// -// The returned encoded values are valid until br.reset() is called. -func (c *blockResultColumn) getEncodedValues(br *blockResult) []string { - if c.encodedValues != nil { - return c.encodedValues - } - - if !c.isTime { - logger.Panicf("BUG: encodedValues may be missing only for _time column; got %q column", c.name) - } - - buf := br.buf - valuesBuf := br.valuesBuf - valuesBufLen := len(valuesBuf) - - for _, timestamp := range br.timestamps { - bufLen := len(buf) - buf = encoding.MarshalInt64(buf, timestamp) - s := bytesutil.ToUnsafeString(buf[bufLen:]) - valuesBuf = append(valuesBuf, s) - } - - c.encodedValues = valuesBuf[valuesBufLen:] - - br.valuesBuf = valuesBuf - br.buf = buf - - return c.encodedValues -} - // getValueAtRow returns value for the value at the given rowIdx. // // The returned value is valid until br.reset() is called. diff --git a/lib/logstorage/parser.go b/lib/logstorage/parser.go index 15d158da7..5161ccb1c 100644 --- a/lib/logstorage/parser.go +++ b/lib/logstorage/parser.go @@ -202,15 +202,64 @@ func (q *Query) String() string { } func (q *Query) getResultColumnNames() []string { - for _, p := range q.pipes { - switch t := p.(type) { - case *pipeFields: - return t.fields - case *pipeStats: - return t.neededFields() + input := []string{"*"} + + pipes := q.pipes + for i := len(pipes) - 1; i >= 0; i-- { + fields, m := pipes[i].getNeededFields() + if len(fields) == 0 { + input = nil + } + if len(input) == 0 { + break + } + + // transform upper input fields to the current input fields according to the given mapping. + if input[0] != "*" { + var dst []string + for _, f := range input { + if a, ok := m[f]; ok { + dst = append(dst, a...) + } else { + dst = append(dst, f) + } + } + input = normalizeFields(dst) + } + + // intersect fields with input + if fields[0] != "*" { + m := make(map[string]struct{}) + for _, f := range input { + m[f] = struct{}{} + } + var dst []string + for _, f := range fields { + if _, ok := m[f]; ok { + dst = append(dst, f) + } + } + input = normalizeFields(dst) } } - return []string{"*"} + + return input +} + +func normalizeFields(a []string) []string { + m := make(map[string]struct{}, len(a)) + dst := make([]string, 0, len(a)) + for _, s := range a { + if s == "*" { + return []string{"*"} + } + if _, ok := m[s]; ok { + continue + } + m[s] = struct{}{} + dst = append(dst, s) + } + return dst } // ParseQuery parses s. @@ -522,14 +571,17 @@ func parseFilterLenRange(lex *lexer, fieldName string) (filter, error) { if len(args) != 2 { return nil, fmt.Errorf("unexpected number of args for %s(); got %d; want 2", funcName, len(args)) } + minLen, err := parseUint(args[0]) if err != nil { return nil, fmt.Errorf("cannot parse minLen at %s(): %w", funcName, err) } + maxLen, err := parseUint(args[1]) if err != nil { return nil, fmt.Errorf("cannot parse maxLen at %s(): %w", funcName, err) } + stringRepr := "(" + args[0] + ", " + args[1] + ")" fr := &filterLenRange{ fieldName: fieldName, @@ -739,16 +791,17 @@ func parseFilterRange(lex *lexer, fieldName string) (filter, error) { func parseFloat64(lex *lexer) (float64, string, error) { s := getCompoundToken(lex) f, err := strconv.ParseFloat(s, 64) - if err != nil { - // Try parsing s as integer. - // This handles 0x..., 0b... and 0... prefixes. - n, err := parseInt(s) - if err == nil { - return float64(n), s, nil - } - return 0, "", fmt.Errorf("cannot parse %q as float64: %w", lex.token, err) + if err == nil { + return f, s, nil } - return f, s, nil + + // Try parsing s as integer. + // This handles 0x..., 0b... and 0... prefixes, alongside '_' delimiters. + n, err := parseInt(s) + if err == nil { + return float64(n), s, nil + } + return 0, "", fmt.Errorf("cannot parse %q as float64: %w", lex.token, err) } func parseFuncArg(lex *lexer, fieldName string, callback func(args string) (filter, error)) (filter, error) { @@ -1184,7 +1237,22 @@ func parseUint(s string) (uint64, error) { if strings.EqualFold(s, "inf") || strings.EqualFold(s, "+inf") { return math.MaxUint64, nil } - return strconv.ParseUint(s, 0, 64) + + n, err := strconv.ParseUint(s, 0, 64) + if err == nil { + return n, nil + } + nn, ok := tryParseBytes(s) + if !ok { + nn, ok = tryParseDuration(s) + if !ok { + return 0, fmt.Errorf("cannot parse %q as unsigned integer: %w", s, err) + } + if nn < 0 { + return 0, fmt.Errorf("cannot parse negative value %q as unsigned integer", s) + } + } + return uint64(nn), nil } func parseInt(s string) (int64, error) { @@ -1193,7 +1261,18 @@ func parseInt(s string) (int64, error) { return math.MaxInt64, nil case strings.EqualFold(s, "-inf"): return math.MinInt64, nil - default: - return strconv.ParseInt(s, 0, 64) } + + n, err := strconv.ParseInt(s, 0, 64) + if err == nil { + return n, nil + } + nn, ok := tryParseBytes(s) + if !ok { + nn, ok = tryParseDuration(s) + if !ok { + return 0, fmt.Errorf("cannot parse %q as integer: %w", s, err) + } + } + return nn, nil } diff --git a/lib/logstorage/parser_test.go b/lib/logstorage/parser_test.go index 72cdaf667..359e14d4a 100644 --- a/lib/logstorage/parser_test.go +++ b/lib/logstorage/parser_test.go @@ -3,6 +3,7 @@ package logstorage import ( "math" "reflect" + "slices" "testing" "time" ) @@ -540,6 +541,12 @@ func TestParseRangeFilter(t *testing.T) { f(`:range(1, 2)`, ``, math.Nextafter(1, math.Inf(1)), math.Nextafter(2, math.Inf(-1))) f(`range[1, 2)`, ``, 1, math.Nextafter(2, math.Inf(-1))) f(`range("1", 2]`, ``, math.Nextafter(1, math.Inf(1)), 2) + + f(`response_size:range[1KB, 10MiB]`, `response_size`, 1_000, 10*(1<<20)) + f(`response_size:range[1G, 10Ti]`, `response_size`, 1_000_000_000, 10*(1<<40)) + f(`response_size:range[10, inf]`, `response_size`, 10, math.Inf(1)) + + f(`duration:range[100ns, 1y2w2.5m3s5ms]`, `duration`, 100, 1*nsecsPerYear+2*nsecsPerWeek+2.5*nsecsPerMinute+3*nsecsPerSecond+5*nsecsPerMillisecond) } func TestParseQuerySuccess(t *testing.T) { @@ -749,6 +756,7 @@ func TestParseQuerySuccess(t *testing.T) { f(`len_range(10, +InF)`, `len_range(10, +InF)`) f(`len_range(10, 1_000_000)`, `len_range(10, 1_000_000)`) f(`len_range(0x10,0b100101)`, `len_range(0x10, 0b100101)`) + f(`len_range(1.5KB, 22MB100KB)`, `len_range(1.5KB, 22MB100KB)`) // range filter f(`range(1.234, 5656.43454)`, `range(1.234, 5656.43454)`) @@ -760,6 +768,7 @@ func TestParseQuerySuccess(t *testing.T) { f(`range(1_000, 0o7532)`, `range(1_000, 0o7532)`) f(`range(0x1ff, inf)`, `range(0x1ff, inf)`) f(`range(-INF,+inF)`, `range(-INF, +inF)`) + f(`range(1.5K, 22.5GiB)`, `range(1.5K, 22.5GiB)`) // re filter f("re('foo|ba(r.+)')", `re("foo|ba(r.+)")`) @@ -816,19 +825,34 @@ func TestParseQuerySuccess(t *testing.T) { f(`foo | fields bar`, `foo | fields bar`) f(`foo|FIELDS bar,Baz , "a,b|c"`, `foo | fields bar, Baz, "a,b|c"`) f(`foo | Fields x.y, "abc:z/a", _b$c`, `foo | fields x.y, "abc:z/a", "_b$c"`) + f(`foo | fields "", a`, `foo | fields _msg, a`) // multiple fields pipes f(`foo | fields bar | fields baz, abc`, `foo | fields bar | fields baz, abc`) + // copy pipe + f(`* | copy foo as bar`, `* | copy foo as bar`) + f(`* | COPY foo as bar, x y | Copy a as b`, `* | copy foo as bar, x as y | copy a as b`) + + // rename pipe + f(`* | rename foo as bar`, `* | rename foo as bar`) + f(`* | RENAME foo AS bar, x y | Rename a as b`, `* | rename foo as bar, x as y | rename a as b`) + + // delete pipe + f(`* | delete foo`, `* | delete foo`) + f(`* | DELETE foo, bar`, `* | delete foo, bar`) + // head pipe f(`foo | head 10`, `foo | head 10`) - f(`foo | HEAD 1123432`, `foo | head 1123432`) + f(`foo | HEAD 1_123_432`, `foo | head 1123432`) + f(`foo | head 10K`, `foo | head 10000`) // multiple head pipes f(`foo | head 100 | head 10 | head 234`, `foo | head 100 | head 10 | head 234`) // skip pipe f(`foo | skip 10`, `foo | skip 10`) + f(`foo | skip 12_345M`, `foo | skip 12345000000`) // multiple skip pipes f(`foo | skip 10 | skip 100`, `foo | skip 10 | skip 100`) @@ -839,6 +863,8 @@ func TestParseQuerySuccess(t *testing.T) { f(`* | stats count() x`, `* | stats count(*) as x`) f(`* | stats count(*) x`, `* | stats count(*) as x`) f(`* | stats count(foo,*,bar) x`, `* | stats count(*) as x`) + f(`* | stats count('') foo`, `* | stats count(_msg) as foo`) + f(`* | stats count(foo) ''`, `* | stats count(foo) as _msg`) // stats pipe sum f(`* | stats Sum(foo) bar`, `* | stats sum(foo) as bar`) @@ -1107,6 +1133,23 @@ func TestParseQueryFailure(t *testing.T) { f(`foo | fields bar,`) f(`foo | fields bar,,`) + // invalid copy pipe + f(`foo | copy`) + f(`foo | copy foo`) + f(`foo | copy foo,`) + f(`foo | copy foo,,`) + + // invalid rename pipe + f(`foo | rename`) + f(`foo | rename foo`) + f(`foo | rename foo,`) + f(`foo | rename foo,,`) + + // invalid delete pipe + f(`foo | delete`) + f(`foo | delete foo,`) + f(`foo | delete foo,,`) + // missing head pipe value f(`foo | head`) @@ -1175,3 +1218,25 @@ func TestParseQueryFailure(t *testing.T) { f(`foo | stats by(bar,`) f(`foo | stats by(bar)`) } + +func TestNormalizeFields(t *testing.T) { + f := func(fields, normalizedExpected []string) { + t.Helper() + + normalized := normalizeFields(fields) + if !slices.Equal(normalized, normalizedExpected) { + t.Fatalf("unexpected normalized fields for %q; got %q; want %q", fields, normalized, normalizedExpected) + } + } + + f(nil, nil) + f([]string{"foo"}, []string{"foo"}) + + // duplicate fields + f([]string{"foo", "bar", "foo", "x"}, []string{"foo", "bar", "x"}) + f([]string{"foo", "foo", "x", "x", "x"}, []string{"foo", "x"}) + + // star field + f([]string{"*"}, []string{"*"}) + f([]string{"foo", "*", "bar"}, []string{"*"}) +} diff --git a/lib/logstorage/pipe.go b/lib/logstorage/pipe.go index 1763529ed..3fc9750c7 100644 --- a/lib/logstorage/pipe.go +++ b/lib/logstorage/pipe.go @@ -8,6 +8,12 @@ type pipe interface { // String returns string representation of the pipe. String() string + // getNeededFields must return the required input fields alongside the mapping from output fields to input fields for the given pipe. + // + // It must return []string{"*"} if the set of input fields cannot be determined at the given pipe. + // It must return nil map if the pipe doesn't add new fields to the output. + getNeededFields() ([]string, map[string][]string) + // newPipeProcessor must return new pipeProcessor for the given ppBase. // // workersCount is the number of goroutine workers, which will call writeBlock() method. @@ -68,12 +74,6 @@ func parsePipes(lex *lexer) ([]pipe, error) { return nil, fmt.Errorf("missing token after '|'") } switch { - case lex.isKeyword("fields"): - pf, err := parsePipeFields(lex) - if err != nil { - return nil, fmt.Errorf("cannot parse 'fields' pipe: %w", err) - } - pipes = append(pipes, pf) case lex.isKeyword("stats"): ps, err := parsePipeStats(lex) if err != nil { @@ -92,6 +92,30 @@ func parsePipes(lex *lexer) ([]pipe, error) { return nil, fmt.Errorf("cannot parse 'skip' pipe: %w", err) } pipes = append(pipes, ps) + case lex.isKeyword("fields"): + pf, err := parsePipeFields(lex) + if err != nil { + return nil, fmt.Errorf("cannot parse 'fields' pipe: %w", err) + } + pipes = append(pipes, pf) + case lex.isKeyword("copy"): + pc, err := parsePipeCopy(lex) + if err != nil { + return nil, fmt.Errorf("cannot parse 'copy' pipe: %w", err) + } + pipes = append(pipes, pc) + case lex.isKeyword("rename"): + pr, err := parsePipeRename(lex) + if err != nil { + return nil, fmt.Errorf("cannot parse 'rename' pipe: %w", err) + } + pipes = append(pipes, pr) + case lex.isKeyword("delete"): + pd, err := parsePipeDelete(lex) + if err != nil { + return nil, fmt.Errorf("cannot parse 'delete' pipe: %w", err) + } + pipes = append(pipes, pd) default: return nil, fmt.Errorf("unexpected pipe %q", lex.token) } diff --git a/lib/logstorage/pipe_copy.go b/lib/logstorage/pipe_copy.go new file mode 100644 index 000000000..474287cf0 --- /dev/null +++ b/lib/logstorage/pipe_copy.go @@ -0,0 +1,99 @@ +package logstorage + +import ( + "fmt" + "strings" + + "github.com/VictoriaMetrics/VictoriaMetrics/lib/logger" +) + +// pipeCopy implements '| copy ...' pipe. +// +// See https://docs.victoriametrics.com/victorialogs/logsql/#transformations +type pipeCopy struct { + // srcFields contains a list of source fields to copy + srcFields []string + + // dstFields contains a list of destination fields + dstFields []string +} + +func (pc *pipeCopy) String() string { + if len(pc.srcFields) == 0 { + logger.Panicf("BUG: pipeCopy must contain at least a single srcField") + } + + a := make([]string, len(pc.srcFields)) + for i, srcField := range pc.srcFields { + dstField := pc.dstFields[i] + a[i] = quoteTokenIfNeeded(srcField) + " as " + quoteTokenIfNeeded(dstField) + } + return "copy " + strings.Join(a, ", ") +} + +func (pc *pipeCopy) getNeededFields() ([]string, map[string][]string) { + m := make(map[string][]string, len(pc.srcFields)) + for i, dstField := range pc.dstFields { + m[dstField] = append(m[dstField], pc.srcFields[i]) + } + return []string{"*"}, m +} + +func (pc *pipeCopy) newPipeProcessor(_ int, _ <-chan struct{}, _ func(), ppBase pipeProcessor) pipeProcessor { + return &pipeCopyProcessor{ + pc: pc, + ppBase: ppBase, + } +} + +type pipeCopyProcessor struct { + pc *pipeCopy + ppBase pipeProcessor +} + +func (pcp *pipeCopyProcessor) writeBlock(workerID uint, br *blockResult) { + br.copyColumns(pcp.pc.srcFields, pcp.pc.dstFields) + pcp.ppBase.writeBlock(workerID, br) +} + +func (pcp *pipeCopyProcessor) flush() error { + return nil +} + +func parsePipeCopy(lex *lexer) (*pipeCopy, error) { + if !lex.isKeyword("copy") { + return nil, fmt.Errorf("expecting 'copy'; got %q", lex.token) + } + + var srcFields []string + var dstFields []string + for { + lex.nextToken() + srcField, err := parseFieldName(lex) + if err != nil { + return nil, fmt.Errorf("cannot parse src field name: %w", err) + } + if lex.isKeyword("as") { + lex.nextToken() + } + dstField, err := parseFieldName(lex) + if err != nil { + return nil, fmt.Errorf("cannot parse dst field name: %w", err) + } + + srcFields = append(srcFields, srcField) + dstFields = append(dstFields, dstField) + + switch { + case lex.isKeyword("|", ")", ""): + pc := &pipeCopy{ + srcFields: srcFields, + dstFields: dstFields, + } + return pc, nil + case lex.isKeyword(","): + default: + return nil, fmt.Errorf("unexpected token: %q; expecting ',', '|' or ')'", lex.token) + } + } +} diff --git a/lib/logstorage/pipe_delete.go b/lib/logstorage/pipe_delete.go new file mode 100644 index 000000000..0809f41dc --- /dev/null +++ b/lib/logstorage/pipe_delete.go @@ -0,0 +1,76 @@ +package logstorage + +import ( + "fmt" + + "github.com/VictoriaMetrics/VictoriaMetrics/lib/logger" +) + +// pipeDelete implements '| delete ...' pipe. +// +// See https://docs.victoriametrics.com/victorialogs/logsql/#transformations +type pipeDelete struct { + // fields contains a list of fields to delete + fields []string +} + +func (pd *pipeDelete) String() string { + if len(pd.fields) == 0 { + logger.Panicf("BUG: pipeDelete must contain at least a single field") + } + + return "delete " + fieldNamesString(pd.fields) +} + +func (pd *pipeDelete) getNeededFields() ([]string, map[string][]string) { + return []string{"*"}, nil +} + +func (pd *pipeDelete) newPipeProcessor(_ int, _ <-chan struct{}, _ func(), ppBase pipeProcessor) pipeProcessor { + return &pipeDeleteProcessor{ + pd: pd, + ppBase: ppBase, + } +} + +type pipeDeleteProcessor struct { + pd *pipeDelete + ppBase pipeProcessor +} + +func (pdp *pipeDeleteProcessor) writeBlock(workerID uint, br *blockResult) { + br.deleteColumns(pdp.pd.fields) + pdp.ppBase.writeBlock(workerID, br) +} + +func (pdp *pipeDeleteProcessor) flush() error { + return nil +} + +func parsePipeDelete(lex *lexer) (*pipeDelete, error) { + if !lex.isKeyword("delete") { + return nil, fmt.Errorf("expecting 'delete'; got %q", lex.token) + } + + var fields []string + for { + lex.nextToken() + field, err := parseFieldName(lex) + if err != nil { + return nil, fmt.Errorf("cannot parse field name: %w", err) + } + + fields = append(fields, field) + + switch { + case lex.isKeyword("|", ")", ""): + pd := &pipeDelete{ + fields: fields, + } + return pd, nil + case lex.isKeyword(","): + default: + return nil, fmt.Errorf("unexpected token: %q; expecting ',', '|' or ')'", lex.token) + } + } +} diff --git a/lib/logstorage/pipe_fields.go b/lib/logstorage/pipe_fields.go index c3bcb38b4..c05e348d4 100644 --- a/lib/logstorage/pipe_fields.go +++ b/lib/logstorage/pipe_fields.go @@ -9,7 +9,7 @@ import ( // pipeFields implements '| fields ...' pipe. // -// See https://docs.victoriametrics.com/victorialogs/logsql/#limiters +// See https://docs.victoriametrics.com/victorialogs/logsql/#fields-pipe type pipeFields struct { // fields contains list of fields to fetch fields []string @@ -25,6 +25,13 @@ func (pf *pipeFields) String() string { return "fields " + fieldNamesString(pf.fields) } +func (pf *pipeFields) getNeededFields() ([]string, map[string][]string) { + if pf.containsStar { + return []string{"*"}, nil + } + return pf.fields, nil +} + func (pf *pipeFields) newPipeProcessor(_ int, _ <-chan struct{}, _ func(), ppBase pipeProcessor) pipeProcessor { return &pipeFieldsProcessor{ pf: pf, @@ -39,7 +46,7 @@ type pipeFieldsProcessor struct { func (pfp *pipeFieldsProcessor) writeBlock(workerID uint, br *blockResult) { if !pfp.pf.containsStar { - br.updateColumns(pfp.pf.fields) + br.setColumns(pfp.pf.fields) } pfp.ppBase.writeBlock(workerID, br) } @@ -49,11 +56,13 @@ func (pfp *pipeFieldsProcessor) flush() error { } func parsePipeFields(lex *lexer) (*pipeFields, error) { + if !lex.isKeyword("fields") { + return nil, fmt.Errorf("expecting 'fields'; got %q", lex.token) + } + var fields []string for { - if !lex.mustNextToken() { - return nil, fmt.Errorf("missing field name") - } + lex.nextToken() field, err := parseFieldName(lex) if err != nil { return nil, fmt.Errorf("cannot parse field name: %w", err) @@ -61,6 +70,9 @@ func parsePipeFields(lex *lexer) (*pipeFields, error) { fields = append(fields, field) switch { case lex.isKeyword("|", ")", ""): + if slices.Contains(fields, "*") { + fields = []string{"*"} + } pf := &pipeFields{ fields: fields, containsStar: slices.Contains(fields, "*"), diff --git a/lib/logstorage/pipe_head.go b/lib/logstorage/pipe_head.go index 7a82f4b8c..110267e62 100644 --- a/lib/logstorage/pipe_head.go +++ b/lib/logstorage/pipe_head.go @@ -16,6 +16,10 @@ func (ph *pipeHead) String() string { return fmt.Sprintf("head %d", ph.n) } +func (ph *pipeHead) getNeededFields() ([]string, map[string][]string) { + return []string{"*"}, nil +} + func (ph *pipeHead) newPipeProcessor(_ int, _ <-chan struct{}, cancel func(), ppBase pipeProcessor) pipeProcessor { if ph.n == 0 { // Special case - notify the caller to stop writing data to the returned pipeHeadProcessor @@ -65,12 +69,14 @@ func (php *pipeHeadProcessor) flush() error { } func parsePipeHead(lex *lexer) (*pipeHead, error) { - if !lex.mustNextToken() { - return nil, fmt.Errorf("missing the number of head rows to return") + if !lex.isKeyword("head") { + return nil, fmt.Errorf("expecting 'head'; got %q", lex.token) } + + lex.nextToken() n, err := parseUint(lex.token) if err != nil { - return nil, fmt.Errorf("cannot parse the number of head rows to return %q: %w", lex.token, err) + return nil, fmt.Errorf("cannot parse the number of head rows to return from %q: %w", lex.token, err) } lex.nextToken() ph := &pipeHead{ diff --git a/lib/logstorage/pipe_rename.go b/lib/logstorage/pipe_rename.go new file mode 100644 index 000000000..1bfaba889 --- /dev/null +++ b/lib/logstorage/pipe_rename.go @@ -0,0 +1,99 @@ +package logstorage + +import ( + "fmt" + "strings" + + "github.com/VictoriaMetrics/VictoriaMetrics/lib/logger" +) + +// pipeRename implements '| rename ...' pipe. +// +// See https://docs.victoriametrics.com/victorialogs/logsql/#transformations +type pipeRename struct { + // srcFields contains a list of source fields to rename + srcFields []string + + // dstFields contains a list of destination fields + dstFields []string +} + +func (pr *pipeRename) String() string { + if len(pr.srcFields) == 0 { + logger.Panicf("BUG: pipeRename must contain at least a single srcField") + } + + a := make([]string, len(pr.srcFields)) + for i, srcField := range pr.srcFields { + dstField := pr.dstFields[i] + a[i] = quoteTokenIfNeeded(srcField) + " as " + quoteTokenIfNeeded(dstField) + } + return "rename " + strings.Join(a, ", ") +} + +func (pr *pipeRename) getNeededFields() ([]string, map[string][]string) { + m := make(map[string][]string, len(pr.srcFields)) + for i, dstField := range pr.dstFields { + m[dstField] = append(m[dstField], pr.srcFields[i]) + } + return []string{"*"}, m +} + +func (pr *pipeRename) newPipeProcessor(_ int, _ <-chan struct{}, _ func(), ppBase pipeProcessor) pipeProcessor { + return &pipeRenameProcessor{ + pr: pr, + ppBase: ppBase, + } +} + +type pipeRenameProcessor struct { + pr *pipeRename + ppBase pipeProcessor +} + +func (prp *pipeRenameProcessor) writeBlock(workerID uint, br *blockResult) { + br.renameColumns(prp.pr.srcFields, prp.pr.dstFields) + prp.ppBase.writeBlock(workerID, br) +} + +func (prp *pipeRenameProcessor) flush() error { + return nil +} + +func parsePipeRename(lex *lexer) (*pipeRename, error) { + if !lex.isKeyword("rename") { + return nil, fmt.Errorf("expecting 'rename'; got %q", lex.token) + } + + var srcFields []string + var dstFields []string + for { + lex.nextToken() + srcField, err := parseFieldName(lex) + if err != nil { + return nil, fmt.Errorf("cannot parse src field name: %w", err) + } + if lex.isKeyword("as") { + lex.nextToken() + } + dstField, err := parseFieldName(lex) + if err != nil { + return nil, fmt.Errorf("cannot parse dst field name: %w", err) + } + + srcFields = append(srcFields, srcField) + dstFields = append(dstFields, dstField) + + switch { + case lex.isKeyword("|", ")", ""): + pr := &pipeRename{ + srcFields: srcFields, + dstFields: dstFields, + } + return pr, nil + case lex.isKeyword(","): + default: + return nil, fmt.Errorf("unexpected token: %q; expecting ',', '|' or ')'", lex.token) + } + } +} diff --git a/lib/logstorage/pipe_skip.go b/lib/logstorage/pipe_skip.go index a5a8403be..70f27c873 100644 --- a/lib/logstorage/pipe_skip.go +++ b/lib/logstorage/pipe_skip.go @@ -16,6 +16,10 @@ func (ps *pipeSkip) String() string { return fmt.Sprintf("skip %d", ps.n) } +func (ps *pipeSkip) getNeededFields() ([]string, map[string][]string) { + return []string{"*"}, nil +} + func (ps *pipeSkip) newPipeProcessor(workersCount int, _ <-chan struct{}, _ func(), ppBase pipeProcessor) pipeProcessor { return &pipeSkipProcessor{ ps: ps, @@ -52,12 +56,14 @@ func (psp *pipeSkipProcessor) flush() error { } func parsePipeSkip(lex *lexer) (*pipeSkip, error) { - if !lex.mustNextToken() { - return nil, fmt.Errorf("missing the number of rows to skip") + if !lex.isKeyword("skip") { + return nil, fmt.Errorf("expecting 'rename'; got %q", lex.token) } + + lex.nextToken() n, err := parseUint(lex.token) if err != nil { - return nil, fmt.Errorf("cannot parse the number of rows to skip %q: %w", lex.token, err) + return nil, fmt.Errorf("cannot parse the number of rows to skip from %q: %w", lex.token, err) } lex.nextToken() ps := &pipeSkip{ diff --git a/lib/logstorage/pipe_stats.go b/lib/logstorage/pipe_stats.go index 0d6e8eedc..56ed93158 100644 --- a/lib/logstorage/pipe_stats.go +++ b/lib/logstorage/pipe_stats.go @@ -83,6 +83,27 @@ func (ps *pipeStats) String() string { return s } +func (ps *pipeStats) getNeededFields() ([]string, map[string][]string) { + var byFields []string + for _, bf := range ps.byFields { + byFields = append(byFields, bf.name) + } + + neededFields := append([]string{}, byFields...) + m := make(map[string][]string) + for i, f := range ps.funcs { + funcFields := f.neededFields() + + neededFields = append(neededFields, funcFields...) + + resultName := ps.resultNames[i] + m[resultName] = append(m[resultName], byFields...) + m[resultName] = append(m[resultName], funcFields...) + } + + return neededFields, m +} + const stateSizeBudgetChunk = 1 << 20 func (ps *pipeStats) newPipeProcessor(workersCount int, stopCh <-chan struct{}, cancel func(), ppBase pipeProcessor) pipeProcessor { @@ -376,35 +397,13 @@ func (psp *pipeStatsProcessor) flush() error { return nil } -func (ps *pipeStats) neededFields() []string { - var neededFields []string - m := make(map[string]struct{}) - - for _, bf := range ps.byFields { - name := bf.name - if _, ok := m[name]; !ok { - m[name] = struct{}{} - neededFields = append(neededFields, name) - } - } - - for _, f := range ps.funcs { - for _, fieldName := range f.neededFields() { - if _, ok := m[fieldName]; !ok { - m[fieldName] = struct{}{} - neededFields = append(neededFields, fieldName) - } - } - } - - return neededFields -} - func parsePipeStats(lex *lexer) (*pipeStats, error) { - if !lex.mustNextToken() { - return nil, fmt.Errorf("missing stats config") + if !lex.isKeyword("stats") { + return nil, fmt.Errorf("expecting 'stats'; got %q", lex.token) } + lex.nextToken() + var ps pipeStats if lex.isKeyword("by") { lex.nextToken() @@ -494,9 +493,7 @@ func parseStatsFunc(lex *lexer) (statsFunc, string, error) { func parseResultName(lex *lexer) (string, error) { if lex.isKeyword("as") { - if !lex.mustNextToken() { - return "", fmt.Errorf("missing token after 'as' keyword") - } + lex.nextToken() } resultName, err := parseFieldName(lex) if err != nil { @@ -543,9 +540,7 @@ func parseByFields(lex *lexer) ([]*byField, error) { } var bfs []*byField for { - if !lex.mustNextToken() { - return nil, fmt.Errorf("missing field name or ')'") - } + lex.nextToken() if lex.isKeyword(")") { lex.nextToken() return bfs, nil @@ -657,6 +652,9 @@ func tryParseBucketSize(s string) (float64, bool) { return 0, false } +// parseFieldNamesForStatsFunc parses field names for statsFunc. +// +// It returns ["*"] if the fields names list is empty or if it contains "*" field. func parseFieldNamesForStatsFunc(lex *lexer, funcName string) ([]string, error) { if !lex.isKeyword(funcName) { return nil, fmt.Errorf("unexpected func; got %q; want %q", lex.token, funcName) @@ -678,9 +676,7 @@ func parseFieldNamesInParens(lex *lexer) ([]string, error) { } var fields []string for { - if !lex.mustNextToken() { - return nil, fmt.Errorf("missing field name or ')'") - } + lex.nextToken() if lex.isKeyword(")") { lex.nextToken() return fields, nil @@ -708,8 +704,9 @@ func parseFieldName(lex *lexer) (string, error) { if lex.isKeyword(",", "(", ")", "[", "]", "|", ":", "") { return "", fmt.Errorf("unexpected token: %q", lex.token) } - token := getCompoundPhrase(lex, false) - return token, nil + fieldName := getCompoundPhrase(lex, false) + fieldName = getCanonicalColumnName(fieldName) + return fieldName, nil } func fieldNamesString(fields []string) string { diff --git a/lib/logstorage/pipe_stats_test.go b/lib/logstorage/pipe_stats_test.go index eba8f2e4b..a320fca3d 100644 --- a/lib/logstorage/pipe_stats_test.go +++ b/lib/logstorage/pipe_stats_test.go @@ -35,12 +35,12 @@ func TestTryParseBucketSize_Success(t *testing.T) { f("-1h5m3.5s", -(nsecsPerHour + 5*nsecsPerMinute + 3.5*nsecsPerSecond)) // bytes - f("1b", 1) - f("1k", 1_000) - f("1Kb", 1_000) + f("1B", 1) + f("1K", 1_000) + f("1KB", 1_000) f("5.5KiB", 5.5*(1<<10)) f("10MB500KB10B", 10*1_000_000+500*1_000+10) - f("10m0k", 10*1_000_000) + f("10M", 10*1_000_000) f("-10MB", -10*1_000_000) // ipv4 mask @@ -95,13 +95,13 @@ func TestTryParseBucketOffset_Success(t *testing.T) { f("-1h5m3.5s", -(nsecsPerHour + 5*nsecsPerMinute + 3.5*nsecsPerSecond)) // bytes - f("1b", 1) - f("1k", 1_000) - f("1Kb", 1_000) + f("1B", 1) + f("1K", 1_000) + f("1KB", 1_000) f("5.5KiB", 5.5*(1<<10)) f("10MB500KB10B", 10*1_000_000+500*1_000+10) - f("10m0k", 10*1_000_000) - f("-10mb", -10*1_000_000) + f("10M", 10*1_000_000) + f("-10MB", -10*1_000_000) } func TestTryParseBucketOffset_Failure(t *testing.T) { diff --git a/lib/logstorage/rows.go b/lib/logstorage/rows.go index 76516bc8b..d8d61b015 100644 --- a/lib/logstorage/rows.go +++ b/lib/logstorage/rows.go @@ -24,10 +24,7 @@ func (f *Field) Reset() { // String returns string representation of f. func (f *Field) String() string { - name := f.Name - if name == "" { - name = "_msg" - } + name := getCanonicalColumnName(f.Name) return fmt.Sprintf("%q:%q", name, f.Value) } @@ -121,3 +118,10 @@ func (rs *rows) mergeRows(timestampsA, timestampsB []int64, fieldsA, fieldsB [][ rs.appendRows(timestampsA, fieldsA) } } + +func getCanonicalColumnName(columnName string) string { + if columnName == "" { + return "_msg" + } + return columnName +} diff --git a/lib/logstorage/stats_count.go b/lib/logstorage/stats_count.go index c87587a03..7a6d14890 100644 --- a/lib/logstorage/stats_count.go +++ b/lib/logstorage/stats_count.go @@ -18,7 +18,11 @@ func (sc *statsCount) String() string { } func (sc *statsCount) neededFields() []string { - return getFieldsIgnoreStar(sc.fields) + if sc.containsStar { + // There is no need in fetching any columns for count(*) - the number of matching rows can be calculated as len(blockResult.timestamps) + return nil + } + return sc.fields } func (sc *statsCount) newStatsProcessor() (statsProcessor, int) { @@ -204,13 +208,3 @@ func parseStatsCount(lex *lexer) (*statsCount, error) { } return sc, nil } - -func getFieldsIgnoreStar(fields []string) []string { - var result []string - for _, f := range fields { - if f != "*" { - result = append(result, f) - } - } - return result -} diff --git a/lib/logstorage/stats_uniq.go b/lib/logstorage/stats_uniq.go index 448696c8c..83d0e500e 100644 --- a/lib/logstorage/stats_uniq.go +++ b/lib/logstorage/stats_uniq.go @@ -88,8 +88,7 @@ func (sup *statsUniqProcessor) updateStatsForAllRows(br *blockResult) int { } if len(fields) == 1 { // Fast path for a single column. - // The unique key is formed as " ? ", - // where is skipped if == 1. + // The unique key is formed as " ", // This guarantees that keys do not clash for different column types across blocks. c := br.getColumnByName(fields[0]) if c.isTime { @@ -119,7 +118,7 @@ func (sup *statsUniqProcessor) updateStatsForAllRows(br *blockResult) int { return stateSizeIncrease } keyBuf := sup.keyBuf[:0] - keyBuf = append(keyBuf[:0], 0, byte(valueTypeString)) + keyBuf = append(keyBuf[:0], 0) keyBuf = append(keyBuf, v...) if _, ok := m[string(keyBuf)]; !ok { m[string(keyBuf)] = struct{}{} @@ -131,13 +130,13 @@ func (sup *statsUniqProcessor) updateStatsForAllRows(br *blockResult) int { if c.valueType == valueTypeDict { // count unique non-zero c.dictValues keyBuf := sup.keyBuf[:0] - for i, v := range c.dictValues { + for _, v := range c.dictValues { if v == "" { // Do not count empty values continue } - keyBuf = append(keyBuf[:0], 0, byte(valueTypeDict)) - keyBuf = append(keyBuf, byte(i)) + keyBuf = append(keyBuf[:0], 0) + keyBuf = append(keyBuf, v...) if _, ok := m[string(keyBuf)]; !ok { m[string(keyBuf)] = struct{}{} stateSizeIncrease += len(keyBuf) + int(unsafe.Sizeof("")) @@ -148,19 +147,18 @@ func (sup *statsUniqProcessor) updateStatsForAllRows(br *blockResult) int { } // Count unique values across encodedValues - encodedValues := c.getEncodedValues(br) - isStringValueType := c.valueType == valueTypeString + values := c.getValues(br) keyBuf := sup.keyBuf[:0] - for i, v := range encodedValues { - if isStringValueType && v == "" { + for i, v := range values { + if v == "" { // Do not count empty values continue } - if i > 0 && encodedValues[i-1] == v { + if i > 0 && values[i-1] == v { // This value has been already counted. continue } - keyBuf = append(keyBuf[:0], 0, byte(c.valueType)) + keyBuf = append(keyBuf[:0], 0) keyBuf = append(keyBuf, v...) if _, ok := m[string(keyBuf)]; !ok { m[string(keyBuf)] = struct{}{} @@ -249,8 +247,7 @@ func (sup *statsUniqProcessor) updateStatsForRow(br *blockResult, rowIdx int) in } if len(fields) == 1 { // Fast path for a single column. - // The unique key is formed as " ? ", - // where is skipped if == 1. + // The unique key is formed as " ", // This guarantees that keys do not clash for different column types across blocks. c := br.getColumnByName(fields[0]) if c.isTime { @@ -273,7 +270,7 @@ func (sup *statsUniqProcessor) updateStatsForRow(br *blockResult, rowIdx int) in return stateSizeIncrease } keyBuf := sup.keyBuf[:0] - keyBuf = append(keyBuf[:0], 0, byte(valueTypeString)) + keyBuf = append(keyBuf[:0], 0) keyBuf = append(keyBuf, v...) if _, ok := m[string(keyBuf)]; !ok { m[string(keyBuf)] = struct{}{} @@ -285,13 +282,14 @@ func (sup *statsUniqProcessor) updateStatsForRow(br *blockResult, rowIdx int) in if c.valueType == valueTypeDict { // count unique non-zero c.dictValues dictIdx := c.encodedValues[rowIdx][0] - if c.dictValues[dictIdx] == "" { + v := c.dictValues[dictIdx] + if v == "" { // Do not count empty values return stateSizeIncrease } keyBuf := sup.keyBuf[:0] - keyBuf = append(keyBuf[:0], 0, byte(valueTypeDict)) - keyBuf = append(keyBuf, dictIdx) + keyBuf = append(keyBuf[:0], 0) + keyBuf = append(keyBuf, v...) if _, ok := m[string(keyBuf)]; !ok { m[string(keyBuf)] = struct{}{} stateSizeIncrease += len(keyBuf) + int(unsafe.Sizeof("")) @@ -301,14 +299,13 @@ func (sup *statsUniqProcessor) updateStatsForRow(br *blockResult, rowIdx int) in } // Count unique values for the given rowIdx - encodedValues := c.getEncodedValues(br) - v := encodedValues[rowIdx] - if c.valueType == valueTypeString && v == "" { + v := c.getValueAtRow(br, rowIdx) + if v == "" { // Do not count empty values return stateSizeIncrease } keyBuf := sup.keyBuf[:0] - keyBuf = append(keyBuf[:0], 0, byte(c.valueType)) + keyBuf = append(keyBuf[:0], 0) keyBuf = append(keyBuf, v...) if _, ok := m[string(keyBuf)]; !ok { m[string(keyBuf)] = struct{}{} diff --git a/lib/logstorage/values_encoder.go b/lib/logstorage/values_encoder.go index d5f65b037..e80b1a309 100644 --- a/lib/logstorage/values_encoder.go +++ b/lib/logstorage/values_encoder.go @@ -731,88 +731,91 @@ func tryParseBytes(s string) (int64, bool) { if !ok { return 0, false } + if len(tail) == 0 { + if _, frac := math.Modf(f); frac != 0 { + // deny floating-point numbers without any suffix. + return 0, false + } + } s = tail if len(s) == 0 { n += int64(f) continue } if len(s) >= 3 { - prefix := s[:3] switch { - case strings.EqualFold(prefix, "kib"): + case strings.HasPrefix(s, "KiB"): n += int64(f * (1 << 10)) s = s[3:] continue - case strings.EqualFold(prefix, "mib"): + case strings.HasPrefix(s, "MiB"): n += int64(f * (1 << 20)) s = s[3:] continue - case strings.EqualFold(prefix, "gib"): + case strings.HasPrefix(s, "GiB"): n += int64(f * (1 << 30)) s = s[3:] continue - case strings.EqualFold(prefix, "tib"): + case strings.HasPrefix(s, "TiB"): n += int64(f * (1 << 40)) s = s[3:] continue } } if len(s) >= 2 { - prefix := s[:2] switch { - case strings.EqualFold(prefix, "ki"): + case strings.HasPrefix(s, "Ki"): n += int64(f * (1 << 10)) s = s[2:] continue - case strings.EqualFold(prefix, "mi"): + case strings.HasPrefix(s, "Mi"): n += int64(f * (1 << 20)) s = s[2:] continue - case strings.EqualFold(prefix, "gi"): + case strings.HasPrefix(s, "Gi"): n += int64(f * (1 << 30)) s = s[2:] continue - case strings.EqualFold(prefix, "ti"): + case strings.HasPrefix(s, "Ti"): n += int64(f * (1 << 40)) s = s[2:] continue - case strings.EqualFold(prefix, "kb"): + case strings.HasPrefix(s, "KB"): n += int64(f * 1_000) s = s[2:] continue - case strings.EqualFold(prefix, "mb"): + case strings.HasPrefix(s, "MB"): n += int64(f * 1_000_000) s = s[2:] continue - case strings.EqualFold(prefix, "gb"): + case strings.HasPrefix(s, "GB"): n += int64(f * 1_000_000_000) s = s[2:] continue - case strings.EqualFold(prefix, "tb"): + case strings.HasPrefix(s, "TB"): n += int64(f * 1_000_000_000_000) s = s[2:] continue } } - prefix := s[:1] switch { - case strings.EqualFold(prefix, "b"): + case strings.HasPrefix(s, "B"): n += int64(f) s = s[1:] continue - case strings.EqualFold(prefix, "k"): + case strings.HasPrefix(s, "K"): n += int64(f * 1_000) s = s[1:] continue - case strings.EqualFold(prefix, "m"): + case strings.HasPrefix(s, "M"): n += int64(f * 1_000_000) s = s[1:] continue - case strings.EqualFold(prefix, "g"): + case strings.HasPrefix(s, "G"): n += int64(f * 1_000_000_000) s = s[1:] continue - case strings.EqualFold(prefix, "t"): + case strings.HasPrefix(s, "T"): n += int64(f * 1_000_000_000_000) s = s[1:] continue @@ -859,48 +862,45 @@ func tryParseDuration(s string) (int64, bool) { return 0, false } if len(s) >= 3 { - prefix := s[:3] - if strings.EqualFold(prefix, "µs") { + if strings.HasPrefix(s, "µs") { nsecs += int64(f * nsecsPerMicrosecond) s = s[3:] continue } } if len(s) >= 2 { - prefix := s[:2] switch { - case strings.EqualFold(prefix, "ms"): + case strings.HasPrefix(s, "ms"): nsecs += int64(f * nsecsPerMillisecond) s = s[2:] continue - case strings.EqualFold(prefix, "ns"): + case strings.HasPrefix(s, "ns"): nsecs += int64(f) s = s[2:] continue } } - prefix := s[:1] switch { - case strings.EqualFold(prefix, "y"): + case strings.HasPrefix(s, "y"): nsecs += int64(f * nsecsPerYear) s = s[1:] - case strings.EqualFold(prefix, "w"): + case strings.HasPrefix(s, "w"): nsecs += int64(f * nsecsPerWeek) s = s[1:] continue - case strings.EqualFold(prefix, "d"): + case strings.HasPrefix(s, "d"): nsecs += int64(f * nsecsPerDay) s = s[1:] continue - case strings.EqualFold(prefix, "h"): + case strings.HasPrefix(s, "h"): nsecs += int64(f * nsecsPerHour) s = s[1:] continue - case strings.EqualFold(prefix, "m"): + case strings.HasPrefix(s, "m"): nsecs += int64(f * nsecsPerMinute) s = s[1:] continue - case strings.EqualFold(prefix, "s"): + case strings.HasPrefix(s, "s"): nsecs += int64(f * nsecsPerSecond) s = s[1:] continue diff --git a/lib/logstorage/values_encoder_test.go b/lib/logstorage/values_encoder_test.go index d7c068b61..8de5d290d 100644 --- a/lib/logstorage/values_encoder_test.go +++ b/lib/logstorage/values_encoder_test.go @@ -325,7 +325,6 @@ func TestTryParseDuration_Success(t *testing.T) { // zero duration f("0s", 0) - f("0S", 0) f("0.0w0d0h0s0.0ms", 0) f("-0w", 0) @@ -334,15 +333,9 @@ func TestTryParseDuration_Success(t *testing.T) { f("1.5ms", 1.5*nsecsPerMillisecond) f("1µs", nsecsPerMicrosecond) f("1ns", 1) - f("1NS", 1) - f("1nS", 1) - f("1Ns", 1) f("1h", nsecsPerHour) - f("1H", nsecsPerHour) f("1.5d", 1.5*nsecsPerDay) - f("1.5D", 1.5*nsecsPerDay) f("1.5w", 1.5*nsecsPerWeek) - f("1.5W", 1.5*nsecsPerWeek) f("2.5y", 2.5*nsecsPerYear) f("1m5.123456789s", nsecsPerMinute+5.123456789*nsecsPerSecond) @@ -417,62 +410,25 @@ func TestTryParseBytes_Success(t *testing.T) { } } - f("123.456", 123) f("1_500", 1_500) - f("2.5b", 2) f("2.5B", 2) - f("1.5k", 1_500) - f("1.5m", 1_500_000) - f("1.5g", 1_500_000_000) - f("1.5t", 1_500_000_000_000) - f("1.5K", 1_500) f("1.5M", 1_500_000) f("1.5G", 1_500_000_000) f("1.5T", 1_500_000_000_000) - f("1.5kb", 1_500) - f("1.5mb", 1_500_000) - f("1.5gb", 1_500_000_000) - f("1.5tb", 1_500_000_000_000) - - f("1.5Kb", 1_500) - f("1.5Mb", 1_500_000) - f("1.5Gb", 1_500_000_000) - f("1.5Tb", 1_500_000_000_000) - f("1.5KB", 1_500) f("1.5MB", 1_500_000) f("1.5GB", 1_500_000_000) f("1.5TB", 1_500_000_000_000) - f("1.5ki", 1.5*(1<<10)) - f("1.5mi", 1.5*(1<<20)) - f("1.5gi", 1.5*(1<<30)) - f("1.5ti", 1.5*(1<<40)) - f("1.5Ki", 1.5*(1<<10)) f("1.5Mi", 1.5*(1<<20)) f("1.5Gi", 1.5*(1<<30)) f("1.5Ti", 1.5*(1<<40)) - f("1.5KI", 1.5*(1<<10)) - f("1.5MI", 1.5*(1<<20)) - f("1.5GI", 1.5*(1<<30)) - f("1.5TI", 1.5*(1<<40)) - - f("1.5kib", 1.5*(1<<10)) - f("1.5mib", 1.5*(1<<20)) - f("1.5gib", 1.5*(1<<30)) - f("1.5tib", 1.5*(1<<40)) - - f("1.5kiB", 1.5*(1<<10)) - f("1.5miB", 1.5*(1<<20)) - f("1.5giB", 1.5*(1<<30)) - f("1.5tiB", 1.5*(1<<40)) - f("1.5KiB", 1.5*(1<<10)) f("1.5MiB", 1.5*(1<<20)) f("1.5GiB", 1.5*(1<<30)) @@ -503,6 +459,37 @@ func TestTryParseBytes_Failure(t *testing.T) { f("123qsb") f("123sqsb") f("123s5qsb") + + // invalid case for the suffix + f("1b") + + f("1k") + f("1m") + f("1g") + f("1t") + + f("1kb") + f("1mb") + f("1gb") + f("1tb") + + f("1ki") + f("1mi") + f("1gi") + f("1ti") + + f("1kib") + f("1mib") + f("1gib") + f("1tib") + + f("1KIB") + f("1MIB") + f("1GIB") + f("1TIB") + + // fractional number without suffix + f("123.456") } func TestTryParseFloat64_Success(t *testing.T) {