From 79787ce25a8e237f7bbd75abb998aeb02bebecf1 Mon Sep 17 00:00:00 2001 From: Aliaksandr Valialkin Date: Wed, 22 May 2024 17:17:59 +0200 Subject: [PATCH] wip --- docs/VictoriaLogs/CHANGELOG.md | 1 + docs/VictoriaLogs/LogsQL.md | 126 +++++++++++++++++++++------- lib/logstorage/parser_test.go | 106 +++++++++++------------ lib/logstorage/pipe_extract.go | 43 +++++----- lib/logstorage/pipe_extract_test.go | 69 ++++++++------- lib/logstorage/pipe_format.go | 42 +++++----- lib/logstorage/pipe_format_test.go | 44 +++++----- 7 files changed, 251 insertions(+), 180 deletions(-) diff --git a/docs/VictoriaLogs/CHANGELOG.md b/docs/VictoriaLogs/CHANGELOG.md index 108ef63a2..66971ff6a 100644 --- a/docs/VictoriaLogs/CHANGELOG.md +++ b/docs/VictoriaLogs/CHANGELOG.md @@ -19,6 +19,7 @@ according to [these docs](https://docs.victoriametrics.com/VictoriaLogs/QuickSta ## tip +* FEATURE: add ability to generate output fields according to the provided format string. See [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#format-pipe). * FEATURE: add ability to extract fields with [`extract` pipe](https://docs.victoriametrics.com/victorialogs/logsql/#extract-pipe) only if the given condition is met. See [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#conditional-extract). * FEATURE: add ability to unpack JSON fields with [`unpack_json` pipe](https://docs.victoriametrics.com/victorialogs/logsql/#unpack_json-pipe) only if the given condition is met. See [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#conditional-unpack_json). * FEATURE: add ability to unpack [logfmt](https://brandur.org/logfmt) fields with [`unpack_logfmt` pipe](https://docs.victoriametrics.com/victorialogs/logsql/#unpack_logfmt-pipe) only if the given condition is met. See [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#conditional-unpack_logfmt). diff --git a/docs/VictoriaLogs/LogsQL.md b/docs/VictoriaLogs/LogsQL.md index 5f89041bd..6cbee8e54 100644 --- a/docs/VictoriaLogs/LogsQL.md +++ b/docs/VictoriaLogs/LogsQL.md @@ -1056,6 +1056,7 @@ LogsQL supports the following pipes: - [`field_names`](#field_names-pipe) returns all the names of [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model). - [`fields`](#fields-pipe) selects the given set of [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model). - [`filter`](#filter-pipe) applies additional [filters](#filters) to results. +- [`format`](#format-pipe) formats ouptut field from input [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model). - [`limit`](#limit-pipe) limits the number selected logs. - [`offset`](#offset-pipe) skips the given number of selected logs. - [`rename`](#rename-pipe) renames [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model). @@ -1110,21 +1111,21 @@ See also: ### extract pipe -`| extract from field_name "pattern"` [pipe](#pipes) allows extracting additional fields specified in the `pattern` from the given -`field_name` [log field](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model). Existing log fields remain unchanged -after the `| extract ...` pipe. +`| extract "pattern" from field_name` [pipe](#pipes) allows extracting abitrary text into output fields according to the [`pattern`](#format-for-extract-pipe-pattern) from the given +[`field_name`](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model). Existing log fields remain unchanged after the `| extract ...` pipe. -`| extract ...` pipe can be useful for extracting additional fields needed for further data processing with other pipes such as [`stats` pipe](#stats-pipe) or [`sort` pipe](#sort-pipe). +`| extract ...` can be useful for extracting additional fields needed for further data processing with other pipes such as [`stats` pipe](#stats-pipe) or [`sort` pipe](#sort-pipe). For example, the following query selects logs with the `error` [word](#word) for the last day, extracts ip address from [`_msg` field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) into `ip` field and then calculates top 10 ip addresses with the biggest number of logs: ```logsql -_time:1d error | extract from _msg "ip= " | stats by (ip) count() logs | sort by (logs) desc limit 10 +_time:1d error | extract "ip= " from _msg | stats by (ip) count() logs | sort by (logs) desc limit 10 ``` -It is expected that `_msg` field contains `ip=...` substring, which ends with space. For example, `error ip=1.2.3.4 from user_id=42`. +It is expected that `_msg` field contains `ip=...` substring ending with space. For example, `error ip=1.2.3.4 from user_id=42`. +If there is no such substring in the current `_msg` field, then the `ip` output field will be empty. If the `| extract ...` pipe is applied to [`_msg` field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field), then the `from _msg` part can be omitted. For example, the following query is equivalent to the previous one: @@ -1133,6 +1134,12 @@ For example, the following query is equivalent to the previous one: _time:1d error | extract "ip= " | stats by (ip) count() logs | sort by (logs) desc limit 10 ``` +If the `pattern` contains double quotes, then it can be quoted into single quotes. For example, the following query extracts `ip` from the corresponding JSON field: + +```logsql +_time:5m | extract '"ip":""' +``` + See also: - [Format for extract pipe pattern](#format-for-extract-pipe-pattern) @@ -1140,23 +1147,27 @@ See also: - [`unpack_json` pipe](#unpack_json-pipe) - [`unpack_logfmt` pipe](#unpack_logfmt-pipe) -#### Conditional extract - -If some log entries must be skipped from [`extract` pipe](#extract-pipe), then add `if ()` filter to the end of `| extract ...` pipe. -The `` can contain arbitrary [filters](#filters). For example, the following query extracts `ip` field only -if the input [log entry](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model) doesn't contain `ip` field or this field is empty: - -```logsql -_time:5m | extract "ip= " if (ip:"") -``` - #### Format for extract pipe pattern -The `pattern` part from [`| extract from src_field "pattern"` pipe](#extract-pipes) may contain arbitrary text, which matches as is to the `src_field` value. -Additionally to arbitrary text, the `pattern` may contain placeholders in the form `<...>`, which match any strings, including empty strings. -Placeholders may be named, such as ``, or anonymous, such as `<_>`. Named placeholders extract the matching text into -the corresponding [log field](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model). -Anonymous placeholders are useful for skipping arbitrary text during pattern matching. +The `pattern` part from [`extract ` pipe](#extract-pipe) has the following format: + +``` +text1text2...textNtextN+1 +``` + +Where `text1`, ... `textN+1` is arbitrary non-empty text, which matches as is to the input text. + +The `field1`, ... `fieldN` are placeholders, which match a substring of any length (including zero length) in the input text until the next `textX`. +Placeholders can be anonymous and named. Anonymous placeholders are written as `<_>`. They are used for convenience when some input text +must be skipped until the next `textX`. Named palceholders are written as ``, where `some_name` is the name of the log field to store +the corresponding matching substring to. + +The matching starts from the first occurence of the `text1` in the input text. If the `pattern` starts with `` and doesn't contain `text1`, +then the matching starts from the beginning of the input text. Matching is performed sequentially according to the `pattern`. If some `textX` isn't found +in the remaining input text, then the remaining named placeholders receive empty string values and the matching finishes prematurely. + +Matching finishes successfully when `textN+1` is found in the input text. +If the `pattern` ends with `` and doesn't contain `textN+1`, then the `` matches the remaining input text. For example, if [`_msg` field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) contains the following text: @@ -1164,34 +1175,44 @@ For example, if [`_msg` field](https://docs.victoriametrics.com/victorialogs/key 1.2.3.4 GET /foo/bar?baz 404 "Mozilla foo bar baz" some tail here ``` -Then the following `| extract ...` [pipe](#pipes) can be used for extracting `ip`, `path` and `user_agent` fields from it: +Then the following `pattern` can be used for extracting `ip`, `path` and `user_agent` fields from it: ``` -| extract ' <_> <_> ""' + <_> <_> "" ``` Note that the user-agent part of the log message is in double quotes. This means that it may contain special chars, including escaped double quote, e.g. `\"`. This may break proper matching of the string in double quotes. -VictoriaLogs automatically detects the whole string in quotes and automatically decodes it if the first char in the placeholder is double quote or backtick. -So it is better to use the following `pattern` for proper matching of quoted strings: +VictoriaLogs automatically detects quoted strings and automatically unquotes them if the first matching char in the placeholder is double quote or backtick. +So it is better to use the following `pattern` for proper matching of quoted `user_agent` string: ``` -| extract " <_> <_> " + <_> <_> ``` -Note that the `user_agent` now matches double quotes, but VictoriaLogs automatically unquotes the matching string before storing it in the `user_agent` field. -This is useful for extracting JSON strings. For example, the following `pattern` properly extracts the `message` JSON string into `msg` field: +This is useful for extracting JSON strings. For example, the following `pattern` properly extracts the `message` JSON string into `msg` field, even if it contains special chars: ``` -| extract '"message":' +"message": ``` If some special chars such as `<` must be matched by the `pattern`, then they can be [html-escaped](https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references). -For example, the following `pattern` properly matches `a < 123.456` text: +For example, the following `pattern` properly matches `a < b` text by extracting `a` into `left` field and `b` into `right` field: ``` -| extract " < " + < +``` + +#### Conditional extract + +If some log entries must be skipped from [`extract` pipe](#extract-pipe), then add `if ()` filter after the `extract` word. +The `` can contain arbitrary [filters](#filters). For example, the following query extracts `ip` field +from [`_msg` field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) only +if the input [log entry](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model) doesn't contain `ip` field or this field is empty: + +```logsql +_time:5m | extract if (ip:"") "ip= " ``` ### field_names pipe @@ -1249,6 +1270,49 @@ See also: - [`stats` pipe](#stats-pipe) - [`sort` pipe](#sort-pipe) +### format pipe + +`| format "pattern" as result_field` [pipe](#format-pipe) combines [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) +according to the `pattern` and stores it to the `result_field`. + +For example, the following query stores `request from :` text into [`_msg` field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field), +by substituting `` and `` with the corresponding [log field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) names: + +```logsql +_time:5m | format "request from :" as _msg +``` + +If the result of the `format` pattern is stored into [`_msg` field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field), +then `as _msg` part can be omitted. The following query is equivalent to the previous one: + +```logsql +_time:5m | format "request from :" +``` + +If some field values must be put into double quotes before formatting, then add `:q` after the corresponding field name. +For example, the following command generates properly encoded JSON object from `_msg` and `stacktrace` [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) +and stores it into `my_json` output field: + +```logsql +_time:5m | format '{"_msg":<_msg:q>,"stacktrace":}' as my_json +``` + +See also: + +- [Conditional format](#conditional-format) +- [`extract` pipe](#extract-pipe) + +#### Conditional format + +If the [`format` pipe](#format-pipe) musn't be applied to every [log entry](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model), +then add `if ()` just after the `format` word. +The `` can contain arbitrary [filters](#filters). For example, the following query stores the formatted result to `message` field +only if `ip` and `host` [fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) aren't empty: + +```logsql +_time:5m | format if (ip:* and host:*) "request from :" as message +``` + ### limit pipe If only a subset of selected logs must be processed, then `| limit N` [pipe](#pipes) can be used, where `N` can contain any [supported integer numeric value](#numeric-values). diff --git a/lib/logstorage/parser_test.go b/lib/logstorage/parser_test.go index 108e9f546..93a360de0 100644 --- a/lib/logstorage/parser_test.go +++ b/lib/logstorage/parser_test.go @@ -1001,11 +1001,11 @@ func TestParseQuerySuccess(t *testing.T) { // extract pipe f(`* | extract "foobaz"`, `* | extract "foobaz"`) - f(`* | extract from _msg "foobaz"`, `* | extract "foobaz"`) - f(`* | extract from '' 'foobaz'`, `* | extract "foobaz"`) - f("* | extract from x `foobaz`", `* | extract from x "foobaz"`) - f("* | extract from x foobaz", `* | extract from x "foobaz"`) - f("* | extract from x foobaz if (a:b)", `* | extract from x "foobaz" if (a:b)`) + f(`* | extract "foobaz" from _msg`, `* | extract "foobaz"`) + f(`* | extract 'foobaz' from ''`, `* | extract "foobaz"`) + f("* | extract `foobaz` from x", `* | extract "foobaz" from x`) + f("* | extract foobaz from x", `* | extract "foobaz" from x`) + f("* | extract if (a:b) foobaz from x", `* | extract if (a:b) "foobaz" from x`) // unpack_json pipe f(`* | unpack_json`, `* | unpack_json`) @@ -1625,10 +1625,10 @@ func TestQueryGetNeededColumns(t *testing.T) { f(`* | format "foo" as s1`, `*`, `s1`) f(`* | format "foo" as s1`, `*`, ``) - f(`* | format "foo" if (x1:y) as s1`, `*`, `s1`) - f(`* | format "foo" if (x1:y) as s1`, `*`, `s1`) - f(`* | format "foo" if (s1:y) as s1`, `*`, ``) - f(`* | format "foo" if (x1:y) as s1`, `*`, ``) + f(`* | format if (x1:y) "foo" as s1`, `*`, `s1`) + f(`* | format if (x1:y) "foo" as s1`, `*`, `s1`) + f(`* | format if (s1:y) "foo" as s1`, `*`, ``) + f(`* | format if (x1:y) "foo" as s1`, `*`, ``) f(`* | format "foo" as s1 | fields f1`, `f1`, ``) f(`* | format "foo" as s1 | fields s1`, ``, ``) @@ -1638,8 +1638,8 @@ func TestQueryGetNeededColumns(t *testing.T) { f(`* | format "foo" as s1 | fields f1`, `f1`, ``) f(`* | format "foo" as s1 | fields s1`, `s1`, ``) - f(`* | format "foo" if (f1:x) as s1 | fields s1`, `f1`, ``) - f(`* | format "foo" if (f1:x) as s1 | fields s2`, `s2`, ``) + f(`* | format if (f1:x) "foo" as s1 | fields s1`, `f1`, ``) + f(`* | format if (f1:x) "foo" as s1 | fields s2`, `s2`, ``) f(`* | format "foo" as s1 | rm f1`, `*`, `f1,s1`) f(`* | format "foo" as s1 | rm s1`, `*`, `s1`) @@ -1649,52 +1649,52 @@ func TestQueryGetNeededColumns(t *testing.T) { f(`* | format "foo" as s1 | rm f1`, `*`, `f1`) f(`* | format "foo" as s1 | rm s1`, `*`, `s1`) - f(`* | format "foo" if (f1:x) as s1 | rm s1`, `*`, `s1`) - f(`* | format "foo" if (f1:x) as s1 | rm f1`, `*`, `s1`) - f(`* | format "foo" if (f1:x) as s1 | rm f2`, `*`, `f2,s1`) + f(`* | format if (f1:x) "foo" as s1 | rm s1`, `*`, `s1`) + f(`* | format if (f1:x) "foo" as s1 | rm f1`, `*`, `s1`) + f(`* | format if (f1:x) "foo" as s1 | rm f2`, `*`, `f2,s1`) - f(`* | extract from s1 "x"`, `*`, `f1,f2`) - f(`* | extract from s1 "x" if (f3:foo)`, `*`, `f1,f2`) - f(`* | extract from s1 "x" if (f1:foo)`, `*`, `f2`) - f(`* | extract from s1 "x" | fields foo`, `foo`, ``) - f(`* | extract from s1 "x" if (x:bar) | fields foo`, `foo`, ``) - f(`* | extract from s1 "x" | fields foo,s1`, `foo,s1`, ``) - f(`* | extract from s1 "x" if (x:bar) | fields foo,s1`, `foo,s1`, ``) - f(`* | extract from s1 "x" | fields foo,f1`, `foo,s1`, ``) - f(`* | extract from s1 "x" if (x:bar) | fields foo,f1`, `foo,s1,x`, ``) - f(`* | extract from s1 "x" | fields foo,f1,f2`, `foo,s1`, ``) - f(`* | extract from s1 "x" if (x:bar) | fields foo,f1,f2`, `foo,s1,x`, ``) - f(`* | extract from s1 "x" | rm foo`, `*`, `f1,f2,foo`) - f(`* | extract from s1 "x" if (x:bar) | rm foo`, `*`, `f1,f2,foo`) - f(`* | extract from s1 "x" | rm foo,s1`, `*`, `f1,f2,foo`) - f(`* | extract from s1 "x" if (x:bar) | rm foo,s1`, `*`, `f1,f2,foo`) - f(`* | extract from s1 "x" | rm foo,f1`, `*`, `f1,f2,foo`) - f(`* | extract from s1 "x" if (x:bar) | rm foo,f1`, `*`, `f1,f2,foo`) - f(`* | extract from s1 "x" | rm foo,f1,f2`, `*`, `f1,f2,foo,s1`) - f(`* | extract from s1 "x" if (x:bar) | rm foo,f1,f2`, `*`, `f1,f2,foo,s1`) + f(`* | extract "x" from s1`, `*`, `f1,f2`) + f(`* | extract if (f3:foo) "x" from s1`, `*`, `f1,f2`) + f(`* | extract if (f1:foo) "x" from s1`, `*`, `f2`) + f(`* | extract "x" from s1 | fields foo`, `foo`, ``) + f(`* | extract if (x:bar) "x" from s1 | fields foo`, `foo`, ``) + f(`* | extract "x" from s1| fields foo,s1`, `foo,s1`, ``) + f(`* | extract if (x:bar) "x" from s1 | fields foo,s1`, `foo,s1`, ``) + f(`* | extract "x" from s1 | fields foo,f1`, `foo,s1`, ``) + f(`* | extract if (x:bar) "x" from s1 | fields foo,f1`, `foo,s1,x`, ``) + f(`* | extract "x" from s1 | fields foo,f1,f2`, `foo,s1`, ``) + f(`* | extract if (x:bar) "x" from s1 | fields foo,f1,f2`, `foo,s1,x`, ``) + f(`* | extract "x" from s1 | rm foo`, `*`, `f1,f2,foo`) + f(`* | extract if (x:bar) "x" from s1 | rm foo`, `*`, `f1,f2,foo`) + f(`* | extract "x" from s1 | rm foo,s1`, `*`, `f1,f2,foo`) + f(`* | extract if (x:bar) "x" from s1 | rm foo,s1`, `*`, `f1,f2,foo`) + f(`* | extract "x" from s1 | rm foo,f1`, `*`, `f1,f2,foo`) + f(`* | extract if (x:bar) "x" from s1 | rm foo,f1`, `*`, `f1,f2,foo`) + f(`* | extract "x" from s1 | rm foo,f1,f2`, `*`, `f1,f2,foo,s1`) + f(`* | extract if (x:bar) "x" from s1 | rm foo,f1,f2`, `*`, `f1,f2,foo,s1`) - f(`* | extract from s1 "xy"`, `*`, ``) - f(`* | extract from s1 "xy" if (x:foo)`, `*`, ``) - f(`* | extract from s1 "xy" if (s1:foo)`, `*`, ``) - f(`* | extract from s1 "xy" if (s1:foo)`, `*`, `f1`) + f(`* | extract "xy" from s1 `, `*`, ``) + f(`* | extract if (x:foo) "xy" from s1`, `*`, ``) + f(`* | extract if (s1:foo) "xy" from s1`, `*`, ``) + f(`* | extract if (s1:foo) "xy" from s1`, `*`, `f1`) - f(`* | extract from s1 "xy" | fields s2`, `s2`, ``) - f(`* | extract from s1 "xy" | fields s1`, `s1`, ``) - f(`* | extract from s1 "xy" if (x:foo) | fields s1`, `s1,x`, ``) - f(`* | extract from s1 "xy" if (x:foo) | fields s2`, `s2`, ``) - f(`* | extract from s1 "xy" if (s1:foo) | fields s1`, `s1`, ``) - f(`* | extract from s1 "xy" if (s1:foo) | fields s2`, `s2`, ``) - f(`* | extract from s1 "xy" if (s1:foo) | fields s1`, `s1`, ``) - f(`* | extract from s1 "xy" if (s1:foo) | fields s2`, `s2`, ``) + f(`* | extract "xy" from s1 | fields s2`, `s2`, ``) + f(`* | extract "xy" from s1 | fields s1`, `s1`, ``) + f(`* | extract if (x:foo) "xy" from s1 | fields s1`, `s1,x`, ``) + f(`* | extract if (x:foo) "xy" from s1 | fields s2`, `s2`, ``) + f(`* | extract if (s1:foo) "xy" from s1 | fields s1`, `s1`, ``) + f(`* | extract if (s1:foo) "xy" from s1 | fields s2`, `s2`, ``) + f(`* | extract if (s1:foo) "xy" from s1 | fields s1`, `s1`, ``) + f(`* | extract if (s1:foo) "xy" from s1 | fields s2`, `s2`, ``) - f(`* | extract from s1 "xy" | rm s2`, `*`, `s2`) - f(`* | extract from s1 "xy" | rm s1`, `*`, `s1`) - f(`* | extract from s1 "xy" if (x:foo) | rm s1`, `*`, `s1`) - f(`* | extract from s1 "xy" if (x:foo) | rm s2`, `*`, `s2`) - f(`* | extract from s1 "xy" if (s1:foo) | rm s1`, `*`, `s1`) - f(`* | extract from s1 "xy" if (s1:foo) | rm s2`, `*`, `s2`) - f(`* | extract from s1 "xy" if (s1:foo) | rm s1`, `*`, `f1`) - f(`* | extract from s1 "xy" if (s1:foo) | rm s2`, `*`, `f1,s2`) + f(`* | extract "xy" from s1 | rm s2`, `*`, `s2`) + f(`* | extract "xy" from s1 | rm s1`, `*`, `s1`) + f(`* | extract if (x:foo) "xy" from s1 | rm s1`, `*`, `s1`) + f(`* | extract if (x:foo) "xy" from s1 | rm s2`, `*`, `s2`) + f(`* | extract if (s1:foo) "xy" from s1 | rm s1`, `*`, `s1`) + f(`* | extract if (s1:foo) "xy" from s1 | rm s2`, `*`, `s2`) + f(`* | extract if (s1:foo) "xy" from s1 | rm s1`, `*`, `f1`) + f(`* | extract if (s1:foo) "xy" from s1 | rm s2`, `*`, `f1,s2`) f(`* | unpack_json`, `*`, ``) f(`* | unpack_json from s1`, `*`, ``) diff --git a/lib/logstorage/pipe_extract.go b/lib/logstorage/pipe_extract.go index b7097c78a..4e172d240 100644 --- a/lib/logstorage/pipe_extract.go +++ b/lib/logstorage/pipe_extract.go @@ -4,7 +4,7 @@ import ( "fmt" ) -// pipeExtract processes '| extract from ' pipe. +// pipeExtract processes '| extract ...' pipe. // // See https://docs.victoriametrics.com/victorialogs/logsql/#extract-pipe type pipeExtract struct { @@ -19,13 +19,13 @@ type pipeExtract struct { func (pe *pipeExtract) String() string { s := "extract" - if !isMsgFieldName(pe.fromField) { - s += " from " + quoteTokenIfNeeded(pe.fromField) - } - s += " " + quoteTokenIfNeeded(pe.patternStr) if pe.iff != nil { s += " " + pe.iff.String() } + s += " " + quoteTokenIfNeeded(pe.patternStr) + if !isMsgFieldName(pe.fromField) { + s += " from " + quoteTokenIfNeeded(pe.fromField) + } return s } @@ -90,14 +90,14 @@ func parsePipeExtract(lex *lexer) (*pipeExtract, error) { } lex.nextToken() - fromField := "_msg" - if lex.isKeyword("from") { - lex.nextToken() - f, err := parseFieldName(lex) + // parse optional if (...) + var iff *ifFilter + if lex.isKeyword("if") { + f, err := parseIfFilter(lex) if err != nil { - return nil, fmt.Errorf("cannot parse 'from' field name: %w", err) + return nil, err } - fromField = f + iff = f } // parse pattern @@ -110,19 +110,22 @@ func parsePipeExtract(lex *lexer) (*pipeExtract, error) { return nil, fmt.Errorf("cannot parse 'pattern' %q: %w", patternStr, err) } + // parse optional 'from ...' part + fromField := "_msg" + if lex.isKeyword("from") { + lex.nextToken() + f, err := parseFieldName(lex) + if err != nil { + return nil, fmt.Errorf("cannot parse 'from' field name: %w", err) + } + fromField = f + } + pe := &pipeExtract{ fromField: fromField, ptn: ptn, patternStr: patternStr, - } - - // parse optional if (...) - if lex.isKeyword("if") { - iff, err := parseIfFilter(lex) - if err != nil { - return nil, err - } - pe.iff = iff + iff: iff, } return pe, nil diff --git a/lib/logstorage/pipe_extract_test.go b/lib/logstorage/pipe_extract_test.go index 86c7de364..559ce4027 100644 --- a/lib/logstorage/pipe_extract_test.go +++ b/lib/logstorage/pipe_extract_test.go @@ -11,8 +11,8 @@ func TestParsePipeExtractSuccess(t *testing.T) { } f(`extract "foo"`) - f(`extract from x "foo"`) - f(`extract from x "foo" if (y:in(a:foo bar | uniq by (qwe) limit 10))`) + f(`extract "foo" from x`) + f(`extract if (x:y) "foo" from baz`) } func TestParsePipeExtractFailure(t *testing.T) { @@ -23,11 +23,10 @@ func TestParsePipeExtractFailure(t *testing.T) { f(`extract`) f(`extract from`) + f(`extract from x`) + f(`extract from x "y"`) f(`extract if (x:y)`) - f(`extract if (x:y) "a"`) - f(`extract "a" if`) - f(`extract "a" if (foo`) - f(`extract "a" if "foo"`) + f(`extract "a" if (x:y)`) f(`extract "a"`) f(`extract ""`) f(`extract "<*>foo<_>bar"`) @@ -64,7 +63,7 @@ func TestPipeExtract(t *testing.T) { }) // single row, extract from non-existing field - f(`extract from x "foo="`, [][]Field{ + f(`extract "foo=" from x`, [][]Field{ { {"_msg", `foo=bar`}, }, @@ -76,7 +75,7 @@ func TestPipeExtract(t *testing.T) { }) // single row, pattern mismatch - f(`extract from x "foo="`, [][]Field{ + f(`extract "foo=" from x`, [][]Field{ { {"x", `foobar`}, }, @@ -88,7 +87,7 @@ func TestPipeExtract(t *testing.T) { }) // single row, partial partern match - f(`extract from x "foo= baz="`, [][]Field{ + f(`extract "foo= baz=" from x`, [][]Field{ { {"x", `a foo="a\"b\\c" cde baz=aa`}, }, @@ -101,7 +100,7 @@ func TestPipeExtract(t *testing.T) { }) // single row, overwirte existing column - f(`extract from x "foo= baz="`, [][]Field{ + f(`extract "foo= baz=" from x`, [][]Field{ { {"x", `a foo=cc baz=aa b`}, {"bar", "abc"}, @@ -115,7 +114,7 @@ func TestPipeExtract(t *testing.T) { }) // single row, if match - f(`extract from x "foo= baz=" if (x:baz)`, [][]Field{ + f(`extract if (x:baz) "foo= baz=" from x`, [][]Field{ { {"x", `a foo=cc baz=aa b`}, {"bar", "abc"}, @@ -129,7 +128,7 @@ func TestPipeExtract(t *testing.T) { }) // single row, if mismatch - f(`extract from x "foo= baz=" if (bar:"")`, [][]Field{ + f(`extract if (bar:"") "foo= baz=" from x`, [][]Field{ { {"x", `a foo=cc baz=aa b`}, {"bar", "abc"}, @@ -142,7 +141,7 @@ func TestPipeExtract(t *testing.T) { }) // multiple rows with distinct set of labels - f(`extract "ip= " if (!ip:keep)`, [][]Field{ + f(`extract if (!ip:keep) "ip= "`, [][]Field{ { {"foo", "bar"}, {"_msg", "request from ip=1.2.3.4 xxx"}, @@ -201,44 +200,44 @@ func TestPipeExtractUpdateNeededFields(t *testing.T) { } // all the needed fields - f("extract from x ''", "*", "", "*", "foo") - f("extract from x '' if (foo:bar)", "*", "", "*", "") + f("extract '' from x", "*", "", "*", "foo") + f("extract if (foo:bar) '' from x", "*", "", "*", "") // unneeded fields do not intersect with pattern and output fields - f("extract from x ''", "*", "f1,f2", "*", "f1,f2,foo") - f("extract from x '' if (f1:x)", "*", "f1,f2", "*", "f2,foo") - f("extract from x '' if (foo:bar f1:x)", "*", "f1,f2", "*", "f2") + f("extract '' from x", "*", "f1,f2", "*", "f1,f2,foo") + f("extract if (f1:x) '' from x", "*", "f1,f2", "*", "f2,foo") + f("extract if (foo:bar f1:x) '' from x", "*", "f1,f2", "*", "f2") // unneeded fields intersect with pattern - f("extract from x ''", "*", "f2,x", "*", "f2,foo") - f("extract from x '' if (f1:abc)", "*", "f2,x", "*", "f2,foo") - f("extract from x '' if (f2:abc)", "*", "f2,x", "*", "foo") + f("extract '' from x", "*", "f2,x", "*", "f2,foo") + f("extract if (f1:abc) '' from x", "*", "f2,x", "*", "f2,foo") + f("extract if (f2:abc) '' from x", "*", "f2,x", "*", "foo") // unneeded fields intersect with output fields - f("extract from x 'x'", "*", "f2,foo", "*", "bar,f2,foo") - f("extract from x 'x' if (f1:abc)", "*", "f2,foo", "*", "bar,f2,foo") - f("extract from x 'x' if (f2:abc foo:w)", "*", "f2,foo", "*", "bar") + f("extract 'x' from x", "*", "f2,foo", "*", "bar,f2,foo") + f("extract if (f1:abc) 'x' from x", "*", "f2,foo", "*", "bar,f2,foo") + f("extract if (f2:abc foo:w) 'x' from x", "*", "f2,foo", "*", "bar") // unneeded fields intersect with all the output fields - f("extract from x 'x'", "*", "f2,foo,bar", "*", "bar,f2,foo,x") - f("extract from x 'x if (a:b f2:q x:y foo:w)'", "*", "f2,foo,bar", "*", "bar,f2,foo,x") + f("extract 'x' from x", "*", "f2,foo,bar", "*", "bar,f2,foo,x") + f("extract if (a:b f2:q x:y foo:w) 'x' from x", "*", "f2,foo,bar", "*", "bar,f2,foo,x") // needed fields do not intersect with pattern and output fields - f("extract from x 'x'", "f1,f2", "", "f1,f2", "") - f("extract from x 'x' if (a:b)", "f1,f2", "", "f1,f2", "") - f("extract from x 'x' if (f1:b)", "f1,f2", "", "f1,f2", "") + f("extract 'x' from x", "f1,f2", "", "f1,f2", "") + f("extract if (a:b) 'x' from x", "f1,f2", "", "f1,f2", "") + f("extract if (f1:b) 'x' from x", "f1,f2", "", "f1,f2", "") // needed fields intersect with pattern field - f("extract from x 'x'", "f2,x", "", "f2,x", "") - f("extract from x 'x' if (a:b)", "f2,x", "", "f2,x", "") + f("extract 'x' from x", "f2,x", "", "f2,x", "") + f("extract if (a:b) 'x' from x", "f2,x", "", "f2,x", "") // needed fields intersect with output fields - f("extract from x 'x'", "f2,foo", "", "f2,x", "") - f("extract from x 'x' if (a:b)", "f2,foo", "", "a,f2,x", "") + f("extract 'x' from x", "f2,foo", "", "f2,x", "") + f("extract if (a:b) 'x' from x", "f2,foo", "", "a,f2,x", "") // needed fields intersect with pattern and output fields - f("extract from x 'x'", "f2,foo,x,y", "", "f2,x,y", "") - f("extract from x 'x' if (a:b foo:q)", "f2,foo,x,y", "", "a,f2,foo,x,y", "") + f("extract 'x' from x", "f2,foo,x,y", "", "f2,x,y", "") + f("extract if (a:b foo:q) 'x' from x", "f2,foo,x,y", "", "a,f2,foo,x,y", "") } func expectParsePipeFailure(t *testing.T, pipeStr string) { diff --git a/lib/logstorage/pipe_format.go b/lib/logstorage/pipe_format.go index c71aba4a3..3f830b199 100644 --- a/lib/logstorage/pipe_format.go +++ b/lib/logstorage/pipe_format.go @@ -21,11 +21,14 @@ type pipeFormat struct { } func (pf *pipeFormat) String() string { - s := "format " + quoteTokenIfNeeded(pf.formatStr) + s := "format" if pf.iff != nil { s += " " + pf.iff.String() } - s += " as " + quoteTokenIfNeeded(pf.resultField) + s += " " + quoteTokenIfNeeded(pf.formatStr) + if !isMsgFieldName(pf.resultField) { + s += " as " + quoteTokenIfNeeded(pf.resultField) + } return s } @@ -150,16 +153,6 @@ func parsePipeFormat(lex *lexer) (*pipeFormat, error) { } lex.nextToken() - // parse format - formatStr, err := getCompoundToken(lex) - if err != nil { - return nil, fmt.Errorf("cannot read 'format': %w", err) - } - steps, err := parsePatternSteps(formatStr) - if err != nil { - return nil, fmt.Errorf("cannot parse 'pattern' %q: %w", formatStr, err) - } - // parse optional if (...) var iff *ifFilter if lex.isKeyword("if") { @@ -170,14 +163,25 @@ func parsePipeFormat(lex *lexer) (*pipeFormat, error) { iff = f } - // parse resultField - if !lex.isKeyword("as") { - return nil, fmt.Errorf("missing 'as' keyword after 'format %q'", formatStr) - } - lex.nextToken() - resultField, err := parseFieldName(lex) + // parse format + formatStr, err := getCompoundToken(lex) if err != nil { - return nil, fmt.Errorf("cannot parse result field after 'format %q as': %w", formatStr, err) + return nil, fmt.Errorf("cannot read 'format': %w", err) + } + steps, err := parsePatternSteps(formatStr) + if err != nil { + return nil, fmt.Errorf("cannot parse 'pattern' %q: %w", formatStr, err) + } + + // parse optional 'as ...` part + resultField := "_msg" + if lex.isKeyword("as") { + lex.nextToken() + field, err := parseFieldName(lex) + if err != nil { + return nil, fmt.Errorf("cannot parse result field after 'format %q as': %w", formatStr, err) + } + resultField = field } pf := &pipeFormat{ diff --git a/lib/logstorage/pipe_format_test.go b/lib/logstorage/pipe_format_test.go index b8e78d666..d40f85bac 100644 --- a/lib/logstorage/pipe_format_test.go +++ b/lib/logstorage/pipe_format_test.go @@ -10,13 +10,14 @@ func TestParsePipeFormatSuccess(t *testing.T) { expectParsePipeSuccess(t, pipeStr) } + f(`format "foo"`) f(`format "" as x`) f(`format "<>" as x`) f(`format foo as x`) - f(`format "" as _msg`) - f(`format "bar" as _msg`) - f(`format "barbac" as _msg`) - f(`format "barbac" if (x:y) as _msg`) + f(`format ""`) + f(`format "bar"`) + f(`format "barbac"`) + f(`format if (x:y) "barbac"`) } func TestParsePipeFormatFailure(t *testing.T) { @@ -26,9 +27,8 @@ func TestParsePipeFormatFailure(t *testing.T) { } f(`format`) - f(`format foo`) + f(`format if`) f(`format foo bar`) - f(`format foo as`) f(`format foo if`) f(`format foo as x if (x:y)`) } @@ -108,7 +108,7 @@ func TestPipeFormat(t *testing.T) { }) // conditional format over multiple rows - f(`format "a: , b: , x: " if (!c:*) as c`, [][]Field{ + f(`format if (!c:*) "a: , b: , x: " as c`, [][]Field{ { {"b", "bar"}, {"a", "foo"}, @@ -147,41 +147,41 @@ func TestPipeFormatUpdateNeededFields(t *testing.T) { // all the needed fields f(`format "foo" as x`, "*", "", "*", "x") f(`format "foo" as x`, "*", "", "*", "x") - f(`format "foo" if (f2:z) as x`, "*", "", "*", "x") + f(`format if (f2:z) "foo" as x`, "*", "", "*", "x") // unneeded fields do not intersect with pattern and output field f(`format "foo" as x`, "*", "f1,f2", "*", "f1,f2,x") f(`format "foo" as x`, "*", "f1,f2", "*", "f1,f2,x") - f(`format "foo" if (f4:z) as x`, "*", "f1,f2", "*", "f1,f2,x") - f(`format "foo" if (f1:z) as x`, "*", "f1,f2", "*", "f2,x") + f(`format if (f4:z) "foo" as x`, "*", "f1,f2", "*", "f1,f2,x") + f(`format if (f1:z) "foo" as x`, "*", "f1,f2", "*", "f2,x") // unneeded fields intersect with pattern f(`format "foo" as x`, "*", "f1,f2", "*", "f2,x") - f(`format "foo" if (f4:z) as x`, "*", "f1,f2", "*", "f2,x") - f(`format "foo" if (f2:z) as x`, "*", "f1,f2", "*", "x") + f(`format if (f4:z) "foo" as x`, "*", "f1,f2", "*", "f2,x") + f(`format if (f2:z) "foo" as x`, "*", "f1,f2", "*", "x") // unneeded fields intersect with output field f(`format "foo" as x`, "*", "x,y", "*", "x,y") - f(`format "foo" if (f2:z) as x`, "*", "x,y", "*", "x,y") - f(`format "foo" if (y:z) as x`, "*", "x,y", "*", "x,y") + f(`format if (f2:z) "foo" as x`, "*", "x,y", "*", "x,y") + f(`format if (y:z) "foo" as x`, "*", "x,y", "*", "x,y") // needed fields do not intersect with pattern and output field f(`format "foo" as f2`, "x,y", "", "x,y", "") - f(`format "foo" if (f3:z) as f2`, "x,y", "", "x,y", "") - f(`format "foo" if (x:z) as f2`, "x,y", "", "x,y", "") + f(`format if (f3:z) "foo" as f2`, "x,y", "", "x,y", "") + f(`format if (x:z) "foo" as f2`, "x,y", "", "x,y", "") // needed fields intersect with pattern field f(`format "foo" as f2`, "f1,y", "", "f1,y", "") - f(`format "foo" if (f3:z) as f2`, "f1,y", "", "f1,y", "") - f(`format "foo" if (x:z) as f2`, "f1,y", "", "f1,y", "") + f(`format if (f3:z) "foo" as f2`, "f1,y", "", "f1,y", "") + f(`format if (x:z) "foo" as f2`, "f1,y", "", "f1,y", "") // needed fields intersect with output field f(`format "foo" as f2`, "f2,y", "", "f1,y", "") - f(`format "foo" if (f3:z) as f2`, "f2,y", "", "f1,f3,y", "") - f(`format "foo" if (x:z or y:w) as f2`, "f2,y", "", "f1,x,y", "") + f(`format if (f3:z) "foo" as f2`, "f2,y", "", "f1,f3,y", "") + f(`format if (x:z or y:w) "foo" as f2`, "f2,y", "", "f1,x,y", "") // needed fields intersect with pattern and output fields f(`format "foo" as f2`, "f1,f2,y", "", "f1,y", "") - f(`format "foo" if (f3:z) as f2`, "f1,f2,y", "", "f1,f3,y", "") - f(`format "foo" if (x:z or y:w) as f2`, "f1,f2,y", "", "f1,x,y", "") + f(`format if (f3:z) "foo" as f2`, "f1,f2,y", "", "f1,f3,y", "") + f(`format if (x:z or y:w) "foo" as f2`, "f1,f2,y", "", "f1,x,y", "") }