This commit is contained in:
Aliaksandr Valialkin 2024-05-22 17:17:59 +02:00
parent 93a645dcfc
commit 79787ce25a
No known key found for this signature in database
GPG key ID: 52C003EE2BCDB9EB
7 changed files with 251 additions and 180 deletions

View file

@ -19,6 +19,7 @@ according to [these docs](https://docs.victoriametrics.com/VictoriaLogs/QuickSta
## tip
* FEATURE: add ability to generate output fields according to the provided format string. See [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#format-pipe).
* FEATURE: add ability to extract fields with [`extract` pipe](https://docs.victoriametrics.com/victorialogs/logsql/#extract-pipe) only if the given condition is met. See [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#conditional-extract).
* FEATURE: add ability to unpack JSON fields with [`unpack_json` pipe](https://docs.victoriametrics.com/victorialogs/logsql/#unpack_json-pipe) only if the given condition is met. See [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#conditional-unpack_json).
* FEATURE: add ability to unpack [logfmt](https://brandur.org/logfmt) fields with [`unpack_logfmt` pipe](https://docs.victoriametrics.com/victorialogs/logsql/#unpack_logfmt-pipe) only if the given condition is met. See [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#conditional-unpack_logfmt).

View file

@ -1056,6 +1056,7 @@ LogsQL supports the following pipes:
- [`field_names`](#field_names-pipe) returns all the names of [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model).
- [`fields`](#fields-pipe) selects the given set of [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model).
- [`filter`](#filter-pipe) applies additional [filters](#filters) to results.
- [`format`](#format-pipe) formats ouptut field from input [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model).
- [`limit`](#limit-pipe) limits the number selected logs.
- [`offset`](#offset-pipe) skips the given number of selected logs.
- [`rename`](#rename-pipe) renames [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model).
@ -1110,21 +1111,21 @@ See also:
### extract pipe
`| extract from field_name "pattern"` [pipe](#pipes) allows extracting additional fields specified in the `pattern` from the given
`field_name` [log field](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model). Existing log fields remain unchanged
after the `| extract ...` pipe.
`| extract "pattern" from field_name` [pipe](#pipes) allows extracting abitrary text into output fields according to the [`pattern`](#format-for-extract-pipe-pattern) from the given
[`field_name`](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model). Existing log fields remain unchanged after the `| extract ...` pipe.
`| extract ...` pipe can be useful for extracting additional fields needed for further data processing with other pipes such as [`stats` pipe](#stats-pipe) or [`sort` pipe](#sort-pipe).
`| extract ...` can be useful for extracting additional fields needed for further data processing with other pipes such as [`stats` pipe](#stats-pipe) or [`sort` pipe](#sort-pipe).
For example, the following query selects logs with the `error` [word](#word) for the last day,
extracts ip address from [`_msg` field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) into `ip` field and then calculates top 10 ip addresses
with the biggest number of logs:
```logsql
_time:1d error | extract from _msg "ip=<ip> " | stats by (ip) count() logs | sort by (logs) desc limit 10
_time:1d error | extract "ip=<ip> " from _msg | stats by (ip) count() logs | sort by (logs) desc limit 10
```
It is expected that `_msg` field contains `ip=...` substring, which ends with space. For example, `error ip=1.2.3.4 from user_id=42`.
It is expected that `_msg` field contains `ip=...` substring ending with space. For example, `error ip=1.2.3.4 from user_id=42`.
If there is no such substring in the current `_msg` field, then the `ip` output field will be empty.
If the `| extract ...` pipe is applied to [`_msg` field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field), then the `from _msg` part can be omitted.
For example, the following query is equivalent to the previous one:
@ -1133,6 +1134,12 @@ For example, the following query is equivalent to the previous one:
_time:1d error | extract "ip=<ip> " | stats by (ip) count() logs | sort by (logs) desc limit 10
```
If the `pattern` contains double quotes, then it can be quoted into single quotes. For example, the following query extracts `ip` from the corresponding JSON field:
```logsql
_time:5m | extract '"ip":"<ip>"'
```
See also:
- [Format for extract pipe pattern](#format-for-extract-pipe-pattern)
@ -1140,23 +1147,27 @@ See also:
- [`unpack_json` pipe](#unpack_json-pipe)
- [`unpack_logfmt` pipe](#unpack_logfmt-pipe)
#### Conditional extract
If some log entries must be skipped from [`extract` pipe](#extract-pipe), then add `if (<filters>)` filter to the end of `| extract ...` pipe.
The `<filters>` can contain arbitrary [filters](#filters). For example, the following query extracts `ip` field only
if the input [log entry](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model) doesn't contain `ip` field or this field is empty:
```logsql
_time:5m | extract "ip=<ip> " if (ip:"")
```
#### Format for extract pipe pattern
The `pattern` part from [`| extract from src_field "pattern"` pipe](#extract-pipes) may contain arbitrary text, which matches as is to the `src_field` value.
Additionally to arbitrary text, the `pattern` may contain placeholders in the form `<...>`, which match any strings, including empty strings.
Placeholders may be named, such as `<ip>`, or anonymous, such as `<_>`. Named placeholders extract the matching text into
the corresponding [log field](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model).
Anonymous placeholders are useful for skipping arbitrary text during pattern matching.
The `pattern` part from [`extract ` pipe](#extract-pipe) has the following format:
```
text1<field1>text2<field2>...textN<fieldN>textN+1
```
Where `text1`, ... `textN+1` is arbitrary non-empty text, which matches as is to the input text.
The `field1`, ... `fieldN` are placeholders, which match a substring of any length (including zero length) in the input text until the next `textX`.
Placeholders can be anonymous and named. Anonymous placeholders are written as `<_>`. They are used for convenience when some input text
must be skipped until the next `textX`. Named palceholders are written as `<some_name>`, where `some_name` is the name of the log field to store
the corresponding matching substring to.
The matching starts from the first occurence of the `text1` in the input text. If the `pattern` starts with `<field1>` and doesn't contain `text1`,
then the matching starts from the beginning of the input text. Matching is performed sequentially according to the `pattern`. If some `textX` isn't found
in the remaining input text, then the remaining named placeholders receive empty string values and the matching finishes prematurely.
Matching finishes successfully when `textN+1` is found in the input text.
If the `pattern` ends with `<fieldN>` and doesn't contain `textN+1`, then the `<fieldN>` matches the remaining input text.
For example, if [`_msg` field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) contains the following text:
@ -1164,34 +1175,44 @@ For example, if [`_msg` field](https://docs.victoriametrics.com/victorialogs/key
1.2.3.4 GET /foo/bar?baz 404 "Mozilla foo bar baz" some tail here
```
Then the following `| extract ...` [pipe](#pipes) can be used for extracting `ip`, `path` and `user_agent` fields from it:
Then the following `pattern` can be used for extracting `ip`, `path` and `user_agent` fields from it:
```
| extract '<ip> <_> <path> <_> "<user_agent>"'
<ip> <_> <path> <_> "<user_agent>"
```
Note that the user-agent part of the log message is in double quotes. This means that it may contain special chars, including escaped double quote, e.g. `\"`.
This may break proper matching of the string in double quotes.
VictoriaLogs automatically detects the whole string in quotes and automatically decodes it if the first char in the placeholder is double quote or backtick.
So it is better to use the following `pattern` for proper matching of quoted strings:
VictoriaLogs automatically detects quoted strings and automatically unquotes them if the first matching char in the placeholder is double quote or backtick.
So it is better to use the following `pattern` for proper matching of quoted `user_agent` string:
```
| extract "<ip> <_> <path> <_> <user_agent>"
<ip> <_> <path> <_> <user_agent>
```
Note that the `user_agent` now matches double quotes, but VictoriaLogs automatically unquotes the matching string before storing it in the `user_agent` field.
This is useful for extracting JSON strings. For example, the following `pattern` properly extracts the `message` JSON string into `msg` field:
This is useful for extracting JSON strings. For example, the following `pattern` properly extracts the `message` JSON string into `msg` field, even if it contains special chars:
```
| extract '"message":<msg>'
"message":<msg>
```
If some special chars such as `<` must be matched by the `pattern`, then they can be [html-escaped](https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references).
For example, the following `pattern` properly matches `a < 123.456` text:
For example, the following `pattern` properly matches `a < b` text by extracting `a` into `left` field and `b` into `right` field:
```
| extract "<left> &lt; <right>"
<left> &lt; <right>
```
#### Conditional extract
If some log entries must be skipped from [`extract` pipe](#extract-pipe), then add `if (<filters>)` filter after the `extract` word.
The `<filters>` can contain arbitrary [filters](#filters). For example, the following query extracts `ip` field
from [`_msg` field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) only
if the input [log entry](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model) doesn't contain `ip` field or this field is empty:
```logsql
_time:5m | extract if (ip:"") "ip=<ip> "
```
### field_names pipe
@ -1249,6 +1270,49 @@ See also:
- [`stats` pipe](#stats-pipe)
- [`sort` pipe](#sort-pipe)
### format pipe
`| format "pattern" as result_field` [pipe](#format-pipe) combines [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
according to the `pattern` and stores it to the `result_field`.
For example, the following query stores `request from <ip>:<port>` text into [`_msg` field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field),
by substituting `<ip>` and `<port>` with the corresponding [log field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) names:
```logsql
_time:5m | format "request from <ip>:<port>" as _msg
```
If the result of the `format` pattern is stored into [`_msg` field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field),
then `as _msg` part can be omitted. The following query is equivalent to the previous one:
```logsql
_time:5m | format "request from <ip>:<port>"
```
If some field values must be put into double quotes before formatting, then add `:q` after the corresponding field name.
For example, the following command generates properly encoded JSON object from `_msg` and `stacktrace` [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model)
and stores it into `my_json` output field:
```logsql
_time:5m | format '{"_msg":<_msg:q>,"stacktrace":<stacktrace:q>}' as my_json
```
See also:
- [Conditional format](#conditional-format)
- [`extract` pipe](#extract-pipe)
#### Conditional format
If the [`format` pipe](#format-pipe) musn't be applied to every [log entry](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model),
then add `if (<filters>)` just after the `format` word.
The `<filters>` can contain arbitrary [filters](#filters). For example, the following query stores the formatted result to `message` field
only if `ip` and `host` [fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) aren't empty:
```logsql
_time:5m | format if (ip:* and host:*) "request from <ip>:<host>" as message
```
### limit pipe
If only a subset of selected logs must be processed, then `| limit N` [pipe](#pipes) can be used, where `N` can contain any [supported integer numeric value](#numeric-values).

View file

@ -1001,11 +1001,11 @@ func TestParseQuerySuccess(t *testing.T) {
// extract pipe
f(`* | extract "foo<bar>baz"`, `* | extract "foo<bar>baz"`)
f(`* | extract from _msg "foo<bar>baz"`, `* | extract "foo<bar>baz"`)
f(`* | extract from '' 'foo<bar>baz'`, `* | extract "foo<bar>baz"`)
f("* | extract from x `foo<bar>baz`", `* | extract from x "foo<bar>baz"`)
f("* | extract from x foo<bar>baz", `* | extract from x "foo<bar>baz"`)
f("* | extract from x foo<bar>baz if (a:b)", `* | extract from x "foo<bar>baz" if (a:b)`)
f(`* | extract "foo<bar>baz" from _msg`, `* | extract "foo<bar>baz"`)
f(`* | extract 'foo<bar>baz' from ''`, `* | extract "foo<bar>baz"`)
f("* | extract `foo<bar>baz` from x", `* | extract "foo<bar>baz" from x`)
f("* | extract foo<bar>baz from x", `* | extract "foo<bar>baz" from x`)
f("* | extract if (a:b) foo<bar>baz from x", `* | extract if (a:b) "foo<bar>baz" from x`)
// unpack_json pipe
f(`* | unpack_json`, `* | unpack_json`)
@ -1625,10 +1625,10 @@ func TestQueryGetNeededColumns(t *testing.T) {
f(`* | format "foo<f1>" as s1`, `*`, `s1`)
f(`* | format "foo<s1>" as s1`, `*`, ``)
f(`* | format "foo" if (x1:y) as s1`, `*`, `s1`)
f(`* | format "foo<f1>" if (x1:y) as s1`, `*`, `s1`)
f(`* | format "foo<f1>" if (s1:y) as s1`, `*`, ``)
f(`* | format "foo<s1>" if (x1:y) as s1`, `*`, ``)
f(`* | format if (x1:y) "foo" as s1`, `*`, `s1`)
f(`* | format if (x1:y) "foo<f1>" as s1`, `*`, `s1`)
f(`* | format if (s1:y) "foo<f1>" as s1`, `*`, ``)
f(`* | format if (x1:y) "foo<s1>" as s1`, `*`, ``)
f(`* | format "foo" as s1 | fields f1`, `f1`, ``)
f(`* | format "foo" as s1 | fields s1`, ``, ``)
@ -1638,8 +1638,8 @@ func TestQueryGetNeededColumns(t *testing.T) {
f(`* | format "foo<s1>" as s1 | fields f1`, `f1`, ``)
f(`* | format "foo<s1>" as s1 | fields s1`, `s1`, ``)
f(`* | format "foo" if (f1:x) as s1 | fields s1`, `f1`, ``)
f(`* | format "foo" if (f1:x) as s1 | fields s2`, `s2`, ``)
f(`* | format if (f1:x) "foo" as s1 | fields s1`, `f1`, ``)
f(`* | format if (f1:x) "foo" as s1 | fields s2`, `s2`, ``)
f(`* | format "foo" as s1 | rm f1`, `*`, `f1,s1`)
f(`* | format "foo" as s1 | rm s1`, `*`, `s1`)
@ -1649,52 +1649,52 @@ func TestQueryGetNeededColumns(t *testing.T) {
f(`* | format "foo<s1>" as s1 | rm f1`, `*`, `f1`)
f(`* | format "foo<s1>" as s1 | rm s1`, `*`, `s1`)
f(`* | format "foo" if (f1:x) as s1 | rm s1`, `*`, `s1`)
f(`* | format "foo" if (f1:x) as s1 | rm f1`, `*`, `s1`)
f(`* | format "foo" if (f1:x) as s1 | rm f2`, `*`, `f2,s1`)
f(`* | format if (f1:x) "foo" as s1 | rm s1`, `*`, `s1`)
f(`* | format if (f1:x) "foo" as s1 | rm f1`, `*`, `s1`)
f(`* | format if (f1:x) "foo" as s1 | rm f2`, `*`, `f2,s1`)
f(`* | extract from s1 "<f1>x<f2>"`, `*`, `f1,f2`)
f(`* | extract from s1 "<f1>x<f2>" if (f3:foo)`, `*`, `f1,f2`)
f(`* | extract from s1 "<f1>x<f2>" if (f1:foo)`, `*`, `f2`)
f(`* | extract from s1 "<f1>x<f2>" | fields foo`, `foo`, ``)
f(`* | extract from s1 "<f1>x<f2>" if (x:bar) | fields foo`, `foo`, ``)
f(`* | extract from s1 "<f1>x<f2>" | fields foo,s1`, `foo,s1`, ``)
f(`* | extract from s1 "<f1>x<f2>" if (x:bar) | fields foo,s1`, `foo,s1`, ``)
f(`* | extract from s1 "<f1>x<f2>" | fields foo,f1`, `foo,s1`, ``)
f(`* | extract from s1 "<f1>x<f2>" if (x:bar) | fields foo,f1`, `foo,s1,x`, ``)
f(`* | extract from s1 "<f1>x<f2>" | fields foo,f1,f2`, `foo,s1`, ``)
f(`* | extract from s1 "<f1>x<f2>" if (x:bar) | fields foo,f1,f2`, `foo,s1,x`, ``)
f(`* | extract from s1 "<f1>x<f2>" | rm foo`, `*`, `f1,f2,foo`)
f(`* | extract from s1 "<f1>x<f2>" if (x:bar) | rm foo`, `*`, `f1,f2,foo`)
f(`* | extract from s1 "<f1>x<f2>" | rm foo,s1`, `*`, `f1,f2,foo`)
f(`* | extract from s1 "<f1>x<f2>" if (x:bar) | rm foo,s1`, `*`, `f1,f2,foo`)
f(`* | extract from s1 "<f1>x<f2>" | rm foo,f1`, `*`, `f1,f2,foo`)
f(`* | extract from s1 "<f1>x<f2>" if (x:bar) | rm foo,f1`, `*`, `f1,f2,foo`)
f(`* | extract from s1 "<f1>x<f2>" | rm foo,f1,f2`, `*`, `f1,f2,foo,s1`)
f(`* | extract from s1 "<f1>x<f2>" if (x:bar) | rm foo,f1,f2`, `*`, `f1,f2,foo,s1`)
f(`* | extract "<f1>x<f2>" from s1`, `*`, `f1,f2`)
f(`* | extract if (f3:foo) "<f1>x<f2>" from s1`, `*`, `f1,f2`)
f(`* | extract if (f1:foo) "<f1>x<f2>" from s1`, `*`, `f2`)
f(`* | extract "<f1>x<f2>" from s1 | fields foo`, `foo`, ``)
f(`* | extract if (x:bar) "<f1>x<f2>" from s1 | fields foo`, `foo`, ``)
f(`* | extract "<f1>x<f2>" from s1| fields foo,s1`, `foo,s1`, ``)
f(`* | extract if (x:bar) "<f1>x<f2>" from s1 | fields foo,s1`, `foo,s1`, ``)
f(`* | extract "<f1>x<f2>" from s1 | fields foo,f1`, `foo,s1`, ``)
f(`* | extract if (x:bar) "<f1>x<f2>" from s1 | fields foo,f1`, `foo,s1,x`, ``)
f(`* | extract "<f1>x<f2>" from s1 | fields foo,f1,f2`, `foo,s1`, ``)
f(`* | extract if (x:bar) "<f1>x<f2>" from s1 | fields foo,f1,f2`, `foo,s1,x`, ``)
f(`* | extract "<f1>x<f2>" from s1 | rm foo`, `*`, `f1,f2,foo`)
f(`* | extract if (x:bar) "<f1>x<f2>" from s1 | rm foo`, `*`, `f1,f2,foo`)
f(`* | extract "<f1>x<f2>" from s1 | rm foo,s1`, `*`, `f1,f2,foo`)
f(`* | extract if (x:bar) "<f1>x<f2>" from s1 | rm foo,s1`, `*`, `f1,f2,foo`)
f(`* | extract "<f1>x<f2>" from s1 | rm foo,f1`, `*`, `f1,f2,foo`)
f(`* | extract if (x:bar) "<f1>x<f2>" from s1 | rm foo,f1`, `*`, `f1,f2,foo`)
f(`* | extract "<f1>x<f2>" from s1 | rm foo,f1,f2`, `*`, `f1,f2,foo,s1`)
f(`* | extract if (x:bar) "<f1>x<f2>" from s1 | rm foo,f1,f2`, `*`, `f1,f2,foo,s1`)
f(`* | extract from s1 "x<s1>y"`, `*`, ``)
f(`* | extract from s1 "x<s1>y" if (x:foo)`, `*`, ``)
f(`* | extract from s1 "x<s1>y" if (s1:foo)`, `*`, ``)
f(`* | extract from s1 "x<f1>y" if (s1:foo)`, `*`, `f1`)
f(`* | extract "x<s1>y" from s1 `, `*`, ``)
f(`* | extract if (x:foo) "x<s1>y" from s1`, `*`, ``)
f(`* | extract if (s1:foo) "x<s1>y" from s1`, `*`, ``)
f(`* | extract if (s1:foo) "x<f1>y" from s1`, `*`, `f1`)
f(`* | extract from s1 "x<s1>y" | fields s2`, `s2`, ``)
f(`* | extract from s1 "x<s1>y" | fields s1`, `s1`, ``)
f(`* | extract from s1 "x<s1>y" if (x:foo) | fields s1`, `s1,x`, ``)
f(`* | extract from s1 "x<s1>y" if (x:foo) | fields s2`, `s2`, ``)
f(`* | extract from s1 "x<s1>y" if (s1:foo) | fields s1`, `s1`, ``)
f(`* | extract from s1 "x<s1>y" if (s1:foo) | fields s2`, `s2`, ``)
f(`* | extract from s1 "x<f1>y" if (s1:foo) | fields s1`, `s1`, ``)
f(`* | extract from s1 "x<f1>y" if (s1:foo) | fields s2`, `s2`, ``)
f(`* | extract "x<s1>y" from s1 | fields s2`, `s2`, ``)
f(`* | extract "x<s1>y" from s1 | fields s1`, `s1`, ``)
f(`* | extract if (x:foo) "x<s1>y" from s1 | fields s1`, `s1,x`, ``)
f(`* | extract if (x:foo) "x<s1>y" from s1 | fields s2`, `s2`, ``)
f(`* | extract if (s1:foo) "x<s1>y" from s1 | fields s1`, `s1`, ``)
f(`* | extract if (s1:foo) "x<s1>y" from s1 | fields s2`, `s2`, ``)
f(`* | extract if (s1:foo) "x<f1>y" from s1 | fields s1`, `s1`, ``)
f(`* | extract if (s1:foo) "x<f1>y" from s1 | fields s2`, `s2`, ``)
f(`* | extract from s1 "x<s1>y" | rm s2`, `*`, `s2`)
f(`* | extract from s1 "x<s1>y" | rm s1`, `*`, `s1`)
f(`* | extract from s1 "x<s1>y" if (x:foo) | rm s1`, `*`, `s1`)
f(`* | extract from s1 "x<s1>y" if (x:foo) | rm s2`, `*`, `s2`)
f(`* | extract from s1 "x<s1>y" if (s1:foo) | rm s1`, `*`, `s1`)
f(`* | extract from s1 "x<s1>y" if (s1:foo) | rm s2`, `*`, `s2`)
f(`* | extract from s1 "x<f1>y" if (s1:foo) | rm s1`, `*`, `f1`)
f(`* | extract from s1 "x<f1>y" if (s1:foo) | rm s2`, `*`, `f1,s2`)
f(`* | extract "x<s1>y" from s1 | rm s2`, `*`, `s2`)
f(`* | extract "x<s1>y" from s1 | rm s1`, `*`, `s1`)
f(`* | extract if (x:foo) "x<s1>y" from s1 | rm s1`, `*`, `s1`)
f(`* | extract if (x:foo) "x<s1>y" from s1 | rm s2`, `*`, `s2`)
f(`* | extract if (s1:foo) "x<s1>y" from s1 | rm s1`, `*`, `s1`)
f(`* | extract if (s1:foo) "x<s1>y" from s1 | rm s2`, `*`, `s2`)
f(`* | extract if (s1:foo) "x<f1>y" from s1 | rm s1`, `*`, `f1`)
f(`* | extract if (s1:foo) "x<f1>y" from s1 | rm s2`, `*`, `f1,s2`)
f(`* | unpack_json`, `*`, ``)
f(`* | unpack_json from s1`, `*`, ``)

View file

@ -4,7 +4,7 @@ import (
"fmt"
)
// pipeExtract processes '| extract from <field> <pattern>' pipe.
// pipeExtract processes '| extract ...' pipe.
//
// See https://docs.victoriametrics.com/victorialogs/logsql/#extract-pipe
type pipeExtract struct {
@ -19,13 +19,13 @@ type pipeExtract struct {
func (pe *pipeExtract) String() string {
s := "extract"
if !isMsgFieldName(pe.fromField) {
s += " from " + quoteTokenIfNeeded(pe.fromField)
}
s += " " + quoteTokenIfNeeded(pe.patternStr)
if pe.iff != nil {
s += " " + pe.iff.String()
}
s += " " + quoteTokenIfNeeded(pe.patternStr)
if !isMsgFieldName(pe.fromField) {
s += " from " + quoteTokenIfNeeded(pe.fromField)
}
return s
}
@ -90,14 +90,14 @@ func parsePipeExtract(lex *lexer) (*pipeExtract, error) {
}
lex.nextToken()
fromField := "_msg"
if lex.isKeyword("from") {
lex.nextToken()
f, err := parseFieldName(lex)
// parse optional if (...)
var iff *ifFilter
if lex.isKeyword("if") {
f, err := parseIfFilter(lex)
if err != nil {
return nil, fmt.Errorf("cannot parse 'from' field name: %w", err)
return nil, err
}
fromField = f
iff = f
}
// parse pattern
@ -110,19 +110,22 @@ func parsePipeExtract(lex *lexer) (*pipeExtract, error) {
return nil, fmt.Errorf("cannot parse 'pattern' %q: %w", patternStr, err)
}
// parse optional 'from ...' part
fromField := "_msg"
if lex.isKeyword("from") {
lex.nextToken()
f, err := parseFieldName(lex)
if err != nil {
return nil, fmt.Errorf("cannot parse 'from' field name: %w", err)
}
fromField = f
}
pe := &pipeExtract{
fromField: fromField,
ptn: ptn,
patternStr: patternStr,
}
// parse optional if (...)
if lex.isKeyword("if") {
iff, err := parseIfFilter(lex)
if err != nil {
return nil, err
}
pe.iff = iff
iff: iff,
}
return pe, nil

View file

@ -11,8 +11,8 @@ func TestParsePipeExtractSuccess(t *testing.T) {
}
f(`extract "foo<bar>"`)
f(`extract from x "foo<bar>"`)
f(`extract from x "foo<bar>" if (y:in(a:foo bar | uniq by (qwe) limit 10))`)
f(`extract "foo<bar>" from x`)
f(`extract if (x:y) "foo<bar>" from baz`)
}
func TestParsePipeExtractFailure(t *testing.T) {
@ -23,11 +23,10 @@ func TestParsePipeExtractFailure(t *testing.T) {
f(`extract`)
f(`extract from`)
f(`extract from x`)
f(`extract from x "y<foo>"`)
f(`extract if (x:y)`)
f(`extract if (x:y) "a<b>"`)
f(`extract "a<b>" if`)
f(`extract "a<b>" if (foo`)
f(`extract "a<b>" if "foo"`)
f(`extract "a<b>" if (x:y)`)
f(`extract "a"`)
f(`extract "<a><b>"`)
f(`extract "<*>foo<_>bar"`)
@ -64,7 +63,7 @@ func TestPipeExtract(t *testing.T) {
})
// single row, extract from non-existing field
f(`extract from x "foo=<bar>"`, [][]Field{
f(`extract "foo=<bar>" from x`, [][]Field{
{
{"_msg", `foo=bar`},
},
@ -76,7 +75,7 @@ func TestPipeExtract(t *testing.T) {
})
// single row, pattern mismatch
f(`extract from x "foo=<bar>"`, [][]Field{
f(`extract "foo=<bar>" from x`, [][]Field{
{
{"x", `foobar`},
},
@ -88,7 +87,7 @@ func TestPipeExtract(t *testing.T) {
})
// single row, partial partern match
f(`extract from x "foo=<bar> baz=<xx>"`, [][]Field{
f(`extract "foo=<bar> baz=<xx>" from x`, [][]Field{
{
{"x", `a foo="a\"b\\c" cde baz=aa`},
},
@ -101,7 +100,7 @@ func TestPipeExtract(t *testing.T) {
})
// single row, overwirte existing column
f(`extract from x "foo=<bar> baz=<xx>"`, [][]Field{
f(`extract "foo=<bar> baz=<xx>" from x`, [][]Field{
{
{"x", `a foo=cc baz=aa b`},
{"bar", "abc"},
@ -115,7 +114,7 @@ func TestPipeExtract(t *testing.T) {
})
// single row, if match
f(`extract from x "foo=<bar> baz=<xx>" if (x:baz)`, [][]Field{
f(`extract if (x:baz) "foo=<bar> baz=<xx>" from x`, [][]Field{
{
{"x", `a foo=cc baz=aa b`},
{"bar", "abc"},
@ -129,7 +128,7 @@ func TestPipeExtract(t *testing.T) {
})
// single row, if mismatch
f(`extract from x "foo=<bar> baz=<xx>" if (bar:"")`, [][]Field{
f(`extract if (bar:"") "foo=<bar> baz=<xx>" from x`, [][]Field{
{
{"x", `a foo=cc baz=aa b`},
{"bar", "abc"},
@ -142,7 +141,7 @@ func TestPipeExtract(t *testing.T) {
})
// multiple rows with distinct set of labels
f(`extract "ip=<ip> " if (!ip:keep)`, [][]Field{
f(`extract if (!ip:keep) "ip=<ip> "`, [][]Field{
{
{"foo", "bar"},
{"_msg", "request from ip=1.2.3.4 xxx"},
@ -201,44 +200,44 @@ func TestPipeExtractUpdateNeededFields(t *testing.T) {
}
// all the needed fields
f("extract from x '<foo>'", "*", "", "*", "foo")
f("extract from x '<foo>' if (foo:bar)", "*", "", "*", "")
f("extract '<foo>' from x", "*", "", "*", "foo")
f("extract if (foo:bar) '<foo>' from x", "*", "", "*", "")
// unneeded fields do not intersect with pattern and output fields
f("extract from x '<foo>'", "*", "f1,f2", "*", "f1,f2,foo")
f("extract from x '<foo>' if (f1:x)", "*", "f1,f2", "*", "f2,foo")
f("extract from x '<foo>' if (foo:bar f1:x)", "*", "f1,f2", "*", "f2")
f("extract '<foo>' from x", "*", "f1,f2", "*", "f1,f2,foo")
f("extract if (f1:x) '<foo>' from x", "*", "f1,f2", "*", "f2,foo")
f("extract if (foo:bar f1:x) '<foo>' from x", "*", "f1,f2", "*", "f2")
// unneeded fields intersect with pattern
f("extract from x '<foo>'", "*", "f2,x", "*", "f2,foo")
f("extract from x '<foo>' if (f1:abc)", "*", "f2,x", "*", "f2,foo")
f("extract from x '<foo>' if (f2:abc)", "*", "f2,x", "*", "foo")
f("extract '<foo>' from x", "*", "f2,x", "*", "f2,foo")
f("extract if (f1:abc) '<foo>' from x", "*", "f2,x", "*", "f2,foo")
f("extract if (f2:abc) '<foo>' from x", "*", "f2,x", "*", "foo")
// unneeded fields intersect with output fields
f("extract from x '<foo>x<bar>'", "*", "f2,foo", "*", "bar,f2,foo")
f("extract from x '<foo>x<bar>' if (f1:abc)", "*", "f2,foo", "*", "bar,f2,foo")
f("extract from x '<foo>x<bar>' if (f2:abc foo:w)", "*", "f2,foo", "*", "bar")
f("extract '<foo>x<bar>' from x", "*", "f2,foo", "*", "bar,f2,foo")
f("extract if (f1:abc) '<foo>x<bar>' from x", "*", "f2,foo", "*", "bar,f2,foo")
f("extract if (f2:abc foo:w) '<foo>x<bar>' from x", "*", "f2,foo", "*", "bar")
// unneeded fields intersect with all the output fields
f("extract from x '<foo>x<bar>'", "*", "f2,foo,bar", "*", "bar,f2,foo,x")
f("extract from x '<foo>x<bar> if (a:b f2:q x:y foo:w)'", "*", "f2,foo,bar", "*", "bar,f2,foo,x")
f("extract '<foo>x<bar>' from x", "*", "f2,foo,bar", "*", "bar,f2,foo,x")
f("extract if (a:b f2:q x:y foo:w) '<foo>x<bar>' from x", "*", "f2,foo,bar", "*", "bar,f2,foo,x")
// needed fields do not intersect with pattern and output fields
f("extract from x '<foo>x<bar>'", "f1,f2", "", "f1,f2", "")
f("extract from x '<foo>x<bar>' if (a:b)", "f1,f2", "", "f1,f2", "")
f("extract from x '<foo>x<bar>' if (f1:b)", "f1,f2", "", "f1,f2", "")
f("extract '<foo>x<bar>' from x", "f1,f2", "", "f1,f2", "")
f("extract if (a:b) '<foo>x<bar>' from x", "f1,f2", "", "f1,f2", "")
f("extract if (f1:b) '<foo>x<bar>' from x", "f1,f2", "", "f1,f2", "")
// needed fields intersect with pattern field
f("extract from x '<foo>x<bar>'", "f2,x", "", "f2,x", "")
f("extract from x '<foo>x<bar>' if (a:b)", "f2,x", "", "f2,x", "")
f("extract '<foo>x<bar>' from x", "f2,x", "", "f2,x", "")
f("extract if (a:b) '<foo>x<bar>' from x", "f2,x", "", "f2,x", "")
// needed fields intersect with output fields
f("extract from x '<foo>x<bar>'", "f2,foo", "", "f2,x", "")
f("extract from x '<foo>x<bar>' if (a:b)", "f2,foo", "", "a,f2,x", "")
f("extract '<foo>x<bar>' from x", "f2,foo", "", "f2,x", "")
f("extract if (a:b) '<foo>x<bar>' from x", "f2,foo", "", "a,f2,x", "")
// needed fields intersect with pattern and output fields
f("extract from x '<foo>x<bar>'", "f2,foo,x,y", "", "f2,x,y", "")
f("extract from x '<foo>x<bar>' if (a:b foo:q)", "f2,foo,x,y", "", "a,f2,foo,x,y", "")
f("extract '<foo>x<bar>' from x", "f2,foo,x,y", "", "f2,x,y", "")
f("extract if (a:b foo:q) '<foo>x<bar>' from x", "f2,foo,x,y", "", "a,f2,foo,x,y", "")
}
func expectParsePipeFailure(t *testing.T, pipeStr string) {

View file

@ -21,11 +21,14 @@ type pipeFormat struct {
}
func (pf *pipeFormat) String() string {
s := "format " + quoteTokenIfNeeded(pf.formatStr)
s := "format"
if pf.iff != nil {
s += " " + pf.iff.String()
}
s += " " + quoteTokenIfNeeded(pf.formatStr)
if !isMsgFieldName(pf.resultField) {
s += " as " + quoteTokenIfNeeded(pf.resultField)
}
return s
}
@ -150,16 +153,6 @@ func parsePipeFormat(lex *lexer) (*pipeFormat, error) {
}
lex.nextToken()
// parse format
formatStr, err := getCompoundToken(lex)
if err != nil {
return nil, fmt.Errorf("cannot read 'format': %w", err)
}
steps, err := parsePatternSteps(formatStr)
if err != nil {
return nil, fmt.Errorf("cannot parse 'pattern' %q: %w", formatStr, err)
}
// parse optional if (...)
var iff *ifFilter
if lex.isKeyword("if") {
@ -170,15 +163,26 @@ func parsePipeFormat(lex *lexer) (*pipeFormat, error) {
iff = f
}
// parse resultField
if !lex.isKeyword("as") {
return nil, fmt.Errorf("missing 'as' keyword after 'format %q'", formatStr)
// parse format
formatStr, err := getCompoundToken(lex)
if err != nil {
return nil, fmt.Errorf("cannot read 'format': %w", err)
}
steps, err := parsePatternSteps(formatStr)
if err != nil {
return nil, fmt.Errorf("cannot parse 'pattern' %q: %w", formatStr, err)
}
// parse optional 'as ...` part
resultField := "_msg"
if lex.isKeyword("as") {
lex.nextToken()
resultField, err := parseFieldName(lex)
field, err := parseFieldName(lex)
if err != nil {
return nil, fmt.Errorf("cannot parse result field after 'format %q as': %w", formatStr, err)
}
resultField = field
}
pf := &pipeFormat{
formatStr: formatStr,

View file

@ -10,13 +10,14 @@ func TestParsePipeFormatSuccess(t *testing.T) {
expectParsePipeSuccess(t, pipeStr)
}
f(`format "foo<bar>"`)
f(`format "" as x`)
f(`format "<>" as x`)
f(`format foo as x`)
f(`format "<foo>" as _msg`)
f(`format "<foo>bar<baz>" as _msg`)
f(`format "bar<baz><xyz>bac" as _msg`)
f(`format "bar<baz><xyz>bac" if (x:y) as _msg`)
f(`format "<foo>"`)
f(`format "<foo>bar<baz>"`)
f(`format "bar<baz><xyz>bac"`)
f(`format if (x:y) "bar<baz><xyz>bac"`)
}
func TestParsePipeFormatFailure(t *testing.T) {
@ -26,9 +27,8 @@ func TestParsePipeFormatFailure(t *testing.T) {
}
f(`format`)
f(`format foo`)
f(`format if`)
f(`format foo bar`)
f(`format foo as`)
f(`format foo if`)
f(`format foo as x if (x:y)`)
}
@ -108,7 +108,7 @@ func TestPipeFormat(t *testing.T) {
})
// conditional format over multiple rows
f(`format "a: <a>, b: <b>, x: <a>" if (!c:*) as c`, [][]Field{
f(`format if (!c:*) "a: <a>, b: <b>, x: <a>" as c`, [][]Field{
{
{"b", "bar"},
{"a", "foo"},
@ -147,41 +147,41 @@ func TestPipeFormatUpdateNeededFields(t *testing.T) {
// all the needed fields
f(`format "foo" as x`, "*", "", "*", "x")
f(`format "<f1>foo" as x`, "*", "", "*", "x")
f(`format "<f1>foo" if (f2:z) as x`, "*", "", "*", "x")
f(`format if (f2:z) "<f1>foo" as x`, "*", "", "*", "x")
// unneeded fields do not intersect with pattern and output field
f(`format "foo" as x`, "*", "f1,f2", "*", "f1,f2,x")
f(`format "<f3>foo" as x`, "*", "f1,f2", "*", "f1,f2,x")
f(`format "<f3>foo" if (f4:z) as x`, "*", "f1,f2", "*", "f1,f2,x")
f(`format "<f3>foo" if (f1:z) as x`, "*", "f1,f2", "*", "f2,x")
f(`format if (f4:z) "<f3>foo" as x`, "*", "f1,f2", "*", "f1,f2,x")
f(`format if (f1:z) "<f3>foo" as x`, "*", "f1,f2", "*", "f2,x")
// unneeded fields intersect with pattern
f(`format "<f1>foo" as x`, "*", "f1,f2", "*", "f2,x")
f(`format "<f1>foo" if (f4:z) as x`, "*", "f1,f2", "*", "f2,x")
f(`format "<f1>foo" if (f2:z) as x`, "*", "f1,f2", "*", "x")
f(`format if (f4:z) "<f1>foo" as x`, "*", "f1,f2", "*", "f2,x")
f(`format if (f2:z) "<f1>foo" as x`, "*", "f1,f2", "*", "x")
// unneeded fields intersect with output field
f(`format "<f1>foo" as x`, "*", "x,y", "*", "x,y")
f(`format "<f1>foo" if (f2:z) as x`, "*", "x,y", "*", "x,y")
f(`format "<f1>foo" if (y:z) as x`, "*", "x,y", "*", "x,y")
f(`format if (f2:z) "<f1>foo" as x`, "*", "x,y", "*", "x,y")
f(`format if (y:z) "<f1>foo" as x`, "*", "x,y", "*", "x,y")
// needed fields do not intersect with pattern and output field
f(`format "<f1>foo" as f2`, "x,y", "", "x,y", "")
f(`format "<f1>foo" if (f3:z) as f2`, "x,y", "", "x,y", "")
f(`format "<f1>foo" if (x:z) as f2`, "x,y", "", "x,y", "")
f(`format if (f3:z) "<f1>foo" as f2`, "x,y", "", "x,y", "")
f(`format if (x:z) "<f1>foo" as f2`, "x,y", "", "x,y", "")
// needed fields intersect with pattern field
f(`format "<f1>foo" as f2`, "f1,y", "", "f1,y", "")
f(`format "<f1>foo" if (f3:z) as f2`, "f1,y", "", "f1,y", "")
f(`format "<f1>foo" if (x:z) as f2`, "f1,y", "", "f1,y", "")
f(`format if (f3:z) "<f1>foo" as f2`, "f1,y", "", "f1,y", "")
f(`format if (x:z) "<f1>foo" as f2`, "f1,y", "", "f1,y", "")
// needed fields intersect with output field
f(`format "<f1>foo" as f2`, "f2,y", "", "f1,y", "")
f(`format "<f1>foo" if (f3:z) as f2`, "f2,y", "", "f1,f3,y", "")
f(`format "<f1>foo" if (x:z or y:w) as f2`, "f2,y", "", "f1,x,y", "")
f(`format if (f3:z) "<f1>foo" as f2`, "f2,y", "", "f1,f3,y", "")
f(`format if (x:z or y:w) "<f1>foo" as f2`, "f2,y", "", "f1,x,y", "")
// needed fields intersect with pattern and output fields
f(`format "<f1>foo" as f2`, "f1,f2,y", "", "f1,y", "")
f(`format "<f1>foo" if (f3:z) as f2`, "f1,f2,y", "", "f1,f3,y", "")
f(`format "<f1>foo" if (x:z or y:w) as f2`, "f1,f2,y", "", "f1,x,y", "")
f(`format if (f3:z) "<f1>foo" as f2`, "f1,f2,y", "", "f1,f3,y", "")
f(`format if (x:z or y:w) "<f1>foo" as f2`, "f1,f2,y", "", "f1,x,y", "")
}