diff --git a/docs/VictoriaLogs/LogsQL.md b/docs/VictoriaLogs/LogsQL.md index da70a7697..74cec09c1 100644 --- a/docs/VictoriaLogs/LogsQL.md +++ b/docs/VictoriaLogs/LogsQL.md @@ -37,6 +37,8 @@ For example, the following query finds all the logs with `error` word: error ``` +See [how to send queries to VictoriaLogs](https://docs.victoriametrics.com/victorialogs/querying/). + If the queried [word](#word) clashes with LogsQL keywords, then just wrap it into quotes. For example, the following query finds all the log messages with `and` [word](#word): @@ -80,11 +82,32 @@ Typical LogsQL query constists of multiple [filters](#filters) joined with `AND` So LogsQL allows omitting `AND` words. For example, the following query is equivalent to the query above: ```logsql -error _time:5m +_time:5m error ``` -The query returns all the [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) by default. -See [how to query specific fields](#querying-specific-fields). +The query returns logs in arbitrary order because sorting of big amounts of logs may require non-trivial amounts of CPU and RAM. +The number of logs with `error` word over the last 5 minutes isn't usually too big (e.g. less than a few millions), so it is OK to sort them with [`sort` pipe](#sort-pipe). +The following query sorts the selected logs by [`_time`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#time-field) field: + +```logsql +_time:5m error | sort by (_time) +``` + +It is unlikely you are going to investigate more than a few hundreds of logs returned by the query above. So you can limit the number of returned logs +with [`limit` pipe](#limit-pipe). The following query returns the last 10 logs with the `error` word over the last 5 minutes: + +```logsql +_time:5m error | sort by (_time) desc | limit 10 +``` + +By default VictoriaLogs returns all the [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). +If you need only the given set of fields, then add [`fields` pipe](#fields-pipe) to the end of the query. For example, the following query returns only +[`_time`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#time-field), [`_stream`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#stream-fields) +and [`_msg`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) fields: + +```logsql +error _time:5m | fields _time, _stream, _msg +``` Suppose the query above selects too many rows because some buggy app pushes invalid error logs to VictoriaLogs. Suppose the app adds `buggy_app` [word](#word) to every log line. Then the following query removes all the logs from the buggy app, allowing us paying attention to the real errors: @@ -93,8 +116,10 @@ Then the following query removes all the logs from the buggy app, allowing us pa _time:5m error NOT buggy_app ``` -This query uses `NOT` [operator](#logical-filter) for removing log lines from the buggy app. The `NOT` operator is used frequently, so it can be substituted with `!` char. -So the following query is equivalent to the previous one: +This query uses `NOT` [operator](#logical-filter) for removing log lines from the buggy app. The `NOT` operator is used frequently, so it can be substituted with `!` char +(the `!` char is used instead of `-` char as a shorthand for `NOT` operator becasue it nicely combines with [`=`](https://docs.victoriametrics.com/victorialogs/logsql/#exact-filter) +and [`~`](https://docs.victoriametrics.com/victorialogs/logsql/#regexp-filter) filters like `!=` and `!~`). +The following query is equivalent to the previous one: ```logsql _time:5m error !buggy_app @@ -113,17 +138,15 @@ This query can be rewritten to more clear query with the `OR` [operator](#logica _time:5m error !(buggy_app OR foobar) ``` -Note that the parentheses are required here, since otherwise the query won't return the expected results. -The query `error !buggy_app OR foobar` is interpreted as `(error AND NOT buggy_app) OR foobar`. This query may return error logs -from the buggy app if they contain `foobar` [word](#word). This query also continues returning all the error logs from the second buggy app. -This is because of different priorities for `NOT`, `AND` and `OR` operators. -Read [these docs](#logical-filter) for more details. There is no need in remembering all these priority rules - -just wrap the needed query parts into explicit parentheses if you aren't sure in priority rules. +The parentheses are **required** here, since otherwise the query won't return the expected results. +The query `error !buggy_app OR foobar` is interpreted as `(error AND NOT buggy_app) OR foobar` according to [priorities for AND, OR and NOT operator](#logical-filters). +This query returns logs with `foobar` [word](#word), even if do not contain `error` word or contain `buggy_app` word. +So it is recommended wrapping the needed query parts into explicit parentheses if you are unsure in priority rules. As an additional bonus, explicit parentheses make queries easier to read and maintain. Queries above assume that the `error` [word](#word) is stored in the [log message](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field). -This word can be stored in other [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) such as `log.level`. -How to select error logs in this case? Just add the `log.level:` prefix in front of the `error` word: +If this word is stored in other [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) such as `log.level`, then add `log.level:` prefix +in front of the `error` word: ```logsq _time:5m log.level:error !(buggy_app OR foobar) @@ -158,8 +181,16 @@ If the `app` field is associated with the log stream, then the query above can b _time:5m log.level:error _stream:{app!~"buggy_app|foobar"} ``` -This query completely skips scanning for logs from `buggy_app` and `foobar` apps, thus significantly reducing disk read IO and CPU time -needed for performing the query. +This query skips scanning for [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) from `buggy_app` and `foobar` apps. +It inpsects only `log.level` and [`_stream`](https://docs.victoriametrics.com/victorialogs/keyconcepts/#stream-fields) labels. +This significantly reduces disk read IO and CPU time needed for performing the query. + +LogsQL also provides [functions for statistics calculation](#stats-pipe) over the selected logs. For example, the following query returns the number of logs +with the `error` word for the last 5 minutes: + +```logsql +_time:5m error | stats count() logs_with_error +``` Finally, it is recommended reading [performance tips](#performance-tips). @@ -177,13 +208,16 @@ These words are taken into account by full-text search filters such as #### Query syntax -LogsQL query must contain [filters](#filters) for selecting the matching logs. At least a single filter is required. +LogsQL query must contain at least a single [filter](#filters) for selecting the matching logs. For example, the following query selects all the logs for the last 5 minutes by using [`_time` filter](#time-filter): ```logsql _time:5m ``` +Tip: try [`*` filter](https://docs.victoriametrics.com/victorialogs/logsql/#any-value-filter), which selects all the logs stored in VictoriaLogs. +Do not worry - this doesn't crash VictoriaLogs, even if it contains trillions of logs. In the worst case it will return + Additionally to filters, LogQL query may contain arbitrary mix of optional actions for processing the selected logs. These actions are delimited by `|` and are known as [`pipes`](#pipes). For example, the following query uses [`stats` pipe](#stats-pipe) for returning the number of [log messages](https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field) with the `error` [word](#word) for the last 5 minutes: @@ -2492,3 +2526,5 @@ Internally duration values are converted into nanoseconds. This rule doesn't apply to [time filter](#time-filter) and [stream filter](#stream-filter), which can be put at any place of the query. - Move more specific filters, which match lower number of log entries, to the beginning of the query. This rule doesn't apply to [time filter](#time-filter) and [stream filter](#stream-filter), which can be put at any place of the query. +- If the selected logs are passed to [pipes](#pipes) for further transformations and statistics' calculations, then it is recommended + reducing the number of selected logs by using more specific [filters](#filters), which return lower number of logs to process by [pipes](#pipes). diff --git a/docs/VictoriaLogs/querying/README.md b/docs/VictoriaLogs/querying/README.md index bb6dc7adf..c115c2360 100644 --- a/docs/VictoriaLogs/querying/README.md +++ b/docs/VictoriaLogs/querying/README.md @@ -43,8 +43,8 @@ For example, the following query returns all the log entries with the `error` wo curl http://localhost:9428/select/logsql/query -d 'query=error' ``` -The response by default contains all the [log fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model). -See [how to query specific fields](https://docs.victoriametrics.com/victorialogs/logsql/#querying-specific-fields). +The response by default contains all the [fields](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) for the selected logs. +Use [`fields` pipe](https://docs.victoriametrics.com/victorialogs/logsql/#fields-pipe) for selecting only the needed fields. The `query` argument can be passed either in the request url itself (aka HTTP GET request) or via request body with the `x-www-form-urlencoded` encoding (aka HTTP POST request). The HTTP POST is useful for sending long queries @@ -56,7 +56,8 @@ or similar tools. By default the `/select/logsql/query` returns all the log entries matching the given `query`. The response size can be limited in the following ways: -- By closing the response stream at any time. In this case VictoriaLogs stops query execution and frees all the resources occupied by the request. +- By closing the response stream at any time. VictoriaLogs stops query execution and frees all the resources occupied by the request as soon as it detects closed client connection. + So it is safe running [`*` query](https://docs.victoriametrics.com/victorialogs/logsql/#any-value-filter), which selects all the logs, even if trillions of logs are stored in VictoriaLogs. - By specifying the maximum number of log entries, which can be returned in the response via `limit` query arg. For example, the following request returns up to 10 matching log entries: ```sh @@ -68,7 +69,7 @@ By default the `/select/logsql/query` returns all the log entries matching the g ``` - By adding [`_time` filter](https://docs.victoriametrics.com/victorialogs/logsql/#time-filter). The time range for the query can be specified via optional `start` and `end` query ars formatted according to [these docs](https://docs.victoriametrics.com/single-server-victoriametrics/#timestamp-formats). -- By adding other [filters](https://docs.victoriametrics.com/victorialogs/logsql/#filters) to the query. +- By adding more specific [filters](https://docs.victoriametrics.com/victorialogs/logsql/#filters) to the query, which select lower number of logs. The `/select/logsql/query` endpoint returns [a stream of JSON lines](https://jsonlines.org/), where each line contains JSON-encoded log entry in the form `{field1="value1",...,fieldN="valueN"}`. @@ -79,18 +80,18 @@ Example response: {"_msg":"some other error","_stream":"{}","_time":"2023-01-01T13:32:15Z"} ``` -The matching lines are sent to the response stream as soon as they are found in VictoriaLogs storage. +Logs lines are sent to the response stream as soon as they are found in VictoriaLogs storage. This means that the returned response may contain billions of lines for queries matching too many log entries. The response can be interrupted at any time by closing the connection to VictoriaLogs server. -This allows post-processing the returned lines at the client side with the usual Unix commands such as `grep`, `jq`, `less`, `head`, etc. -See [these docs](#command-line) for more details. +This allows post-processing the returned lines at the client side with the usual Unix commands such as `grep`, `jq`, `less`, `head`, etc., +without worrying about resource usage at VictoriaLogs side. See [these docs](#command-line) for more details. -The returned lines aren't sorted, since sorting disables the ability to send matching log entries to response stream as soon as they are found. -Query results can be sorted either at VictoriaLogs side according [to these docs](https://docs.victoriametrics.com/victorialogs/logsql/#sort-pipe) +The returned lines aren't sorted by default, since sorting disables the ability to send matching log entries to response stream as soon as they are found. +Query results can be sorted either at VictoriaLogs side via [`sort` pipe](https://docs.victoriametrics.com/victorialogs/logsql/#sort-pipe) or at client side with the usual `sort` command according to [these docs](#command-line). By default the `(AccountID=0, ProjectID=0)` [tenant](https://docs.victoriametrics.com/victorialogs/#multitenancy) is queried. -If you need querying other tenant, then specify the needed tenant via http request headers. For example, the following query searches +If you need querying other tenant, then specify it via `AccounID` and `ProjectID` http request headers. For example, the following query searches for log messages at `(AccountID=12, ProjectID=34)` tenant: ```sh @@ -100,9 +101,15 @@ curl http://localhost:9428/select/logsql/query -H 'AccountID: 12' -H 'ProjectID: The number of requests to `/select/logsql/query` can be [monitored](https://docs.victoriametrics.com/victorialogs/#monitoring) with `vl_http_requests_total{path="/select/logsql/query"}` metric. +See also: + - [Querying hits stats](#querying-hits-stats) - [Querying streams](#querying-streams) -- [HTTP API](#http-api) +- [Querying stream field names](#querying-stream-field-names) +- [Querying stream field values](#querying-stream-field-values) +- [Querying field names](#querying-field-names) +- [Querying field values](#querying-field-values) + ### Querying hits stats @@ -454,32 +461,25 @@ There are three modes of displaying query results: - `Table` - displays query results as a table. - `JSON` - displays raw JSON response from [HTTP API](#http-api). -This is the first version that has minimal functionality. It comes with the following limitations: - -- The number of query results is always limited to 1000 lines. Iteratively add - more specific [filters](https://docs.victoriametrics.com/victorialogs/logsql/#filters) to the query - in order to get full response with less than 1000 lines. -- Queries are always executed against [tenant](https://docs.victoriametrics.com/victorialogs/#multitenancy) `0`. - -These limitations will be removed in future versions. - -To get around the current limitations, you can use an alternative - the [command line interface](#command-line). +This is the first version that has minimal functionality and may contain bugs. +It is recommended trying [command line interface](#command-line), which has no known bugs :) ## Command-line VictoriaLogs integrates well with `curl` and other command-line tools during querying because of the following features: -- VictoriaLogs sends the matching log entries to the response stream as soon as they are found. - This allows forwarding the response stream to arbitrary [Unix pipes](https://en.wikipedia.org/wiki/Pipeline_(Unix)). -- VictoriaLogs automatically adjusts query execution speed to the speed of the client, which reads the response stream. +- Matching log entries are sent to the response stream as soon as they are found. + This allows forwarding the response stream to arbitrary [Unix pipes](https://en.wikipedia.org/wiki/Pipeline_(Unix)) + without waiting until the response finishes. +- Query execution speed is automatically adjusted to the speed of the client, which reads the response stream. For example, if the response stream is piped to `less` command, then the query is suspended until the `less` command reads the next block from the response stream. -- VictoriaLogs automatically cancels query execution when the client closes the response stream. +- Query is automatically canceled when the client closes the response stream. For example, if the query response is piped to `head` command, then VictoriaLogs stops executing the query when the `head` command closes the response stream. These features allow executing queries at command-line interface, which potentially select billions of rows, -without the risk of high resource usage (CPU, RAM, disk IO) at VictoriaLogs server. +without the risk of high resource usage (CPU, RAM, disk IO) at VictoriaLogs. For example, the following query can return very big number of matching log entries (e.g. billions) if VictoriaLogs contains many log messages with the `error` [word](https://docs.victoriametrics.com/victorialogs/logsql/#word): @@ -488,8 +488,8 @@ many log messages with the `error` [word](https://docs.victoriametrics.com/victo curl http://localhost:9428/select/logsql/query -d 'query=error' ``` -If the command returns "never-ending" response, then just press `ctrl+C` at any time in order to cancel the query. -VictoriaLogs notices that the response stream is closed, so it cancels the query and instantly stops consuming CPU, RAM and disk IO for this query. +If the command above returns "never-ending" response, then just press `ctrl+C` at any time in order to cancel the query. +VictoriaLogs notices that the response stream is closed, so it cancels the query and stops consuming CPU, RAM and disk IO for this query. Then just use `head` command for investigating the returned log messages and narrowing down the query: @@ -500,6 +500,12 @@ curl http://localhost:9428/select/logsql/query -d 'query=error' | head -10 The `head -10` command reads only the first 10 log messages from the response and then closes the response stream. This automatically cancels the query at VictoriaLogs side, so it stops consuming CPU, RAM and disk IO resources. +Alternatively, you can limit the number of returned logs at VictoriaLogs side via [`limit` pipe](https://docs.victoriametrics.com/victorialogs/logsql/#limit-pipe): + +```sh +curl http://localhost:9428/select/logsql/query -d 'query=error | limit 10' +``` + Sometimes it may be more convenient to use `less` command instead of `head` during the investigation of the returned response: ```sh @@ -509,7 +515,7 @@ curl http://localhost:9428/select/logsql/query -d 'query=error' | less The `less` command reads the response stream on demand, when the user scrolls down the output. VictoriaLogs suspends query execution when `less` stops reading the response stream. It doesn't consume CPU and disk IO resources during this time. It resumes query execution -when the `less` continues reading the response stream. +after the `less` continues reading the response stream. Suppose that the initial investigation of the returned query results helped determining that the needed log messages contain `cannot open file` [phrase](https://docs.victoriametrics.com/victorialogs/logsql/#phrase-filter). @@ -543,7 +549,13 @@ See [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#stream-fi [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#time-filter) about `_time` filter and [these docs](https://docs.victoriametrics.com/victorialogs/logsql/#logical-filter) about `AND` operator. -The following example shows how to sort query results by the [`_time` field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#time-field): +Alternatively, you can count the number of matching logs at VictoriaLogs side with [`stats` pipe](https://docs.victoriametrics.com/victorialogs/logsql/#stats-pipe): + +```sh +curl http://localhost:9428/select/logsql/query -d 'query=_stream:{app="nginx"} AND _time:5m AND error | stats count() logs_with_error' +``` + +The following example shows how to sort query results by the [`_time` field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#time-field) with traditional Unix tools: ```sh curl http://localhost:9428/select/logsql/query -d 'query=error' | jq -r '._time + " " + ._msg' | sort | less @@ -558,8 +570,14 @@ can take non-trivial amounts of time if the `query` returns too many results. Th before sorting the results. See [these tips](https://docs.victoriametrics.com/victorialogs/logsql/#performance-tips) on how to narrow down query results. +Alternatively, sorting of matching logs can be performed at VictoriaLogs side via [`sort` pipe](https://docs.victoriametrics.com/victorialogs/logsql/#sort-pipe): + +```sh +curl http://localhost:9428/select/logsql/query -d 'query=error | sort by (_time)' | less +``` + The following example calculates stats on the number of log messages received during the last 5 minutes -grouped by `log.level` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model): +grouped by `log.level` [field](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model) with traditional Unix tools: ```sh curl http://localhost:9428/select/logsql/query -d 'query=_time:5m log.level:*' | jq -r '."log.level"' | sort | uniq -c @@ -569,6 +587,12 @@ The query selects all the log messages with non-empty `log.level` field via ["an then pipes them to `jq` command, which extracts the `log.level` field value from the returned JSON stream, then the extracted `log.level` values are sorted with `sort` command and, finally, they are passed to `uniq -c` command for calculating the needed stats. +Alternatively, all the stats calculations above can be performed at VictoriaLogs side via [`stats by(...)`](https://docs.victoriametrics.com/victorialogs/logsql/#stats-by-fields): + +```sh +curl http://localhost:9428/select/logsql/query -d 'query=_time:5m log.level:* | stats by (log.level) count() matching_logs' +``` + See also: - [Key concepts](https://docs.victoriametrics.com/victorialogs/keyconcepts/).