mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2025-03-11 15:34:56 +00:00

History

Aliaksandr Valialkin 00c3dbd15d app/victoria-logs: add ability to debug data ingestion by passing `debug` query arg to data ingestion API		2023-06-20 20:02:46 -07:00
..
keyConcepts.md	app/victoria-logs: initial code release	2023-06-19 22:55:12 -07:00
LogsQL.md	docs/VictoriaLogs: mention that VictoriaLogs supports multitenancy and out of order logs ingestion	2023-06-20 18:09:39 -07:00
README.md	app/victoria-logs: add ability to debug data ingestion by passing `debug` query arg to data ingestion API	2023-06-20 20:02:46 -07:00
Roadmap.md	app/victoria-logs: initial code release	2023-06-19 22:55:12 -07:00

README.md

VictoriaLogs

VictoriaLogs is log management and log analytics system from VictoriaMetrics.

It provides the following key features:

VictoriaLogs can accept logs from popular log collectors, which support ElasticSearch data ingestion format. See these docs. Grafana Loki data ingestion format will be supported in the near future - see the Roadmap.
VictoriaLogs is much easier to setup and operate comparing to ElasticSearch and Grafana Loki. See these docs.
VictoriaLogs provides easy yet powerful query language with full-text search capabilities across all the log fields - see LogsQL docs.
VictoriaLogs can be seamlessly combined with good old Unix tools for log analysis such as grep, less, sort, jq, etc. See these docs for details.
VictoriaLogs capacity and performance scales lineraly with the available resources (CPU, RAM, disk IO, disk space). It runs smoothly on both Raspberry PI and a server with hundreds of CPU cores and terabytes of RAM.
VictoriaLogs can handle much bigger data volumes than ElasticSearch and Grafana Loki when running on comparable hardware.
VictoriaLogs supports multitenancy - see these docs.
VictoriaLogs supports out of order logs' ingestion aka backfilling.

VictoriaLogs is at Preview stage now. It is ready for evaluation in production and verifying claims given above. It isn't recommended migrating from existing logging solutions to VictoriaLogs Preview in general case yet. See the Roadmap for details.

If you have questions about VictoriaLogs, then feel free asking them at VictoriaMetrics community Slack chat.

Operation

How to run VictoriaLogs

There are the following options exist now:

To run Docker image:

docker run --rm -it -p 9428:9428 -v ./victoria-logs-data:/victoria-logs-data \
  docker.io/victoriametrics/victoria-logs:heads-public-single-node-0-ga638f5e2b

To build VictoriaLogs from source code:

Checkout VictoriaLogs source code. It is located in the VictoriaMetrics repository:
```
git clone https://github.com/VictoriaMetrics/VictoriaMetrics
cd VictoriaMetrics
```
Then build VictoriaLogs. The build command requires Go 1.20.
```
make victoria-logs
```
Then run the built binary:
```
bin/victoria-logs
```

VictoriaLogs is ready to receive logs and query logs at the TCP port 9428 now! It has no any external dependencies, so it may run in various environments without additional setup and configuration. VictoriaLogs automatically adapts to the available CPU and RAM resources. It also automatically setups and creates the needed indexes during data ingestion.

It is possible to change the TCP port via -httpListenAddr command-line flag. For example, the following command starts VictoriaLogs, which accepts incoming requests at port 9200 (aka ElasticSearch HTTP API port):

/path/to/victoria-logs -httpListenAddr=:9200

VictoriaLogs stores the ingested data to the victoria-logs-data directory by default. The directory can be changed via -storageDataPath command-line flag. See these docs for details.

It is recommended setting up monitoring of VictoriaLogs according to these docs.

Data ingestion

VictoriaLogs supports the following data ingestion approaches:

Via Filebeat. See these docs.
Via Logstash. See these docs.

The ingested logs can be queried according to these docs.

See also data ingestion troubleshooting docs.

Filebeat setup

Specify output.elasicsearch section in the filebeat.yml for sending the collected logs to VictoriaLogs:

output.elasticsearch:
  hosts: ["http://localhost:9428/insert/elasticsearch/"]
  parameters:
    _msg_field: "message"
    _time_field: "@timestamp"
    _stream_fields: "host.hostname,log.file.path"

Substitute the localhost:9428 address inside hosts section with the real TCP address of VictoriaLogs.

See these docs for details on the parameters section.

It is recommended to verify whether the initial setup generates the needed log fields and uses the correct stream fields. This can be done by specifying debug parameter:

output.elasticsearch:
  hosts: ["http://localhost:9428/insert/elasticsearch/"]
  parameters:
    _msg_field: "message"
    _time_field: "@timestamp"
    _stream_fields: "host.hostname,log.file.path"
    debug: "1"

If some log fields must be skipped during data ingestion, then they can be put into ignore_fields parameter. For example, the following config instructs VictoriaLogs to ignore log.offset and event.original fields in the ingested logs:

output.elasticsearch:
  hosts: ["http://localhost:9428/insert/elasticsearch/"]
  parameters:
    _msg_field: "message"
    _time_field: "@timestamp"
    _stream_fields: "host.name,log.file.path"
    ignore_fields: "log.offset,event.original"

When Filebeat ingests logs into VictoriaLogs at a high rate, then it may be needed to tune worker and bulk_max_size options. For example, the following config is optimized for higher than usual ingestion rate:

output.elasticsearch:
  hosts: ["http://localhost:9428/insert/elasticsearch/"]
  parameters:
    _msg_field: "message"
    _time_field: "@timestamp"
    _stream_fields: "host.name,log.file.path"
  worker: 8
  bulk_max_size: 1000

If the Filebeat sends logs to VictoriaLogs in another datacenter, then it may be useful enabling data compression via compression_level option. This usually allows saving network bandwidth and costs by up to 5 times:

output.elasticsearch:
  hosts: ["http://localhost:9428/insert/elasticsearch/"]
  parameters:
    _msg_field: "message"
    _time_field: "@timestamp"
    _stream_fields: "host.name,log.file.path"
  compression_level: 1

By default the ingested logs are stored in the (AccountID=0, ProjectID=0) tenant. If you need storing logs in other tenant, then specify the needed tenant via headers at output.elasticsearch section. For example, the following filebeat.yml config instructs Filebeat to store the data to (AccountID=12, ProjectID=34) tenant:

output.elasticsearch:
  hosts: ["http://localhost:9428/insert/elasticsearch/"]
  headers:
    AccountID: 12
    ProjectID: 34
  parameters:
    _msg_field: "message"
    _time_field: "@timestamp"
    _stream_fields: "host.name,log.file.path"

The ingested log entries can be queried according to these docs.

See also data ingestion troubleshooting docs.

Logstash setup

Specify output.elasticsearch section in the logstash.conf file for sending the collected logs to VictoriaLogs:

output {
  elasticsearch {
    hosts => ["http://localhost:9428/insert/elasticsearch/"]
    parameters => {
        "_msg_field" => "message"
        "_time_field" => "@timestamp"
        "_stream_fields" => "host.name,process.name"
    }
  }
}

Substitute localhost:9428 address inside hosts with the real TCP address of VictoriaLogs.

See these docs for details on the parameters section.

It is recommended to verify whether the initial setup generates the needed log fields and uses the correct stream fields. This can be done by specifying debug parameter:

output {
  elasticsearch {
    hosts => ["http://localhost:9428/insert/elasticsearch/"]
    parameters => {
        "_msg_field" => "message"
        "_time_field" => "@timestamp"
        "_stream_fields" => "host.name,process.name"
        "debug" => "1"
    }
  }
}

output {
  elasticsearch {
    hosts => ["http://localhost:9428/insert/elasticsearch/"]
    parameters => {
        "_msg_field" => "message"
        "_time_field" => "@timestamp"
        "_stream_fields" => "host.hostname,process.name"
        "ignore_fields" => "log.offset,event.original"
    }
  }
}

If the Logstash sends logs to VictoriaLogs in another datacenter, then it may be useful enabling data compression via http_compression: true option. This usually allows saving network bandwidth and costs by up to 5 times:

output {
  elasticsearch {
    hosts => ["http://localhost:9428/insert/elasticsearch/"]
    parameters => {
        "_msg_field" => "message"
        "_time_field" => "@timestamp"
        "_stream_fields" => "host.hostname,process.name"
    }
    http_compression => true
  }
}

By default the ingested logs are stored in the (AccountID=0, ProjectID=0) tenant. If you need storing logs in other tenant, then specify the needed tenant via custom_headers at output.elasticsearch section. For example, the following logstash.conf config instructs Logstash to store the data to (AccountID=12, ProjectID=34) tenant:

output {
  elasticsearch {
    hosts => ["http://localhost:9428/insert/elasticsearch/"]
    custom_headers => {
        "AccountID" => "1"
        "ProjectID" => "2"
    }
    parameters => {
        "_msg_field" => "message"
        "_time_field" => "@timestamp"
        "_stream_fields" => "host.hostname,process.name"
    }
  }
}

The ingested log entries can be queried according to these docs.

See also data ingestion troubleshooting docs.

Data ingestion parameters

VictoriaLogs accepts the following parameters at data ingestion HTTP APIs:

_msg_field - it must contain the name of the log field with the log message generated by the log shipper. This is usually the message field for Filebeat and Logstash. If the _msg_field parameter isn't set, then VictoriaLogs reads the log message from the _msg field.
_time_field - it must contain the name of the log field with the log timestamp generated by the log shipper. This is usually the @timestamp field for Filebeat and Logstash. If the _time_field parameter isn't set, then VictoriaLogs reads the timestamp from the _time field. If this field doesn't exist, then the current timestamp is used.
_stream_fields - it should contain comma-separated list of log field names, which uniquely identify every log stream collected the log shipper. If the _stream_fields parameter isn't set, then all the ingested logs are written to default log stream - {}.
ignore_fields - this parameter may contain the list of log field names, which must be ignored during data ingestion.
debug - if this parameter is set to 1, then the ingested logs aren't stored in VictoriaLogs. Instead, the ingested data is logged by VictoriaLogs, so it can be investigated later.

Data ingestion troubleshooting

VictoriaLogs provides the following command-line flags, which can help debugging data ingestion issues:

-logNewStreams - if this flag is passed to VictoriaLogs, then it logs all the newly registered log streams. This may help debugging high cardinality issues.
-logIngestedRows - if this flag is passed to VictoriaLogs, then it logs all the ingested log entries. See also debug parameter.

VictoriaLogs exposes various metrics, which may help debugging data ingestion issues:

vl_rows_ingested_total - the number of ingested log entries since the last VictoriaLogs restart. If this number icreases over time, then logs are successfully ingested into VictoriaLogs. The ingested logs can be inspected in the following ways:
- By passing debug=1 parameter to every request to data ingestion endpoints. The ingested rows aren't stored in VictoriaLogs in this case. Instead, they are logged, so they can be investigated later. The vl_rows_dropped_total metric is incremented for each logged row.
- By passing -logIngestedRows command-line flag to VictoriaLogs. In this case it logs all the ingested data, so it can be investigated later.
vl_streams_created_total - the number of created log streams since the last VictoriaLogs restart. If this metric grows rapidly during extended periods of time, then this may lead to high cardinality issues. The newly created log streams can be inspected in logs by passing -logNewStreams command-line flag to VictoriaLogs.

Querying

VictoriaLogs can be queried at the /select/logsql/query endpoint. The LogsQL query must be passed via query argument. For example, the following query returns all the log entries with the error word:

curl http://localhost:9428/select/logsql/query -d 'query=error'

The query argument can be passed either in the request url itself (aka HTTP GET request) or via request body with the x-www-form-urlencoded encoding (aka HTTP POST request). The HTTP POST is useful for sending long queries when they do not fit the maximum url length of the used clients and proxies.

See LogsQL docs for details on what can be passed to the query arg. The query arg must be properly encoded with percent encoding when passing it to curl or similar tools.

The /select/logsql/query endpoint returns a stream of JSON lines, where each line contains JSON-encoded log entry in the form {field1="value1",...,fieldN="valueN"}. Example response:

{"_msg":"error: disconnect from 19.54.37.22: Auth fail [preauth]","_stream":"{}","_time":"2023-01-01T13:32:13Z"}
{"_msg":"some other error","_stream":"{}","_time":"2023-01-01T13:32:15Z"}

The matching lines are sent to the response stream as soon as they are found in VictoriaLogs storage. This means that the returned response may contain billions of lines for queries matching too many log entries. The response can be interrupted at any time by closing the connection to VictoriaLogs server. This allows post-processing the returned lines at the client side with the usual Unix commands such as grep, jq, less, head, etc. See these docs for more details.

The returned lines aren't sorted by default, since sorting disables the ability to send matching log entries to response stream as soon as they are found. Query results can be sorted either at VictoriaLogs side according to these docs or at client side with the usual sort command according to these docs.

By default the (AccountID=0, ProjectID=0) tenant is queried. If you need querying other tenant, then specify the needed tenant via http request headers. For example, the following query searches for log messages at (AccountID=12, ProjectID=34) tenant:

curl http://localhost:9428/select/logsql/query -H 'AccountID: 12' -H 'ProjectID: 34' -d 'query=error'

The number of requests to /select/logsql/query can be monitored with vl_http_requests_total{path="/select/logsql/query"} metric.

Querying via command-line

VictoriaLogs provides good integration with curl and other command-line tools because of the following features:

VictoriaLogs sends the matching log entries to the response stream as soon as they are found. This allows forwarding the response stream to arbitrary Unix pipes.
VictoriaLogs automatically adjusts query execution speed to the speed of the client, which reads the response stream. For example, if the response stream is piped to less command, then the query is suspended until the less command reads the next block from the response stream.
VictoriaLogs automatically cancels query execution when the client closes the response stream. For example, if the query response is piped to head command, then VictoriaLogs stops executing the query when the head command closes the response stream.

These features allow executing queries at command-line interface, which potentially select billions of rows, without the risk of high resource usage (CPU, RAM, disk IO) at VictoriaLogs server.

For example, the following query can return very big number of matching log entries (e.g. billions) if VictoriaLogs contains many log messages with the error word:

curl http://localhost:9428/select/logsql/query -d 'query=error'

If the command returns "never-ending" response, then just press ctrl+C at any time in order to cancel the query. VictoriaLogs notices that the response stream is closed, so it cancels the query and instantly stops consuming CPU, RAM and disk IO for this query.

Then just use head command for investigating the returned log messages and narrowing down the query:

curl http://localhost:9428/select/logsql/query -d 'query=error' | head -10

The head -10 command reads only the first 10 log messages from the response and then closes the response stream. This automatically cancels the query at VictoriaLogs side, so it stops consuming CPU, RAM and disk IO resources.

Sometimes it may be more convenient to use less command instead of head during the investigation of the returned response:

curl http://localhost:9428/select/logsql/query -d 'query=error' | less

The less command reads the response stream on demand, when the user scrolls down the output. VictoriaLogs suspends query execution when less stops reading the response stream. It doesn't consume CPU and disk IO resources during this time. It resumes query execution when the less continues reading the response stream.

Suppose that the initial investigation of the returned query results helped determining that the needed log messages contain cannot open file phrase. Then the query can be narrowed down to error AND "cannot open file" (see these docs about AND operator). Then run the updated command in order to continue the investigation:

curl http://localhost:9428/select/logsql/query -d 'query=error AND "cannot open file"' | head

Note that the query arg must be properly encoded with percent encoding when passing it to curl or similar tools.

The pipe the query to "head" or "less" -> investigate the results -> refine the query iteration can be repeated multiple times until the needed log messages are found.

The returned VictoriaLogs query response can be post-processed with any combination of Unix commands, which are usually used for log analysis - grep, jq, awk, sort, uniq, wc, etc.

For example, the following command uses wc -l Unix command for counting the number of log messages with the error word received from streams with app="nginx" field during the last 5 minutes:

curl http://localhost:9428/select/logsql/query -d 'query=_stream:{app="nginx"} AND _time:[now-5m,now] AND error' | wc -l

See these docs about _stream filter, these docs about _time filter and these docs about AND operator.

The following example shows how to sort query results by the _time field:

curl http://localhost:9428/select/logsql/query -d 'query=error' | jq -r '._time + " " + ._msg' | sort | less

This command uses jq for extracting _time and _msg fields from the returned results, and piping them to sort command.

Note that the sort command needs to read all the response stream before returning the sorted results. So the command above can take non-trivial amounts of time if the query returns too many results. The solution is to narrow down the query before sorting the results. See these tips on how to narrow down query results.

The following example calculates stats on the number of log messages received during the last 5 minutes grouped by log.level field:

curl http://localhost:9428/select/logsql/query -d 'query=_time:[now-5m,now] log.level:*' | jq -r '."log.level"' | sort | uniq -c

The query selects all the log messages with non-empty log.level field via "any value" filter, then pipes them to jq command, which extracts the log.level field value from the returned JSON stream, then the extracted log.level values are sorted with sort command and, finally, they are passed to uniq -c command for calculating the needed stats.

Monitoring

VictoriaLogs exposes internal metrics in Prometheus exposition format at http://localhost:9428/metrics page. It is recommended to set up monitoring of these metrics via VictoriaMetrics (see these docs), vmagent (see these docs) or via Prometheus.

VictoriaLogs emits own logs to stdout. It is recommended investigating these logs during troubleshooting.

Retention

By default VictoriaLogs stores log entries with timestamps in the time range [now-7d, now], while dropping logs outside the given time range. E.g. it uses the retention of 7 days. The retention can be configured with -retentionPeriod command-line flag. This flag accepts values starting from 1d (one day) up to 100y (100 years). See these docs for the supported duration formats.

For example, the following command starts VictoriaLogs with the retention of 8 weeks:

/path/to/victoria-logs -retentionPeriod=8w

VictoriaLogs stores the ingested logs in per-day partition directories. It automatically drops partition directories outside the configured retention.

VictoriaLogs automatically drops logs at data ingestion stage if they have timestamps outside the configured retention. A sample of dropped logs is logged with WARN message in order to simplify troubleshooting. The vl_rows_dropped_total metric is incremented each time an ingested log entry is dropped because of timestamp outside the retention. It is recommended setting up the following alerting rule at vmalert in order to be notified when logs with wrong timestamps are ingested into VictoriaLogs:

rate(vl_rows_dropped_total[5m]) > 0

By default VictoriaLogs doesn't accept log entries with timestamps bigger than now+2d, e.g. 2 days in the future. If you need accepting logs with bigger timestamps, then specify the desired "future retention" via -futureRetention command-line flag. This flag accepts values starting from 1d. See these docs for the supported duration formats.

For example, the following command starts VictoriaLogs, which accepts logs with timestamps up to a year in the future:

/path/to/victoria-logs -futureRetention=1y

Storage

VictoriaLogs stores all its data in a single directory - victoria-logs-data. The path to the directory can be changed via -storageDataPath command-line flag. For example, the following command starts VictoriaLogs, which stores the data at /var/lib/victoria-logs:

/path/to/victoria-logs -storageDataPath=/var/lib/victoria-logs

VictoriaLogs automatically creates the -storageDataPath directory on the first run if it is missing.