added more info and examples about data ingestion and collectors to VictoriaLogs docs (#4490)

This commit is contained in:
Alexander Marshalov 2023-06-21 16:58:43 +02:00 committed by Aliaksandr Valialkin
parent a04a206cd2
commit c12b5250c7
No known key found for this signature in database
GPG key ID: A72BEC6CD3D0DED1
7 changed files with 273 additions and 5 deletions

View file

@ -9,6 +9,7 @@ before you start working with VictoriaLogs.
There are the following options exist:
- [To run Docker image](#docker-image)
- [To run in Kubernetes with helm-charts](#helm-charts)
- [To build VictoriaLogs from source code](#building-from-source-code)
### Docker image
@ -21,6 +22,11 @@ docker run --rm -it -p 9428:9428 -v ./victoria-logs-data:/victoria-logs-data \
docker.io/victoriametrics/victoria-logs:heads-public-single-node-0-ga638f5e2b
```
### Helm charts
You can run VictoriaLogs in Kubernetes environment
with [helm-charts](https://github.com/VictoriaMetrics/helm-charts/blob/master/charts/victoria-logs-single/README.md).
### Building from source code
Follow the following steps in order to build VictoriaLogs from source code:
@ -50,6 +56,8 @@ It has no any external dependencies, so it may run in various environments witho
VictoriaLogs automatically adapts to the available CPU and RAM resources. It also automatically setups and creates
the needed indexes during [data ingestion](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/).
## How to configure VictoriaLogs
It is possible to change the TCP port via `-httpListenAddr` command-line flag. For example, the following command
starts VictoriaLogs, which accepts incoming requests at port `9200` (aka ElasticSearch HTTP API port):
@ -66,3 +74,26 @@ E.g. it uses the retention of 7 days. Read [these docs](https://docs.victoriamet
for the [ingested](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/) logs.
It is recommended setting up monitoring of VictoriaLogs according to [these docs](https://docs.victoriametrics.com/VictoriaLogs/#monitoring).
## How to send logs to VictoriaLogs
You can setup data ingestion for VictoriaLogs via the following ways:
- Configure one of the [supported log collectors](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/#http-apis) to send logs to VictoriaLogs.
- Configure your own log collector to send logs to VictoriaLogs via [supported log ingestion protocols](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/#log-collectors-and-data-ingestion-formats).
Here are a demos for running popular supported log collectors in docker with VictoriaLogs:
- [**Filebeat (docker)**](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker/victorialogs/filebeat-docker)
- [**Fluentbit (docker)**](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker/victorialogs/fluentbit-docker)
- [**Logstash (docker)**](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker/victorialogs/logstash)
- [**Vector (docker)**](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker/victorialogs/vector-docker)
And you can use [helm chart](https://github.com/VictoriaMetrics/helm-charts/blob/master/charts/victoria-logs-single/README.md)
as demo for running fluentbit in kubernetes with VictoriaLogs:
- [Fluentbit (k8s)](https://github.com/VictoriaMetrics/helm-charts/blob/master/charts/victoria-logs-single/values.yaml)
## How to query logs in VictoriaLogs
See details in [these docs](https://docs.victoriametrics.com/VictoriaLogs/querying).

View file

@ -17,8 +17,6 @@ The following functionality is planned in the future versions of VictoriaLogs:
- Support for [data ingestion](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/) from popular log collectors and formats:
- Promtail (aka Grafana Loki)
- Vector.dev
- Fluentbit
- Fluentd
- Syslog
- Add missing functionality to [LogsQL](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html):

View file

@ -1,5 +1,9 @@
# Filebeat setup
[Filebeat](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-overview.html) log collector supports
[Elasticsearch output](https://www.elastic.co/guide/en/beats/filebeat/current/elasticsearch-output.html) compatible with
VictoriaMetrics [ingestion format](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/#elasticsearch-bulk-api).
Specify [`output.elasicsearch`](https://www.elastic.co/guide/en/beats/filebeat/current/elasticsearch-output.html) section in the `filebeat.yml`
for sending the collected logs to [VictoriaLogs](https://docs.victoriametrics.com/VictoriaLogs/):
@ -72,7 +76,7 @@ output.elasticsearch:
compression_level: 1
```
By default the ingested logs are stored in the `(AccountID=0, ProjectID=0)` [tenant](https://docs.victoriametrics.com/VictoriaLogs/#multitenancy).
By default, the ingested logs are stored in the `(AccountID=0, ProjectID=0)` [tenant](https://docs.victoriametrics.com/VictoriaLogs/#multitenancy).
If you need storing logs in other tenant, then specify the needed tenant via `headers` at `output.elasticsearch` section.
For example, the following `filebeat.yml` config instructs Filebeat to store the data to `(AccountID=12, ProjectID=34)` tenant:
@ -88,6 +92,12 @@ output.elasticsearch:
_stream_fields: "host.name,log.file.path"
```
More info about output parameters you can find in [these docs](https://www.elastic.co/guide/en/beats/filebeat/current/elasticsearch-output.html).
[Here is a demo](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker/victorialogs/filebeat-docker) for
running Filebeat with VictoriaLogs with docker-compose and collecting logs to VictoriaLogs.
The ingested log entries can be queried according to [these docs](https://docs.victoriametrics.com/VictoriaLogs/querying/).
See also [data ingestion troubleshooting](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/#troubleshooting) docs.

View file

@ -0,0 +1,70 @@
## Fluentbit setup
[Fluentbit](https://docs.fluentbit.io/manual) log collector supports [HTTP output](https://docs.fluentbit.io/manual/pipeline/outputs/http) compatible with
VictoriaMetrics [JSON stream API](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/#json-stream-api).
Specify [`output`](https://www.elastic.co/guide/en/beats/filebeat/current/elasticsearch-output.html) section with `Name http` in the `fluentbit.conf`
for sending the collected logs to VictoriaLogs:
```conf
[Output]
Name http
Match *
host localhost
port 9428
uri /insert/jsonline/?_stream_fields=stream&_msg_field=log&_time_field=date
format json_lines
json_date_format iso8601
```
Substitute the address (`localhost`) and port (`9428`) inside `Output` section with the real TCP address of VictoriaLogs.
The `_msg_field` parameter must contain the field name with the log message generated by Fluentbit. This is usually `message` field.
See [these docs](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#message-field) for details.
The `_time_field` parameter must contain the field name with the log timestamp generated by Fluentbit. This is usually `@timestamp` field.
See [these docs](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#time-field) for details.
It is recommended specifying comma-separated list of field names, which uniquely identify every log stream collected by Fluentbit, in the `_stream_fields` parameter.
See [these docs](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#stream-fields) for details.
If the Fluentbit sends logs to VictoriaLogs in another datacenter, then it may be useful enabling data compression via `compress` option.
This usually allows saving network bandwidth and costs by up to 5 times:
```conf
[Output]
Name http
Match *
host localhost
port 9428
uri /insert/jsonline/?_stream_fields=stream&_msg_field=log&_time_field=date
format json_lines
json_date_format iso8601
compress gzip
```
By default, the ingested logs are stored in the `(AccountID=0, ProjectID=0)` [tenant](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#multitenancy).
If you need storing logs in other tenant, then specify the needed tenant via `headers` at `output.elasticsearch` section.
For example, the following `fluentbit.conf` config instructs Filebeat to store the data to `(AccountID=12, ProjectID=34)` tenant:
```conf
[Output]
Name http
Match *
host localhost
port 9428
uri /insert/jsonline/?_stream_fields=stream&_msg_field=log&_time_field=date
format json_lines
json_date_format iso8601
header AccountID 12
header ProjectID 23
```
More info about output tuning you can find in [these docs](https://docs.fluentbit.io/manual/pipeline/outputs/http).
[Here is a demo](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker/victorialogs/fluentbit-docker)
for running Fluentbit with VictoriaLogs with docker-compose and collecting logs from docker-containers to VictoriaLogs.
The ingested log entries can be queried according to [these docs](https://docs.victoriametrics.com/VictoriaLogs/querying/).
See also [data ingestion troubleshooting](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/#troubleshooting) docs.

View file

@ -1,5 +1,10 @@
# Logstash setup
[Logstash](https://www.elastic.co/guide/en/logstash/8.8/introduction.html) log collector supports
[Opensearch output plugin](https://github.com/opensearch-project/logstash-output-opensearch) compatible with
[Elasticsearch bulk API](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/#elasticsearch-bulk-api)
in VictoriaMetrics.
Specify [`output.elasticsearch`](https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html) section in the `logstash.conf` file
for sending the collected logs to [VictoriaLogs](https://docs.victoriametrics.com/VictoriaLogs/):
@ -74,7 +79,7 @@ output {
}
```
By default the ingested logs are stored in the `(AccountID=0, ProjectID=0)` [tenant](https://docs.victoriametrics.com/VictoriaLogs/#multitenancy).
By default, the ingested logs are stored in the `(AccountID=0, ProjectID=0)` [tenant](https://docs.victoriametrics.com/VictoriaLogs/#multitenancy).
If you need storing logs in other tenant, then specify the needed tenant via `custom_headers` at `output.elasticsearch` section.
For example, the following `logstash.conf` config instructs Logstash to store the data to `(AccountID=12, ProjectID=34)` tenant:
@ -95,6 +100,12 @@ output {
}
```
More info about output tuning you can find in [these docs](https://github.com/opensearch-project/logstash-output-opensearch/blob/main/README.md).
[Here is a demo](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker/victorialogs/logstash)
for running Logstash with VictoriaLogs with docker-compose and collecting logs to VictoriaLogs
(via [Elasticsearch bulk API](https://docs.victoriametrics.com/VictoriaLogs/daat-ingestion/#elasticsearch-bulk-api)).
The ingested log entries can be queried according to [these docs](https://docs.victoriametrics.com/VictoriaLogs/querying/).
See also [data ingestion troubleshooting](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/#troubleshooting) docs.

View file

@ -3,7 +3,11 @@
[VictoriaLogs](https://docs.victoriametrics.com/VictoriaLogs/) can accept logs from the following log collectors:
- Filebeat. See [how to setup Filebeat for sending logs to VictoriaLogs](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/Filebeat.html).
- Fluentbit. See [how to setup Fluentbit for sending logs to VictoriaLogs](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/Fluentbit.html).
- Logstash. See [how to setup Logstash for sending logs to VictoriaLogs](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/Logstash.html).
- Vector. See [how to setup Vector for sending logs to VictoriaLogs](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/Vector.html).
See also [Log collectors and data ingestion formats](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/#log-collectors-and-data-ingestion-formats) in VictoriaMetrics.
The ingested logs can be queried according to [these docs](https://docs.victoriametrics.com/VictoriaLogs/querying/).
@ -21,7 +25,8 @@ VictoriaLogs accepts optional [HTTP parameters](#http-parameters) at data ingest
### Elasticsearch bulk API
VictoriaLogs accepts logs in [Elasticsearch bulk API](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html)
format at `http://localhost:9428/insert/elasticsearch/_bulk` endpoint.
/ [OpenSearch Bulk API](http://opensearch.org/docs/1.2/opensearch/rest-api/document-apis/bulk/) format
at `http://localhost:9428/insert/elasticsearch/_bulk` endpoint.
The following command pushes a single log line to Elasticsearch bulk API at VictoriaLogs:
@ -114,3 +119,14 @@ VictoriaLogs exposes various [metrics](https://docs.victoriametrics.com/Victoria
since the last VictoriaLogs restart. If this metric grows rapidly during extended periods of time, then this may lead
to [high cardinality issues](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#high-cardinality).
The newly created log streams can be inspected in logs by passing `-logNewStreams` command-line flag to VictoriaLogs.
## Log collectors and data ingestion formats
Here is the list of supported collectors and their ingestion formats supported by VictoriaLogs:
| Collector | Elasticsearch | JSON Stream |
|------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------|---------------------------------------------------------------|
| [filebeat](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/Filebeat.html) | [Yes](https://www.elastic.co/guide/en/beats/filebeat/current/elasticsearch-output.html) | No |
| [fluentbit](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/Fluentbit.html) | No | [Yes](https://docs.fluentbit.io/manual/pipeline/outputs/http) |
| [logstash](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/Logstash.html) | [Yes](https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html) | No |
| [vector](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/Vector.html) | [Yes](https://vector.dev/docs/reference/configuration/sinks/elasticsearch/) | No |

View file

@ -0,0 +1,132 @@
# Vector setup
[Vector](http://vector.dev) log collector supports
[Elasticsearch sink](https://vector.dev/docs/reference/configuration/sinks/elasticsearch/) compatible with
[VictoriaMetrics Elasticsearch bulk API](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/#elasticsearch-bulk-api).
Specify [`sinks.vlogs`](https://www.elastic.co/guide/en/beats/filebeat/current/elasticsearch-output.html) with `type=elasticsearch` section in the `vector.toml`
for sending the collected logs to VictoriaLogs:
```toml
[sinks.vlogs]
inputs = [ "your_input" ]
type = "elasticsearch"
endpoints = [ "http://localhost:9428/insert/elasticsearch/" ]
mode = "bulk"
api_version = "v8"
healthcheck.enabled = false
[sinks.vlogs.query]
_msg_field = "message"
_time_field = "timestamp"
_stream_fields = "host,container_name"
```
Substitute the `localhost:9428` address inside `endpoints` section with the real TCP address of VictoriaLogs.
Replace `your_input` with the name of the `inputs` section, which collects logs. See [these docs](https://vector.dev/docs/reference/configuration/sources/) for details.
The `_msg_field` parameter must contain the field name with the log message generated by Vector. This is usually `message` field.
See [these docs](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#message-field) for details.
The `_time_field` parameter must contain the field name with the log timestamp generated by Vector. This is usually `@timestamp` field.
See [these docs](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#time-field) for details.
It is recommended specifying comma-separated list of field names, which uniquely identify every log stream collected by Vector, in the `_stream_fields` parameter.
See [these docs](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#stream-fields) for details.
If some [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model) aren't needed,
then VictoriaLogs can be instructed to ignore them during data ingestion - just pass `ignore_fields` parameter with comma-separated list of fields to ignore.
For example, the following config instructs VictoriaLogs to ignore `log.offset` and `event.original` fields in the ingested logs:
```toml
[sinks.vlogs]
inputs = [ "your_input" ]
type = "elasticsearch"
endpoints = [ "http://localhost:9428/insert/elasticsearch/" ]
mode = "bulk"
api_version = "v8"
healthcheck.enabled = false
[sinks.vlogs.query]
_msg_field = "message"
_time_field = "timestamp"
_stream_fields = "host,container_name"
ignore_fields = "log.offset,event.original"
```
More details about `_msg_field`, `_time_field`, `_stream_fields` and `ignore_fields` are
available [here](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/#http-parameters).
When Vector ingests logs into VictoriaLogs at a high rate, then it may be needed to tune `batch.max_events` option.
For example, the following config is optimized for higher than usual ingestion rate:
```toml
[sinks.vlogs]
inputs = [ "your_input" ]
type = "elasticsearch"
endpoints = [ "http://localhost:9428/insert/elasticsearch/" ]
mode = "bulk"
api_version = "v8"
healthcheck.enabled = false
[sinks.vlogs.query]
_msg_field = "message"
_time_field = "timestamp"
_stream_fields = "host,container_name"
[sinks.vlogs.batch]
max_events = 1000
```
If the Vector sends logs to VictoriaLogs in another datacenter, then it may be useful enabling data compression via `compression` option.
This usually allows saving network bandwidth and costs by up to 5 times:
```toml
[sinks.vlogs]
inputs = [ "your_input" ]
type = "elasticsearch"
endpoints = [ "http://localhost:9428/insert/elasticsearch/" ]
mode = "bulk"
api_version = "v8"
healthcheck.enabled = false
compression = "gzip"
[sinks.vlogs.query]
_msg_field = "message"
_time_field = "timestamp"
_stream_fields = "host,container_name"
```
By default, the ingested logs are stored in the `(AccountID=0, ProjectID=0)` [tenant](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#multitenancy).
If you need storing logs in other tenant, then specify the needed tenant via `custom_headers` at `output.elasticsearch` section.
For example, the following `vector.toml` config instructs Logstash to store the data to `(AccountID=12, ProjectID=34)` tenant:
```toml
[sinks.vlogs]
inputs = [ "your_input" ]
type = "elasticsearch"
endpoints = [ "http://localhost:9428/insert/elasticsearch/" ]
mode = "bulk"
api_version = "v8"
healthcheck.enabled = false
[sinks.vlogs.query]
_msg_field = "message"
_time_field = "timestamp"
_stream_fields = "host,container_name"
[sinks.vlogs.request.headers]
AccountID = "12"
ProjectID = "34"
```
More info about output tuning you can find in [these docs](https://vector.dev/docs/reference/configuration/sinks/elasticsearch/).
[Here is a demo](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker/victorialogs/vector-docker)
for running Vector with VictoriaLogs with docker-compose and collecting logs from docker-containers
to VictoriaLogs (via [Elasticsearch API](https://docs.victoriametrics.com/VictoriaLogs/ingestion/#elasticsearch-bulk-api)).
The ingested log entries can be queried according to [these docs](https://docs.victoriametrics.com/VictoriaLogs/querying/).
See also [data ingestion troubleshooting](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/#troubleshooting) docs.