VictoriaMetrics/docs/VictoriaLogs/data-ingestion
Zhu Jiekun fea4433362
docs: [VictoriaLogs] OTel Collector elasticsearchexporter header note (#7074)
### Describe Your Changes

By default, the `elasticsearchexporter` in OTel Collector puts the log
message under a field other than `_msg` (e.g., `Body`). Without
specifying via an HTTP header, those logs may not be queried correctly.
See also:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6785.

This pull request updates the example configuration and notes for the
`elasticsearchexporter`.

### Checklist

The following checks are **mandatory**:

- [X] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).
2024-09-24 11:52:09 +02:00
..
_index.md docs: updated guides structure, removed deprecated sort option (#6767) 2024-08-07 16:48:08 +02:00
Filebeat.md docs: fix links to docker 2024-09-09 19:56:58 +02:00
Fluentbit.md docs: fix links to docker 2024-09-09 19:56:58 +02:00
Logstash.md app/vlinsert: support getting _msg_field, _time_field, _stream_fields and _ignore_fields from headers 2024-09-03 17:43:26 +02:00
opentelemetry.md docs: [VictoriaLogs] OTel Collector elasticsearchexporter header note (#7074) 2024-09-24 11:52:09 +02:00
Promtail.md view documentation locally (#6677) 2024-07-24 01:00:31 -07:00
README.md vlinsert: added opentelemetry logs support 2024-09-03 20:12:05 +02:00
syslog.md docs: updated docs titles and links (#6741) 2024-08-06 15:54:52 +02:00
Telegraf.md app/vlinsert: support getting _msg_field, _time_field, _stream_fields and _ignore_fields from headers 2024-09-03 17:43:26 +02:00
Vector.md docs: fix links to docker 2024-09-09 19:56:58 +02:00

VictoriaLogs can accept logs from the following log collectors:

The ingested logs can be queried according to these docs.

See also:

HTTP APIs

VictoriaLogs supports the following data ingestion HTTP APIs:

VictoriaLogs accepts optional HTTP parameters at data ingestion HTTP APIs.

Elasticsearch bulk API

VictoriaLogs accepts logs in Elasticsearch bulk API / OpenSearch Bulk API format at http://localhost:9428/insert/elasticsearch/_bulk endpoint.

The following command pushes a single log line to VictoriaLogs:

echo '{"create":{}}
{"_msg":"cannot open file","_time":"0","host.name":"host123"}
' | curl -X POST -H 'Content-Type: application/json' --data-binary @- http://localhost:9428/insert/elasticsearch/_bulk

It is possible to push thousands of log lines in a single request to this API.

If the timestamp field is set to "0", then the current timestamp at VictoriaLogs side is used per each ingested log line. Otherwise the timestamp field must be in the ISO8601 format. For example, 2023-06-20T15:32:10Z. Optional fractional part of seconds can be specified after the dot - 2023-06-20T15:32:10.123Z. Timezone can be specified instead of Z suffix - 2023-06-20T15:32:10+02:00.

See these docs for details on fields, which must be present in the ingested log messages.

The API accepts various http parameters, which can change the data ingestion behavior - these docs for details.

The following command verifies that the data has been successfully ingested to VictoriaLogs by querying it:

curl http://localhost:9428/select/logsql/query -d 'query=host.name:host123'

The command should return the following response:

{"_msg":"cannot open file","_stream":"{}","_time":"2023-06-21T04:24:24Z","host.name":"host123"}

The response by default contains all the log fields. See how to query specific fields.

The duration of requests to /insert/elasticsearch/_bulk can be monitored with vl_http_request_duration_seconds{path="/insert/elasticsearch/_bulk"} metric.

See also:

JSON stream API

VictoriaLogs accepts JSON line stream aka ndjson at http://localhost:9428/insert/jsonline endpoint.

The following command pushes multiple log lines to VictoriaLogs:

echo '{ "log": { "level": "info", "message": "hello world" }, "date": "0", "stream": "stream1" }
{ "log": { "level": "error", "message": "oh no!" }, "date": "0", "stream": "stream1" }
{ "log": { "level": "info", "message": "hello world" }, "date": "0", "stream": "stream2" }
' | curl -X POST -H 'Content-Type: application/stream+json' --data-binary @- \
 'http://localhost:9428/insert/jsonline?_stream_fields=stream&_time_field=date&_msg_field=log.message'

It is possible to push unlimited number of log lines in a single request to this API.

If the timestamp field is set to "0", then the current timestamp at VictoriaLogs side is used per each ingested log line. Otherwise the timestamp field must be in the ISO8601 format. For example, 2023-06-20T15:32:10Z. Optional fractional part of seconds can be specified after the dot - 2023-06-20T15:32:10.123Z. Timezone can be specified instead of Z suffix - 2023-06-20T15:32:10+02:00.

See these docs for details on fields, which must be present in the ingested log messages.

The API accepts various http parameters, which can change the data ingestion behavior - these docs for details.

The following command verifies that the data has been successfully ingested into VictoriaLogs by querying it:

curl http://localhost:9428/select/logsql/query -d 'query=log.level:*'

The command should return the following response:

{"_msg":"hello world","_stream":"{stream=\"stream2\"}","_time":"2023-06-20T13:35:11.56789Z","log.level":"info"}
{"_msg":"hello world","_stream":"{stream=\"stream1\"}","_time":"2023-06-20T15:31:23Z","log.level":"info"}
{"_msg":"oh no!","_stream":"{stream=\"stream1\"}","_time":"2023-06-20T15:32:10.567Z","log.level":"error"}

The response by default contains all the log fields. See how to query specific fields.

The duration of requests to /insert/jsonline can be monitored with vl_http_request_duration_seconds{path="/insert/jsonline"} metric.

See also:

Loki JSON API

VictoriaLogs accepts logs in Loki JSON API format at http://localhost:9428/insert/loki/api/v1/push endpoint.

The following command pushes a single log line to Loki JSON API at VictoriaLogs:

curl -H "Content-Type: application/json" -XPOST "http://localhost:9428/insert/loki/api/v1/push?_stream_fields=instance,job" --data-raw \
  '{"streams": [{ "stream": { "instance": "host123", "job": "app42" }, "values": [ [ "0", "foo fizzbuzz bar" ] ] }]}'

It is possible to push thousands of log streams and log lines in a single request to this API.

The API accepts various http parameters, which can change the data ingestion behavior - these docs for details. There is no need in specifying _msg_field and _time_field query args, since VictoriaLogs automatically extracts log message and timestamp from the ingested Loki data.

The following command verifies that the data has been successfully ingested into VictoriaLogs by querying it:

curl http://localhost:9428/select/logsql/query -d 'query=fizzbuzz'

The command should return the following response:

{"_msg":"foo fizzbuzz bar","_stream":"{instance=\"host123\",job=\"app42\"}","_time":"2023-07-20T23:01:19.288676497Z"}

The response by default contains all the log fields. See how to query specific fields.

The duration of requests to /insert/loki/api/v1/push can be monitored with vl_http_request_duration_seconds{path="/insert/loki/api/v1/push"} metric.

See also:

HTTP parameters

VictoriaLogs accepts the following configuration parameters via HTTP Headers or URL Query string at data ingestion HTTP APIs. First defined parameter is used. Query string parameters have priority over HTTP Headers.

HTTP Query string parameters

List of supported Query string parameters:

  • _msg_field - it must contain the name of the log field with the log message generated by the log shipper. This is usually the message field for Filebeat and Logstash. If the _msg_field parameter isn't set, then VictoriaLogs reads the log message from the _msg field.

  • _time_field - it must contain the name of the log field with the log timestamp generated by the log shipper. This is usually the @timestamp field for Filebeat and Logstash. If the _time_field parameter isn't set, then VictoriaLogs reads the timestamp from the _time field. If this field doesn't exist, then the current timestamp is used.

  • _stream_fields - it should contain comma-separated list of log field names, which uniquely identify every log stream collected the log shipper. If the _stream_fields parameter isn't set, then all the ingested logs are written to default log stream - {}.

  • ignore_fields - this parameter may contain the list of log field names, which must be ignored during data ingestion.

  • debug - if this parameter is set to 1, then the ingested logs aren't stored in VictoriaLogs. Instead, the ingested data is logged by VictoriaLogs, so it can be investigated later.

See also HTTP headers.

HTTP headers

List of supported HTTP Headers parameters:

  • AccountID - may contain the needed accountID of tenant to ingest data to. See multitenancy docs for details.

  • ProjectID- may contain the projectID needed of tenant to ingest data to. See multitenancy docs for details. VictoriaLogs accepts optional AccountID and ProjectID headers at data ingestion HTTP APIs.

  • VL-Msg-Field - it must contain the name of the log field with the log message generated by the log shipper. This is usually the message field for Filebeat and Logstash. If the VL-Msg-Field header isn't set, then VictoriaLogs reads the log message from the _msg field.

  • VL-Time-Field - it must contain the name of the log field with the log timestamp generated by the log shipper. This is usually the @timestamp field for Filebeat and Logstash. If the VL-Time-Field header isn't set, then VictoriaLogs reads the timestamp from the _time field. If this field doesn't exist, then the current timestamp is used.

  • VL-Stream-Fields - it should contain comma-separated list of log field names, which uniquely identify every log stream collected the log shipper. If the VL-Stream-Fields header isn't set, then all the ingested logs are written to default log stream - {}.

  • VL-Ignore-Fields - this parameter may contain the list of log field names, which must be ignored during data ingestion.

  • VL-Debug - if this parameter is set to 1, then the ingested logs aren't stored in VictoriaLogs. Instead, the ingested data is logged by VictoriaLogs, so it can be investigated later.

See also HTTP Query string parameters.

Troubleshooting

The following command can be used for verifying whether the data is successfully ingested into VictoriaLogs:

curl http://localhost:9428/select/logsql/query -d 'query=*' | head

This command selects all the data ingested into VictoriaLogs via HTTP query API using any value filter, while head cancels query execution after reading the first 10 log lines. See these docs for more details on how head integrates with VictoriaLogs.

The response by default contains all the log fields. See how to query specific fields.

VictoriaLogs provides the following command-line flags, which can help debugging data ingestion issues:

  • -logNewStreams - if this flag is passed to VictoriaLogs, then it logs all the newly registered log streams. This may help debugging high cardinality issues.
  • -logIngestedRows - if this flag is passed to VictoriaLogs, then it logs all the ingested log entries. See also debug parameter.

VictoriaLogs exposes various metrics, which may help debugging data ingestion issues:

  • vl_rows_ingested_total - the number of ingested log entries since the last VictoriaLogs restart. If this number increases over time, then logs are successfully ingested into VictoriaLogs. The ingested logs can be inspected in the following ways:
    • By passing debug=1 parameter to every request to data ingestion APIs. The ingested rows aren't stored in VictoriaLogs in this case. Instead, they are logged, so they can be investigated later. The vl_rows_dropped_total metric is incremented for each logged row.
    • By passing -logIngestedRows command-line flag to VictoriaLogs. In this case it logs all the ingested data, so it can be investigated later.
  • vl_streams_created_total - the number of created log streams since the last VictoriaLogs restart. If this metric grows rapidly during extended periods of time, then this may lead to high cardinality issues. The newly created log streams can be inspected in logs by passing -logNewStreams command-line flag to VictoriaLogs.

Log collectors and data ingestion formats

Here is the list of log collectors and their ingestion formats supported by VictoriaLogs:

How to setup the collector Format: Elasticsearch Format: JSON Stream Format: Loki Format: syslog Format: OpenTelemetry
Rsyslog Yes No No Yes No
Syslog-ng Yes, v1, v2 No No Yes No
Filebeat Yes No No No No
Fluentbit No Yes Yes Yes Yes
Logstash Yes No No Yes Yes
Vector Yes Yes Yes No Yes
Promtail No No Yes No No
OpenTelemetry Collector Yes No Yes Yes Yes
Telegraf Yes Yes Yes Yes Yes