lib/protoparser/opentelemetry: follow-up after 47892b4a4c

- Rename -opentelemetry.sanitizeMetrics command-line flag to more clear -opentelemetry.usePrometheusNaming
- Clarify the description of the change at docs/CHANGELOG.md
- Rename promrelabel.SanitizeLabelNameParts to more clear promrelabel.SplitMetricNameToTokens
- Properly split metric names at '_' char in promerlabel.SplitMetricNameToTokens.
- Add tests for various edge cases for Prometheus metric names' normalization
  according to the code at b865505850/pkg/translator/prometheus/normalize_name.go
- Extract the code responsible for Prometheus metric names' normalization into a separate file (santize.go)

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6037
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6035
This commit is contained in:
Aliaksandr Valialkin 2024-04-03 02:18:33 +03:00
parent a5f65756f8
commit bb9bb600b3
No known key found for this signature in database
GPG key ID: 52C003EE2BCDB9EB
8 changed files with 288 additions and 140 deletions

View file

@ -61,13 +61,13 @@ See also [LTS releases](https://docs.victoriametrics.com/lts-releases/).
* FEATURE: [vmctl](https://docs.victoriametrics.com/vmctl.html): support client-side TLS configuration for VictoriaMetrics destination specified via `--vm-*` cmd-line flags used in [InfluxDB](https://docs.victoriametrics.com/vmctl/#migrating-data-from-influxdb-1x), [Remote Read protocol](https://docs.victoriametrics.com/vmctl/#migrating-data-by-remote-read-protocol), [OpenTSDB](https://docs.victoriametrics.com/vmctl/#migrating-data-from-opentsdb), [Prometheus](https://docs.victoriametrics.com/vmctl/#migrating-data-from-prometheus) and [Promscale](https://docs.victoriametrics.com/vmctl/#migrating-data-from-promscale) migration modes.
* FEATURE: [vmctl](https://docs.victoriametrics.com/vmctl.html): split [explore phase](https://docs.victoriametrics.com/vmctl/#migrating-data-from-victoriametrics) in `vm-native` mode by time intervals when [--vm-native-step-interval](https://docs.victoriametrics.com/vmctl/#using-time-based-chunking-of-migration) is specified. This should reduce probability of exceeding complexity limits for number of selected series during explore phase. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5369).
* FEATURE: [graphite](https://docs.victoriametrics.com/#graphite-render-api-usage): add support for [aggregateSeriesLists](https://graphite.readthedocs.io/en/latest/functions.html#graphite.render.functions.aggregateSeriesLists), [diffSeriesLists](https://graphite.readthedocs.io/en/latest/functions.html#graphite.render.functions.diffSeriesLists), [multiplySeriesLists](https://graphite.readthedocs.io/en/latest/functions.html#graphite.render.functions.multiplySeriesLists) and [sumSeriesLists](https://graphite.readthedocs.io/en/latest/functions.html#graphite.render.functions.sumSeriesLists) functions. Thanks to @rbizos for [the pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5809).
* FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent.html): added command line argument that enables OpenTelementry metric names and labels sanitization.
* FEATURE: [OpenTelemetry](https://docs.victoriametrics.com/#sending-data-via-opentelemetry): add `-opentelemetry.usePrometheusNaming` command-line flag, which can be used for enabling automatic conversion of the ingested metric names and labels into Prometheus-compatible format. See [these docs](https://docs.victoriametrics.com/#sending-data-via-opentelemetry) and [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6037).
* BUGFIX: prevent from automatic deletion of newly registered time series when it is queried immediately after the addition. The probability of this bug has been increased significantly after [v1.99.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.99.0) because of optimizations related to registering new time series. See [this](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5948) and [this](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5959) issue.
* BUGFIX: [vmagent](https://docs.victoriametrics.com/vmagent.html): properly set `Host` header in requests to scrape targets if it is specified via [`headers` option](https://docs.victoriametrics.com/sd_configs/#http-api-client-options). Thanks to @fholzer for [the bugreport](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5969) and [the fix](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5970).
* BUGFIX: [vmagent](https://docs.victoriametrics.com/vmagent.html): properly set `Host` header in requests to scrape targets when [`server_name` option at `tls_config`](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#tls_config) is set. Previously the `Host` header was set incorrectly to the target hostname in this case.
* BUGFIX: do not drop `match[]` filter at [`/api/v1/series`](https://docs.victoriametrics.com/url-examples/#apiv1series) if `-search.ignoreExtraFiltersAtLabelsAPI` command-line flag is set, since missing `match[]` filter breaks `/api/v1/series` requests.
* BUGFIX: [vmagent](https://docs.victoriametrics.com/vmagent.html): return proper resonses for [AWS Firehose](https://docs.aws.amazon.com/firehose/latest/dev/httpdeliveryrequestresponse.html#requestformat) requests according to [these docs](https://docs.aws.amazon.com/firehose/latest/dev/httpdeliveryrequestresponse.html#responseformat). See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6016).
* BUGFIX: [vmagent](https://docs.victoriametrics.com/vmagent.html): return proper resonses for [AWS Firehose](https://docs.aws.amazon.com/firehose/latest/dev/httpdeliveryrequestresponse.html#requestformat) requests according to [these docs](https://docs.aws.amazon.com/firehose/latest/dev/httpdeliveryrequestresponse.html#responseformat). See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6016) and [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6037).
* BUGFIX: [vmctl](https://docs.victoriametrics.com/vmctl.html): properly parse TLS key and CA files for [InfluxDB](https://docs.victoriametrics.com/vmctl/#migrating-data-from-influxdb-1x) and [OpenTSDB](https://docs.victoriametrics.com/vmctl/#migrating-data-from-opentsdb) migration modes.
* BUGFIX: [vmui](https://docs.victoriametrics.com/#vmui): fix VictoriaLogs UI query handling to correctly apply `_time` filter across all queries. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5920).
* BUGFIX: [Single-node VictoriaMetrics](https://docs.victoriametrics.com/) and `vmselect` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/cluster-victoriametrics/): limit duration of requests to /api/v1/labels, /api/v1/label/.../values or /api/v1/series with `-search.maxLabelsAPIDuration` duration. Before, `-search.maxExportDuration` value was used by mistake. The bug has been introduced in [v1.99.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.99.0). Thanks to @kbweave for the [pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5992).

View file

@ -1549,6 +1549,9 @@ VictoriaMetrics supports data ingestion via [OpenTelemetry protocol for metrics]
VictoriaMetrics expects `protobuf`-encoded requests at `/opentelemetry/v1/metrics`.
Set HTTP request header `Content-Encoding: gzip` when sending gzip-compressed data to `/opentelemetry/v1/metrics`.
VictoriaMetrics stores the ingested OpenTelemetry [raw samples](https://docs.victoriametrics.com/keyconcepts/#raw-samples) as is without any transformations.
Pass `-opentelemetry.usePrometheusNaming` command-line flag to VictoriaMetrics for automatic conversion of metric names and labels into Prometheus-compatible format.
See [How to use OpenTelemetry metrics with VictoriaMetrics](https://docs.victoriametrics.com/guides/getting-started-with-opentelemetry/).
## JSON line format

View file

@ -1557,6 +1557,9 @@ VictoriaMetrics supports data ingestion via [OpenTelemetry protocol for metrics]
VictoriaMetrics expects `protobuf`-encoded requests at `/opentelemetry/v1/metrics`.
Set HTTP request header `Content-Encoding: gzip` when sending gzip-compressed data to `/opentelemetry/v1/metrics`.
VictoriaMetrics stores the ingested OpenTelemetry [raw samples](https://docs.victoriametrics.com/keyconcepts/#raw-samples) as is without any transformations.
Pass `-opentelemetry.usePrometheusNaming` command-line flag to VictoriaMetrics for automatic conversion of metric names and labels into Prometheus-compatible format.
See [How to use OpenTelemetry metrics with VictoriaMetrics](https://docs.victoriametrics.com/guides/getting-started-with-opentelemetry/).
## JSON line format

View file

@ -663,11 +663,15 @@ func SanitizeLabelName(name string) string {
return labelNameSanitizer.Transform(name)
}
// SanitizeLabelNameParts returns label name slice generated from metric name divided by unsupported characters
func SanitizeLabelNameParts(name string) []string {
return unsupportedLabelNameChars.Split(name, -1)
// SplitMetricNameToTokens returns tokens generated from metric name divided by unsupported Prometheus characters
//
// See https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels
func SplitMetricNameToTokens(name string) []string {
return nonAlphaNumChars.Split(name, -1)
}
var nonAlphaNumChars = regexp.MustCompile(`[^a-zA-Z0-9]`)
var labelNameSanitizer = bytesutil.NewFastStringTransformer(func(s string) string {
return unsupportedLabelNameChars.ReplaceAllString(s, "_")
})

View file

@ -0,0 +1,138 @@
package stream
import (
"flag"
"slices"
"strings"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/opentelemetry/pb"
)
var (
usePrometheusNaming = flag.Bool("opentelemetry.usePrometheusNaming", false, "Whether to convert metric names and labels into Prometheus-compatible format for the metrics ingested "+
"via OpenTelemetry protocol; see https://docs.victoriametrics.com/#sending-data-via-opentelemetry")
)
// unitMap is obtained from https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/b8655058501bed61a06bb660869051491f46840b/pkg/translator/prometheus/normalize_name.go#L19
var unitMap = map[string]string{
// Time
"d": "days",
"h": "hours",
"min": "minutes",
"s": "seconds",
"ms": "milliseconds",
"us": "microseconds",
"ns": "nanoseconds",
// Bytes
"By": "bytes",
"KiBy": "kibibytes",
"MiBy": "mebibytes",
"GiBy": "gibibytes",
"TiBy": "tibibytes",
"KBy": "kilobytes",
"MBy": "megabytes",
"GBy": "gigabytes",
"TBy": "terabytes",
// SI
"m": "meters",
"V": "volts",
"A": "amperes",
"J": "joules",
"W": "watts",
"g": "grams",
// Misc
"Cel": "celsius",
"Hz": "hertz",
"1": "",
"%": "percent",
}
// perUnitMap is copied from https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/b8655058501bed61a06bb660869051491f46840b/pkg/translator/prometheus/normalize_name.go#L58
var perUnitMap = map[string]string{
"s": "second",
"m": "minute",
"h": "hour",
"d": "day",
"w": "week",
"mo": "month",
"y": "year",
}
// See https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/b8655058501bed61a06bb660869051491f46840b/pkg/translator/prometheus/normalize_label.go#L26
func sanitizeLabelName(labelName string) string {
if !*usePrometheusNaming {
return labelName
}
return sanitizePrometheusLabelName(labelName)
}
func sanitizePrometheusLabelName(labelName string) string {
if len(labelName) == 0 {
return ""
}
labelName = promrelabel.SanitizeLabelName(labelName)
if labelName[0] >= '0' && labelName[0] <= '9' {
return "key_" + labelName
} else if strings.HasPrefix(labelName, "_") && !strings.HasPrefix(labelName, "__") {
return "key" + labelName
}
return labelName
}
// See https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/b8655058501bed61a06bb660869051491f46840b/pkg/translator/prometheus/normalize_name.go#L83
func sanitizeMetricName(m *pb.Metric) string {
if !*usePrometheusNaming {
return m.Name
}
return sanitizePrometheusMetricName(m)
}
func sanitizePrometheusMetricName(m *pb.Metric) string {
nameTokens := promrelabel.SplitMetricNameToTokens(m.Name)
unitTokens := strings.SplitN(m.Unit, "/", 2)
if len(unitTokens) > 0 {
mainUnit := strings.TrimSpace(unitTokens[0])
if mainUnit != "" && !strings.ContainsAny(mainUnit, "{}") {
if u, ok := unitMap[mainUnit]; ok {
mainUnit = u
}
if mainUnit != "" && !slices.Contains(nameTokens, mainUnit) {
nameTokens = append(nameTokens, mainUnit)
}
}
if len(unitTokens) > 1 {
perUnit := strings.TrimSpace(unitTokens[1])
if perUnit != "" && !strings.ContainsAny(perUnit, "{}") {
if u, ok := perUnitMap[perUnit]; ok {
perUnit = u
}
if perUnit != "" && !slices.Contains(nameTokens, perUnit) {
nameTokens = append(nameTokens, "per", perUnit)
}
}
}
}
if m.Sum != nil && m.Sum.IsMonotonic {
nameTokens = moveOrAppend(nameTokens, "total")
} else if m.Unit == "1" && m.Gauge != nil {
nameTokens = moveOrAppend(nameTokens, "ratio")
}
return strings.Join(nameTokens, "_")
}
func moveOrAppend(tokens []string, value string) []string {
for i := range tokens {
if tokens[i] == value {
tokens = append(tokens[:i], tokens[i+1:]...)
break
}
}
return append(tokens, value)
}

View file

@ -0,0 +1,127 @@
package stream
import (
"testing"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/opentelemetry/pb"
)
func TestSanitizePrometheusLabelName(t *testing.T) {
f := func(labelName, expectedResult string) {
t.Helper()
result := sanitizePrometheusLabelName(labelName)
if result != expectedResult {
t.Fatalf("unexpected result; got %q; want %q", result, expectedResult)
}
}
f("", "")
f("foo", "foo")
f("foo_bar/baz:abc", "foo_bar_baz_abc")
f("1foo", "key_1foo")
f("_foo", "key_foo")
f("__bar", "__bar")
}
func TestSanitizePrometheusMetricName(t *testing.T) {
f := func(m *pb.Metric, expectedResult string) {
t.Helper()
result := sanitizePrometheusMetricName(m)
if result != expectedResult {
t.Fatalf("unexpected result; got %q; want %q", result, expectedResult)
}
}
f(&pb.Metric{}, "")
f(&pb.Metric{
Name: "foo",
}, "foo")
f(&pb.Metric{
Name: "foo",
Unit: "s",
}, "foo_seconds")
f(&pb.Metric{
Name: "foo_seconds",
Unit: "s",
}, "foo_seconds")
f(&pb.Metric{
Name: "foo",
Sum: &pb.Sum{
IsMonotonic: true,
},
}, "foo_total")
f(&pb.Metric{
Name: "foo_total",
Sum: &pb.Sum{
IsMonotonic: true,
},
}, "foo_total")
f(&pb.Metric{
Name: "foo",
Sum: &pb.Sum{
IsMonotonic: true,
},
Unit: "s",
}, "foo_seconds_total")
f(&pb.Metric{
Name: "foo_seconds",
Sum: &pb.Sum{
IsMonotonic: true,
},
Unit: "s",
}, "foo_seconds_total")
f(&pb.Metric{
Name: "foo_total",
Sum: &pb.Sum{
IsMonotonic: true,
},
Unit: "s",
}, "foo_seconds_total")
f(&pb.Metric{
Name: "foo_seconds_total",
Sum: &pb.Sum{
IsMonotonic: true,
},
Unit: "s",
}, "foo_seconds_total")
f(&pb.Metric{
Name: "foo_total_seconds",
Sum: &pb.Sum{
IsMonotonic: true,
},
Unit: "s",
}, "foo_seconds_total")
f(&pb.Metric{
Name: "foo",
Gauge: &pb.Gauge{},
Unit: "1",
}, "foo_ratio")
f(&pb.Metric{
Name: "foo",
Unit: "m/s",
}, "foo_meters_per_second")
f(&pb.Metric{
Name: "foo_second",
Unit: "m/s",
}, "foo_second_meters")
f(&pb.Metric{
Name: "foo_meters",
Unit: "m/s",
}, "foo_meters_per_second")
}

View file

@ -1,13 +1,10 @@
package stream
import (
"flag"
"fmt"
"io"
"strconv"
"strings"
"sync"
"unicode"
"github.com/VictoriaMetrics/metrics"
@ -16,72 +13,11 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/opentelemetry/pb"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
)
var (
// sanitizeMetrics controls sanitizing metric and label names ingested via OpenTelemetry protocol.
sanitizeMetrics = flag.Bool("opentelemetry.sanitizeMetrics", false, "Sanitize metric and label names for the ingested OpenTelemetry data")
)
// https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/b8655058501bed61a06bb660869051491f46840b/pkg/translator/prometheus/normalize_name.go#L19
var unitMap = []struct {
prefix string
units map[string]string
}{
{
units: map[string]string{
// Time
"d": "days",
"h": "hours",
"min": "minutes",
"s": "seconds",
"ms": "milliseconds",
"us": "microseconds",
"ns": "nanoseconds",
// Bytes
"By": "bytes",
"KiBy": "kibibytes",
"MiBy": "mebibytes",
"GiBy": "gibibytes",
"TiBy": "tibibytes",
"KBy": "kilobytes",
"MBy": "megabytes",
"GBy": "gigabytes",
"TBy": "terabytes",
// SI
"m": "meters",
"V": "volts",
"A": "amperes",
"J": "joules",
"W": "watts",
"g": "grams",
// Misc
"Cel": "celsius",
"Hz": "hertz",
"1": "",
"%": "percent",
},
}, {
prefix: "per",
units: map[string]string{
"s": "second",
"m": "minute",
"h": "hour",
"d": "day",
"w": "week",
"mo": "month",
"y": "year",
},
},
}
// ParseStream parses OpenTelemetry protobuf or json data from r and calls callback for the parsed rows.
//
// callback shouldn't hold tss items after returning.
@ -355,74 +291,6 @@ func (wr *writeContext) parseRequestToTss(req *pb.ExportMetricsServiceRequest) {
}
}
// https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/b8655058501bed61a06bb660869051491f46840b/pkg/translator/prometheus/normalize_label.go#L26
func sanitizeLabelName(labelName string) string {
if !*sanitizeMetrics {
return labelName
}
if len(labelName) == 0 {
return labelName
}
labelName = promrelabel.SanitizeLabelName(labelName)
if unicode.IsDigit(rune(labelName[0])) {
return "key_" + labelName
} else if strings.HasPrefix(labelName, "_") && !strings.HasPrefix(labelName, "__") {
return "key" + labelName
}
return labelName
}
// https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/b8655058501bed61a06bb660869051491f46840b/pkg/translator/prometheus/normalize_name.go#L83
func sanitizeMetricName(metric *pb.Metric) string {
if !*sanitizeMetrics {
return metric.Name
}
nameTokens := promrelabel.SanitizeLabelNameParts(metric.Name)
unitTokens := strings.SplitN(metric.Unit, "/", len(unitMap))
for i, u := range unitTokens {
unitToken := strings.TrimSpace(u)
if unitToken == "" || strings.ContainsAny(unitToken, "{}") {
continue
}
if unit, ok := unitMap[i].units[unitToken]; ok {
unitToken = unit
}
if unitToken != "" && !containsToken(nameTokens, unitToken) {
unitPrefix := unitMap[i].prefix
if unitPrefix != "" {
nameTokens = append(nameTokens, unitPrefix, unitToken)
} else {
nameTokens = append(nameTokens, unitToken)
}
}
}
if metric.Sum != nil && metric.Sum.IsMonotonic {
nameTokens = moveOrAppend(nameTokens, "total")
} else if metric.Unit == "1" && metric.Gauge != nil {
nameTokens = moveOrAppend(nameTokens, "ratio")
}
return strings.Join(nameTokens, "_")
}
func containsToken(tokens []string, value string) bool {
for _, token := range tokens {
if token == value {
return true
}
}
return false
}
func moveOrAppend(tokens []string, value string) []string {
for t := range tokens {
if tokens[t] == value {
tokens = append(tokens[:t], tokens[t+1:]...)
break
}
}
return append(tokens, value)
}
var wrPool sync.Pool
func getWriteContext() *writeContext {

View file

@ -15,9 +15,14 @@ import (
)
func TestParseStream(t *testing.T) {
f := func(samples []*pb.Metric, tssExpected []prompbmarshal.TimeSeries, sanitize bool) {
f := func(samples []*pb.Metric, tssExpected []prompbmarshal.TimeSeries, usePromNaming bool) {
t.Helper()
*sanitizeMetrics = sanitize
prevPromNaming := *usePrometheusNaming
*usePrometheusNaming = usePromNaming
defer func() {
*usePrometheusNaming = prevPromNaming
}()
checkSeries := func(tss []prompbmarshal.TimeSeries) error {
if len(tss) != len(tssExpected) {
@ -122,7 +127,7 @@ func TestParseStream(t *testing.T) {
false,
)
// Test gauge with unit and sanitization
// Test gauge with unit and prometheus naming
f(
[]*pb.Metric{
generateGauge("my-gauge", "ms"),