lib/protoparser/graphite: added -graphite.sanitizeMetricName flag (#6489)

### Describe Your Changes

Added flag to sanitize graphite metrics
fixes #6077

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/contributing/).

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
This commit is contained in:
Andrii Chubatiuk 2024-07-02 15:56:41 +03:00 committed by GitHub
parent f3831bdd13
commit 476faf5578
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
6 changed files with 89 additions and 10 deletions

View file

@ -774,6 +774,13 @@ Example for writing data with Graphite plaintext protocol to local VictoriaMetri
echo "foo.bar.baz;tag1=value1;tag2=value2 123 `date +%s`" | nc -N localhost 2003 echo "foo.bar.baz;tag1=value1;tag2=value2 123 `date +%s`" | nc -N localhost 2003
``` ```
To sanitize ingested metric names and labels according to Prometheus naming convention enable
`-graphite.sanitizeMetricName` cmd-line flag. When enabled, VictoriaMetrics will apply the following modifications:
- replace `/`,`@`,`*` with `_`;
- drop `\`;
- remove redundant dots, e.g: `metric..name` => `metric.name`;
- replace characters not matching the expression `^a-zA-Z0-9:._` with `_`.
VictoriaMetrics sets the current time if the timestamp is omitted. VictoriaMetrics sets the current time if the timestamp is omitted.
An arbitrary number of lines delimited by `\n` (aka newline char) can be sent in one go. An arbitrary number of lines delimited by `\n` (aka newline char) can be sent in one go.
After that the data may be read via [/api/v1/export](#how-to-export-data-in-json-line-format) endpoint: After that the data may be read via [/api/v1/export](#how-to-export-data-in-json-line-format) endpoint:
@ -2836,6 +2843,8 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
Flag value can be read from the given file when using -forceMergeAuthKey=file:///abs/path/to/file or -forceMergeAuthKey=file://./relative/path/to/file . Flag value can be read from the given http/https url when using -forceMergeAuthKey=http://host/path or -forceMergeAuthKey=https://host/path Flag value can be read from the given file when using -forceMergeAuthKey=file:///abs/path/to/file or -forceMergeAuthKey=file://./relative/path/to/file . Flag value can be read from the given http/https url when using -forceMergeAuthKey=http://host/path or -forceMergeAuthKey=https://host/path
-fs.disableMmap -fs.disableMmap
Whether to use pread() instead of mmap() for reading data files. By default, mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread() Whether to use pread() instead of mmap() for reading data files. By default, mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
-graphite.sanitizeMetricName
Sanitize metric names for the ingested Graphite data. See https://docs.victoriametrics.com/#how-to-send-data-from-graphite-compatible-agents-such-as-statsd
-graphiteListenAddr string -graphiteListenAddr string
TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty. See also -graphiteListenAddr.useProxyProtocol TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty. See also -graphiteListenAddr.useProxyProtocol
-graphiteListenAddr.useProxyProtocol -graphiteListenAddr.useProxyProtocol

View file

@ -43,6 +43,7 @@ See also [LTS releases](https://docs.victoriametrics.com/lts-releases/).
* `vm_streamaggr_stale_samples_total` - shows the number of time series that became [stale](https://docs.victoriametrics.com/stream-aggregation/#staleness) during aggregation; * `vm_streamaggr_stale_samples_total` - shows the number of time series that became [stale](https://docs.victoriametrics.com/stream-aggregation/#staleness) during aggregation;
* metrics related to stream aggregation got additional labels `match` (matching param), `group` (`by` or `without` param), `url` (address of `remoteWrite.url` where aggregation is applied), `position` (the position of the aggregation rule in config file). * metrics related to stream aggregation got additional labels `match` (matching param), `group` (`by` or `without` param), `url` (address of `remoteWrite.url` where aggregation is applied), `position` (the position of the aggregation rule in config file).
* These and other metrics were reflected on the [vmagent dashboard](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/dashboards/vmagent.json) in `stream aggregation` section. * These and other metrics were reflected on the [vmagent dashboard](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/dashboards/vmagent.json) in `stream aggregation` section.
* FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent/) and [Single-node VictoriaMetrics](https://docs.victoriametrics.com/): add `-graphite.sanitizeMetricName` cmd-line flag for sanitizing metrics ingested via [Graphite protocol](https://docs.victoriametrics.com/#how-to-send-data-from-graphite-compatible-agents-such-as-statsd). See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6077).
* FEATURE: [VictoriaMetrics cluster](https://docs.victoriametrics.com/cluster-victoriametrics/): do not retry RPC calls to vmstorage nodes if [complexity limits](https://docs.victoriametrics.com/#resource-usage-limits) were exceeded. * FEATURE: [VictoriaMetrics cluster](https://docs.victoriametrics.com/cluster-victoriametrics/): do not retry RPC calls to vmstorage nodes if [complexity limits](https://docs.victoriametrics.com/#resource-usage-limits) were exceeded.
* BUGFIX: [docker-compose](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker#docker-compose-environment-for-victoriametrics): fix incorrect link to vmui from [VictoriaMetrics plugin in Grafana](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker#grafana). * BUGFIX: [docker-compose](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker#docker-compose-environment-for-victoriametrics): fix incorrect link to vmui from [VictoriaMetrics plugin in Grafana](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker#grafana).

View file

@ -777,6 +777,13 @@ Example for writing data with Graphite plaintext protocol to local VictoriaMetri
echo "foo.bar.baz;tag1=value1;tag2=value2 123 `date +%s`" | nc -N localhost 2003 echo "foo.bar.baz;tag1=value1;tag2=value2 123 `date +%s`" | nc -N localhost 2003
``` ```
To sanitize ingested metric names and labels according to Prometheus naming convention enable
`-graphite.sanitizeMetricName` cmd-line flag. When enabled, VictoriaMetrics will apply the following modifications:
- replace `/`,`@`,`*` with `_`;
- drop `\`;
- remove redundant dots, e.g: `metric..name` => `metric.name`;
- replace characters not matching the expression `^a-zA-Z0-9:._` with `_`.
VictoriaMetrics sets the current time if the timestamp is omitted. VictoriaMetrics sets the current time if the timestamp is omitted.
An arbitrary number of lines delimited by `\n` (aka newline char) can be sent in one go. An arbitrary number of lines delimited by `\n` (aka newline char) can be sent in one go.
After that the data may be read via [/api/v1/export](#how-to-export-data-in-json-line-format) endpoint: After that the data may be read via [/api/v1/export](#how-to-export-data-in-json-line-format) endpoint:
@ -2839,6 +2846,8 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
Flag value can be read from the given file when using -forceMergeAuthKey=file:///abs/path/to/file or -forceMergeAuthKey=file://./relative/path/to/file . Flag value can be read from the given http/https url when using -forceMergeAuthKey=http://host/path or -forceMergeAuthKey=https://host/path Flag value can be read from the given file when using -forceMergeAuthKey=file:///abs/path/to/file or -forceMergeAuthKey=file://./relative/path/to/file . Flag value can be read from the given http/https url when using -forceMergeAuthKey=http://host/path or -forceMergeAuthKey=https://host/path
-fs.disableMmap -fs.disableMmap
Whether to use pread() instead of mmap() for reading data files. By default, mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread() Whether to use pread() instead of mmap() for reading data files. By default, mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
-graphite.sanitizeMetricName
Sanitize metric names for the ingested Graphite data. See https://docs.victoriametrics.com/#how-to-send-data-from-graphite-compatible-agents-such-as-statsd
-graphiteListenAddr string -graphiteListenAddr string
TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty. See also -graphiteListenAddr.useProxyProtocol TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty. See also -graphiteListenAddr.useProxyProtocol
-graphiteListenAddr.useProxyProtocol -graphiteListenAddr.useProxyProtocol

View file

@ -785,6 +785,13 @@ Example for writing data with Graphite plaintext protocol to local VictoriaMetri
echo "foo.bar.baz;tag1=value1;tag2=value2 123 `date +%s`" | nc -N localhost 2003 echo "foo.bar.baz;tag1=value1;tag2=value2 123 `date +%s`" | nc -N localhost 2003
``` ```
To sanitize ingested metric names and labels according to Prometheus naming convention enable
`-graphite.sanitizeMetricName` cmd-line flag. When enabled, VictoriaMetrics will apply the following modifications:
- replace `/`,`@`,`*` with `_`;
- drop `\`;
- remove redundant dots, e.g: `metric..name` => `metric.name`;
- replace characters not matching the expression `^a-zA-Z0-9:._` with `_`.
VictoriaMetrics sets the current time if the timestamp is omitted. VictoriaMetrics sets the current time if the timestamp is omitted.
An arbitrary number of lines delimited by `\n` (aka newline char) can be sent in one go. An arbitrary number of lines delimited by `\n` (aka newline char) can be sent in one go.
After that the data may be read via [/api/v1/export](#how-to-export-data-in-json-line-format) endpoint: After that the data may be read via [/api/v1/export](#how-to-export-data-in-json-line-format) endpoint:
@ -2847,6 +2854,8 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
Flag value can be read from the given file when using -forceMergeAuthKey=file:///abs/path/to/file or -forceMergeAuthKey=file://./relative/path/to/file . Flag value can be read from the given http/https url when using -forceMergeAuthKey=http://host/path or -forceMergeAuthKey=https://host/path Flag value can be read from the given file when using -forceMergeAuthKey=file:///abs/path/to/file or -forceMergeAuthKey=file://./relative/path/to/file . Flag value can be read from the given http/https url when using -forceMergeAuthKey=http://host/path or -forceMergeAuthKey=https://host/path
-fs.disableMmap -fs.disableMmap
Whether to use pread() instead of mmap() for reading data files. By default, mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread() Whether to use pread() instead of mmap() for reading data files. By default, mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
-graphite.sanitizeMetricName
Sanitize metric names for the ingested Graphite data. See https://docs.victoriametrics.com/#how-to-send-data-from-graphite-compatible-agents-such-as-statsd
-graphiteListenAddr string -graphiteListenAddr string
TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty. See also -graphiteListenAddr.useProxyProtocol TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty. See also -graphiteListenAddr.useProxyProtocol
-graphiteListenAddr.useProxyProtocol -graphiteListenAddr.useProxyProtocol

View file

@ -1,14 +1,22 @@
package graphite package graphite
import ( import (
"flag"
"fmt" "fmt"
"regexp"
"strings" "strings"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger" "github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/metrics" "github.com/VictoriaMetrics/metrics"
"github.com/valyala/fastjson/fastfloat" "github.com/valyala/fastjson/fastfloat"
) )
var (
sanitizeMetricName = flag.Bool("graphite.sanitizeMetricName", false, "Sanitize metric names for the ingested Graphite data. "+
"See https://docs.victoriametrics.com/#how-to-send-data-from-graphite-compatible-agents-such-as-statsd")
)
// graphite text line protocol may use white space or tab as separator // graphite text line protocol may use white space or tab as separator
// See https://github.com/grobian/carbon-c-relay/commit/f3ffe6cc2b52b07d14acbda649ad3fd6babdd528 // See https://github.com/grobian/carbon-c-relay/commit/f3ffe6cc2b52b07d14acbda649ad3fd6babdd528
const graphiteSeparators = " \t" const graphiteSeparators = " \t"
@ -76,6 +84,9 @@ func (r *Row) UnmarshalMetricAndTags(s string, tagsPool []Tag) ([]Tag, error) {
if len(r.Metric) == 0 { if len(r.Metric) == 0 {
return tagsPool, fmt.Errorf("metric cannot be empty") return tagsPool, fmt.Errorf("metric cannot be empty")
} }
if *sanitizeMetricName {
r.Metric = sanitizer.Transform(r.Metric)
}
return tagsPool, nil return tagsPool, nil
} }
@ -202,6 +213,9 @@ func (t *Tag) reset() {
func (t *Tag) unmarshal(s string) { func (t *Tag) unmarshal(s string) {
t.reset() t.reset()
if *sanitizeMetricName {
s = sanitizer.Transform(s)
}
n := strings.IndexByte(s, '=') n := strings.IndexByte(s, '=')
if n < 0 { if n < 0 {
// Empty tag value. // Empty tag value.
@ -240,3 +254,18 @@ func stripLeadingWhitespace(s string) string {
} }
return "" return ""
} }
var sanitizer = bytesutil.NewFastStringTransformer(func(s string) string {
// Apply rule to drop some chars to preserve backwards compatibility
s = dropChars.Replace(s)
// Replace any remaining illegal chars
return allowedChars.ReplaceAllLiteralString(s, "_")
})
var (
dropChars = strings.NewReplacer(
`\`, "",
"..", ".",
)
allowedChars = regexp.MustCompile(`[^a-zA-Z0-9:._=\p{L}]`)
)

View file

@ -19,6 +19,8 @@ func TestUnmarshalMetricAndTagsFailure(t *testing.T) {
} }
func TestUnmarshalMetricAndTagsSuccess(t *testing.T) { func TestUnmarshalMetricAndTagsSuccess(t *testing.T) {
sanitizeFlagValue := *sanitizeMetricName
*sanitizeMetricName = true
f := func(s string, rExpected *Row) { f := func(s string, rExpected *Row) {
t.Helper() t.Helper()
var r Row var r Row
@ -31,10 +33,10 @@ func TestUnmarshalMetricAndTagsSuccess(t *testing.T) {
} }
} }
f(" ", &Row{ f(" ", &Row{
Metric: " ", Metric: "_",
}) })
f("foo ;bar=baz", &Row{ f("foo ;bar=baz", &Row{
Metric: "foo ", Metric: "foo_",
Tags: []Tag{ Tags: []Tag{
{ {
Key: "bar", Key: "bar",
@ -43,7 +45,7 @@ func TestUnmarshalMetricAndTagsSuccess(t *testing.T) {
}, },
}) })
f("f oo;bar=baz", &Row{ f("f oo;bar=baz", &Row{
Metric: "f oo", Metric: "f_oo",
Tags: []Tag{ Tags: []Tag{
{ {
Key: "bar", Key: "bar",
@ -56,7 +58,7 @@ func TestUnmarshalMetricAndTagsSuccess(t *testing.T) {
Tags: []Tag{ Tags: []Tag{
{ {
Key: "bar", Key: "bar",
Value: "baz ", Value: "baz___",
}, },
}, },
}) })
@ -65,7 +67,7 @@ func TestUnmarshalMetricAndTagsSuccess(t *testing.T) {
Tags: []Tag{ Tags: []Tag{
{ {
Key: "bar", Key: "bar",
Value: " baz", Value: "_baz",
}, },
}, },
}) })
@ -74,7 +76,7 @@ func TestUnmarshalMetricAndTagsSuccess(t *testing.T) {
Tags: []Tag{ Tags: []Tag{
{ {
Key: "bar", Key: "bar",
Value: "b az", Value: "b_az",
}, },
}, },
}) })
@ -82,7 +84,7 @@ func TestUnmarshalMetricAndTagsSuccess(t *testing.T) {
Metric: "foo", Metric: "foo",
Tags: []Tag{ Tags: []Tag{
{ {
Key: "b ar", Key: "b_ar",
Value: "baz", Value: "baz",
}, },
}, },
@ -103,9 +105,25 @@ func TestUnmarshalMetricAndTagsSuccess(t *testing.T) {
}, },
}, },
}) })
f("foo..bar;bar=123;baz=aa=bb", &Row{
Metric: "foo.bar",
Tags: []Tag{
{
Key: "bar",
Value: "123",
},
{
Key: "baz",
Value: "aa=bb",
},
},
})
*sanitizeMetricName = sanitizeFlagValue
} }
func TestRowsUnmarshalFailure(t *testing.T) { func TestRowsUnmarshalFailure(t *testing.T) {
sanitizeFlagValue := *sanitizeMetricName
*sanitizeMetricName = true
f := func(s string) { f := func(s string) {
t.Helper() t.Helper()
var rows Rows var rows Rows
@ -129,9 +147,12 @@ func TestRowsUnmarshalFailure(t *testing.T) {
// invalid timestamp // invalid timestamp
f("aa 123 bar") f("aa 123 bar")
*sanitizeMetricName = sanitizeFlagValue
} }
func TestRowsUnmarshalSuccess(t *testing.T) { func TestRowsUnmarshalSuccess(t *testing.T) {
sanitizeFlagValue := *sanitizeMetricName
*sanitizeMetricName = true
f := func(s string, rowsExpected *Rows) { f := func(s string, rowsExpected *Rows) {
t.Helper() t.Helper()
var rows Rows var rows Rows
@ -184,17 +205,17 @@ func TestRowsUnmarshalSuccess(t *testing.T) {
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3102 // See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3102
f("s a;ta g1=aaa1;tag2=bb b2;tag3 1 23", &Rows{ f("s a;ta g1=aaa1;tag2=bb b2;tag3 1 23", &Rows{
Rows: []Row{{ Rows: []Row{{
Metric: "s a", Metric: "s_a",
Value: 1, Value: 1,
Timestamp: 23, Timestamp: 23,
Tags: []Tag{ Tags: []Tag{
{ {
Key: "ta g1", Key: "ta_g1",
Value: "aaa1", Value: "aaa1",
}, },
{ {
Key: "tag2", Key: "tag2",
Value: "bb b2", Value: "bb_b2",
}, },
}, },
}}, }},
@ -379,4 +400,5 @@ func TestRowsUnmarshalSuccess(t *testing.T) {
Timestamp: 1789, Timestamp: 1789,
}}, }},
}) })
*sanitizeMetricName = sanitizeFlagValue
} }