2023-01-04 06:19:18 +00:00
package streamaggr
import (
2023-03-29 16:05:58 +00:00
"encoding/json"
2023-01-04 06:19:18 +00:00
"fmt"
"math"
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
"slices"
2023-01-05 10:08:24 +00:00
"sort"
2023-01-04 06:19:18 +00:00
"strconv"
"strings"
"sync"
2024-03-17 21:01:44 +00:00
"sync/atomic"
2023-01-04 06:19:18 +00:00
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
2023-01-07 06:39:13 +00:00
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
2023-01-04 06:19:18 +00:00
"github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding"
2024-01-24 10:31:27 +00:00
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envtemplate"
2024-01-21 19:58:26 +00:00
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs/fscore"
2023-01-04 06:19:18 +00:00
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promutils"
2024-03-04 03:42:55 +00:00
"github.com/VictoriaMetrics/VictoriaMetrics/lib/timerpool"
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
"github.com/VictoriaMetrics/metrics"
2023-07-24 23:44:09 +00:00
"gopkg.in/yaml.v2"
2023-01-04 06:19:18 +00:00
)
var supportedOutputs = [ ] string {
2024-07-14 15:23:59 +00:00
"avg" ,
"count_samples" ,
"count_series" ,
"histogram_bucket" ,
2023-01-04 06:19:18 +00:00
"increase" ,
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
"increase_prometheus" ,
2023-01-04 06:19:18 +00:00
"last" ,
"max" ,
2024-07-14 15:23:59 +00:00
"min" ,
"quantiles(phi1, ..., phiN)" ,
"rate_avg" ,
"rate_sum" ,
2023-01-04 06:19:18 +00:00
"stddev" ,
"stdvar" ,
2024-07-14 15:23:59 +00:00
"sum_samples" ,
"total" ,
"total_prometheus" ,
"unique_samples" ,
2023-01-04 06:19:18 +00:00
}
2024-07-01 12:56:17 +00:00
// maxLabelValueLen is maximum match expression label value length in stream aggregation metrics
const maxLabelValueLen = 64
2024-05-08 11:10:53 +00:00
var (
// lc contains information about all compressed labels for streaming aggregation
lc promutils . LabelsCompressor
_ = metrics . NewGauge ( ` vm_streamaggr_labels_compressor_size_bytes ` , func ( ) float64 {
return float64 ( lc . SizeBytes ( ) )
} )
_ = metrics . NewGauge ( ` vm_streamaggr_labels_compressor_items_count ` , func ( ) float64 {
return float64 ( lc . ItemsCount ( ) )
} )
)
2023-04-01 04:27:45 +00:00
// LoadFromFile loads Aggregators from the given path and uses the given pushFunc for pushing the aggregated data.
2023-01-04 06:19:18 +00:00
//
2024-03-04 03:42:55 +00:00
// opts can contain additional options. If opts is nil, then default options are used.
2023-01-25 17:14:49 +00:00
//
2023-01-04 06:19:18 +00:00
// The returned Aggregators must be stopped with MustStop() when no longer needed.
2024-07-01 12:56:17 +00:00
func LoadFromFile ( path string , pushFunc PushFunc , opts Options ) ( * Aggregators , error ) {
2024-01-21 19:58:26 +00:00
data , err := fscore . ReadFileOrHTTP ( path )
2023-01-04 06:19:18 +00:00
if err != nil {
2023-04-01 04:27:45 +00:00
return nil , fmt . Errorf ( "cannot load aggregators: %w" , err )
2023-01-04 06:19:18 +00:00
}
2024-01-24 10:31:27 +00:00
data , err = envtemplate . ReplaceBytes ( data )
if err != nil {
return nil , fmt . Errorf ( "cannot expand environment variables in %q: %w" , path , err )
}
2024-07-03 13:10:09 +00:00
as , err := LoadFromData ( data , pushFunc , opts )
2023-01-04 06:19:18 +00:00
if err != nil {
2024-03-04 22:45:22 +00:00
return nil , fmt . Errorf ( "cannot initialize aggregators from %q: %w; see https://docs.victoriametrics.com/stream-aggregation/#stream-aggregation-config" , path , err )
2023-01-04 06:19:18 +00:00
}
2024-01-24 10:31:27 +00:00
2023-04-01 04:27:45 +00:00
return as , nil
2023-01-04 06:19:18 +00:00
}
2024-03-04 03:42:55 +00:00
// Options contains optional settings for the Aggregators.
type Options struct {
// DedupInterval is deduplication interval for samples received for the same time series.
//
// The last sample per each series is left per each DedupInterval if DedupInterval > 0.
//
// By default deduplication is disabled.
2024-03-04 12:50:46 +00:00
//
// The deduplication can be set up individually per each aggregation via dedup_interval option.
2024-03-04 03:42:55 +00:00
DedupInterval time . Duration
2024-03-05 00:13:21 +00:00
// DropInputLabels is an optional list of labels to drop from samples before de-duplication and stream aggregation.
DropInputLabels [ ] string
2024-03-04 03:42:55 +00:00
// NoAlignFlushToInterval disables alignment of flushes to the aggregation interval.
//
// By default flushes are aligned to aggregation interval.
2024-03-04 12:50:46 +00:00
//
// The alignment of flushes can be disabled individually per each aggregation via no_align_flush_to_interval option.
2024-03-04 03:42:55 +00:00
NoAlignFlushToInterval bool
2024-03-04 18:01:12 +00:00
// FlushOnShutdown enables flush of incomplete aggregation state on startup and shutdown.
2024-03-04 03:42:55 +00:00
//
2024-03-04 14:12:41 +00:00
// By default incomplete state is dropped.
2024-03-04 12:50:46 +00:00
//
// The flush of incomplete state can be enabled individually per each aggregation via flush_on_shutdown option.
2024-03-04 03:42:55 +00:00
FlushOnShutdown bool
// KeepMetricNames instructs to leave metric names as is for the output time series without adding any suffix.
//
// By default the following suffix is added to every output time series:
//
// input_name:<interval>[_by_<by_labels>][_without_<without_labels>]_<output>
//
2024-03-17 21:01:44 +00:00
// This option can be overridden individually per each aggregation via keep_metric_names option.
2024-03-04 03:42:55 +00:00
KeepMetricNames bool
2024-03-17 21:01:44 +00:00
// IgnoreOldSamples instructs to ignore samples with timestamps older than the current aggregation interval.
//
// By default all the samples are taken into account.
//
// This option can be overridden individually per each aggregation via ignore_old_samples option.
IgnoreOldSamples bool
2024-04-22 11:52:04 +00:00
2024-07-02 19:21:35 +00:00
// IgnoreFirstIntervals sets the number of aggregation intervals to be ignored on start.
2024-04-22 11:52:04 +00:00
//
2024-07-02 19:21:35 +00:00
// By default, zero intervals are ignored.
2024-04-22 11:52:04 +00:00
//
2024-04-22 12:22:59 +00:00
// This option can be overridden individually per each aggregation via ignore_first_intervals option.
2024-04-22 11:52:04 +00:00
IgnoreFirstIntervals int
2024-07-01 12:56:17 +00:00
// Alias is name or url of remote write context
Alias string
// aggrID is aggregators id number starting from 1, which is used in metrics labels
aggrID int
2023-01-04 06:19:18 +00:00
}
// Config is a configuration for a single stream aggregation.
type Config struct {
// Match is a label selector for filtering time series for the given selector.
//
// If the match isn't set, then all the input time series are processed.
Match * promrelabel . IfExpression ` yaml:"match,omitempty" `
// Interval is the interval between aggregations.
Interval string ` yaml:"interval" `
2024-03-04 03:42:55 +00:00
// NoAlighFlushToInterval disables aligning of flushes to multiples of Interval.
// By default flushes are aligned to Interval.
//
// See also FlushOnShutdown.
NoAlignFlushToInterval * bool ` yaml:"no_align_flush_to_interval,omitempty" `
2024-03-04 18:01:12 +00:00
// FlushOnShutdown defines whether to flush incomplete aggregation state on startup and shutdown.
2024-03-04 14:12:41 +00:00
// By default incomplete aggregation state is dropped, since it may confuse users.
2024-03-04 03:42:55 +00:00
FlushOnShutdown * bool ` yaml:"flush_on_shutdown,omitempty" `
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
// DedupInterval is an optional interval for deduplication.
DedupInterval string ` yaml:"dedup_interval,omitempty" `
2023-07-20 14:07:33 +00:00
// Staleness interval is interval after which the series state will be reset if no samples have been sent during it.
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
// The parameter is only relevant for outputs: total, total_prometheus, increase, increase_prometheus and histogram_bucket.
2023-07-20 14:07:33 +00:00
StalenessInterval string ` yaml:"staleness_interval,omitempty" `
2023-01-04 06:19:18 +00:00
// Outputs is a list of output aggregate functions to produce.
//
// The following names are allowed:
//
2024-07-14 15:23:59 +00:00
// - avg - the average value across all the samples
// - count_samples - counts the input samples
// - count_series - counts the number of unique input series
// - histogram_bucket - creates VictoriaMetrics histogram for input samples
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
// - increase - calculates the increase over input series
// - increase_prometheus - calculates the increase over input series, ignoring the first sample in new time series
2023-01-04 06:19:18 +00:00
// - last - the last biggest sample value
// - max - the maximum sample value
2024-07-14 15:23:59 +00:00
// - min - the minimum sample value
// - quantiles(phi1, ..., phiN) - quantiles' estimation for phi in the range [0..1]
// - rate_avg - calculates average of rate for input counters
// - rate_sum - calculates sum of rate for input counters
2023-01-04 06:19:18 +00:00
// - stddev - standard deviation across all the samples
// - stdvar - standard variance across all the samples
2024-07-14 15:23:59 +00:00
// - sum_samples - sums the input sample values
// - total - aggregates input counters
// - total_prometheus - aggregates input counters, ignoring the first sample in new time series
// - unique_samples - counts the number of unique sample values
2023-01-04 06:19:18 +00:00
//
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
// The output time series will have the following names by default:
//
// input_name:<interval>[_by_<by_labels>][_without_<without_labels>]_<output>
2023-01-04 06:19:18 +00:00
//
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
// See also KeepMetricNames
2023-01-04 06:19:18 +00:00
//
Outputs [ ] string ` yaml:"outputs" `
2024-03-04 03:42:55 +00:00
// KeepMetricNames instructs to leave metric names as is for the output time series without adding any suffix.
KeepMetricNames * bool ` yaml:"keep_metric_names,omitempty" `
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
2024-03-17 21:01:44 +00:00
// IgnoreOldSamples instructs to ignore samples with old timestamps outside the current aggregation interval.
IgnoreOldSamples * bool ` yaml:"ignore_old_samples,omitempty" `
2024-04-22 11:52:04 +00:00
// IgnoreFirstIntervals sets number of aggregation intervals to be ignored on start.
IgnoreFirstIntervals * int ` yaml:"ignore_first_intervals,omitempty" `
2023-01-04 06:19:18 +00:00
// By is an optional list of labels for grouping input series.
//
// See also Without.
//
// If neither By nor Without are set, then the Outputs are calculated
// individually per each input time series.
By [ ] string ` yaml:"by,omitempty" `
// Without is an optional list of labels, which must be excluded when grouping input series.
//
// See also By.
//
// If neither By nor Without are set, then the Outputs are calculated
// individually per each input time series.
Without [ ] string ` yaml:"without,omitempty" `
2024-03-05 00:13:21 +00:00
// DropInputLabels is an optional list with labels, which must be dropped before further processing of input samples.
//
// Labels are dropped before de-duplication and aggregation.
DropInputLabels * [ ] string ` yaml:"drop_input_labels,omitempty" `
2023-01-04 06:19:18 +00:00
// InputRelabelConfigs is an optional relabeling rules, which are applied on the input
// before aggregation.
InputRelabelConfigs [ ] promrelabel . RelabelConfig ` yaml:"input_relabel_configs,omitempty" `
// OutputRelabelConfigs is an optional relabeling rules, which are applied
// on the aggregated output before being sent to remote storage.
OutputRelabelConfigs [ ] promrelabel . RelabelConfig ` yaml:"output_relabel_configs,omitempty" `
}
2024-03-04 22:45:22 +00:00
// Aggregators aggregates metrics passed to Push and calls pushFunc for aggregated data.
2023-01-04 06:19:18 +00:00
type Aggregators struct {
2023-04-01 04:27:45 +00:00
as [ ] * aggregator
2024-03-04 22:45:22 +00:00
// configData contains marshaled configs.
2023-04-01 04:27:45 +00:00
// It is used in Equal() for comparing Aggregators.
configData [ ] byte
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
ms * metrics . Set
2023-01-04 06:19:18 +00:00
}
2024-07-03 13:10:09 +00:00
// LoadFromData loads aggregators from data.
func LoadFromData ( data [ ] byte , pushFunc PushFunc , opts Options ) ( * Aggregators , error ) {
2024-03-04 22:45:22 +00:00
var cfgs [ ] * Config
if err := yaml . UnmarshalStrict ( data , & cfgs ) ; err != nil {
return nil , fmt . Errorf ( "cannot parse stream aggregation config: %w" , err )
}
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
ms := metrics . NewSet ( )
2023-01-04 06:19:18 +00:00
as := make ( [ ] * aggregator , len ( cfgs ) )
for i , cfg := range cfgs {
2024-07-01 12:56:17 +00:00
opts . aggrID = i + 1
2024-03-04 03:42:55 +00:00
a , err := newAggregator ( cfg , pushFunc , ms , opts )
2023-01-04 06:19:18 +00:00
if err != nil {
2023-04-01 04:27:45 +00:00
// Stop already initialized aggregators before returning the error.
for _ , a := range as [ : i ] {
a . MustStop ( )
}
2023-01-04 06:19:18 +00:00
return nil , fmt . Errorf ( "cannot initialize aggregator #%d: %w" , i , err )
}
as [ i ] = a
}
2023-04-01 04:27:45 +00:00
configData , err := json . Marshal ( cfgs )
if err != nil {
logger . Panicf ( "BUG: cannot marshal the provided configs: %s" , err )
2023-03-29 16:05:58 +00:00
}
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
2024-07-01 12:56:17 +00:00
metricLabels := fmt . Sprintf ( "url=%q" , opts . Alias )
_ = ms . NewGauge ( fmt . Sprintf ( ` vm_streamaggr_dedup_state_size_bytes { %s} ` , metricLabels ) , func ( ) float64 {
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
n := uint64 ( 0 )
for _ , aggr := range as {
if aggr . da != nil {
n += aggr . da . sizeBytes ( )
}
}
return float64 ( n )
} )
2024-07-01 12:56:17 +00:00
_ = ms . NewGauge ( fmt . Sprintf ( ` vm_streamaggr_dedup_state_items_count { %s} ` , metricLabels ) , func ( ) float64 {
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
n := uint64 ( 0 )
for _ , aggr := range as {
if aggr . da != nil {
n += aggr . da . itemsCount ( )
}
}
return float64 ( n )
} )
metrics . RegisterSet ( ms )
2023-04-01 04:27:45 +00:00
return & Aggregators {
as : as ,
configData : configData ,
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
ms : ms ,
2023-04-01 04:27:45 +00:00
} , nil
2023-01-04 06:19:18 +00:00
}
2024-05-20 12:03:28 +00:00
// IsEnabled returns true if Aggregators has at least one configured aggregator
func ( a * Aggregators ) IsEnabled ( ) bool {
if a == nil {
return false
}
if len ( a . as ) == 0 {
return false
}
return true
}
2023-01-04 06:19:18 +00:00
// MustStop stops a.
func ( a * Aggregators ) MustStop ( ) {
if a == nil {
return
}
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
metrics . UnregisterSet ( a . ms )
a . ms = nil
2023-04-01 04:27:45 +00:00
for _ , aggr := range a . as {
2023-01-04 06:19:18 +00:00
aggr . MustStop ( )
}
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
a . as = nil
2023-01-04 06:19:18 +00:00
}
2023-04-01 04:27:45 +00:00
// Equal returns true if a and b are initialized from identical configs.
func ( a * Aggregators ) Equal ( b * Aggregators ) bool {
if a == nil || b == nil {
return a == nil && b == nil
}
return string ( a . configData ) == string ( b . configData )
}
2023-01-04 06:19:18 +00:00
// Push pushes tss to a.
2023-07-24 23:44:09 +00:00
//
// Push sets matchIdxs[idx] to 1 if the corresponding tss[idx] was used in aggregations.
// Otherwise matchIdxs[idx] is set to 0.
//
// Push returns matchIdxs with len equal to len(tss).
// It re-uses the matchIdxs if it has enough capacity to hold len(tss) items.
// Otherwise it allocates new matchIdxs.
func ( a * Aggregators ) Push ( tss [ ] prompbmarshal . TimeSeries , matchIdxs [ ] byte ) [ ] byte {
matchIdxs = bytesutil . ResizeNoCopyMayOverallocate ( matchIdxs , len ( tss ) )
2024-03-04 22:45:22 +00:00
for i := range matchIdxs {
2023-07-24 23:44:09 +00:00
matchIdxs [ i ] = 0
}
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
if a == nil {
return matchIdxs
}
2023-07-24 23:44:09 +00:00
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
for _ , aggr := range a . as {
aggr . Push ( tss , matchIdxs )
2023-01-04 06:19:18 +00:00
}
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
2023-07-24 23:44:09 +00:00
return matchIdxs
2023-01-04 06:19:18 +00:00
}
// aggregator aggregates input series according to the config passed to NewAggregator
type aggregator struct {
match * promrelabel . IfExpression
2024-03-05 00:13:21 +00:00
dropInputLabels [ ] string
2023-01-04 06:19:18 +00:00
inputRelabeling * promrelabel . ParsedConfigs
outputRelabeling * promrelabel . ParsedConfigs
2024-03-17 21:01:44 +00:00
keepMetricNames bool
ignoreOldSamples bool
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
2023-01-04 06:19:18 +00:00
by [ ] string
without [ ] string
aggregateOnlyByTime bool
2024-03-04 03:42:55 +00:00
// da is set to non-nil if input samples must be de-duplicated
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
da * dedupAggr
2023-01-25 17:14:49 +00:00
2023-01-04 06:19:18 +00:00
// aggrStates contains aggregate states for the given outputs
2024-07-12 08:56:07 +00:00
aggrStates map [ string ] aggrState
2023-01-04 06:19:18 +00:00
2024-03-17 21:01:44 +00:00
// minTimestamp is used for ignoring old samples when ignoreOldSamples is set
minTimestamp atomic . Int64
2023-01-04 06:19:18 +00:00
// suffix contains a suffix, which should be added to aggregate metric names
//
2023-02-13 12:27:13 +00:00
// It contains the interval, labels in (by, without), plus output name.
2023-01-04 06:19:18 +00:00
// For example, foo_bar metric name is transformed to foo_bar:1m_by_job
// for `interval: 1m`, `by: [job]`
2023-04-01 04:27:45 +00:00
suffix string
2023-01-04 06:19:18 +00:00
wg sync . WaitGroup
stopCh chan struct { }
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
flushDuration * metrics . Histogram
dedupFlushDuration * metrics . Histogram
2024-06-06 12:06:11 +00:00
samplesLag * metrics . Histogram
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
flushTimeouts * metrics . Counter
dedupFlushTimeouts * metrics . Counter
2024-06-06 12:06:11 +00:00
ignoredOldSamples * metrics . Counter
ignoredNanSamples * metrics . Counter
2024-07-01 12:56:17 +00:00
matchedSamples * metrics . Counter
staleInputSamples map [ string ] * metrics . Counter
staleOutputSamples map [ string ] * metrics . Counter
flushedSamples map [ string ] * metrics . Counter
2023-01-04 06:19:18 +00:00
}
type aggrState interface {
2024-06-07 14:24:09 +00:00
// pushSamples must push samples to the aggrState.
//
// samples[].key must be cloned by aggrState, since it may change after returning from pushSamples.
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
pushSamples ( samples [ ] pushSample )
2024-06-07 14:24:09 +00:00
2024-03-04 17:12:06 +00:00
flushState ( ctx * flushCtx , resetState bool )
2023-01-04 06:19:18 +00:00
}
// PushFunc is called by Aggregators when it needs to push its state to metrics storage
type PushFunc func ( tss [ ] prompbmarshal . TimeSeries )
// newAggregator creates new aggregator for the given cfg, which pushes the aggregate data to pushFunc.
//
2024-03-04 03:42:55 +00:00
// opts can contain additional options. If opts is nil, then default options are used.
2023-01-25 17:14:49 +00:00
//
2023-01-04 06:19:18 +00:00
// The returned aggregator must be stopped when no longer needed by calling MustStop().
2024-07-01 12:56:17 +00:00
func newAggregator ( cfg * Config , pushFunc PushFunc , ms * metrics . Set , opts Options ) ( * aggregator , error ) {
2023-01-04 06:19:18 +00:00
// check cfg.Interval
2024-03-04 22:45:22 +00:00
if cfg . Interval == "" {
return nil , fmt . Errorf ( "missing `interval` option" )
}
2023-01-04 06:19:18 +00:00
interval , err := time . ParseDuration ( cfg . Interval )
if err != nil {
return nil , fmt . Errorf ( "cannot parse `interval: %q`: %w" , cfg . Interval , err )
}
2024-03-01 19:33:05 +00:00
if interval < time . Second {
return nil , fmt . Errorf ( "aggregation interval cannot be smaller than 1s; got %s" , interval )
2023-01-04 06:19:18 +00:00
}
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
// check cfg.DedupInterval
2024-03-04 03:42:55 +00:00
dedupInterval := opts . DedupInterval
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
if cfg . DedupInterval != "" {
di , err := time . ParseDuration ( cfg . DedupInterval )
if err != nil {
return nil , fmt . Errorf ( "cannot parse `dedup_interval: %q`: %w" , cfg . DedupInterval , err )
}
dedupInterval = di
}
if dedupInterval > interval {
return nil , fmt . Errorf ( "dedup_interval=%s cannot exceed interval=%s" , dedupInterval , interval )
}
if dedupInterval > 0 && interval % dedupInterval != 0 {
return nil , fmt . Errorf ( "interval=%s must be a multiple of dedup_interval=%s" , interval , dedupInterval )
}
2023-07-20 14:07:33 +00:00
// check cfg.StalenessInterval
stalenessInterval := interval * 2
if cfg . StalenessInterval != "" {
stalenessInterval , err = time . ParseDuration ( cfg . StalenessInterval )
if err != nil {
return nil , fmt . Errorf ( "cannot parse `staleness_interval: %q`: %w" , cfg . StalenessInterval , err )
}
if stalenessInterval < interval {
2024-03-03 23:49:26 +00:00
return nil , fmt . Errorf ( "staleness_interval=%s cannot be smaller than interval=%s" , cfg . StalenessInterval , cfg . Interval )
2023-07-20 14:07:33 +00:00
}
}
2024-03-05 00:13:21 +00:00
// Check cfg.DropInputLabels
dropInputLabels := opts . DropInputLabels
if v := cfg . DropInputLabels ; v != nil {
dropInputLabels = * v
}
2023-01-04 06:19:18 +00:00
// initialize input_relabel_configs and output_relabel_configs
inputRelabeling , err := promrelabel . ParseRelabelConfigs ( cfg . InputRelabelConfigs )
if err != nil {
return nil , fmt . Errorf ( "cannot parse input_relabel_configs: %w" , err )
}
outputRelabeling , err := promrelabel . ParseRelabelConfigs ( cfg . OutputRelabelConfigs )
if err != nil {
return nil , fmt . Errorf ( "cannot parse output_relabel_configs: %w" , err )
}
// check by and without lists
2023-01-05 10:08:24 +00:00
by := sortAndRemoveDuplicates ( cfg . By )
without := sortAndRemoveDuplicates ( cfg . Without )
2023-01-04 06:19:18 +00:00
if len ( by ) > 0 && len ( without ) > 0 {
return nil , fmt . Errorf ( "`by: %s` and `without: %s` lists cannot be set simultaneously" , by , without )
}
aggregateOnlyByTime := ( len ( by ) == 0 && len ( without ) == 0 )
if ! aggregateOnlyByTime && len ( without ) == 0 {
by = addMissingUnderscoreName ( by )
}
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
// check cfg.KeepMetricNames
2024-03-04 03:42:55 +00:00
keepMetricNames := opts . KeepMetricNames
if v := cfg . KeepMetricNames ; v != nil {
keepMetricNames = * v
}
if keepMetricNames {
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
if len ( cfg . Outputs ) != 1 {
2024-07-12 08:56:07 +00:00
return nil , fmt . Errorf ( "`outputs` list must contain only a single entry if `keep_metric_names` is set; got %q" , cfg . Outputs )
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
}
if cfg . Outputs [ 0 ] == "histogram_bucket" || strings . HasPrefix ( cfg . Outputs [ 0 ] , "quantiles(" ) && strings . Contains ( cfg . Outputs [ 0 ] , "," ) {
return nil , fmt . Errorf ( "`keep_metric_names` cannot be applied to `outputs: %q`, since they can generate multiple time series" , cfg . Outputs )
}
}
2024-03-17 21:01:44 +00:00
// check cfg.IgnoreOldSamples
ignoreOldSamples := opts . IgnoreOldSamples
if v := cfg . IgnoreOldSamples ; v != nil {
ignoreOldSamples = * v
}
2024-04-22 11:52:04 +00:00
// check cfg.IgnoreFirstIntervals
ignoreFirstIntervals := opts . IgnoreFirstIntervals
if v := cfg . IgnoreFirstIntervals ; v != nil {
ignoreFirstIntervals = * v
}
2023-01-04 06:19:18 +00:00
// initialize outputs list
if len ( cfg . Outputs ) == 0 {
2024-07-12 08:56:07 +00:00
return nil , fmt . Errorf ( "`outputs` list must contain at least a single entry from the list %s" , supportedOutputs )
2023-01-04 06:19:18 +00:00
}
2024-07-12 08:56:07 +00:00
aggrStates := make ( map [ string ] aggrState , len ( cfg . Outputs ) )
for _ , output := range cfg . Outputs {
// check for duplicated output
if _ , ok := aggrStates [ output ] ; ok {
return nil , fmt . Errorf ( "`outputs` list contains duplicated aggregation function: %s" , output )
}
2023-01-04 06:19:18 +00:00
if strings . HasPrefix ( output , "quantiles(" ) {
if ! strings . HasSuffix ( output , ")" ) {
return nil , fmt . Errorf ( "missing closing brace for `quantiles()` output" )
}
argsStr := output [ len ( "quantiles(" ) : len ( output ) - 1 ]
if len ( argsStr ) == 0 {
return nil , fmt . Errorf ( "`quantiles()` must contain at least one phi" )
}
args := strings . Split ( argsStr , "," )
phis := make ( [ ] float64 , len ( args ) )
for j , arg := range args {
arg = strings . TrimSpace ( arg )
phi , err := strconv . ParseFloat ( arg , 64 )
if err != nil {
return nil , fmt . Errorf ( "cannot parse phi=%q for quantiles(%s): %w" , arg , argsStr , err )
}
if phi < 0 || phi > 1 {
return nil , fmt . Errorf ( "phi inside quantiles(%s) must be in the range [0..1]; got %v" , argsStr , phi )
}
phis [ j ] = phi
}
2024-07-12 08:56:07 +00:00
if _ , ok := aggrStates [ "quantiles" ] ; ok {
return nil , fmt . Errorf ( "`outputs` list contains duplicated `quantiles()` function, please combine multiple phi* like `quantiles(0.5, 0.9)`" )
}
aggrStates [ "quantiles" ] = newQuantilesAggrState ( phis )
2023-01-04 06:19:18 +00:00
continue
}
switch output {
2024-07-14 15:23:59 +00:00
case "avg" :
aggrStates [ output ] = newAvgAggrState ( )
case "count_samples" :
aggrStates [ output ] = newCountSamplesAggrState ( )
case "count_series" :
aggrStates [ output ] = newCountSeriesAggrState ( )
case "histogram_bucket" :
aggrStates [ output ] = newHistogramBucketAggrState ( stalenessInterval )
2023-01-04 06:19:18 +00:00
case "increase" :
2024-07-12 08:56:07 +00:00
aggrStates [ output ] = newTotalAggrState ( stalenessInterval , true , true )
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
case "increase_prometheus" :
2024-07-12 08:56:07 +00:00
aggrStates [ output ] = newTotalAggrState ( stalenessInterval , true , false )
2023-01-04 06:19:18 +00:00
case "last" :
2024-07-12 08:56:07 +00:00
aggrStates [ output ] = newLastAggrState ( )
2023-01-04 06:19:18 +00:00
case "max" :
2024-07-12 08:56:07 +00:00
aggrStates [ output ] = newMaxAggrState ( )
2024-07-14 15:23:59 +00:00
case "min" :
aggrStates [ output ] = newMinAggrState ( )
case "rate_avg" :
aggrStates [ output ] = newRateAggrState ( stalenessInterval , true )
case "rate_sum" :
aggrStates [ output ] = newRateAggrState ( stalenessInterval , false )
2023-01-04 06:19:18 +00:00
case "stddev" :
2024-07-12 08:56:07 +00:00
aggrStates [ output ] = newStddevAggrState ( )
2023-01-04 06:19:18 +00:00
case "stdvar" :
2024-07-12 08:56:07 +00:00
aggrStates [ output ] = newStdvarAggrState ( )
2024-07-14 15:23:59 +00:00
case "sum_samples" :
aggrStates [ output ] = newSumSamplesAggrState ( )
case "total" :
aggrStates [ output ] = newTotalAggrState ( stalenessInterval , false , true )
case "total_prometheus" :
aggrStates [ output ] = newTotalAggrState ( stalenessInterval , false , false )
case "unique_samples" :
aggrStates [ output ] = newUniqueSamplesAggrState ( )
2023-01-04 06:19:18 +00:00
default :
2024-07-12 08:56:07 +00:00
return nil , fmt . Errorf ( "unsupported output=%q; supported values: %s;" , output , supportedOutputs )
2023-01-04 06:19:18 +00:00
}
}
// initialize suffix to add to metric names after aggregation
suffix := ":" + cfg . Interval
2024-07-01 12:56:17 +00:00
group := "none"
2023-01-04 06:19:18 +00:00
if labels := removeUnderscoreName ( by ) ; len ( labels ) > 0 {
2024-07-01 12:56:17 +00:00
group = fmt . Sprintf ( "by: %s" , strings . Join ( labels , "," ) )
2023-01-04 06:19:18 +00:00
suffix += fmt . Sprintf ( "_by_%s" , strings . Join ( labels , "_" ) )
}
if labels := removeUnderscoreName ( without ) ; len ( labels ) > 0 {
2024-07-01 12:56:17 +00:00
group = fmt . Sprintf ( "without: %s" , strings . Join ( labels , "," ) )
2023-01-04 06:19:18 +00:00
suffix += fmt . Sprintf ( "_without_%s" , strings . Join ( labels , "_" ) )
}
suffix += "_"
2024-07-01 12:56:17 +00:00
outputs := strings . Join ( cfg . Outputs , "," )
matchExpr := cfg . Match . String ( )
if len ( matchExpr ) > maxLabelValueLen {
matchExpr = matchExpr [ : maxLabelValueLen - 3 ] + "..."
}
metricLabels := fmt . Sprintf ( ` match=%q, group=%q, url=%q, position="%d" ` , matchExpr , group , opts . Alias , opts . aggrID )
2023-01-04 06:19:18 +00:00
// initialize the aggregator
a := & aggregator {
match : cfg . Match ,
2024-03-05 00:13:21 +00:00
dropInputLabels : dropInputLabels ,
2023-01-04 06:19:18 +00:00
inputRelabeling : inputRelabeling ,
outputRelabeling : outputRelabeling ,
2024-03-17 21:01:44 +00:00
keepMetricNames : keepMetricNames ,
ignoreOldSamples : ignoreOldSamples ,
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
2023-01-04 06:19:18 +00:00
by : by ,
without : without ,
aggregateOnlyByTime : aggregateOnlyByTime ,
aggrStates : aggrStates ,
2024-03-04 03:42:55 +00:00
suffix : suffix ,
2023-01-04 06:19:18 +00:00
stopCh : make ( chan struct { } ) ,
2024-07-01 12:56:17 +00:00
flushDuration : ms . GetOrCreateHistogram ( fmt . Sprintf ( ` vm_streamaggr_flush_duration_seconds { outputs=%q, %s} ` , outputs , metricLabels ) ) ,
dedupFlushDuration : ms . GetOrCreateHistogram ( fmt . Sprintf ( ` vm_streamaggr_dedup_flush_duration_seconds { outputs=%q, %s} ` , outputs , metricLabels ) ) ,
samplesLag : ms . GetOrCreateHistogram ( fmt . Sprintf ( ` vm_streamaggr_samples_lag_seconds { outputs=%q, %s} ` , outputs , metricLabels ) ) ,
matchedSamples : ms . GetOrCreateCounter ( fmt . Sprintf ( ` vm_streamaggr_matched_samples_total { outputs=%q, %s} ` , outputs , metricLabels ) ) ,
flushTimeouts : ms . GetOrCreateCounter ( fmt . Sprintf ( ` vm_streamaggr_flush_timeouts_total { outputs=%q, %s} ` , outputs , metricLabels ) ) ,
dedupFlushTimeouts : ms . GetOrCreateCounter ( fmt . Sprintf ( ` vm_streamaggr_dedup_flush_timeouts_total { outputs=%q, %s} ` , outputs , metricLabels ) ) ,
ignoredNanSamples : ms . GetOrCreateCounter ( fmt . Sprintf ( ` vm_streamaggr_ignored_samples_total { reason="nan", outputs=%q, %s} ` , outputs , metricLabels ) ) ,
ignoredOldSamples : ms . GetOrCreateCounter ( fmt . Sprintf ( ` vm_streamaggr_ignored_samples_total { reason="too_old", outputs=%q, %s} ` , outputs , metricLabels ) ) ,
staleInputSamples : make ( map [ string ] * metrics . Counter , len ( cfg . Outputs ) ) ,
staleOutputSamples : make ( map [ string ] * metrics . Counter , len ( cfg . Outputs ) ) ,
flushedSamples : make ( map [ string ] * metrics . Counter , len ( cfg . Outputs ) ) ,
}
for _ , output := range cfg . Outputs {
// Removing output args for metric label value in outputs like quantile(arg1, arg2)
if ri := strings . IndexRune ( output , '(' ) ; ri >= 0 {
output = output [ : ri ]
}
a . staleInputSamples [ output ] = ms . GetOrCreateCounter ( fmt . Sprintf ( ` vm_streamaggr_stale_samples_total { key="input", output=%q, %s} ` , output , metricLabels ) )
a . staleOutputSamples [ output ] = ms . GetOrCreateCounter ( fmt . Sprintf ( ` vm_streamaggr_stale_samples_total { key="output", output=%q, %s} ` , output , metricLabels ) )
a . flushedSamples [ output ] = ms . GetOrCreateCounter ( fmt . Sprintf ( ` vm_streamaggr_flushed_samples_total { output=%q, %s} ` , output , metricLabels ) )
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
}
if dedupInterval > 0 {
a . da = newDedupAggr ( )
2023-01-25 17:14:49 +00:00
}
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
2024-03-04 03:42:55 +00:00
alignFlushToInterval := ! opts . NoAlignFlushToInterval
if v := cfg . NoAlignFlushToInterval ; v != nil {
alignFlushToInterval = ! * v
}
skipIncompleteFlush := ! opts . FlushOnShutdown
if v := cfg . FlushOnShutdown ; v != nil {
skipIncompleteFlush = ! * v
}
2023-01-04 06:19:18 +00:00
a . wg . Add ( 1 )
go func ( ) {
2024-04-22 11:52:04 +00:00
a . runFlusher ( pushFunc , alignFlushToInterval , skipIncompleteFlush , interval , dedupInterval , ignoreFirstIntervals )
2023-01-25 17:14:49 +00:00
a . wg . Done ( )
2023-01-04 06:19:18 +00:00
} ( )
return a , nil
}
2024-04-22 11:52:04 +00:00
func ( a * aggregator ) runFlusher ( pushFunc PushFunc , alignFlushToInterval , skipIncompleteFlush bool , interval , dedupInterval time . Duration , ignoreFirstIntervals int ) {
2024-03-04 14:12:41 +00:00
alignedSleep := func ( d time . Duration ) {
2024-03-04 03:42:55 +00:00
if ! alignFlushToInterval {
return
}
2023-01-25 17:14:49 +00:00
2024-03-04 14:12:41 +00:00
ct := time . Duration ( time . Now ( ) . UnixNano ( ) )
dSleep := d - ( ct % d )
timer := timerpool . Get ( dSleep )
defer timer . Stop ( )
select {
case <- a . stopCh :
case <- timer . C :
2024-03-04 03:42:55 +00:00
}
2024-03-04 14:12:41 +00:00
}
tickerWait := func ( t * time . Ticker ) bool {
select {
case <- a . stopCh :
return false
case <- t . C :
return true
2024-03-04 03:42:55 +00:00
}
2024-03-04 14:12:41 +00:00
}
if dedupInterval <= 0 {
alignedSleep ( interval )
2024-03-04 12:50:46 +00:00
t := time . NewTicker ( interval )
2024-03-04 14:12:41 +00:00
defer t . Stop ( )
if alignFlushToInterval && skipIncompleteFlush {
2024-03-04 17:12:06 +00:00
a . flush ( nil , interval , true )
2024-07-02 19:21:35 +00:00
ignoreFirstIntervals --
2024-03-04 12:50:46 +00:00
}
2023-01-25 17:14:49 +00:00
2024-03-04 14:12:41 +00:00
for tickerWait ( t ) {
2024-04-22 11:52:04 +00:00
if ignoreFirstIntervals > 0 {
2024-07-02 19:21:35 +00:00
a . flush ( nil , interval , true )
2024-04-22 11:52:04 +00:00
ignoreFirstIntervals --
2024-07-02 19:21:35 +00:00
} else {
a . flush ( pushFunc , interval , true )
2024-04-22 11:52:04 +00:00
}
2024-03-04 14:12:41 +00:00
if alignFlushToInterval {
select {
case <- t . C :
default :
}
}
}
} else {
alignedSleep ( dedupInterval )
t := time . NewTicker ( dedupInterval )
defer t . Stop ( )
flushDeadline := time . Now ( ) . Add ( interval )
isSkippedFirstFlush := false
for tickerWait ( t ) {
2024-03-04 12:50:46 +00:00
a . dedupFlush ( dedupInterval )
2024-03-04 14:12:41 +00:00
ct := time . Now ( )
if ct . After ( flushDeadline ) {
// It is time to flush the aggregated state
if alignFlushToInterval && skipIncompleteFlush && ! isSkippedFirstFlush {
2024-07-02 19:21:35 +00:00
a . flush ( nil , interval , true )
ignoreFirstIntervals --
2024-03-04 14:12:41 +00:00
isSkippedFirstFlush = true
2024-07-02 19:21:35 +00:00
} else if ignoreFirstIntervals > 0 {
a . flush ( nil , interval , true )
ignoreFirstIntervals --
} else {
a . flush ( pushFunc , interval , true )
2024-03-04 14:12:41 +00:00
}
for ct . After ( flushDeadline ) {
flushDeadline = flushDeadline . Add ( interval )
}
}
if alignFlushToInterval {
select {
case <- t . C :
default :
}
}
2023-01-04 06:19:18 +00:00
}
}
2024-03-04 14:12:41 +00:00
2024-07-02 19:21:35 +00:00
if ! skipIncompleteFlush && ignoreFirstIntervals <= 0 {
2024-03-04 14:12:41 +00:00
a . dedupFlush ( dedupInterval )
2024-03-04 17:12:06 +00:00
a . flush ( pushFunc , interval , true )
2024-03-04 14:12:41 +00:00
}
2023-01-04 06:19:18 +00:00
}
2024-03-04 12:50:46 +00:00
func ( a * aggregator ) dedupFlush ( dedupInterval time . Duration ) {
if dedupInterval <= 0 {
// The de-duplication is disabled.
return
}
startTime := time . Now ( )
2024-06-10 12:29:45 +00:00
a . da . flush ( a . pushSamples )
2024-03-04 12:50:46 +00:00
d := time . Since ( startTime )
a . dedupFlushDuration . Update ( d . Seconds ( ) )
if d > dedupInterval {
a . dedupFlushTimeouts . Inc ( )
2024-03-04 17:37:58 +00:00
logger . Warnf ( "deduplication couldn't be finished in the configured dedup_interval=%s; it took %.03fs; " +
2024-03-04 12:50:46 +00:00
"possible solutions: increase dedup_interval; use match filter matching smaller number of series; " +
"reduce samples' ingestion rate to stream aggregation" , dedupInterval , d . Seconds ( ) )
}
2023-01-25 17:14:49 +00:00
}
2024-03-04 17:12:06 +00:00
func ( a * aggregator ) flush ( pushFunc PushFunc , interval time . Duration , resetState bool ) {
2024-03-04 12:50:46 +00:00
startTime := time . Now ( )
2024-04-03 23:24:56 +00:00
// Update minTimestamp before flushing samples to the storage,
// since the flush durtion can be quite long.
// This should prevent from dropping samples with old timestamps when the flush takes long time.
a . minTimestamp . Store ( startTime . UnixMilli ( ) - 5_000 )
2024-03-03 23:21:39 +00:00
var wg sync . WaitGroup
2024-07-12 08:56:07 +00:00
for output , as := range a . aggrStates {
2024-03-03 23:21:39 +00:00
flushConcurrencyCh <- struct { } { }
wg . Add ( 1 )
go func ( as aggrState ) {
defer func ( ) {
<- flushConcurrencyCh
wg . Done ( )
} ( )
2024-03-04 03:42:55 +00:00
ctx := getFlushCtx ( a , pushFunc )
2024-03-04 17:12:06 +00:00
as . flushState ( ctx , resetState )
2024-07-12 08:56:07 +00:00
ctx . flushSeries ( output )
2024-03-04 04:01:14 +00:00
ctx . resetSeries ( )
2024-03-03 23:21:39 +00:00
putFlushCtx ( ctx )
} ( as )
}
wg . Wait ( )
2024-03-04 12:50:46 +00:00
d := time . Since ( startTime )
a . flushDuration . Update ( d . Seconds ( ) )
if d > interval {
a . flushTimeouts . Inc ( )
2024-03-04 17:37:58 +00:00
logger . Warnf ( "stream aggregation couldn't be finished in the configured interval=%s; it took %.03fs; " +
2024-03-04 12:50:46 +00:00
"possible solutions: increase interval; use match filter matching smaller number of series; " +
"reduce samples' ingestion rate to stream aggregation" , interval , d . Seconds ( ) )
}
2023-01-04 06:19:18 +00:00
}
2024-03-03 23:21:39 +00:00
var flushConcurrencyCh = make ( chan struct { } , cgroup . AvailableCPUs ( ) )
2023-01-04 06:19:18 +00:00
// MustStop stops the aggregator.
//
// The aggregator stops pushing the aggregated metrics after this call.
func ( a * aggregator ) MustStop ( ) {
close ( a . stopCh )
2023-04-01 04:27:45 +00:00
a . wg . Wait ( )
2023-01-04 06:19:18 +00:00
}
2023-01-25 17:14:49 +00:00
// Push pushes tss to a.
2023-07-24 23:44:09 +00:00
func ( a * aggregator ) Push ( tss [ ] prompbmarshal . TimeSeries , matchIdxs [ ] byte ) {
2024-06-06 12:06:11 +00:00
now := time . Now ( ) . UnixMilli ( )
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
ctx := getPushCtx ( )
defer putPushCtx ( ctx )
samples := ctx . samples
2024-03-05 00:13:21 +00:00
buf := ctx . buf
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
labels := & ctx . labels
inputLabels := & ctx . inputLabels
outputLabels := & ctx . outputLabels
2023-01-25 17:14:49 +00:00
2024-03-05 00:13:21 +00:00
dropLabels := a . dropInputLabels
2024-03-17 21:01:44 +00:00
ignoreOldSamples := a . ignoreOldSamples
minTimestamp := a . minTimestamp . Load ( )
2024-07-01 12:56:17 +00:00
var maxLag int64
2023-07-24 23:33:30 +00:00
for idx , ts := range tss {
2023-07-22 23:41:43 +00:00
if ! a . match . Match ( ts . Labels ) {
continue
}
2023-07-24 23:44:09 +00:00
matchIdxs [ idx ] = 1
2024-03-05 00:13:21 +00:00
if len ( dropLabels ) > 0 {
labels . Labels = dropSeriesLabels ( labels . Labels [ : 0 ] , ts . Labels , dropLabels )
} else {
labels . Labels = append ( labels . Labels [ : 0 ] , ts . Labels ... )
}
2023-07-24 23:44:09 +00:00
labels . Labels = a . inputRelabeling . Apply ( labels . Labels , 0 )
if len ( labels . Labels ) == 0 {
// The metric has been deleted by the relabeling
continue
2023-07-22 23:41:43 +00:00
}
2023-07-24 23:44:09 +00:00
labels . Sort ( )
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
inputLabels . Reset ( )
outputLabels . Reset ( )
if ! a . aggregateOnlyByTime {
inputLabels . Labels , outputLabels . Labels = getInputOutputLabels ( inputLabels . Labels , outputLabels . Labels , labels . Labels , a . by , a . without )
} else {
outputLabels . Labels = append ( outputLabels . Labels , labels . Labels ... )
2023-07-24 23:33:30 +00:00
}
2023-07-24 23:44:09 +00:00
2024-06-07 13:45:52 +00:00
bufLen := len ( buf )
buf = compressLabels ( buf , inputLabels . Labels , outputLabels . Labels )
// key remains valid only by the end of this function and can't be reused after
// do not intern key because number of unique keys could be too high
key := bytesutil . ToUnsafeString ( buf [ bufLen : ] )
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
for _ , sample := range ts . Samples {
if math . IsNaN ( sample . Value ) {
2024-06-06 12:06:11 +00:00
a . ignoredNanSamples . Inc ( )
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
// Skip NaN values
2023-07-22 23:41:43 +00:00
continue
}
2024-03-17 21:01:44 +00:00
if ignoreOldSamples && sample . Timestamp < minTimestamp {
2024-06-06 12:06:11 +00:00
a . ignoredOldSamples . Inc ( )
2024-03-17 21:01:44 +00:00
// Skip old samples outside the current aggregation interval
continue
}
2024-07-01 12:56:17 +00:00
if maxLag < now - sample . Timestamp {
maxLag = now - sample . Timestamp
}
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
samples = append ( samples , pushSample {
2024-03-12 21:47:29 +00:00
key : key ,
value : sample . Value ,
timestamp : sample . Timestamp ,
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
} )
2023-01-04 06:19:18 +00:00
}
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
}
2024-07-01 12:56:17 +00:00
if len ( samples ) > 0 {
a . matchedSamples . Add ( len ( samples ) )
a . samplesLag . Update ( float64 ( maxLag ) / 1_000 )
2024-06-06 12:06:11 +00:00
}
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
ctx . samples = samples
ctx . buf = buf
2023-01-04 06:19:18 +00:00
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
if a . da != nil {
a . da . pushSamples ( samples )
} else {
a . pushSamples ( samples )
2023-01-04 06:19:18 +00:00
}
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
}
2024-05-08 11:10:53 +00:00
func compressLabels ( dst [ ] byte , inputLabels , outputLabels [ ] prompbmarshal . Label ) [ ] byte {
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
bb := bbPool . Get ( )
2024-03-04 15:57:36 +00:00
bb . B = lc . Compress ( bb . B , inputLabels )
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
dst = encoding . MarshalVarUint64 ( dst , uint64 ( len ( bb . B ) ) )
dst = append ( dst , bb . B ... )
2023-01-04 06:19:18 +00:00
bbPool . Put ( bb )
2024-03-04 15:57:36 +00:00
dst = lc . Compress ( dst , outputLabels )
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
return dst
2023-01-04 06:19:18 +00:00
}
2024-05-08 11:10:53 +00:00
func decompressLabels ( dst [ ] prompbmarshal . Label , key string ) [ ] prompbmarshal . Label {
2024-03-04 15:57:36 +00:00
return lc . Decompress ( dst , bytesutil . ToUnsafeBytes ( key ) )
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
}
2023-01-04 06:19:18 +00:00
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
func getOutputKey ( key string ) string {
src := bytesutil . ToUnsafeBytes ( key )
2024-05-13 23:23:44 +00:00
inputKeyLen , nSize := encoding . UnmarshalVarUint64 ( src )
if nSize <= 0 {
logger . Panicf ( "BUG: cannot unmarshal inputKeyLen from uvarint" )
2023-01-04 06:19:18 +00:00
}
2024-05-14 15:47:19 +00:00
src = src [ nSize : ]
2024-05-13 23:23:44 +00:00
outputKey := src [ inputKeyLen : ]
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
return bytesutil . ToUnsafeString ( outputKey )
2023-01-04 06:19:18 +00:00
}
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
func getInputOutputKey ( key string ) ( string , string ) {
src := bytesutil . ToUnsafeBytes ( key )
2024-05-13 23:23:44 +00:00
inputKeyLen , nSize := encoding . UnmarshalVarUint64 ( src )
if nSize <= 0 {
logger . Panicf ( "BUG: cannot unmarshal inputKeyLen from uvarint" )
2023-01-04 06:19:18 +00:00
}
2024-05-13 23:23:44 +00:00
src = src [ nSize : ]
inputKey := src [ : inputKeyLen ]
outputKey := src [ inputKeyLen : ]
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
return bytesutil . ToUnsafeString ( inputKey ) , bytesutil . ToUnsafeString ( outputKey )
2023-01-04 06:19:18 +00:00
}
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
func ( a * aggregator ) pushSamples ( samples [ ] pushSample ) {
for _ , as := range a . aggrStates {
as . pushSamples ( samples )
2023-01-04 06:19:18 +00:00
}
}
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
type pushCtx struct {
samples [ ] pushSample
labels promutils . Labels
inputLabels promutils . Labels
outputLabels promutils . Labels
buf [ ] byte
2023-01-04 06:19:18 +00:00
}
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
func ( ctx * pushCtx ) reset ( ) {
clear ( ctx . samples )
ctx . samples = ctx . samples [ : 0 ]
ctx . labels . Reset ( )
ctx . inputLabels . Reset ( )
ctx . outputLabels . Reset ( )
ctx . buf = ctx . buf [ : 0 ]
}
type pushSample struct {
2024-06-07 13:45:52 +00:00
// key identifies a sample that belongs to unique series
// key value can't be re-used
2024-03-12 21:47:29 +00:00
key string
value float64
timestamp int64
2023-01-04 06:19:18 +00:00
}
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
func getPushCtx ( ) * pushCtx {
v := pushCtxPool . Get ( )
if v == nil {
return & pushCtx { }
2023-01-04 06:19:18 +00:00
}
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
return v . ( * pushCtx )
}
func putPushCtx ( ctx * pushCtx ) {
ctx . reset ( )
pushCtxPool . Put ( ctx )
}
var pushCtxPool sync . Pool
2023-01-04 06:19:18 +00:00
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
func getInputOutputLabels ( dstInput , dstOutput , labels [ ] prompbmarshal . Label , by , without [ ] string ) ( [ ] prompbmarshal . Label , [ ] prompbmarshal . Label ) {
if len ( without ) > 0 {
for _ , label := range labels {
if slices . Contains ( without , label . Name ) {
dstInput = append ( dstInput , label )
} else {
dstOutput = append ( dstOutput , label )
}
2023-01-04 06:19:18 +00:00
}
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
} else {
for _ , label := range labels {
if ! slices . Contains ( by , label . Name ) {
dstInput = append ( dstInput , label )
} else {
dstOutput = append ( dstOutput , label )
}
2023-01-04 06:19:18 +00:00
}
}
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
return dstInput , dstOutput
2023-01-04 06:19:18 +00:00
}
2024-03-04 03:42:55 +00:00
func getFlushCtx ( a * aggregator , pushFunc PushFunc ) * flushCtx {
2024-03-03 23:21:39 +00:00
v := flushCtxPool . Get ( )
if v == nil {
v = & flushCtx { }
}
ctx := v . ( * flushCtx )
ctx . a = a
2024-03-04 03:42:55 +00:00
ctx . pushFunc = pushFunc
2024-03-03 23:21:39 +00:00
return ctx
}
func putFlushCtx ( ctx * flushCtx ) {
ctx . reset ( )
flushCtxPool . Put ( ctx )
}
var flushCtxPool sync . Pool
2023-01-04 06:19:18 +00:00
type flushCtx struct {
2024-03-04 03:42:55 +00:00
a * aggregator
pushFunc PushFunc
2023-01-04 06:19:18 +00:00
tss [ ] prompbmarshal . TimeSeries
labels [ ] prompbmarshal . Label
samples [ ] prompbmarshal . Sample
}
func ( ctx * flushCtx ) reset ( ) {
2024-03-03 23:21:39 +00:00
ctx . a = nil
2024-03-04 03:42:55 +00:00
ctx . pushFunc = nil
2024-03-03 23:21:39 +00:00
ctx . resetSeries ( )
}
func ( ctx * flushCtx ) resetSeries ( ) {
2024-03-04 22:45:22 +00:00
clear ( ctx . tss )
ctx . tss = ctx . tss [ : 0 ]
2024-03-03 23:21:39 +00:00
clear ( ctx . labels )
2023-01-04 06:19:18 +00:00
ctx . labels = ctx . labels [ : 0 ]
2024-03-03 23:21:39 +00:00
2023-01-04 06:19:18 +00:00
ctx . samples = ctx . samples [ : 0 ]
}
2024-07-12 08:56:07 +00:00
func ( ctx * flushCtx ) flushSeries ( aggrStateSuffix string ) {
2024-03-03 23:21:39 +00:00
tss := ctx . tss
if len ( tss ) == 0 {
// nothing to flush
return
}
outputRelabeling := ctx . a . outputRelabeling
if outputRelabeling == nil {
// Fast path - push the output metrics.
2024-03-04 03:42:55 +00:00
if ctx . pushFunc != nil {
ctx . pushFunc ( tss )
2024-07-12 08:56:07 +00:00
ctx . a . flushedSamples [ aggrStateSuffix ] . Add ( len ( tss ) )
2024-03-04 03:42:55 +00:00
}
2024-03-03 23:21:39 +00:00
return
}
// Slow path - apply output relabeling and then push the output metrics.
auxLabels := promutils . GetLabels ( )
dstLabels := auxLabels . Labels [ : 0 ]
dst := tss [ : 0 ]
for _ , ts := range tss {
dstLabelsLen := len ( dstLabels )
dstLabels = append ( dstLabels , ts . Labels ... )
dstLabels = outputRelabeling . Apply ( dstLabels , dstLabelsLen )
if len ( dstLabels ) == dstLabelsLen {
// The metric has been deleted by the relabeling
continue
}
ts . Labels = dstLabels [ dstLabelsLen : ]
dst = append ( dst , ts )
}
2024-03-04 03:42:55 +00:00
if ctx . pushFunc != nil {
ctx . pushFunc ( dst )
2024-07-12 08:56:07 +00:00
ctx . a . flushedSamples [ aggrStateSuffix ] . Add ( len ( dst ) )
2024-03-04 03:42:55 +00:00
}
2024-03-03 23:21:39 +00:00
auxLabels . Labels = dstLabels
promutils . PutLabels ( auxLabels )
}
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
func ( ctx * flushCtx ) appendSeries ( key , suffix string , timestamp int64 , value float64 ) {
2023-01-04 06:19:18 +00:00
labelsLen := len ( ctx . labels )
samplesLen := len ( ctx . samples )
2024-05-08 11:10:53 +00:00
ctx . labels = decompressLabels ( ctx . labels , key )
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
if ! ctx . a . keepMetricNames {
ctx . labels = addMetricSuffix ( ctx . labels , labelsLen , ctx . a . suffix , suffix )
2023-01-25 17:14:49 +00:00
}
2023-01-04 06:19:18 +00:00
ctx . samples = append ( ctx . samples , prompbmarshal . Sample {
Timestamp : timestamp ,
Value : value ,
} )
ctx . tss = append ( ctx . tss , prompbmarshal . TimeSeries {
Labels : ctx . labels [ labelsLen : ] ,
Samples : ctx . samples [ samplesLen : ] ,
} )
2024-03-03 23:21:39 +00:00
// Limit the maximum length of ctx.tss in order to limit memory usage.
if len ( ctx . tss ) >= 10_000 {
2024-07-12 08:56:07 +00:00
ctx . flushSeries ( suffix )
2024-03-04 04:01:14 +00:00
ctx . resetSeries ( )
2024-03-03 23:21:39 +00:00
}
2023-01-04 06:19:18 +00:00
}
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
func ( ctx * flushCtx ) appendSeriesWithExtraLabel ( key , suffix string , timestamp int64 , value float64 , extraName , extraValue string ) {
2023-01-04 06:19:18 +00:00
labelsLen := len ( ctx . labels )
samplesLen := len ( ctx . samples )
2024-05-08 11:10:53 +00:00
ctx . labels = decompressLabels ( ctx . labels , key )
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
if ! ctx . a . keepMetricNames {
ctx . labels = addMetricSuffix ( ctx . labels , labelsLen , ctx . a . suffix , suffix )
2023-01-04 06:19:18 +00:00
}
ctx . labels = append ( ctx . labels , prompbmarshal . Label {
Name : extraName ,
Value : extraValue ,
} )
ctx . samples = append ( ctx . samples , prompbmarshal . Sample {
Timestamp : timestamp ,
Value : value ,
} )
ctx . tss = append ( ctx . tss , prompbmarshal . TimeSeries {
Labels : ctx . labels [ labelsLen : ] ,
Samples : ctx . samples [ samplesLen : ] ,
} )
2024-07-01 12:56:17 +00:00
ctx . a . flushedSamples [ suffix ] . Add ( len ( ctx . tss ) )
2023-01-04 06:19:18 +00:00
}
func addMetricSuffix ( labels [ ] prompbmarshal . Label , offset int , firstSuffix , lastSuffix string ) [ ] prompbmarshal . Label {
src := labels [ offset : ]
for i := range src {
label := & src [ i ]
if label . Name != "__name__" {
continue
}
bb := bbPool . Get ( )
bb . B = append ( bb . B , label . Value ... )
bb . B = append ( bb . B , firstSuffix ... )
bb . B = append ( bb . B , lastSuffix ... )
label . Value = bytesutil . InternBytes ( bb . B )
bbPool . Put ( bb )
return labels
}
// The __name__ isn't found. Add it
bb := bbPool . Get ( )
bb . B = append ( bb . B , firstSuffix ... )
bb . B = append ( bb . B , lastSuffix ... )
labelValue := bytesutil . InternBytes ( bb . B )
labels = append ( labels , prompbmarshal . Label {
Name : "__name__" ,
Value : labelValue ,
} )
return labels
}
func addMissingUnderscoreName ( labels [ ] string ) [ ] string {
result := [ ] string { "__name__" }
for _ , s := range labels {
if s == "__name__" {
continue
}
result = append ( result , s )
}
return result
}
func removeUnderscoreName ( labels [ ] string ) [ ] string {
var result [ ] string
for _ , s := range labels {
if s == "__name__" {
continue
}
result = append ( result , s )
}
return result
}
2023-01-05 10:08:24 +00:00
func sortAndRemoveDuplicates ( a [ ] string ) [ ] string {
if len ( a ) == 0 {
return nil
}
a = append ( [ ] string { } , a ... )
sort . Strings ( a )
dst := a [ : 1 ]
for _ , v := range a [ 1 : ] {
if v != dst [ len ( dst ) - 1 ] {
dst = append ( dst , v )
}
}
return dst
}
lib/streamaggr: huge pile of changes
- Reduce memory usage by up to 5x when de-duplicating samples across big number of time series.
- Reduce memory usage by up to 5x when aggregating across big number of output time series.
- Add lib/promutils.LabelsCompressor, which is going to be used by other VictoriaMetrics components
for reducing memory usage for marshaled []prompbmarshal.Label.
- Add `dedup_interval` option at aggregation config, which allows setting individual
deduplication intervals per each aggregation.
- Add `keep_metric_names` option at aggregation config, which allows keeping the original
metric names in the output samples.
- Add `unique_samples` output, which counts the number of unique sample values.
- Add `increase_prometheus` and `total_prometheus` outputs, which ignore the first sample
per each newly encountered time series.
- Use 64-bit hashes instead of marshaled labels as map keys when calculating `count_series` output.
This makes obsolete https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5579
- Expose various metrics, which may help debugging stream aggregation:
- vm_streamaggr_dedup_state_size_bytes - the size of data structures responsible for deduplication
- vm_streamaggr_dedup_state_items_count - the number of items in the deduplication data structures
- vm_streamaggr_labels_compressor_size_bytes - the size of labels compressor data structures
- vm_streamaggr_labels_compressor_items_count - the number of entries in the labels compressor
- vm_streamaggr_flush_duration_seconds - a histogram, which shows the duration of stream aggregation flushes
- vm_streamaggr_dedup_flush_duration_seconds - a histogram, which shows the duration of deduplication flushes
- vm_streamaggr_flush_timeouts_total - counter for timed out stream aggregation flushes,
which took longer than the configured interval
- vm_streamaggr_dedup_flush_timeouts_total - counter for timed out deduplication flushes,
which took longer than the configured dedup_interval
- Actualize docs/stream-aggregation.md
The memory usage reduction increases CPU usage during stream aggregation by up to 30%.
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5850
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5898
2024-03-02 00:42:26 +00:00
var bbPool bytesutil . ByteBufferPool