M3 Aggregator, a service for streaming aggregation of ingested time series data

Overview

m3aggregator is a distributed system written in Go. It does streaming aggregation of ingested time series data before persisting it in m3db. The goal is to reduce the volume of time series data stored (especially for longer retentions), which is achieved by reducing its cardinality and/or datapoint resolution. m3aggregator is sharded for horizontal scalability and replicated (leader/ follower modes) for high availability. The data processed is mapped to a predefined number of shards, depending on hash of time series id.

Flushing

Overview Flushing is the process by which m3aggregator instances output the aggregated time series data (by using m3msg protocol). There are two targets to which the data is flushed: Persistence: the aggregated data is flushed to m3coordinator which then persists it in m3db. Forwarding: intermediate aggregation data is flushed to other nodes of m3aggregator (in a multi-node setup) for further processing. This is necessary for rollup rule processing - eg.

Leader & Follower

A single m3aggregator node for every shardset is elected to be a leader. Both leader and follower nodes are receiving the writes and performing the aggregation. The main difference between the leader and the follower is that the leader node is responsible for flushing (persisting) the data it has aggregated (see Flushing for more details). The follower is standing by ready to take over flushing in case the current leader fails.