Graphite

This document is a getting started guide to integrating the M3 stack with Graphite.

Overview

M3 supports ingesting Graphite metrics using the Carbon plaintext protocol. We also support a variety of aggregation and storage policies for the ingestion pathway (similar to storage-schemas.conf when using Graphite Carbon) that are documented below. Finally, on the query side, we support the majority of graphite query functions.

Ingestion

Setting up the M3 stack to ingest carbon metrics is straightforward. First, make sure you’ve followed our other documentation to get m3coordinator and M3DB setup. Also, familiarize yourself with how M3 handles aggregation.

Once you have both of those services running properly, modify your m3coordinator configuration to add the following lines and restart it:

carbon:
  ingester:
    listenAddress: "0.0.0.0:7204"

This will enable a line-based TCP carbon ingestion server on the specified port. By default, the server will write all carbon metrics to every aggregated namespace specified in the m3coordinator configuration file and aggregate them using a default strategy of mean (equivalent to Graphite’s Average).

This default setup makes sense if your carbon metrics are unaggregated, however, if you’ve already aggregated your data using something like statsite then you may want to disable M3 aggregation. In that case, you can do something like the following:

carbon:
  ingester:
    listenAddress: "0.0.0.0:7204"
    rules:
      - pattern: .*
        aggregation:
          enabled: false
        policies:
          - resolution: 1m
            retention: 48h

This replaces M3’s default behavior with a single rule which states that all metrics (since .* will match any string) should be written to whichever aggregated M3DB namespace has been configured with a resolution of 1 minute and a retention of 48 hours, bypassing aggregation / downsampling altogether. Note that there must be a configured M3DB namespace with the specified resolution/retention or the coordinator will fail to start.

In the situation that you choose to use M3’s aggregation functionality, there are a variety of aggregation types you can choose from. For example:

carbon:
  ingester:
    listenAddress: "0.0.0.0:7204"
    rules:
      - pattern: .*
        aggregation:
          type: last
        policies:
          - resolution: 1m
            retention: 48h

The config above will aggregate ingested carbon metrics into 1 minute tiles, but instead of taking the mean of every datapoint, it will emit the last datapoint that was received within a given tile’s window.

Similar to Graphite’s storage-schemas.conf, M3 carbon ingestion rules are applied in order and only the first pattern that matches is applied. In addition, the rules can be as simple or as complex as you like. For example:

carbon:
  ingester:
    listenAddress: "0.0.0.0:7204"
    rules:
      - pattern: stats.internal.financial-service.*
        aggregation:
          type: max
        policies:
          - resolution: 1m
            retention: 4320h
          - resolution: 10s
            retention: 24h
      - pattern: stats.internal.rest-proxy.*
        aggregation:
          type: mean
        policies:
          - resolution: 10s
            retention: 2h
      - pattern: stats.cloud.*
        aggregation:
          enabled: false
        policies:
          - resolution: 1m
            retention: 2h
      - pattern: .*
        aggregation:
          type: mean
        policies:
          - resolution: 1m
            retention: 48h

Lets break that down.

The first rule states that any metric matching the pattern stats.internal.financial-service.* should be aggregated using the max function (meaning the datapoint with the highest value that is received in a given window will be retained) to generate two different tiles, one with 1 minute resolution and another with 10 second resolution which will be written out to M3DB namespaces with 4320 hour and 24 hour retentions respectively.

The second rule will aggregate all the metrics coming from our rest-proxy service using a mean type aggregation (all datapoints within a given window will be averaged) to generate 10 second tiles and write them out to an M3DB namespace that stores data for two hours.

The third will match any metrics coming from our cloud environment. In this hypoethical example, our cloud metrics are already aggregated using an application like statsite, so instead of aggregating them again, we just write them directly to an M3DB namespace that retains data for two hours. Note that while we’re not aggregating the data in M3 here, we still need to provide a resolution so that the ingester can match the storage policy to a known M3DB namespace, as well as so that when we fan out queries to multiple namespaces we know the resolution of the data contained in each namespace.

Finally, our last rule uses a “catch-all” pattern to capture any metrics that don’t match any of our other rules and aggregate them using the mean function into 1 minute tiles which we store for 48 hours.

Debug mode

If at any time you’re not sure which metrics are being matched by which patterns, or want more visibility into how the carbon ingestion rule are being evaluated, modify the config to enable debug mode:

carbon:
  ingester:
    debug: true
    listenAddress: "0.0.0.0:7204"

This will make the carbon ingestion emit logs for every step that is taking. Note: If your coordinator is ingesting a lot of data, enabling this mode could bring the proccess to a halt due to the I/O overhead, so use this feature cautiously in production environments.

Supported Aggregation Functions

  • last
  • min
  • max
  • mean
  • median
  • count
  • sum
  • sumsq
  • stdev
  • p10
  • p20
  • p30
  • p40
  • p50
  • p60
  • p70
  • p80
  • p90
  • p95
  • p99
  • p999
  • p9999

Querying

M3 supports the the majority of graphite query functions and can be used to query metrics that were ingested via the ingestion pathway described above.

Grafana

M3Coordinator implements the Graphite source interface, so you can add it as a graphite source in Grafana by following these instructions.

Note that you’ll need to set the URL to: http://<M3_COORDINATOR_HOST_NAME>:7201/api/v1/graphite

Direct

You can query for metrics directly by issuing HTTP GET requests directly against the M3Coordinator /api/v1/graphite/render endpoint which runs on port 7201 by default. For example:

(export now=$(date +%s) && curl "localhost:7201/api/v1/graphite/render?target=transformNull(foo.*.baz)&from=$(($now-300))" | jq .)

will query for all metrics matching the foo.*.baz pattern, applying the transformNull function, and returning all datapoints for the last 5 minutes.