In this post, I walk you through setting up a Cloudflare Prometheus Exporter and configure the necessary scrape config, an example dashboard, and some alerting rules. These examples demonstrate how to deploy the exporter on a Kubernetes cluster with Prometheus set up implemented using the Prometheus Operator. However, this is no hard requirement, and you can adapt the example to fit your infrastructure setup.
A worth of caution, be aware the API’s used by the exporter is only available to Cloudflare Enterprise customers, so make sure you double-check that before you put in some effort setting things up.
Deploying the Cloudflare Exporter
First of all, big thanks to Wehkamp for open sourcing their Cloudflare exporter for others to enjoy using 🎉. You can check out the exporter repository on Github wehkamp/docker-prometheus-cloudflare-exporter for more information on which metrics the exporter exposes.
Below you see a standard Kubernetes Deployment manifest
|
|
A couple of things I want to point out
- Check Docker Hub you use the latest released image, the instructions in their README point to version
1.0
, however,1.1.1
is the latest available tag. - As you can see, we specify the
ZONE
environment variable. Unfortunately, the exporter can only handle one specific zone at the time, so if your company owns multiple zones, you will need to deploy a dedicated exporter per zone to handle it. - Authentication! The exporter still uses Cloudflare’s Global API key secrets to authenticate and not their API Tokens, which means you can’t limit the API Key scope. However, since Cloudflare doesn’t support any notion of organization-wide API keys/tokens, I recommend configuring a “robot” account in Cloudflare with restricted access to the specific zone being monitored and limit the scope of that account to establish the same goal. The added benefit is you don’t need to worry about offboarding ex-colleagues and to brake your monitoring in the meanwhile.
ps: make sure you configured a Kubernetes secret with the name cloudflare-api-key-secret
and a key api-key
with
the Global API key.
Set up the scrape config a.k.a. Service Monitors
Now the exporter runs it’s time to scrape it to get the metrics into Prometheus. To configure the scrape config, we are
going to define a ServiceMonitor
CRD resource. This CRD comes installed with the Prometheus Operator and gives you
the ability to configure scrape config using Kubernetes resources. The Prometheus Operator automatically gathers these
ServiceMonitor
objects and updates the Prometheus configuration accordingly. Let’s see how that looks like
|
|
It looks like a normal Kubernetes Manifest, however, this won’t work just yet, we still need to define an actual service for it to monitor
|
|
This is a regular Kubernetes service manifest. There is one trick I wanted to point out. The careful reader might have
noticed I only configured the app.kubernetes.io/name: cloudflare-exporter
label as the selector. But didn’t add the
cloudflare/zone
label. This little “trick” makes sure that when you add more exporter deployment to monitor different
zones, they will automatically be picked up by the ServiceMonitor without any changes.
The exported metrics include the zone they are watching, so you don’t need any custom configuration to relabel metrics to include the exporter zone label to find out which zone the metrics belong to.
Configure a Grafana dashboard
The Cloudflare Prometheus Exporter already comes with an example dashboard. However, it’s a plain JSON model, which makes it a rather long blob of text. As a shameless plug, I want to share one of my previous posts where I explain how to use Grafonnet to generate dashboards instead. In this gist you can find an example Cloudflare Grafonnet dashboard which we use today, this dashboard borrows a few concepts of the Gitlab’s runbook repository. We’ve abstracted away a couple of standardized components and helpers, so instead of ~500 line JSON blob, you get a ~90 line Grafonnet definition, which is much easier to comprehend.
Alerting rules
Like with the scrape config, the Prometheus Operator additionally offers a PrometheusRule CRD resource to configure alerting rules (or recording rules). Below we will discuss two examples of alerting rules and their purpose.
-
The missing metric alert
1 2
- alert: CloudflareExporterScrapeMissing expr: absent(sum by(zone) (cloudflare_pop_http_responses_sent))
This alert makes sure we are alerted when we are missing any Cloudflare metrics. It’s always crucial to also report when you are missing metrics
-
The elevated error rate alert
1 2 3 4 5 6 7 8 9 10 11 12
- alert: CloudFlareElevatedElevatedErrorRate expr: | ( sum by (zone) ( rate(cloudflare_pop_http_responses_sent{http_status=~"5.."}[5m]) ) / on (zone) sum by (zone) ( rate(cloudflare_pop_http_responses_sent[5m]) ) ) > 0.05 for: 1m
The elevated error alert is why we did all this trouble setting up the exporter in the first place. It takes the 5xx HTTP responses from the previous 5 minutes and divides them by the total amount of HTTP responses from the previous 5 minutes, to calculate a percentage of request that is erroneous. If that percentage crosses 5%, we will trigger a notification.
Finally you wrap these alerts into a PrometheusRule
manifest, like this
|
|
As you can see, we added a couple of labels and annotations to the alert rules themselves. We use these for several purposes. The annotations help us get a better picture of the specific alert and to direct us to the essential Grafana dashboard we need to investigate to solve the situation. The labels are used mostly to route the alerts to the proper places according to their severity. We might, for example, decide only to send a notification to Slack. But in cases, we need to send an alert to Pagerduty. We provide a specific PagerDuty service label to know which service to assign the alert to, ultimately notifying the right people on-call.
If any of this was helpful or if you have any questions, feel free to reach out on twitter ✌️.