Grafana and Grafonnet Dashboards

Recently I had to configure some dashboards and alerts on Grafana, but the manual process of creating such a repetitive dashboard is a huge pain. Because Grafana doesn’t support a way to generate alerts dynamically, you will need to set up a dashboard for alerting only, and preferably you use a single graph per specific alert.

I want to specify a single server/instance per graph because it’s easily overlooked and can shoot you in the foot. For example, when you have a graph monitoring the storage of all your servers, and it triggers a warning alert for one instance. Sometimes this situation can take some time to resolve if this instance is a database server, you need to plan for extra capacity, which can take a while to execute to resolve the warning. But if you have other servers passing the warning threshold, you won’t be notified since the alert is already active. So, in a nutshell, I like to use Grafonnet to generate comprehensible and repetitive dashboards quickly.

What is Grafonnet?

Grafonnet is a Jsonnet library for generating Grafana dashboards and developed by Grafana Labs.

Great, so what is Jsonnet 😅? Well, Jsonnet is a simple extension of JSON, and Jsonnet defines itself as a data templating language for app and tool developers. Indeed yet another templating language 🎉. Yes, I know but bear with be I promise it makes your life easier.

How do I get started?

First of all, you have to make sure you have both Jsonnet and Grafonnet installed. On macOS and using Homebrew, you can install jsonnet with Homebrew: brew install jsonnet. The most straightforward approach to get started with Grafonnet is to clone the repo git clone https://github.com/grafana/grafonnet-lib.git.

However, when you are ready to set up a project properly, I recommend you check out jsonnet-bundler. Like it’s Ruby counterpart Bundler, jsonnet-bundler is a way to manage jsonnet dependencies like Grafonnet. After installing jsonnet-bundler, these two lines should be enough to get going

jb init
jb install https://github.com/grafana/grafonnet-lib/grafonnet

Your first dashboard

Importing the Grafonnet library

In this example, we will generate a simple dashboard with a graph per server instance to visualize the disk usage and create an alert to notify us when we reach a certain threshold.

First of all, we are going to import a couple of dependencies from the Grafonnet library.

1
2
3
4
5
local grafana = import 'grafonnet/grafana.libsonnet';
local influxdb = grafana.influxdb;
local graphPanel = grafana.graphPanel;
local alertCondition = grafana.alertCondition;
local dashboard = grafana.dashboard;

In the first line, we define the main import of our Grafonnet library, all the other are helper variables, which makes it easier and shorter to work with them, but it’s perfectly fine to use grafana.dashboard.new( ...)

Generate the graph with alert

I’ve abstracted this part into a function so we can reuse it more easily later, here is how that looks like, and I’ll explain it below the code snippet.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
local postgresqlAlertTimeseries(
    db_instance,
    threshold,
  ) =
     graphPanel.new(
       db_instance
     ).addTarget(
       influxdb.target(
           measurement='disk-usage'
       )
       .where('server', '=', db_instance)
       .selectField('available')
       .addConverter('mean')
     )
     .resetYaxes()
     .addYaxis(
       decimals=1,
       format='bytes'
     )
     .addYaxis(
       format='short',
       max=1,
       min=0,
       show=false,
     ).addAlert(
       'Warning: ' + db_instance + ' disk usage alert',
       notifications = [{id: 1}]
     ).addCondition(
       alertCondition.new(
         evaluatorType='lt',
         evaluatorParams=[threshold],
         operatorType='and',
         reducerType='min',
         queryRefId='A',
         queryTimeStart='5m',
         queryTimeEnd='now',
       ),
     );

To start with lines 1-4, we define a function named postgresqlAlertTimeseries, which takes two input params db_instance, threshold. Next, we define a new graphPanel with the db_instance name. For this graph, we use an InfluxDB target, but Grafonnet ships with support for Prometheus and a few others like Elasticsearch and raw SQL. We define the measurement disk-usage and pass a where clause to match our single DB instance.

In the next few lines 15-24, we tweak the axes a little, but then we come to the exciting part defining the alert and the alert condition (lines 25-38). The tricky part is in line 27. You will need to look up the valid ID of the notifications you have configured. If you leave this empty, it falls back to the default notification channels. I looked it up checking an existing dashboard source 😉

Ps: for brevity, I’ve omitted most of our other params we used to tweak the dashboard, for example, to hide the legend, but make sure to check the docs to configure the graph your style.

Defining the dashboard

At this point, it’s pretty straight forward. We define an array variable db_instances containing the list of DB instances we are generating a graph for (line 1).

Then we also define a small wrapper function generateDBPanel to add the grid position to the generated graph. The gridPos is a necessary argument for the addPanels method when adding a new graph to the dashboard.

Eventually, we define the essential part, the actual Grafana dashboard, from lines 12-21. Most input parameters speak for themself, but I will mention the uid field. If you want to have shareable dashboard links, it makes sense to keep the UID the same and provide one. Otherwise, Grafana will assign a random UID each time you import it, and you lose the reference to the dashboard.

And ultimately, we map over our db_instances array using the Jsonnet standard library std.map function, it will generate the graph for each DB instance.

Here can find a complete gist of this Grafonnet snippet

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
local db_instances =  [
      'db-1',
      'db-2',
      '...'
    ];

local generateDBPanel(db_instance) =
  postgresqlAlertTimeseries(db_instance, 644245094400) {
    gridPos: { h: 6, w: 24, x: 0, y: 0 }
  };

dashboard.new(
            'Postgresql Storage Warning Alerts',
            uid='90C6DB2A-946C-446B-9D2B-8B1E03673F03',
            description='Configures storage alerts for DB servers at a warning level',
            tags=['postgresql','alerting'],
            timezone='utc',
            time_from='now-7d/m',
            time_to='now/m',
          ).addPanels(std.map(generateDBPanel, db_instances))

Generate the dashboard

Now we defined the dashboard. The only thing that’s left for us to do is to generate the actual JSON model of the Grafana dashboard using Jsonnet. If you followed the instructions in “How do I get started?" properly, you should be able to run the following Jsonnet command and get a nice JSON output in your terminal.

jsonnet -J . -J vendor/grafonnet-lib example.dashboard.jsonnet

Import the dashboard into Grafana

Depending on your Grafana setup, there can be multiple ways to import dashboards, but I will only discuss the manual approach for now. So you copy the output of the previous command and visit the https://grafana.example.com/dashboard/import page and follow the step-by-step process.

There is one big caveat, though, when you are defining alerts this way. At the time of writing, there is a known bug grafana/grafana#11419, which prevents alerts from being “activated” during the import process.

So the current workaround means you will need to open the dashboard and save it manually to “activate” the alerts. You can verify the alerts are created by adding checking your dashboard with an [alerts list] panel. But I’ll leave that exercise up to you to define a Grafonnet dashboard with the alert list panel since you all expert in grafonnet now 😜.

Some Tips:

  • What helped me when I wasn’t familiar with the options or parameters would be to configure a dashboard manually and investigate the JSON model. And try to iterate over my grafonnet definition until I ended up with the same result 😅
  • Grafonnet is a superset of JSON, so you can dive into the source code and figure out which options and functions are available. The most used Grafonnet features are also well documented, so it’s worth checking that out.
  • Check out the Jsonnet tutorial and standard library documentation to familiarize yourself with Jsonnet.

The “Future”

Fortunately, we are migrating away to a monitoring stack with Grafana, Prometheus, and Alert Manager, so we won’t have those weird quirks to set up alerts soon anymore. However, we will still be leveraging Jsonnet and Grafonnet to generate our dashboards.

The goal is to have dashboards as code so we can use our regular review process when creating or updating dashboards and tracking why changes happened in the first place, unlike the untransparent approach of manually creating and updating dashboards in Grafana, which results in inconsistent dashboards and conventions.

Eventually, we want to reach a point where we have our own set of library components that we can easily use to compose a dashboard. So we have a unified approach across our dashboards and can reuse existing library components without reinventing the wheel every time. I didn’t come up with this idea myself but stole it by investigating the incredible runbook repository of Gitlab’s SRE team, which is open for everyone to read, it’s full of lot’s of good stuff. But you have to make it fit for your organization, of course. Make sure to check it out.

If you made it this far, thanks for reading! Feel free to reach out on twitter if you have any questions or remarks 👋