Cristian Marius Tiutiu, Bikram Gupta, and Easha Abid
Alerts and notifications are a critical part of your deployment workflow. When working with a Kubernetes cluster, often you need to be notified immediately about any critical issue in your cluster.
Alertmanager is part of the kube-prom-stack
installed in your cluster in Prometheus Stack. It allows you to receive alerts from various sources like Prometheus. Rules are created on the Prometheus side, which in turn can fire alerts. It’s the responsibility of Alertmanager to intercept those alerts, group them (aggregation), apply other transformations and finally route them to the configured receivers. Notification messages can be further formatted to include additional details if desired. You can use Slack, Gmail, etc. to send real-time notifications.
In this section, you will learn how to inspect the existing alerts, create new ones, and then configure Alertmanager to send notifications via Slack using the same manifest file used for configuring Prometheus.
To complete this tutorial, you will need:
kube-prom-stack
has over a hundred rules already activated. To access the Prometheus console, first do a port-forward to your local machine.
kubectl --namespace monitoring port-forward svc/kube-prom-stack-kube-prome-prometheus 9091:9090
Open a web browser on localhost:9091 and access the Alerts menu item. You should see some predefined Alerts and it should look like the following:
Click on any of the alerts to expand it. You can see information about the expression it queries, the labels it has set up, and annotations which is very important from a templating perspective. Prometheus supports templating in the annotations and labels of alerts. For more information check out the official documentation.
To create a new alert, you need to add a new definition in the additionalPrometheusRule
section of the kube-prom-stack
Helm values file.
You will create a sample alert that will be triggered if the emojivoto
namespace does not have an expected number of instances. The expected number of pods for the emojivoto
application is 4.
First, open the 04-setup-observability/assets/manifests/prom-stack-values.yaml
file provided in the Starter Kit repository, using a text editor of your choice (preferably with YAML lint support). Then, uncomment the additionalPrometheusRules
block.
additionalPrometheusRulesMap:
rule-name:
groups:
- name: emojivoto-instance-down
rules:
- alert: EmojivotoInstanceDown
expr: sum(kube_pod_owner{namespace="emojivoto"}) by (namespace) < 4
for: 1m
labels:
severity: 'critical'
annotations:
description: ' The Number of pods from the namespace {{ $labels.namespace }} is lower than the expected 4. '
summary: 'Pod {{ $labels.pod }} down'
Finally, apply settings using helm
:
HELM_CHART_VERSION="35.5.1"
helm upgrade kube-prom-stack prometheus-community/kube-prometheus-stack --version "${HELM_CHART_VERSION}" \
--namespace monitoring \
-f "04-setup-observability/assets/manifests/prom-stack-values-v${HELM_CHART_VERSION}.yaml"
To check that the alert has been created successfully, navigate to the Prometheus Console, click on the Alerts menu item, and identify the EmojivotioInstanceDown
alert. It should be visible at the bottom of the list.
To complete this section, you need to have administrative rights over a Slack workspace. This will enable you to create the incoming webhook you will need in the next steps. You will also need to create a channel where you would like to receive notifications from Alertmanager. You will configure Alertmanager to range over all of the alerts received printing their respective summaries and descriptions on new lines.
https://api.slack.com/apps
. Click on the Create New App button.Next, you will tell Alertmanager how to send Slack notifications. Open the 04-setup-observability/assets/manifests/prom-stack-values-v35.5.1.yaml
file provided in the Starter Kit repository using a text editor of your choice. Uncomment the entire alertmanager.config
block. Make sure to update the slack_api_url
and channel
values by replacing the <>
placeholders accordingly.
alertmanager:
enabled: true
config:
global:
resolve_timeout: 5m
slack_api_url: "<YOUR_SLACK_APP_INCOMING_WEBHOOK_URL_HERE>"
route:
receiver: "slack-notifications"
repeat_interval: 12h
routes:
- receiver: "slack-notifications"
# matchers:
# - alertname="EmojivotoInstanceDown"
# continue: false
receivers:
- name: "slack-notifications"
slack_configs:
- channel: "#<YOUR_SLACK_CHANNEL_NAME_HERE>"
send_resolved: true
title: "{{ range .Alerts }}{{ .Annotations.summary }}\n{{ end }}"
text: "{{ range .Alerts }}{{ .Annotations.description }}\n{{ end }}"
In the above configuration,
slack_api_url
: incoming Slack webhook URL created in step 4receivers.[].slack_configs
: defines the Slack channel used to send notifications, notification title, and the actual message. It is also possible to format the notification message (or body) based on your requirements.title
and text
: iterates over the firing alerts and prints out the summary and description using the Prometheus templating system.send_resolved
: boolean indicating if Alertmanager should send a notification when an alert is not firing anymore.The matcher
and continue
parameters are still commented out as you will be uncommenting that later on in the guide. For now, it should stay commented.
Finally, upgrade the kube-prometheus-stack
, using helm
:
HELM_CHART_VERSION="35.5.1"
helm upgrade kube-prom-stack prometheus-community/kube-prometheus-stack --version "${HELM_CHART_VERSION}" \
--namespace monitoring \
-f "04-setup-observability/assets/manifests/prom-stack-values-v${HELM_CHART_VERSION}.yaml"
At this point, you should receive Slack notifications for all the firing alerts.
Next, you’re going to test if the EmojivotoInstanceDown
alert added previously works and sends a notification to Slack by downscaling the number of replicas for the /emoji
deployment of the emojivoto
namespace.
From your terminal, run the following command to bring the number of replicas for the /emoji
deployment to 0:
kubectl scale --replicas=0 deployment/emoji -n emojivoto
Open a web browser on localhost:9091 and access the Alerts menu item. Search for the EmojivotoInstanceDown alert created earlier. The status of the alert should say Firing after about one minute of scaling down the deployment.
A message notification will be sent to Slack to the channel you configured earlier if everything goes well. You should see the “The Number of pods from the namespace emojivoto
is lower than the expected 4.” alert in the Slack message as configured in the annotations.description
config of the additionalPrometheusRulesMap
block.
Currently, all of the Alerts firings will be sent to the Slack channel. This can cause notification fatigue. To drill down on what notification is sent, you can restrict Alertmanager to only send notifications for alerts that match a certain pattern. This is done using the matcher
parameter.
Open the 04-setup-observability/assets/manifests/prom-stack-values-v35.5.1.yaml
file provided in the Starter Kit repository using a text editor of your choice. Uncomment the entire alertmanager.config
block. Make sure to uncomment the matcher
and the continue
parameters:
config:
global:
resolve_timeout: 5m
slack_api_url: "<YOUR_SLACK_APP_INCOMING_WEBHOOK_URL_HERE>"
route:
receiver: "slack-notifications"
repeat_interval: 12h
routes:
- receiver: "slack-notifications"
matchers:
- alertname="EmojivotoInstanceDown"
continue: false
receivers:
- name: "slack-notifications"
slack_configs:
- channel: "#<YOUR_SLACK_CHANNEL_NAME_HERE>"
send_resolved: true
title: "{{ range .Alerts }}{{ .Annotations.summary }}\n{{ end }}"
text: "{{ range .Alerts }}{{ .Annotations.description }}\n{{ end }}"
Finally, upgrade the kube-prometheus-stack
, using helm
:
HELM_CHART_VERSION="35.5.1"
helm upgrade kube-prom-stack prometheus-community/kube-prometheus-stack --version "${HELM_CHART_VERSION}" \
--namespace monitoring \
-f "04-setup-observability/assets/manifests/prom-stack-values-v${HELM_CHART_VERSION}.yaml"
Now, you should only receive alerts from the matching EmojivotoInstanceDown
alert name. Since the continue
is set to false, Alertmanager will only send notifications from this alert and stop sending them to others.
Clicking on the notification name in Slack will open a web browser to an unreachable web page with the internal Kubernetes DNS of the Alertmanager pod. This is expected. Some useful links for you to check out: For more information, you can check out DNS pod service, Configuration parameters for AlertManager, and some Notification examples.
When an alert fires and sends a notification in Slack it’s important that you can debug the problem easily and find the root cause promptly. To do this you can make use of Grafana which has already been installed in Prometheus Stack and Loki Stack tutorials.
Create a port forward for Grafana on port 3000
:
kubectl --namespace monitoring port-forward svc/kube-prom-stack-grafana 3000:80
Open a web browser on localhost:3000 and log in using the default credentials admin/prom-operator
.
Navigate to the Alerting section. From the State filter, click on the Firing option. Identify the emojivoto-instance-down
alert defined in the Creating a New Alert section and expand it. You should see the following:
Click on the See graph button. From the next page, you can observe the count for the number of pods in the emojivoto
namespace displayed as a metric. Take note that Grafana filters results using a time range of Last 1 hour by default. Adjust this to the time interval when the Alert is fired. You can adjust the time range using a From To option for a more granular result or using Quick range such as Last 30 minutes
.
From the Explore tab, select the Loki data source. From the Log browser input the following: {namespace="emojivoto"}
and click on the Run query button from the top right side of the page. You should see the following:
Make sure you adjust the time interval accordingly.
From this page, you can filter the log results further. For example, to filter the logs for the web-svc
container of the emojivoto
namespace, you can enter the following query: {namespace="emojivoto", container="web-svc"}
More explanations about using LogQL
can be found in Step 3 - Using LogQL.
You can also make use of the Exported Kubernetes Events installed previously and filter for events related to the emojivoto
namespace.
Enter the following query in the log browser: {app="event-exporter"} |= "emojivoto"
. This will return the Kubernetes events related to the emojivoto
namespace.
In this tutorial, you learned how to inspect existing alerts, create new ones, and configure AlertManager
to send notifications to Slack.
The next step is to set Backup and Restore using Velero or TrilioVault in your DOKS cluster.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!