I have deployed prometheus from helm kube-prometheus-stack(2.42.0) with alertmanager enabled, and we have configured it with AlertManagerConfig CRD.
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
labels:
alertmanager: kube-prometheus-stack-alertmanager
name: alertmanager-config
spec:
route:
groupWait: 30s
groupInterval: 5m
repeatInterval: 4h
groupBy:
- job
receiver: default
receivers:
- name: default
slackConfigs:
- apiURL:
key: slackApiUrl
name: slack-secret
channel: "${slack_default_channel}"
sendResolved: true
text: >-
:chart_with_upwards_trend: *<https://monitoring-{{ .CommonLabels.location }}.johndoe.net/d/cluster/kubernetes-cluster-deep-dive?orgId=1&from=now-6h&to=now&var-cluster={{ .CommonLabels.cluster }}&var-location={{ .CommonLabels.location }}&refresh=1m | Grafana >*
{{ range .Alerts -}}
*Alert:* {{ .Annotations.title }}{{ if .Labels.severity }} - `{{ .Labels.severity }}`{{ end }}
{{- if .Annotations.description }} *Description:* {{ .Annotations.description }} {{ end }}
{{- if .Annotations.message }} *Message:* {{ .Annotations.message }} {{ end }}
*Labels*:
{{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
{{ end }}
{{ end }}
title: "{{ .Status | toUpper }} | {{ .CommonLabels.alertname }} | {{ .CommonLabels.cluster }} | {{- .CommonLabels.namespace }}"
opsgenieConfigs:
- sendResolved: true
apiKey:
key: opsgenieApiKey
name: opsgenie-secret
apiURL: "${opsgenie_default_apiurl}"
message: "{{ .CommonAnnotations.message }} {{ .CommonAnnotations.summary }}"
tags: "cluster: {{ .CommonLabels.cluster }}, region: {{ .CommonLabels.location }}, {{ if .CommonLabels.component }} component: {{.CommonLabels.component}} {{ end }}, {{ if .CommonLabels.namespace }}, tenant: {{.CommonLabels.namespace}} {{ end }} {{ if .CommonLabels.environment }}, environment: {{ .CommonLabels.environment }} {{ end }}"
description: >-
{{ range .Alerts -}}
*Alert:* {{ .Annotations.title }}{{ if .Labels.severity }} - `{{ .Labels.severity }}`{{ end }}
{{- if .Annotations.description }} *Description:* {{ .Annotations.description }} {{ end }}
{{- if .Annotations.message }} *Message:* {{ .Annotations.message }} {{ end }}
*Labels*:
{{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
{{ end }}
{{ end }}
responders:
- type: "team"
name: "${opsgenie_default_team}"
Opsgenie Integration in alertmanager POD logs show this after triggering custom alerts:
level=error component=dispatcher msg="Notify for alerts failed" num_alerts=9 err="cpl-prometheus-dev/default-infrastructure-config-dev/default/opsgenie[0]: notify retry canceled due to unr
ecoverable error after 1 attempts: unexpected status code 422: {\"message\":\"Request body is not processable. Please check the errors.\",\"errors\":{\"message\":\"Message can not be empty.\"},\"took\":0.0,\"requestId\":\"0398150c-85f5-435
4-8860-981950f591ae\"}"
The thing is that I have triggered an alert which was showed up fine in Opsgenie, but the next same alert I trigger, is showing the above error message. This also happens from time to time with different alerts. By the way, the Slack Integration is working fine, with all alerts.
Does somebody know anything about this related issue?
Actions:
Restarted alertmanager POD. Run the commands in separate terminals
kubectl port-forward -n monitoring alertmanager-kube-prometheus-stack-alertmanager-0 9093
curl -Ss localhost:9093/metrics | grep 'alertmanager_notifications.*opsgenie'
The output of the second command is:
alertmanager_notifications_failed_total{integration="opsgenie"} 0 #failed Opsgenie Integration Alerts
alertmanager_notifications_total{integration="opsgenie"} 0 #Total Opsgenie Integration Alerts
Triggered a custom alert with a test message. Alert showed up in Opsgenie
The output of the command of the first step is now:
alertmanager_notifications_failed_total{integration="opsgenie"} 0 #failed Opsgenie Integration Alerts
alertmanager_notifications_total{integration="opsgenie"} 1 #Total Opsgenie Integration Alerts
However after testing a second alert with different name, this error message is appearing on the cluster
level=error component=dispatcher msg="Notify for alerts failed" num_alerts=9 err="cpl-prometheus-dev/default-infrastructure-config-dev/default/opsgenie[0]: notify retry canceled due to unr
ecoverable error after 1 attempts: unexpected status code 422: {\"message\":\"Request body is not processable. Please check the errors.\",\"errors\":{\"message\":\"Message can not be empty.\"},\"took\":0.0,\"requestId\":\"0398150c-85f5-435
4-8860-981950f591ae\"}"
Also the output of the command of the first step is now:
alertmanager_notifications_failed_total{integration="opsgenie"} 1 #failed Opsgenie Integration Alerts
alertmanager_notifications_total{integration="opsgenie"} 2 #Total Opsgenie Integration Alerts
This means that the second alert failed to sent to Opsgenie, even though it’s visible on AlertManager.