AlertManagerConfig CRD Opsgenie Integration 422 Error

248 views Asked by At

I have deployed prometheus from helm kube-prometheus-stack(2.42.0) with alertmanager enabled, and we have configured it with AlertManagerConfig CRD.

apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  labels:
    alertmanager: kube-prometheus-stack-alertmanager
  name: alertmanager-config
spec:
  route:
    groupWait: 30s
    groupInterval: 5m
    repeatInterval: 4h
    groupBy:
    - job
    receiver: default
  receivers:
  - name: default
    slackConfigs:
    - apiURL:
        key: slackApiUrl
        name: slack-secret
      channel: "${slack_default_channel}"
      sendResolved: true
      text: >-
        :chart_with_upwards_trend: *<https://monitoring-{{ .CommonLabels.location }}.johndoe.net/d/cluster/kubernetes-cluster-deep-dive?orgId=1&from=now-6h&to=now&var-cluster={{ .CommonLabels.cluster }}&var-location={{ .CommonLabels.location }}&refresh=1m | Grafana >*

        {{ range .Alerts -}}
          *Alert:* {{ .Annotations.title }}{{ if .Labels.severity }} - `{{ .Labels.severity }}`{{ end }}

        {{- if .Annotations.description }}  *Description:* {{ .Annotations.description }} {{ end }}

        {{- if .Annotations.message }}  *Message:* {{ .Annotations.message }} {{ end }}

        *Labels*:
         {{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
         {{ end }}
        {{ end }}
      title: "{{ .Status | toUpper }} | {{ .CommonLabels.alertname }} | {{ .CommonLabels.cluster }} | {{- .CommonLabels.namespace }}"
    opsgenieConfigs:
        - sendResolved: true
          apiKey:
            key: opsgenieApiKey
            name:  opsgenie-secret
          apiURL: "${opsgenie_default_apiurl}"
          message: "{{ .CommonAnnotations.message }} {{ .CommonAnnotations.summary }}"
          tags: "cluster: {{ .CommonLabels.cluster }}, region: {{ .CommonLabels.location }}, {{ if .CommonLabels.component }} component: {{.CommonLabels.component}} {{ end }}, {{ if .CommonLabels.namespace }}, tenant: {{.CommonLabels.namespace}} {{ end }} {{ if .CommonLabels.environment }}, environment: {{ .CommonLabels.environment }} {{ end }}"
          description: >-
            {{ range .Alerts -}}
            *Alert:* {{ .Annotations.title }}{{ if .Labels.severity }} - `{{ .Labels.severity }}`{{ end }}

            {{- if .Annotations.description }}  *Description:* {{ .Annotations.description }} {{ end }}

            {{- if .Annotations.message }}  *Message:* {{ .Annotations.message }} {{ end }}

            *Labels*:
              {{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
              {{ end }}
            {{ end }}
          responders:
            - type: "team"
              name: "${opsgenie_default_team}"
    

Opsgenie Integration in alertmanager POD logs show this after triggering custom alerts:

level=error component=dispatcher msg="Notify for alerts failed" num_alerts=9 err="cpl-prometheus-dev/default-infrastructure-config-dev/default/opsgenie[0]: notify retry canceled due to unr
ecoverable error after 1 attempts: unexpected status code 422: {\"message\":\"Request body is not processable. Please check the errors.\",\"errors\":{\"message\":\"Message can not be empty.\"},\"took\":0.0,\"requestId\":\"0398150c-85f5-435
4-8860-981950f591ae\"}"

The thing is that I have triggered an alert which was showed up fine in Opsgenie, but the next same alert I trigger, is showing the above error message. This also happens from time to time with different alerts. By the way, the Slack Integration is working fine, with all alerts.

Does somebody know anything about this related issue?

Actions:

Restarted alertmanager POD. Run the commands in separate terminals

kubectl port-forward -n monitoring alertmanager-kube-prometheus-stack-alertmanager-0  9093
curl -Ss localhost:9093/metrics | grep 'alertmanager_notifications.*opsgenie'

The output of the second command is:

alertmanager_notifications_failed_total{integration="opsgenie"} 0 #failed Opsgenie Integration Alerts
alertmanager_notifications_total{integration="opsgenie"} 0 #Total Opsgenie Integration Alerts

Triggered a custom alert with a test message. Alert showed up in Opsgenie

The output of the command of the first step is now:

alertmanager_notifications_failed_total{integration="opsgenie"} 0 #failed Opsgenie Integration Alerts
alertmanager_notifications_total{integration="opsgenie"} 1 #Total Opsgenie Integration Alerts

However after testing a second alert with different name, this error message is appearing on the cluster

level=error component=dispatcher msg="Notify for alerts failed" num_alerts=9 err="cpl-prometheus-dev/default-infrastructure-config-dev/default/opsgenie[0]: notify retry canceled due to unr
ecoverable error after 1 attempts: unexpected status code 422: {\"message\":\"Request body is not processable. Please check the errors.\",\"errors\":{\"message\":\"Message can not be empty.\"},\"took\":0.0,\"requestId\":\"0398150c-85f5-435
4-8860-981950f591ae\"}"

Also the output of the command of the first step is now:

alertmanager_notifications_failed_total{integration="opsgenie"} 1 #failed Opsgenie Integration Alerts
alertmanager_notifications_total{integration="opsgenie"} 2 #Total Opsgenie Integration Alerts

This means that the second alert failed to sent to Opsgenie, even though it’s visible on AlertManager.

0

There are 0 answers