How to configure prometheus/alertmanager on OCP 3 • ME2Digital

With Openshift 3.11 is the Prometheus Cluster Monitoring fully supported.

I explain below how to customize the alertmanager.yaml with your own receivers.

Please keep in mind that this solution is primary for the Plattform and not for the Applications them self.

Prerequisites

Before you start let me explain what suggestions and setups are expected for the solution below.

[bastion] host for your OCP environment where you can run the playbook.
three [infrastructure] nodes where the prometheus/altermanager pods are running
a prepared alertmanager.yaml
a user/SA which have the permissions to modify the openshift-monitor Project
the oc tool is installed and a user with the right permissions is logged in.

I personally prefer to use the oc tool as it is out of the box compatible with the setuped OCP version.

You should read and understand how the setup in Openshift works as described in the doc from Prometheus Cluster Monitoring, how the default configuration of the OCP Alertmanager looks like and how the Alertmanager can be configured.

Ansible solution

Run on bastion host

The following line executes the playbook which creates the new alertmanager configuration. The new config will be automatically be deployed after the replacement was done.

ANSIBLE_LOG_PATH=ansible_log_$(date +%Y_%m_%d-%H_%M) ansible-playbook \
  -e ocp_env=dev \
  -e webhook_endpoint=http://127.0.0.1:5001/ \
  -e email_receivers=operators@MYDomain.com \
  alertman-conf.yaml

Playbook

The playbook which handles the modification of the alermanager.yaml.

---
- name: Add webhook receiver to altermanager
  hosts: bastion

  tasks:
    - name: Check that webhook receiver is reachable
      uri:
        body: '{"NodeAlias":"nodetest","Identifier":"MYID"}'
        body_format: json
        method: POST
        url: "{{ webhook_endpoint }}"
      with_items: "{{ groups['infranodes'] }}"
      delegate_to: "{{ item }}"
      changed_when: False

    - name: ALERTS | Create alertman backup tmpfile
      tempfile:
        prefix: "ocp{{ ocp_env }}_alertman_backup"
        suffix: ".tmp"
      register: alertman_back_tmp

    - name: ALERTS | Create alertman all backup tmpfile
      tempfile:
        prefix: "ocp{{ ocp_env }}_alertman_all_backup"
        suffix: ".tmp"
      register: alertman_all_back_tmp

    # Create backup from current setup
    - name: ALERTS | Get alertman secret
      shell: |
        {%raw%}
        oc get secrets -n openshift-monitoring \
        -o go-template='{{ index .data "alertmanager.yaml"}}' alertmanager-main \
        | base64 -d > {% endraw %} {{ alertman_back_tmp.path }}

    - name: ALERTS | Create receiver snipplet
      template:
        dest: /tmp/alert-man-snipplet
        src: templates/alert-receiver.j2
    
    - name: ALERTS | Get alertman secret
      shell: |
        oc get secrets -n openshift-monitoring -o yaml alertmanager-main > {{ alertman_all_back_tmp.path }}

    - name: ALERTS | Replace alertman config with new value
      replace:
        path: "{{ alertman_all_back_tmp.path }}"
        regexp: "^  alertmanager.yaml:.*$"
        replace: "  alertmanager.yaml: {{lookup('file', '/tmp/alert-man-snipplet') | b64encode }}"

    - name: ALERTS | replace alertman secret
      shell: |
        oc replace -n openshift-monitoring -f {{ alertman_all_back_tmp.path }}

    - name: ALERTS | Remove receiver snipplet tmpfile
      file:
        path: /tmp/alert-man-snipplet
        state: absent

    - name: ALERTS | Remove alertman backup tmpfile
      file:
        path: "{{ alertman_all_back_tmp.path }}"
        state: absent

Template

The altermanasger Template alert-receiver.j2.

global:
  resolve_timeout: 5m
route:
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
  receiver: default
  routes:
  - receiver: myconf
  - match:
      alertname: DeadMansSwitch
    repeat_interval: 5m
    receiver: deadmansswitch
receivers:
- name: default
- name: deadmansswitch
- name: myconf
  email_configs:
  - to: '{{ email_receivers }}'
    from: 'admin@ocp{{ ocp_env }}.cloud.internal'
    smarthost: 'SMTPRelay.MyDomain:25'
    send_resolved: true
    #require_tls: false
  webhook_configs:
  - url: "{{ webhook_endpoint }}"
    send_resolved: true

Update

17.05.2019 - add catch all receiver and require_tls

You can contact me for any further questions and orders