Simple OpenShift Monitoring

This guide explains how a simple system can be set up to monitor your OpenShift cluster

Assumptions and Pre-Requisites

This guide assumes familiarity with the Linux Command Line, and with the “oc” command set to manage an Openshift cluster.

To complete the steps in this guide, you must have the “oc” command installed and have a suitable account on your Openshift cluster. Specifically, it is assumed you know the authentication credentials that need to be supplied to “oc login”.

Generating an Authentication Token

The first step is to create a service account within your OpenShift cluster that can be used by your monitoring system, and to create an authentication token for it.

Tip! The token may have already been created, so it’s worth checking first.

Log in to Openshift on the command line:

oc login ...
oc project openshift-infra

Check if a monitoring account already exists:

oc get serviceaccount --all-namespaces | grep infra | grep monitoring

If the account exists, the command above will print the name, and you can move on to the step Reading the Authentication Token. If it prints nothing, the account needs to be created as follows.

Create the monitoring account if it does not exist:

oc create serviceaccount monitoring

Add the role “cluster-reader” to the account:

oc adm policy add-cluster-role-to-user cluster-reader system:serviceaccount:openshift-infra:monitoring

Reading the Authentication Token

Next, you need to retrieve the authentication token that you created in the previous step.

If you haven’t already done so, ensure you are logged in to OpenShift and connected to the right project:

oc login ...
oc project openshift-infra

Find the name of the monitoring account’s token IDs:

oc describe serviceaccount monitoring

This will list the IDs of one or more token names, eg. “monitoring-token-1abc2”

Now find the actual authentication token by querying the secret by the name given in the previous step, e.g.

oc describe secret monitoring-token-1abc2

(replace the last part with the token name you found in the previous step)

This will list a few values, including the 179-character authentication token.

Using the Authentication Token

The Openshift API provides a way of determining the health of the cluster; the data required to do this is obtainable by requesting the API at “/api/vi/nodes“. The authentication token must be supplied in an HTTP header called “Authorization”.

You could query this data from a shell command using Curl, as in the example below:

 $ token="eyJhb...the.179.char.token"
 $ endpoint= https://ocp.my-cluster-url.ukcloud.com:8443
 $ curl -k -H "Authorization: Bearer ${token}" -H 'Accept: application/json' ${endpoint}/api/v1/nodes

However, this API call returns a reasonable amount of JSON-encoded data, which may be hard to parse using shell commands.

It is necessary to fetch items from various places within this data. The Python program in the following section accomplishes this.

Using the Openshift API to Obtain Cluster Status

The code example below uses the OpenShift API to obtain information about the health of the cluster, and prints a summary showing if each node is healthy.

The program accepts the endpoint and token on the command line (see the parse_args function):

#!/usr/bin/env python3
import argparse
import json
import ssl
import urllib.request
 
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--token", "-t", required=True)
    parser.add_argument("--endpoint", "-e", required=True)
    return parser.parse_args()
 
 
def get_all_nodes(endpoint, token):
    """Fetch the nodes data from the openshift cluster"""
    request = urllib.request.Request(endpoint + "/api/v1/nodes")
    request.add_header('Authorization', 'Bearer ' + token)
    request.add_header('Accept', 'application/json')
    ssl_context = None  # or ssl._create_unverified_context() for an insecure connection
    result = urllib.request.urlopen(request, context=ssl_context)
    return result.read()
 
 
def get_status_from_node(data_item):
    """ Extract the status conditions from the data"""
    addresses = data_item['status']['addresses']
    address = None
    for addr in addresses:
        if addr['type'] == 'Hostname':
            address = addr['address']
    return {'hostname': address,
            'conditions': data_item['status']['conditions']}
 
 
def find_faults(cond_data):
    """ find whether each node is in a failed state"""
    cells = []
    for node in cond_data:
        hostname = node['hostname']
        state = 'OK  '
        for cond in node['conditions']:
            if cond['status'] != "False" and cond['type'] != "Ready":
                state = 'FAIL'
            elif cond['status'] != "True" and cond['type'] == "Ready":
                state = 'FAIL'
        cells.append('{} {} '.format(state, hostname))
    return cells
 
 
def main():
    args = parse_args()
    all_nodes = json.loads(get_all_nodes(args.endpoint, args.token))
    all_conditions = []
    for node in all_nodes['items']:
        all_conditions.append(get_status_from_node(node))
    print("Status of Cluster at {}".format(args.endpoint))
    for node in find_faults(all_conditions):
        print(node)
 
 
main()