About Nagios Plugins and How to Write Them
Nagios and compatible monitoring systems, such as Icinga, have a very basic approach to plugin development. These are commands that execute either remotely or on the target server that return an exit code and sometimes performance data. This is different than systems like Prometheus or Cloudwatch which alert on metrics specifically.
A Nagios check can be written in any programming language that supports system exit codes. This includes Python, Bash, Perl, and C++ for examples. The exit code determines the status in the monitoring system.
The exit codes are as follows:
- 0: OK
- 1: WARNING
- 2: CRITICAL
- 3: UNKNOWN
On exit, an output message is also printed. For example:
OK: No errors found in log
Optionally, performance data can be printed by including values after a pipe.
CRITICAL: Errors found in log.|errors=57
To test, let’s try writing a check that verifies a text file called hello_world.txt. We want the following conditions:
- OK if contents of the file are an exact match for Hello, World!
- WARNING if contents of the file include “hello.”
- CRITICAL if the file does not contain “hello” at all.
- UNKNOWN if the check fails to verify the output of the file.
Set up a file for testing:
$ echo "Hello, World!" > /tmp/hello_world.txt
Now we will draft a Nagios check in Python that checks accordingly.
#!/usr/bin/python3
# hello world check for Nagios compatible monitoring systems.
import sys # Required for proper system exit codes
try:
with open("/tmp/hello_world.txt","r") as f:
hello = f.read().strip() # Will remove carriage return if there is one.
if hello == "Hello, World!":
print('OK: "Hello, World!" found in hello_world.txt.')
sys.exit(0)
elif 'hello' in hello.lower(): # not case sensitive
print('WARNING: "hello", but not "Hello, World!" found in hello_world.txt')
sys.exit(1)
else:
print('CRITICAL: "hello" not found in hello_world.txt')
sys.exit(2)
# In case we couldn't check at all.
except Exception as e: # Print exception to help troubleshooting
print('UNKNOWN: could not determine content of hello_world.txt')
print(e)
sys.exit(3)
Run this check, you should get the following:
OK: "Hello, World!" found in hello_world.txt.
This would have come up green with an OK status.
Okay, now try updating hello_world.txt to say “Hello, Somebody!” You should see the following:
WARNING: "hello", but not "Hello, World!" found in hello_world.txt
Next, echo any random string in that does not include “Hello”:
CRITICAL: "hello" not found in hello_world.txt
Finally, just remove hello_world.txt altogether:
UNKNOWN: could not determine content of hello_world.txt
[Errno 2] No such file or directory: '/tmp/hello_world.txt'
Here we have monitored the contents of a file. If Icinga or Nagios were running this check every five minutes and we were to alter the file in the background, you would see the alerts change accordingly. This is not simple metrics; virtually anything can be checked for; logs, json data, running processes, http status codes, etc.
Nagios plugin development guidelines are here: http://nagios-plugins.org/doc/guidelines.html