Windows Monitoring with Nagios and NSClient |
Home Page | Comments | Articles | Faq | Documents | Search | Archive | Tales from the Machine Room | Contribute | Set language to:en it | Login/Register
Today I received a nice request from one of our customers: they have a Windows server with some kind of web application that is supposed to send mail in response to various events. I say "it is supposed" because it seems that sometimes things goes pear-shaped in the mail process and the mail is not sent.
Of course the "check your application" way of thinking wasn't well received by the customer in question, so they asked us (that is, me) to add a "monitoring" to the server to be sure that the mail is sent.
I tried to explain that if the process doesn't send the mail because is broken and the developer can't fix it there isn't much we can monitor, but that too was lost on them.
So the problem of the day was: how do you check if the mail process in Windows (that excuse of an SMTP server in IIS) is running? Considering that the thing is behind a firewall and check if the smtp process alone is running (aka: it responds on port 25) is not a full check?
We decided that the best option to check is to first test from the local machine itself if the mail server responds (that is, connecting on port 25 gives a '220' answer) and if there are files in the mail queue directory older than X minutes. With X reasonably small.
Moreover, we wanted to add the check to our Nagios system to receive alerts if the server goes down or the process is not running anymore. Well, actually we didn't "want" to, but the customer was very pushy about it.
Anyhoo... how the heck do you check things on Windows?
This is what NRPE is for. NRPE stand for Nagios Remote Process Execution and is a client-server method to allow a 'client' to be invoked from the Nagios server, the client can perform operations and return to Nagios the result, this is usefull to do this kind of personalized tests on systems that don't have an extensible snmp daemon or don't give enough informations through snmp. That basically means Windows. The problem with NRPE is that you need to install a client on the machine that you want to check. For Windows there are a couple of clients available, one of them is NSClient.
NSClient (or better: NSClient++) is a nifty Windows program that act as a client for Nagios, answering to 'questions' sent by the Nagios server and providing answers about the local system, as explained before.
One of the option of NSClient is the ability to 'call' a local program on the machine itself and report the results to the Nagios server. Unfortunately, I couldn't find any documentation about how to use such option, most of the documentation I found was about the 'default' check and options of NSClient. So after a few hours spent rummaging around, I decided that I was going to write down this piece of docs.
Well, there isn't much to say about this, you just run the installer. During the install he will also ask if you want to restrict the connections (from one or more ip addresses) and which 'options' you want to activate. Everything that you don't activate now can be activated later by editing the .ini file, that you have to edit anyway if you want to add your scripts.
I didn't install the 'tray' option because nobody logs into the Windows machine so it was totally useless. Once installed and started the service responds to the Nagios server and allow the Win machine to be 'monitored' for standard things (like CPU load, memory load and so on).
This is the nifty bit.
First of all you have to decide what kind of check to do. As said before, I decided that I wanted to see if the smtp server was returning a 220 as response and to check if there weren't any file in the queue dir older than 10 minutes.
Once decided what to do, you have to write a script to do it.
Since I already had ActivePerl on the machine to do other stuff, I decided to write a simple perl script to do the job. The script, like all Nagios scripts, has to do a check and then return an error code depending on the result. The error code can be 0 to signal "ok, no problem", 1 for "Warning" and 2 for "error" (or "critical"). A text message can be returned to give more informations about what went wrong.
You can see my simple script here. Note that the 'queue' directory is hard-coded in it, this is not really nice but hey! is just one machine and I didn't wanted to over-elaborate the thing. Also, error checking is basically non-existant.
Put the script on the server and try it a couple of time, just to be sure that it does its tricks, is not a bad idea to alter it to check different directories to see if it does really pick up files older than 10 minutes and so on. I also found very helpful to have an "utility" script to call the script through, using a .bat file that calls perl and the script.
Now it cames the scary part: attach it to NSClient and Nagios.
You'll have to change the configuration of NSClient (doh!). What you need to be sure is that you have the following bits added to your nsc.ini:
[modules] CheckExternalScripts.dll NRPEListener.dll [NRPE] port=5666 command_timeout=60 allow_arguments=1 allow_nasty_meta_chars=0 use_ssl=0 allowed_hosts=... [External Script] command_timeout=60 allow_arguments=1 allow_nasty_meta_chars=0 [External Scripts] check_iis_smtp=perl scripts\yourscript.pl
The port number of course have to be choose and allowed in the firewall (if you have one between the Nagios server and your Windows client) and the use of ssl is disabled in my case because the machine sits in an internal DMZ, so there is no need for extra-security. The 'allowed_hosts' allow to restrict the IPs to which your NSClient will respond. This is not like a firewall but is better than nothing. The 'check_iis_smtp' is the name I gave to my check, and in this case I call directly the perl interpreter with my script. As said before, I also found usefull to have a .bat script to call directly, without the 'perl' bit in the configuration.
Note the 'Script' and 'Scripts' blocks. The 'External script' block allow you to specify other options that are only applied to the external scripts processing. If they are not specified the 'default' applies.
Once you've done this, you need to check if it works correctly. To do so, run 'nsclient++.exe /test', this will open a sort of 'command line interface' where you can 'feed' commands to the client and see if you get what you expect. To test your command just call your 'check'. You should see something like the following:
If this works you can then move to the next step. If it doesn't... heee. well, it's better if you figure out what the problem is before moving on. Remember to restart NSClient if it works.
In order to work with Nagios you need to have the NRPE plugin. This can be simply downloaded from Nagios web site and then added to your configuration. Once done, you can test from your Nagios server if it can reach the client and get an answer:
/where/is/your/nagios/plugins/check_nrpe -n -H your.windows.client.ip -p 5666 -c check_iis_smtp
Note that you need to specify the correct IP address and port. If everything works fine, you should see the same response as in your test on the machine itself, that is either an 'OK: no problem' or wathever.
Now is a matter of adding a service and a command to call the check_nrpe plugin to do the work. What I did was to add a special 'command' in the command.cfg file:
# 'check_iis_smtp' command definition define command { command_name check_iis_smtp command_line $USER1$/check_nrpe -n -H $HOSTADDRESS$ -p 5666 -c check_iis_smtp $ARG1$ }
Then in the service definition:
define service { host_name your-host-name-or-ip service_description IIS SMTP check_command check_iis_smtp use ... }
The 'use' bits allow to add 'templates' service so to be able to specify things like default check times, who to inform and so on, this is nothing specific for nrpe or nsclient but is a general Nagios thing. That means: go read the documentation!
Once this is done, run a 'preflight' check using nagios -t and if it's ok you can restart nagios and see your new check working.
Adding personalized checks on Windows using Nagios and NREP is not difficult, I only wish the documentations for doing it was a bit more clear!
Comments are added when and more important if I have the time to review them and after removing Spam, Crap, Phishing and the like. So don't hold your breath. And if your comment doesn't appear, is probably becuase it wasn't worth it.
Grazie By Inc0 posted 06/05/2010 10:53
Davide Bianchi, works as Unix/Linux administrator for an hosting provider in The Netherlands.
Do you want to contribute?
read how.
This site is made by me with blood, sweat and gunpowder, if you want to republish or redistribute any part of it, please drop me (or the author of the article if is not me) a mail.
This site was composed with VIM, now is composed with VIM and the (in)famous CMS FdT.
This site isn't optimized for vision with any specific browser, nor
it requires special fonts or resolution.
You're free to see it as you wish.