Monitoring important remote system metrics

The Nagios plugin check_multi is a convenient tool to execute multiple checks within a single check command that generates an overall returned state and output from it. Here in this process, we will show you how to set it up and use it to quickly monitor a list of important system metrics on your clients.

To Start With: What Do You Need?

It is assumed that you’ve gone through this segment division process by process, therefore by now, you should have a Nagios server running and another client computer that you want to monitor, which can already be accessed via its NRPE service externally by our Nagios server. This client computer that you want to monitor needs an installation of the CentOS 7 operating system with root privileges and a console-based text editor of your choice installed on it, as well as a connection to the Internet in order to facilitate the download of additional packages. The client computer will have the IP address 192.168.1.8.

The Process

The check_multi Nagios plugin is available from Github, so we will begin this process to install the git program by downloading it:

Log in as root on your client computer and install Git if not done already:
yum install git
Now, download and install the check_multi plugin by compiling it from the source:
cd /tmp;git clone git://github.com/flackem/check_multi;cd /tmp/check_multi ./configure --with-nagios-name=nagios --with-nagios-user=nagios --withnagios- group=nagios --with-plugin-path=/usr/lib64/nagios/plugins -- libexecdir=/usr/lib64/nagios/plugins/ make all;make install;make install-config
Next, we install another very useful plugin called check_mem, which is not available in the CentOS 7 Nagios plugin rpms:
cd /tmp;git clone https://github.com/justintime/nagios-plugins.git cp /tmp/nagios-plugins/check_mem/check_mem.pl /usr/lib64/nagios/plugins/
Next, let’s create a check_multi command file that will contain all your desired client checks that you want to combine in a single run; open the following file:
vi /usr/local/nagios/etc/check_multi/check_multi.cmd
Put in the following content:
command[ sys_load::check_load ] = check_load -w 5,4,3 -c 10,8,6 command[ sys_mem::check_mem ] = check_mem.pl -w 10 -c 5 -f -C command[ sys_users::check_users ] = check_users -w 5 -c 10 command[ sys_disks::check_disk ] = check_disk -w 5% -c 2% -X nfs command[ sys_procs::check_procs ] = check_procs
Next, test out the command file that we just created in the last step using the following commandline:
/usr/lib64/nagios/plugins/check_multi -f /usr/local/nagios/etc/check_multi/check_multi.cmd
If everything is correct, it should print out the results of your five plugin checks and an overall result, for example, OK -5 plugins checked. Next, we will install this new command in the NRPE service on our client so that the Nagios server is able to execute it remotely by calling its name. Open the NRPE configuration file:
vi /etc/nagios/nrpe.cfg
Add the following line to the end of the file right below the last # command line to expose a new command called check_multicmd to our Nagios server:
command[check_multicmd]=/usr/lib64/nagios/plugins/check_multi -f /usr/local/nagios/etc/check_multi/check_multi.cmd
Finally, let’s reload NRPE:
systemctl restart nrpe
Now, let’s check whether we can execute our new check_multicmd command that we defined in the last step from our Nagios server. Log in as root and type the following command (change the IP address of your client, 192.168.1.8, appropriately):
/usr/lib64/nagios/plugins/check_nrpe -H 192.168.1.8 -c "check_multicmd"
If the output is the same as running it locally on the client itself (take a look at the former step), we can successfully execute remote NRPE commands on our client through our server, so let’s define the command on our Nagios server system for real so that we can start using it within the Nagios system. Open the following file:
vi /etc/nagios/objects/commands.cfg
Put in the following content at the end of the file to define a new command called check_nrpe_multi, which we can use in any service definition:
define command { command_name check_nrpe_multi command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c "check_multicmd" }
Next, we will define a new server definition for the client that we want to monitor on our Nagios server (give the config file an appropriate name, for example, its domain name or IP address):
vi /etc/nagios/servers/192.168.1.8.cfg
Put in the following content, which will define a new host with its service, using our new Nagios command that we just created:
define host { use linux-server host_name host1 address 192.168.1.22 contact_groups unix-admins } define service { use generic-service host_name host1 check_command check_nrpe_multi normal_check_interval 15 service_description check_nrpe_multi service }
Finally, we need to configure all persons who should get notification e-mails for our new service in case of errors. Open the following file:
vi /etc/nagios/objects/contacts.cfg
Put in the following content at the end of the file:
define contactgroup{ contactgroup_name unix-admins alias Unix Administrators } define contact { contact_name pelz use generic-contact alias Oliver Pelz contactgroups unix-admins email oliverpelz@mymailhost.com }
Now, restart the Nagios service:
systemctl restart nagios

How Does It Work?

We started this process by installing the check_multi and check_mem plugins from their author’s Github repositories; they are plain command-line tools. Nagios performs checks by running such external commands, and it uses the return code along with output from the command as information on whether the check was successful or not. Nagios has a very flexible architecture that can be easily extended using plugins, add-ons, and extensions. A central place to search for all kinds of extensions is at https://exchange.nagios.org/ . Next, we added a new command file for check_multi, where we put five different system check_ commands in. These checks act as a starting point for customizing your monitoring needs and will check system load, memory consumption, system users, free space, and processes. All available check_ commands can be found at /usr/lib64/nagios/plugins/check_*. As you can see in our command file, the parameters of those check_ commands can be very different, and explaining them all is out of the scope of this process. Most of them are used to set threshold values to reach a certain state, for example, the CRITICAL state. To get more information about a specific command, use the --help parameter with the command. For example, to find out what all the parameters in the check_load -w 5,4,3 -c 10,8,6 command are doing, use run /usr/lib64/nagios/plugins/check_load --help. You can easily add any number of new check commands to our command file from existing plugins, or you can download and install any new commands, if you like. There are also a number of command file examples shipped with the check_multi plugin, which are very useful for learning, so please have a look at the directory: /usr/local/nagios/etc/check_multi/*.cmd.

Afterwards, we checked the correctness of our new command file that we just created by dry-running it as an -f parameter from the check_multi command locally on the client. In its output, you will find all the single outputs as if you would have run these five commands individually. If one single check fails, the complete check_multi will do. Next, we defined a new NRPE command in the NRPE config file called check_multicmd that can then be executed from the Nagios server, which we tested in the next step from our Nagios server. For a test to be successful, we expect the same results as we got when calling the command from the client itself. Afterwards, we defined this command in our commands.cfg on the Nagios server so that we can reuse it as much as we like in any service definition by referencing the command’s name, check_nrpe_multi. Next, we created a new server file named as the IP address (you can name it anything you like as long it has the .cfg extension in the directory) of the client we want to monitor: 192.168.1.8.cfg. It contains exactly one host definition and one or multiple service definitions, which are linked by the value of host_name of the host with the host_name value in your service definitions.

In the host definition, we defined a contact_groups contact that links to the contacts.cfg file’s contact group and contact entry. These will be used to send notification e-mails if the checked service has any errors. The most important value in the service definition is the check_command check_nrpe_multi line, which executes the command that we created before as our one and only check. Also, the normal_check_interval is important as it defines how often the service will be checked under normal conditions. Here, it gets checked every 15 minutes. You can add as many service definitions to a host as you like.

Now, go to your Nagios web frontend to inspect your new host and service. Here, go to the Hosts tab, where you will see the new host, host1, that you defined in this process, and it should give you information about its status. If you click on the Services tab, you will see the check_nrpe_multi service. It should show the Status as Pending, OK, or CRITICAL, depending on the success of the single checks. If you click on its check_nrpe_multi link, you will see details about the checks.

We could only show you the very basics of Nagios, and there is always more to learn, so please read the official Nagios Core documentation at https://www.nagios.org , or check out the book Learning Nagios 4, Packt Publishing, by Wojciech Kocjan.

Help Category:

CentOS

Monitoring important remote system metrics

To Start With: What Do You Need?

The Process

How Does It Work?

Help Category:

What Our Clients Say

+1 323 412 9457

Live Chat