I’ve been using Nagios for system and network monitoring both at home and in an enterprise environment. At home i monitor services and hardware of two Linux servers and an IBM AS/400. Professionally I’ve been monitoring over 400 Linux-, Unix-, Mac- and Windows servers together with network devices such as switches, routers and load balancers.
Nagios is designed for Linux but should run nice on any *NIX system and requires a web server preferably Apache. Nagios monitors network services such as TCP, SMTP, POP, IMAP, HTTP, DNS, SSH, PING… etc. and host resources such as processor load, disk usage and number of users logged in. Nagios is also capable of host hierarchies for hosts and networks resources to detect network outages. The latest stable version of Nagios as I write this is Nagios 2.1.0 and latest development release is Nagios 3.0rc3.
Nagios is a scalable monitoring system that can have many nodes providing monitoring information to a central Nagios server with the web interface to present the collected information. In this case you set up the nodes to collect information through active and passive checks from the different hosts and network resources in your network and then sends then to a central Nagios server as passive checks. Except the web interface Nagios provides notifications of failing checks through e-mail, SMS or jabber with some 3rd party software eg. SMTP-server, SMS-gateway or Jabber-gateway.
The only inconvenience with Nagios is that the configuration is stored in text files. In large environment you end up with giant text files of configuration if you don’t create your own configuration file structure. Use of agents such as NRPE doesn’t really make your configuration file problem any smaller. To Nagios benefit I have to say that it’s fairly easy to write your own plugins, either for better functionality or to monitor your companies internally developed services.
During the time I’ve been using Nagios I have coded a few plugins for Nagios in Perl. Two of those plugins are remote checks of MySQL, first with a SELECT NOW() to check that it responds and the other check is of MySQL replication checking that replication is running and how many seconds behind the master it is. There are also two hard drive checks, one local using regular shell command df and the other check is a remote check using SNMP with the command snmpdf. These plugins can be found at: http://amelia.linuxchick.se/code/nagios-plugins/
For more information on Nagios and to download it visit http://www.nagios.org.
A screenshot of my Nagios running at home http://amelia.linuxchick.se/screenshots/nagios.jpg.