Monday, January 31, 2011

Learning collectd

Rightscale uses collectd; it's very cool and pretty well baked before I have to start working with it.  One of the things I wanted to monitor is log file age (if a log file from a particular daemon on a system hasn't been modified in the last five minutes, I want to know).

Since I don't see a plugin that checks file freshness, I took Rightscale's example (written in Ruby), and modified it for my own purposes.  Collectd expects a string in the following format:

PUTVAL localhost/filefreshness-_tmp_watchme/absolute-1_minute_file_freshness 1296498856:3953
 
This string is a pattern. The meaning of each field is:
  • instance-id: the AWS ID of the instance so the data can be filed-away correctly on the server
    • obviously the AWS ID isn't localhost.  I have yet to see if this will change when I run the script from collectd.
  • plugin: identifies the plugin which is typically associated with an application or a resource, examples are apache, mysql, squid, cpu, memory, etc.
  • plugin_instance: identifies the instance of an application/resource when there are multiple, examples are cpu-0, cpu-1 on dual-core servers, or df-mnt and df-root for the two filesystems on small instances.
  • type: identifies the type of data being collected.  Your custom plug-in must be defined in /usr/lib/collectd/types.db.
  • type-instance: the name of the variable being collected, or the instance of the variable of the given type being collected, examples are: (for the cpu type) idle, wait, busy; (for the 'mysql_command' type) selects, updates, executes.
(Again, this is shamelessly stolen from Rightscale.)  I chose the nginx log file, called "/mnt/log/nginx_access.log."  As the file gets older, the counter should increase (and it does).  The ruby script takes three arguments:

-h (--hostname) which in this case is "localhost"
-i (--interval) which will be set to 60 (seconds)
-f (--filename) which is /mnt/log/nginx_access.log.  A simple ruby statement,  ffilename=filename.gsub(/\//,"_"), converts slashes to underscores, making the plugin_instance friendly.

The numbers at the end are the UNIX time when the plugin ran, and the age of the file in seconds.  My idea is to send an alert when the monitored file grows beyond 300 seconds.

After creating the plugin script and testing it, you need to upload it to /usr/lib/collectd.  Then you have to edit /usr/lib/collect/types.db and add a new type.

filefreshness           seconds:ABSOLUTE:U:U

For now I am using the ABSOLUTE data source type.  I'm not quite sure what the difference is between a GAUGE and an ABSOLUTE.   I don't think it matters.

Finally, I need to exec my custom plugin.  By adding a new file called "/etc/collectd/conf/sitelocal.conf, I can customize plugins and load extra plugins that aren't loaded by the default Rightscale script (rightscript):
LoadPlugin exec

  #     userid    plugin executable            plugin        args
  Exec "xxx" "/usr/lib/collectd/filefreshness.rb" "-h" "i-xxxxxx" "-i" "60" "-f" "/mnt/log/nginx_access.log"
 Now that I have configured collect to load the exec plugin and exec my custom plugin, I should be able to SIGHUP collectd and I will see new graphs in Rightscale's collectd interface.

0 comments: