check_naf - Monitoring NetApp™ Filer/FAS systems

Command line options

% ./check_naf.py -h
Usage: check_naf.py [options]

Monitoring NetApp™ FAS systems

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -H HOST               Host to check
  -P 1                  SNMP protocol version
  -C public             SNMP v1/v2c community OR SNMP v3 quadruple
  --snmpcmdlinepath=/usr/bin
                        Path to "snmpget" and "snmpwalk" 
  --nonetsnmp           Do not use NET-SNMP python bindings
  --separator=,         Separator for check/target/warn/crit
  --subseparator=+      Separator for multiple checks or targets
  -v, --verbose         Verbosity, more for more ;-)

Quickstart

Run 6 tests ("filer global cpu disk disk,spare nvram version") at once:

% ./check_naf.py -H filer global cpu disk disk,spare nvram version
NAF OK - 6 OK: global/cpu/disk:failed/disk:spare/nvram/version
OK global - FAS3140: The system's global status is normal. 
OK cpu - CPU 4% busy, CPU architecture: amd64
OK disk:failed - No failed disks
OK disk:spare - 1 spare disk
OK nvram - NVRAM battery status is "ok" 
OK version - FAS3140: NetApp Release 7.3.2P6: Sat Mar 20 10:21:30 PDT 2010

Run 4 tests ("vol_data+vol_snap+vol_inode+vol_files"), each for 2 volumes ("/vol/vol0/+/vol/vol1/") at once:

% ./check_naf.py -H filer vol_data+vol_snap+vol_inode+vol_files,/vol/vol0/+/vol/vol1/,50%,75%
NAF CRITICAL - 1 CRITICAL: vol_snap:vol0, 1 WARNING: vol_data:vol1, 6 OK: vol_data:vol0/vol_inode:vol0/vol_files:vol0/vol_snap:vol1/vol_inode:vol1/vol_files:vol1
OK vol_data:vol0 - /vol/vol0/: Used 3.6GiB (27.8%) out of 12.8GiB
CRITICAL vol_snap:vol0 - /vol/vol0/.snapshot: Used 5.3GiB (166.3%) out of 3.2GiB
OK vol_inode:vol0 - /vol/vol0/: Used inodes 11.7k (0.1%) out of 7.8M
OK vol_files:vol0 - /vol/vol0/: Used files 11.7k (0.1%) out of 7.9M, may raised to 4.0M
WARNING vol_data:vol1 - /vol/vol1/: Used 451.9GiB (66.5%) out of 680.0GiB
OK vol_snap:vol1 - /vol/vol1/.snapshot: Used 6.1GiB (3.6%) out of 170.0GiB
OK vol_inode:vol1 - /vol/vol1/: Used inodes 4.8M (18.0%) out of 26.5M
OK vol_files:vol1 - /vol/vol1/: Used files 4.8M (15.2%) out of 31.2M, may raised to 211.7M

Performance data stripped for all examples!

Command line format

%./check_naf.py [options] -H hostname [checktuple [checktuple [...]]]

Format of (multiple) "checktuple"

Format: checkname[,[target][,[warn][,[crit]]]]

  • Multiple checktuples could be specified as space separated list, e.g. "cpu nvram global"
  • Options are separated with "," from check name and other options, e.g. "disk,spare,2,1"
    • Trailing "," could be omitted, e.g. "disk,spare" same as "disk,spare,,"
    • 2nd option is target
      • Some checks have a default target, e.g. "disk" same as "disk,failed"
    • 3rd (warning) and 4th (critical) options are thresholds, e.g. "disk,failed,1,2"
    • If no target but thresholds should be used, just leave the target option empty, e.g. "disk,,1,2"
  • Some checks may/must have a "target", e.g. "disk,failed" (same as "disk")
  • Multiple checks and/or targets are possible
    • They all have the same warning and critical thresholds!
    • Multiple targets for same check(s) are specified as "plus sign" separated list, e.g. "vol_data,/vol/vol0/+/vol/vol1/,50%,75%"
    • Multiple checks for same target(s) are specified as "plus sign" separated list, e.g. "vol_data+vol_snap,/vol/vol0/,50%,75%"

Implemented checks - Overview

"check name" "target" Thresholds Description
global -- -- Check internal monitoring, includes fan, power supplies, disk shelfs, ...
cluster -- -- Check cluster state
cifs stats -- Show CIFS statistics
cifs users # Monitor connected CIFS users
cpu -- % Monitor busy cpu time
disk -- # Same as "disk,failed"
disk failed # Check for failed disks
disk spare # Check for spare disks
environment -- -- Same as "global"
extcache -- Show extcache performance (needs SNMP v2c/v3)
extcache_info -- Show extcache hardware informations
io -- -- Show I/O traffic for net, disk, tape, FCP and iSCSI
nvram -- -- Categorize NVRAM status code
ops -- -- Show total ops for net, CIFS, HTTP, FCP and iSCSI
snapmirror SM Source/Dest. s Check if target is snapmirrored and - if thresolds specified - is not older than X seconds
snapvault SV Source/Dest. s Check if target is snapvaulted and - if thresolds specified - is not older than X seconds
version -- -- Show ONTAP version
vol_data Volume(s), Aggr(s) B,% Check "data" part of volume/aggr for used/free1 space
vol_snap Volume(s), Aggr(s) B,% Check "snapshot" part of volume/aggr for used/free1 space
vol_inode Volume(s), Aggr(s) #,% Check used/free1 inode count volume/aggr
vol_files Volume(s), Aggr(s) #,% Check maximum number of files used/free1 on volume/aggr

Thresholds:

  • %: Percent
  • #: Count
  • B: Bytes
    • k, M, G, ... with base 1000
    • Ki, Mi, Gi, ... with base 1024
  • s: Seconds

1 If checking "free" or "used" depends on thresholds: if warn<crit then check "used", if warn>crit then check "free" (not implemented yet)