check_naf - Monitoring NetApp™ Filer/FAS systems¶
Command line options¶
% ./check_naf.py -h
Usage: check_naf.py [options]
Monitoring NetApp™ FAS systems
Options:
--version show program's version number and exit
-h, --help show this help message and exit
-H HOST Host to check
-P 1 SNMP protocol version
-C public SNMP v1/v2c community OR SNMP v3 quadruple
--snmpcmdlinepath=/usr/bin
Path to "snmpget" and "snmpwalk"
--nonetsnmp Do not use NET-SNMP python bindings
--separator=, Separator for check/target/warn/crit
--subseparator=+ Separator for multiple checks or targets
-v, --verbose Verbosity, more for more ;-)
Quickstart¶
Run 6 tests ("filer global cpu disk disk,spare nvram version") at once:
% ./check_naf.py -H filer global cpu disk disk,spare nvram version
NAF OK - 6 OK: global/cpu/disk:failed/disk:spare/nvram/version
OK global - FAS3140: The system's global status is normal.
OK cpu - CPU 4% busy, CPU architecture: amd64
OK disk:failed - No failed disks
OK disk:spare - 1 spare disk
OK nvram - NVRAM battery status is "ok"
OK version - FAS3140: NetApp Release 7.3.2P6: Sat Mar 20 10:21:30 PDT 2010
Run 4 tests ("vol_data+vol_snap+vol_inode+vol_files"), each for 2 volumes ("/vol/vol0/+/vol/vol1/") at once:
% ./check_naf.py -H filer vol_data+vol_snap+vol_inode+vol_files,/vol/vol0/+/vol/vol1/,50%,75%
NAF CRITICAL - 1 CRITICAL: vol_snap:vol0, 1 WARNING: vol_data:vol1, 6 OK: vol_data:vol0/vol_inode:vol0/vol_files:vol0/vol_snap:vol1/vol_inode:vol1/vol_files:vol1
OK vol_data:vol0 - /vol/vol0/: Used 3.6GiB (27.8%) out of 12.8GiB
CRITICAL vol_snap:vol0 - /vol/vol0/.snapshot: Used 5.3GiB (166.3%) out of 3.2GiB
OK vol_inode:vol0 - /vol/vol0/: Used inodes 11.7k (0.1%) out of 7.8M
OK vol_files:vol0 - /vol/vol0/: Used files 11.7k (0.1%) out of 7.9M, may raised to 4.0M
WARNING vol_data:vol1 - /vol/vol1/: Used 451.9GiB (66.5%) out of 680.0GiB
OK vol_snap:vol1 - /vol/vol1/.snapshot: Used 6.1GiB (3.6%) out of 170.0GiB
OK vol_inode:vol1 - /vol/vol1/: Used inodes 4.8M (18.0%) out of 26.5M
OK vol_files:vol1 - /vol/vol1/: Used files 4.8M (15.2%) out of 31.2M, may raised to 211.7M
Performance data stripped for all examples!
Command line format¶
%./check_naf.py [options] -H hostname [checktuple [checktuple [...]]]
Format of (multiple) "checktuple"¶
Format: checkname[,[target][,[warn][,[crit]]]]
- Multiple checktuples could be specified as space separated list, e.g. "
cpu nvram global" - Options are separated with "," from check name and other options, e.g. "
disk,spare,2,1"- Trailing "," could be omitted, e.g. "
disk,spare" same as "disk,spare,," - 2nd option is target
- Some checks have a default target, e.g. "
disk" same as "disk,failed"
- Some checks have a default target, e.g. "
- 3rd (warning) and 4th (critical) options are thresholds, e.g. "
disk,failed,1,2" - If no target but thresholds should be used, just leave the target option empty, e.g. "
disk,,1,2"
- Trailing "," could be omitted, e.g. "
- Some checks may/must have a "target", e.g. "
disk,failed" (same as "disk") - Multiple checks and/or targets are possible
- They all have the same warning and critical thresholds!
- Multiple targets for same check(s) are specified as "plus sign" separated list, e.g. "
vol_data,/vol/vol0/+/vol/vol1/,50%,75%" - Multiple checks for same target(s) are specified as "plus sign" separated list, e.g. "
vol_data+vol_snap,/vol/vol0/,50%,75%"
Implemented checks - Overview¶
| "check name" | "target" | Thresholds | Description |
global |
-- | -- | Check internal monitoring, includes fan, power supplies, disk shelfs, ... |
cluster |
-- | -- | Check cluster state |
cifs |
stats |
-- | Show CIFS statistics |
cifs |
users |
# | Monitor connected CIFS users |
cpu |
-- | % | Monitor busy cpu time |
disk |
-- | # | Same as "disk,failed" |
disk |
failed |
# | Check for failed disks |
disk |
spare |
# | Check for spare disks |
environment |
-- | -- | Same as "global" |
extcache |
-- | Show extcache performance (needs SNMP v2c/v3) | |
extcache_info |
-- | Show extcache hardware informations | |
io |
-- | -- | Show I/O traffic for net, disk, tape, FCP and iSCSI |
nvram |
-- | -- | Categorize NVRAM status code |
ops |
-- | -- | Show total ops for net, CIFS, HTTP, FCP and iSCSI |
snapmirror |
SM Source/Dest. | s | Check if target is snapmirrored and - if thresolds specified - is not older than X seconds |
snapvault |
SV Source/Dest. | s | Check if target is snapvaulted and - if thresolds specified - is not older than X seconds |
version |
-- | -- | Show ONTAP version |
vol_data |
Volume(s), Aggr(s) | B,% | Check "data" part of volume/aggr for used/free1 space |
vol_snap |
Volume(s), Aggr(s) | B,% | Check "snapshot" part of volume/aggr for used/free1 space |
vol_inode |
Volume(s), Aggr(s) | #,% | Check used/free1 inode count volume/aggr |
vol_files |
Volume(s), Aggr(s) | #,% | Check maximum number of files used/free1 on volume/aggr |
Thresholds:
%: Percent#: CountB: Bytesk,M,G, ... with base 1000Ki,Mi,Gi, ... with base 1024
s: Seconds
1 If checking "free" or "used" depends on thresholds: if warn<crit then check "used", if warn>crit then check "free" (not implemented yet)