Description:
The alert-monitor
compares the current value of variables specified in
the alerts file in the configuration directory with
threshold values and sets the status of those variables accordingly.
It saves the current status of variables in /var/lib/remstats/data/ALERTS
.
What value corresponds to what status level is set in the
rrd definition or sometimes the
host definition. This way an rrd
definition will specify generally reasonable levels, but they can be
overridden for hosts where they aren't reasonable.
For an rrd definition, an alert line looks like:
alert varname relation oklevel [warnlevel [errorlevel]]
or
alert varname nodata status
[The latter says that missing data for variable varname
will cause its status
to be level status
.]
For a host-specified alert level, the line looks like:
alert rrdname varname relation oklevel [warnlevel [errorlevel]]
or
alert rrdname varname nodata status
and the interpretation is the same, except that you're having to say
which rrd this alert refers to.
The available relations are:
< (value is less than threshold)
> (value is greater than threshold)
= (value is equal to threshold)
|< (absolute value of value is less than threshold)
|> (absolute value of value is greater than threshold)
delta< (difference between last two values is less than threshold)
delta> (difference between last two values is greater than threshold)
<daystddev (value is outside threshold * the past day's standard-deviation)
<weekstddev (value is outside threshold * the past day's standard-deviation)
<monthstddev (value is outside threshold * the past day's standard-deviation)
Example
To make things more concrete for the first (normal) case, here's a real example,
from the load
rrd supplied in config-base
:
alert load5 < 3 7 10
This means that if the load5
variable is less than 3, the status is set to OK.
If it's less than 7, it's WARN, less than 10 it's ERROR and more than that, it's
CRITICAL.
Since the first match is taken, it's possible to leave out the upper levels if
you don't want them to ocurr. For example if you only wanted load5
to ever
go to WARN level, never above, you could use:
alert load5 < 3
and then the only possible status levels are OK and WARN.
The possible relation
s are: <, =, >, |<, |>, delta<, delta>. The first
three should be obvious. The next two allow comparisons to the absolute value of
the variable's current value. The last two allow comparisons to the change in
value.