Gateway Monitoring (Advanced - Down)

NOYB

System: Gateways: Edit Gateway
(System: Routing: Edit Gateway)

Advanced:
Down:
"This defines the number of bad probes before the alarm will fire. Default is 10."

Seems to be used as seconds.

With Frequency Probe at 30 and Down at 10 (Default) or 15 the interface keeps getting marked as down and up. Cycling every 15 seconds or so.

Seems only way to prevent this is to have Down greater then Frequency Probe. Say 35.

databeestje

sounds good, we'll add some validation

phil.davis

I'll have a play with this at home this evening, as I use these settings on sites with slow links that get easily swamped by user downloads - in those cases I put the latency limits high and use less frequent probes.
The apinger conf doc does say:

"Down" alarm definition.

This alarm will be fired when target doesn't respond for 30 seconds.

alarm down "down" {
time 30s
}

so it is a down time, rather than probe failure count.
I also noticed the following on the UI:

Frequency Probe ('interval' var in the code) is not validated - I can enter 'xyz' - wasn't game to see what expoded on a production system if I 'Apply Changes'.
Down validation - it is validated correctly, but the error text is a cut-and-paste of the low latency validation error text.

phil.davis

I just submitted a pull request with a new system_gateways_edit.php - it implements lots of validation of the parameters in the advanced section. Also the explanations and error messages have been made correct and standardised. Previously there were lots of weird things that could be entered. Now I believe I have checked for all the obviously wrong conditions.