Apinger giving optimistic RTTs in 2.2.
-
I had the same issue, that is, every time a MONITOR would reach high latency, crap RTT.
In my case, this is a new behavior in 2.2 and it is definitely a NO GO as I badly need load balancing with apinger.
So, I ended up restarting apinger when monitoring alarm goes off:
1. a PHP script that relaunch apinger, located in "/root/restart_apinger.php":
#!/usr/local/bin/php -f require_once("service-utils.inc"); service_control_restart("apinger", ""); ?>
Don't forget to have PHP script executable:
chmod +x /root/restart_apinger.php
2. a patch in "/etc/inc/gwlb.inc" where "/var/etc/apinger.conf" is stored.
Always backup first:
cp -p /etc/inc/gwlb.inc /etc/inc/gwlb.inc~
Patch "/etc/inc/gwlb.inc" alarm command off so it run "restart_apinger.php" script on MONITOR alarm off event + some time:
alarm default { command on "/usr/local/sbin/pfSctl -c 'service reload dyndns %T' -c 'service reload ipsecdns' -c 'service reload openvpn %T' -c 'filter reload' " command off "/usr/local/sbin/pfSctl -c 'service reload dyndns %T' -c 'service reload ipsecdns' -c 'service reload openvpn %T' -c 'filter reload' ; sleep 10 && /root/restart_ apinger.php &" combine 10s }
Then run apinger restart script to update configuration file:
/root/restart_apinger.php
and verify if "apinger.conf" is updated correctly:
less /var/etc/apinger.conf
Any one with a smarter way is welcome.
-
Justin that's a pretty cool workaround (thanks for posting it) but I am concerned – is apinger, such a core feature of the system, that badly broken? Maybe it would help to create a bounty to fix the underlying behavior as well as add some much needed features like monitoring multiple target IPs before declaring a gateway "down" ? I for one would happily donate towards such an effort.
-
The problem is recreating the issues for the people who are able to fix it.
-
Pretty sure it's not that hard to re-create the issue (high latency / host down) using ipfw for example:
http://info.iet.unipi.it/~luigi/dummynet/Not sure apinger is that badly broken: while it is/was a show stopper to me in version 2.2 it used to work just fine before. Don't blame the dev so hard, once they/we find the real problem cause, it should probably be easy to fix.
EDIT:
Humm OKhttps://github.com/Jajcus/apinger/blob/master/BUGS -
Yes it seems unless I am reading it wrong that the last actual code change commit (76b1470) was 9 years ago. Maybe that needs some dusting off? :o
-
This is not what's used on pfSense.
-
Oh I see- where is the source code for the apinger.c that is used in pfsense?
-
https://forum.pfsense.org/index.php?topic=76132.0
-
Ok was not aware. Reading up now, thank you
-
So the fix I posted earlier works, but apinger still crash sometimes, so I wrote a watchdog script to put in the crontab.
The "watch_apinger.sh" shell script will look for apinger PID using ps CLI, look for apinger PID using its PID file (in case of SIG_KILL), compare both values, and run "restart_apinger.php" script if needed:
#!/bin/sh RELAUNCH=0 CUR=$(ps xcopid,command | awk '/apinger/ {print $1}') PID=$(cat /var/run/apinger.pid) [ -z "$CUR" ] && RELAUNCH=1 [ -z "$PID" ] && RELAUNCH=1 [ "$CUR" != "$PID" ] && RELAUNCH=1 [ "$RELAUNCH" == 1 ] && ( killall apinger ; /root/restart_apinger.php )
Save it to "/root/watch_apinger.sh" and make it executable :
chmod +x /root/watch_apinger.sh
Then add it to root crontab:
echo "* * * * * /root/watch_apinger.sh" | crontab -