Apinger giving optimistic RTTs in 2.2.
I had the same issue, that is, every time a MONITOR would reach high latency, crap RTT.
In my case, this is a new behavior in 2.2 and it is definitely a NO GO as I badly need load balancing with apinger.
So, I ended up restarting apinger when monitoring alarm goes off:
1. a PHP script that relaunch apinger, located in "/root/restart_apinger.php":
#!/usr/local/bin/php -f require_once("service-utils.inc"); service_control_restart("apinger", ""); ?>
Don't forget to have PHP script executable:
chmod +x /root/restart_apinger.php
2. a patch in "/etc/inc/gwlb.inc" where "/var/etc/apinger.conf" is stored.
Always backup first:
cp -p /etc/inc/gwlb.inc /etc/inc/gwlb.inc~
Patch "/etc/inc/gwlb.inc" alarm command off so it run "restart_apinger.php" script on MONITOR alarm off event + some time:
alarm default { command on "/usr/local/sbin/pfSctl -c 'service reload dyndns %T' -c 'service reload ipsecdns' -c 'service reload openvpn %T' -c 'filter reload' " command off "/usr/local/sbin/pfSctl -c 'service reload dyndns %T' -c 'service reload ipsecdns' -c 'service reload openvpn %T' -c 'filter reload' ; sleep 10 && /root/restart_ apinger.php &" combine 10s }
Then run apinger restart script to update configuration file:
and verify if "apinger.conf" is updated correctly:
less /var/etc/apinger.conf
Any one with a smarter way is welcome.
Justin that's a pretty cool workaround (thanks for posting it) but I am concerned – is apinger, such a core feature of the system, that badly broken? Maybe it would help to create a bounty to fix the underlying behavior as well as add some much needed features like monitoring multiple target IPs before declaring a gateway "down" ? I for one would happily donate towards such an effort.
The problem is recreating the issues for the people who are able to fix it.
Pretty sure it's not that hard to re-create the issue (high latency / host down) using ipfw for example:
http://info.iet.unipi.it/~luigi/dummynet/Not sure apinger is that badly broken: while it is/was a show stopper to me in version 2.2 it used to work just fine before. Don't blame the dev so hard, once they/we find the real problem cause, it should probably be easy to fix.
Humm OKhttps://github.com/Jajcus/apinger/blob/master/BUGS -
Yes it seems unless I am reading it wrong that the last actual code change commit (76b1470) was 9 years ago. Maybe that needs some dusting off? :o
This is not what's used on pfSense.
Oh I see- where is the source code for the apinger.c that is used in pfsense?
Ok was not aware. Reading up now, thank you
So the fix I posted earlier works, but apinger still crash sometimes, so I wrote a watchdog script to put in the crontab.
The "watch_apinger.sh" shell script will look for apinger PID using ps CLI, look for apinger PID using its PID file (in case of SIG_KILL), compare both values, and run "restart_apinger.php" script if needed:
#!/bin/sh RELAUNCH=0 CUR=$(ps xcopid,command | awk '/apinger/ {print $1}') PID=$(cat /var/run/apinger.pid) [ -z "$CUR" ] && RELAUNCH=1 [ -z "$PID" ] && RELAUNCH=1 [ "$CUR" != "$PID" ] && RELAUNCH=1 [ "$RELAUNCH" == 1 ] && ( killall apinger ; /root/restart_apinger.php )
Save it to "/root/watch_apinger.sh" and make it executable :
chmod +x /root/watch_apinger.sh
Then add it to root crontab:
echo "* * * * * /root/watch_apinger.sh" | crontab -