Apinger giving optimistic RTTs in 2.2.



  • I started noticing on the later 2.2-RC releases that apinger would report anomalously low round-trip times starting some indeterminate time (usually > 24 hr) after startup. Restarting apinger clears it up, so I don't believe this is an externally caused anomaly.

    By "anomalously low" I mean that one link which normally reports a RTT of 7-8 ms will report 1-2 ms instead; the other link, which normally shows >22 ms, will instead report 1-2 ms as well.

    Is anyone else seeing this?



  • You're not the only one, see the thread below.

    https://forum.pfsense.org/index.php?topic=87819.0

    I hadn't tried to restart apinger until I saw your post, but that seems to clear up the issue for me as well.  Thanks.



  • i'll throw my hat into this as well, had problems with apinger on 2.1.x, still seeing issues on 2.2. restarting apinger every couple hours gets it going until it jsut breaks down a couple hours later, requiring another restart. repeat this for eternity. i'm still of the opinion that apinger needs replaced for something more reliable (or to get fixed once and for all).

    i will post up logs later after it generates a new set and then promptly craps itself

    edit: that didn't take long, it's being even more useless now:

    
    Feb 4 09:54:22	apinger: ALARM: WAN_PPPOE(4.2.2.1) *** WAN_PPPOEdown ***
    Feb 4 09:58:42	apinger: Exiting on signal 15.
    Feb 4 09:58:43	apinger: Starting Alarm Pinger, apinger(13709)
    Feb 4 09:59:07	apinger: SIGHUP received, reloading configuration.
    Feb 4 09:59:13	apinger: SIGHUP received, reloading configuration.
    Feb 4 09:59:38	apinger: SIGHUP received, reloading configuration.
    Feb 4 10:03:55	apinger: ALARM: WAN_PPPOE(4.2.2.1) *** WAN_PPPOEdelay ***
    Feb 4 10:05:51	apinger: Exiting on signal 15.
    Feb 4 10:05:52	apinger: Starting Alarm Pinger, apinger(20184)
    Feb 4 10:07:19	apinger: SIGHUP received, reloading configuration.
    Feb 4 10:08:03	apinger: ALARM: WAN_PPPOE(4.2.2.1) *** WAN_PPPOEdelay ***
    Feb 4 10:08:13	apinger: ALARM: WAN2_PPPOE(4.2.2.2) *** WAN2_PPPOEdelay ***
    Feb 4 10:10:03	apinger: alarm canceled: WAN_PPPOE(4.2.2.1) *** WAN_PPPOEdelay ***
    Feb 4 10:10:24	apinger: alarm canceled: WAN2_PPPOE(4.2.2.2) *** WAN2_PPPOEdelay ***
    Feb 4 10:11:40	apinger: Exiting on signal 15.
    Feb 4 10:11:56	apinger: Starting Alarm Pinger, apinger(25269)
    Feb 4 10:13:27	apinger: ALARM: WAN_PPPOE(4.2.2.1) *** WAN_PPPOEdelay ***
    Feb 4 10:13:47	apinger: ALARM: WAN2_PPPOE(4.2.2.2) *** WAN2_PPPOEdelay ***
    Feb 4 10:15:07	apinger: alarm canceled: WAN_PPPOE(4.2.2.1) *** WAN_PPPOEdelay ***
    Feb 4 10:15:27	apinger: alarm canceled: WAN2_PPPOE(4.2.2.2) *** WAN2_PPPOEdelay ***
    Feb 4 10:23:20	apinger: ALARM: WAN_PPPOE(4.2.2.1) *** WAN_PPPOEdelay ***
    Feb 4 10:24:59	apinger: alarm canceled: WAN_PPPOE(4.2.2.1) *** WAN_PPPOEdelay ***
    
    




  • amusingly mine's the opposite of the problem in that thread, my pings should be 20-40ms and drop to crap (as much as i wish my ping was 1ms, that's impossible on adsl).

    if they don't plan to fix or change anything perhaps someone should develop a replacement for apinger, given what it does it shouldn't take more than a week or two to write something new and get it stuffed into pfsense as an additional package.



  • I had the same issue, that is, every time a MONITOR would reach high latency, crap RTT.

    In my case, this is a new behavior in 2.2 and it is definitely a NO GO as I badly need load balancing with apinger.

    So, I ended up restarting apinger when monitoring alarm goes off:

    1. a PHP script that relaunch apinger, located in "/root/restart_apinger.php":

    
    #!/usr/local/bin/php -f
    require_once("service-utils.inc");
    service_control_restart("apinger", "");
    ?>
    
    

    Don't forget to have PHP script executable:

    
    chmod +x /root/restart_apinger.php
    
    

    2. a patch in "/etc/inc/gwlb.inc" where "/var/etc/apinger.conf" is stored.

    Always backup first:

    
    cp -p /etc/inc/gwlb.inc /etc/inc/gwlb.inc~
    
    

    Patch "/etc/inc/gwlb.inc" alarm command off so it run "restart_apinger.php" script on MONITOR alarm off event + some time:

    
    alarm default {
            command on "/usr/local/sbin/pfSctl -c 'service reload dyndns %T' -c 'service reload ipsecdns' -c 'service reload openvpn %T' -c 'filter reload' "
            command off "/usr/local/sbin/pfSctl -c 'service reload dyndns %T' -c 'service reload ipsecdns' -c 'service reload openvpn %T' -c 'filter reload' ; sleep 10 && /root/restart_
    apinger.php &"
            combine 10s
    }
    
    

    Then run apinger restart script to update configuration file:

    
    /root/restart_apinger.php
    
    

    and verify if "apinger.conf" is updated correctly:

    
    less /var/etc/apinger.conf
    
    

    Any one with a smarter way is welcome.



  • Justin that's a pretty cool workaround (thanks for posting it) but I am concerned – is apinger, such a core feature of the system, that badly broken? Maybe it would help to create a bounty to fix the underlying behavior as well as add some much needed features like monitoring multiple target IPs before declaring a gateway "down" ? I for one would happily donate towards such an effort.



  • The problem is recreating the issues for the people who are able to fix it.



  • Pretty sure it's not that hard to re-create the issue (high latency / host down) using ipfw for example:
    http://info.iet.unipi.it/~luigi/dummynet/

    Not sure apinger is that badly broken: while it is/was a show stopper to me in version 2.2 it used to work just fine before. Don't blame the dev so hard, once they/we find the real problem cause, it should probably be easy to fix.

    EDIT:
    Humm OK https://github.com/Jajcus/apinger/blob/master/BUGS



  • Yes it seems unless I am reading it wrong that the last actual code change commit (76b1470) was 9 years ago. Maybe that needs some dusting off? :o


  • Banned

    This is not what's used on pfSense.



  • Oh I see- where is the source code for the apinger.c that is used in pfsense?


  • Banned



  • Ok was not aware. Reading up now, thank you



  • So the fix I posted earlier works, but apinger still crash sometimes, so I wrote a watchdog script to put in the crontab.

    The "watch_apinger.sh" shell script will look for apinger PID using ps CLI, look for apinger PID using its PID file (in case of SIG_KILL), compare both values, and run "restart_apinger.php" script if needed:

    
    #!/bin/sh
    
    RELAUNCH=0
    CUR=$(ps xcopid,command | awk '/apinger/ {print $1}')
    PID=$(cat /var/run/apinger.pid)
    
    [ -z "$CUR" ] && RELAUNCH=1
    [ -z "$PID" ] && RELAUNCH=1
    [ "$CUR" != "$PID" ] && RELAUNCH=1
    
    [ "$RELAUNCH" == 1 ] && ( killall apinger ; /root/restart_apinger.php )
    
    

    Save it to "/root/watch_apinger.sh" and make it executable :

    
    chmod +x /root/watch_apinger.sh
    
    

    Then add it to root crontab:

    
    echo "* * * * * /root/watch_apinger.sh" | crontab -