Pfsense 2.1-release - Gateway aPinger broken?



  • Been running pfsense 2.1-release and found that Gateway monitor (apinger service) starts well after a reboot, but gradually the latency count (ms) climbs until it finally reports PENDING status.  1 WAN and 1 LAN interface on a home network.  50 mbit down/5 mbit up.  I'd hate to turn off the gateway monitor but not sure how to get it working reliably.  Have even created floating rules in traffic shaper (HFSC) to prioritize ICMP packets during heavy traffic times.  Frustrating.

    Has anyone else faced a similar issue?  Also I wasn't sure whether maybe this issue has been addressed in one of the newer 2.1-release daily builds.  Pfsense is reporting I'm on the latest version back from the release date, but is no longer reporting that a daily build is available.

    EDIT:
    I've since followed these instructions and it appears to have eliminated DROPS
    http://forum.pfsense.org/index.php?topic=50901.0



  • Yes, I get the same.  See the following link to other issues I've encountered.

    http://forum.pfsense.org/index.php/topic,69261.0.html

    I just can't trust it.  Unfortunately, I have several machines that I upgraded that I can't easily get back to 2.0.3.    I just had to restart the apinger service on another this morning.

    I'll have to look into doing a cron job of restarting the apinger on a nightly basis, i guess.

    What's a bit more troubling is the lack of discussion on this by the developers.  That's only my perception and may not be accurate.



  • Brutal, appreciate your insights.  Fortunately or not, I'm glad to hear I'm not alone.  Figured this thread would be lighting up w/ similar experiences unless others just haven't realized their experiencing the same issue.  I've since rebooted and disabled the Gateway Monitor completely.  Since then, everything appears stable.  Despite not seeing the ping response, it doesn't appear to impact my connection as it's always been stable and responsive.  Considering the apinger service bugs, I'm not entirely convinced what I was seeing (green, 0% loss, etc.) was even accurate.



  • I can confirm this problem.
    I run a system with 2 WAN. Apinger degrade my second interface  WAN2 in the first place. After enough time it will affect the first WAN interface  too. I need to restart apinger service daily for the moment.
    If there will be necessary I will provide required information.

    Yours
    VV



  • Thanks Zillo.  My experience is similar to yours only w/ a single WAN interface. I hope that this issue will be resolved soon.

    BTW, are there still daily snapshots available with 2.1-release?  If so, what is the URL for those to enter into the firmware update interface?



  • Hi,

    New member of the forum, but using pfsense since half a year, I have similar issues.

    I have two Internet access, the whole was working fine, and since one week (a little more, probably), I had a lot of loos on my 'primary link'. One good part of that trouble is that actually, my backup didn't work, now it does  :D

    But still, I swapped my 'primary' and my secondary access, and for sure, I started to see my line going offline. The stangest thing is to see in the monitor dashboard, message like : RTT 5.5 ms, loss 100 % ???

    I'll see if restarting the apinger process is also a solution here.

    BTW, can I easily go back to pfsense 2.0.3 ?

    Thanks,
    Pascal



  • @miles267:

    Have even created floating rules in traffic shaper (HFSC) to prioritize ICMP packets during heavy traffic times.  Frustrating.

    Floating rules are broken so try to avoid them…. perhaps then your problem is temporary fixed ?



  • Hi,

    Even after restarting apinger, no luck I did lost my gateway, (at least pfsense 'sense' it !)

    As my pfsense is actually a VM in my little ESXi, I just installed a PFsense 2.0.3 in a brand new VM, restored the configuration from the 2.1 (with a big warning about my silly action), and so far, it seems that everything is OK.

    The plan is to wait a week, and if everything is OK, I'll try again an 2.1 update (but with an active Snap, to be able to go bak a little faster !).

    I'll let you know.

    For the record : pfsense is barely from scratch, with only three additional packages : ntop, Open VM Tools  and tftp (to let my SIP phone boot).

    Thanks,
    Pascal



  • I have the exact same problem, apinger is broken in 2.1 I had to roll back 4 installs to 2.0.3



  • For those who were encountering this issue, it appears to have been corrected in the pfsense 2.1.1 PRERELEASE.  Can any of the ops confirm?



  • Issue reproduced at my env too. I have 2 installations, each with 2 WAN. Sometimes i have to restart apinger manually to get back WAN online.

    Maybe someone could share cron script to do this?



  • I`ve found temporary solution - restarting apinger every 5 minutes.

    Open shell via putty

    Go to /usr/local/www
    and create file apinger.php with the following content:
    require_once("/etc/inc/service-utils.inc");
    require_once("/etc/inc/globals.inc");
    require_once("/etc/inc/vslb.inc");
    require_once("/etc/inc/gwlb.inc");

    service_control_restart(apinger,restartservice)
    ?>

    chmod 755 apinger.php

    To run it manually you can just open http://pfsense/apinger.php

    Then install cron package via pfsense UI and add new job:
    */5 * * * * /usr/local/bin/php /usr/local/www/apinger.php



  • Same issue as everyone on this forum..

    Different plataforms (2 running as VM under ESXi, and other on an alix) on different type of interfaces (some ethernets, and some openvpn interfaces). And ive been restarting the apinger in order to 'clear' the Packet loss % so the gw can be detected up again…

    Hope pfsense devs fix this on the next release since its a bummer for pfsenses users who rely on multiwan



  • @miles267:

    For those who were encountering this issue, it appears to have been corrected in the pfsense 2.1.1 PRERELEASE.  Can any of the ops confirm?

    I'm still having the issue with 2.1.1 release; pfsense reports 2% packet loss, but when I ping from the command line, it's all clean?



  • Similar issues here.

    How do you connect to your WAN(s)?
    Do you use a modem or a router in front of pfSense?

    It seems that a (DSL/Cable) router makes more problems at my installations.
    Unfortunately it is not always possible to exchange router with a modem.

    Are there any upcomming solutions to the issues, without hacking around in pfS?

    Cheers,
    Harry



  • I am using 2.1.2, and I can confirm there is definitively a problem. Now, in my case, it does not look like a pure aPinger problem

    The issue is that I am using PPoE for my WAN connection and the script responsible for configuring aPinger (gwlb.inc) gets it wrong.

    Basically PPoE marks the gateway as "dynamic" but the script tries to check if that's an IP address (which is clearly isn't!).

    This then leads aPinger to fail and the gateway report page stays stuck on "Pending"…

    I am going to try and see if I can hack the thing on my test machine for the time being but I would welcome any recommendations on how I should report this.

    Thanks.



  • Yes, I can confirm I got it to work by changing the gwlb.inc script.

    Part of the problem is that GUI checks (the story about providing only an IP address) are present here. So when PPoE sets the IP to "dynamic" it all goes wrong (the script says: Not an IPv4 or an IPv6 address => Do not configure apinger => apinger then fails).

    Ideally the PPoE system should be improved to provide something more useful than just "dynamic" but this is probably a much bigger scope of work.

    In the meantime, changing the block (in the setup_gateways_monitor method):

    
    if (is_ipaddrv4($gateway['gateway'])) {
    			...
    } else if (is_ipaddrv6($gateway['gateway'])) {
    			...
    } else
    			continue;
    

    by

    		if ($gateway['ipprotocol'] == "inet") {
    			$gwifip = find_interface_ip($gateway['interface'], true);
    			if (!is_ipaddrv4($gwifip))
    				continue; //Skip this target
    
    			/*
    			 * If the gateway is the same as the monitor we do not add a
    			 * route as this will break the routing table.
    			 * Add static routes for each gateway with their monitor IP
    			 * not strictly necessary but is a added level of protection.
    			 */
    			if (is_ipaddrv4($gateway['gateway']) && $gateway['monitor'] != $gateway['gateway']) {
    				log_error("Removing static route for monitor {$gateway['monitor']} and adding a new route through {$gateway['gateway']}");
    				mwexec("/sbin/route change -host " . escapeshellarg($gateway['monitor']) .
    					" " . escapeshellarg($gateway['gateway']), true);
    			}
    		} else if ($gateway['ipprotocol'] == "inet6") {
    			if ($gateway['monitor'] == $gateway['gateway']) {
    				/* link locals really need a different src ip */
    				if (is_linklocal($gateway['gateway'])) {
    					$gwifip = find_interface_ipv6_ll($gateway['interface'], true);
    				} else {
    					$gwifip = find_interface_ipv6($gateway['interface'], true);
    				}
    			} else {
    				$gwifip = find_interface_ipv6($gateway['interface'], true);
    				if (is_linklocal($gateway['monitor'])) {
    					if (!strstr($gateway['monitor'], '%')) {
    						$gateway['monitor'] .= "%{$gateway['interface']}";
    					}
    				} else {
    					// Monitor is a routable address, so use a routable address for the "src" part
    					$gwifip = find_interface_ipv6($gateway['interface'], true);
    				}
    			}
    			if (!is_ipaddrv6($gwifip))
    				continue; //Skip this target
    
    			/*
    			 * If the gateway is the same as the monitor we do not add a
    			 * route as this will break the routing table.
    			 * Add static routes for each gateway with their monitor IP
    			 * not strictly necessary but is a added level of protection.
    			 */
    			if (is_ipaddrv6($gateway['gateway']) && $gateway['monitor'] != $gateway['gateway']) {
    				log_error("Removing static route for monitor {$gateway['monitor']} and adding a new route through {$gateway['gateway']}");
    				mwexec("/sbin/route change -host -inet6 " . escapeshellarg($gateway['monitor']) .
    					" " . escapeshellarg($gateway['gateway']), true);
    			}
    		} else { 
    			continue;
    		}
    

    Please note that in my case I also had to change the logic of how it determines the source address: I was trying to force the monitoring of a routable IPv6 address but the script was forcing a local link address as the source address because the default monitor was a local link address (so I change the order so that it considers first if the monitor gateway address had been manually specified in the config).

    HTH



  • Hi Guys,
    I am experiencing exactly the same issue, got 2 WANs configured for loadbalancing, but since the update I noticed that loadbalancing is not working anymore besides the fact that the 2nd WAN in marked as offline.
    I have also a tier 2 H3g connection set up on the loadbalancing, I noticed that pfsense is keeping it up even though both WANs are up and running….
    My broadband connections are both Pppoe.

    I'am planning to roolback to 2.0.3

    Bye



  • Yes, that does not surprise me (quite clearly part of the load balancing work depends on properly determining which gateway is working fine).

    For info, I have submitted my changes to GitHub. They improve things a bit but I still see issues after link loss where the monitoring of my IPv6 gateway is not properly reset. I suspect this is caused by timing issues where pfSense is trying to setup apinger whilst not everything is ready yet…



  • @mukidu:

    Hi Guys,
    I am experiencing exactly the same issue, got 2 WANs configured for loadbalancing, but since the update I noticed that loadbalancing is not working anymore besides the fact that the 2nd WAN in marked as offline.
    I have also a tier 2 H3g connection set up on the loadbalancing, I noticed that pfsense is keeping it up even though both WANs are up and running….
    My broadband connections are both Pppoe.

    I'am planning to roolback to 2.0.3

    Bye

    What version you are using? Does anyone tried 2.1.3 version?



  • The fix I submitted was accepted 13 days ago, however it was too late for it to make it in 2.1.3…

    So I suppose we will only see it on 2.1.4 and later.

    My fix improves the situation as before link monitoring would just not work at all with PPoE (and probably any kind of dynamically established links, so that would include VPNs). This means you can now see packet loss and latency on PPoE links (and I suspect VPN links too, although I haven't tried).

    Now, my fix still needs some work admittedly. I am still seeing some issues sometimes after a complete link loss. When the link comes back up it looks like pfSense calls the script way too early (at a stage where we don't even know the IP to monitor). It seems to be a timing issue as this only happens from time to time.

    It is easy to "fix" as once the link is established you just wait a few seconds and then have to stop/start the "apinger" service and it will then get the proper addresses.

    But it still means the solution won't be good enough to allow proper automated link failover...

    I am probably going to try and have a look at some point, but I suspect it is an issue that will be caused by something deep in the internals of pfSense. So could well be out of my league...



  • i have mutli wan set up with cable as WAN1 and PPPoE DSL as WAN2. my cable gateway is up and running but PPPoE is showing as Pending and restarting the apinger doesn't bring it up online. but im able to ping using DSL gateway. is this the problem that apinger is causing ?

    im on the 2.1.3



  • I ran into this problem for the first time.  I have WAN failover configured but not load balancing.  I rebooted one WAN modem and now the GW monitor just died and will not come back.

    It is configured to ping every 2 seconds




  • @krylou:

    @mukidu:

    Hi Guys,
    I am experiencing exactly the same issue, got 2 WANs configured for loadbalancing, but since the update I noticed that loadbalancing is not working anymore besides the fact that the 2nd WAN in marked as offline.
    I have also a tier 2 H3g connection set up on the loadbalancing, I noticed that pfsense is keeping it up even though both WANs are up and running….
    My broadband connections are both Pppoe.

    I'am planning to roolback to 2.0.3

    Bye

    What version you are using? Does anyone tried 2.1.3 version?

    I upgraded from 2.0.1 to 2.1.2, I also tried 2.1.3 but the issue is still there…



  • I am on 2.1.3 and I can confirm huge issue with apinger. My otherwise stable WANs are shown as "Pending" no matter what monitor IP I set. Even setting the gateway IP address one hop away causes the apinger to show the WAN as Pending and therefore exclude it from the gateway group, causing interruptions all the time.

    BTW, all of you who are experiencing this issue, is it the case that this is due to having traffic shaper configured? Or does it happen even without traffic shaper?

    This is a very core, basic function and must be fixed asap.



  • Can anyone specify if apinger has been fixed yet… or if there is a good work around?  I've got two WANs, both static, and since upgrading to 2.1p1 it's needed the apinger service reset frequently.  I've got multiple other devices at other locations, DHCP and Static that have had to reset the apinger since the 2.1 upgrades started.

    Thanks for any help.

    Bob



  • Hi Bob,

    Please try the work around we used and got good result:

    https://forum.pfsense.org/index.php?topic=78502.0

    Thanks,
    msu



  • There's definitely still a problem in 2.1.4 and since 2.1.5 just released and I know that they are still trying to figure out the problem, I'm assuming it is still an issue in 2.1.5.

    I have this problem on ONE pfSense box with a flakey T1 (yes, they DO exist, but apparently only in Detroit).  I have never seen this on any other of my many pfSense installations.

    They are aware of the issue and trying to figure it out.  I'm in communication with them and have given them access to the ONE firewall where I have this problem.

    In the meantime, I'm going to create a CRON job to restart apinger service at given intervals.  BTW, this command will restart it:

    /usr/local/sbin/pfSsh.php playback svc restart apinger



  • Yes, it's still there, hope it will disappear in 2.2.



  • Please report it and discuss in the following 2.2 thread:

    https://forum.pfsense.org/index.php?topic=78502



  • @pubmsu:

    Please report it and discuss in the following 2.2 thread:

    https://forum.pfsense.org/index.php?topic=78502

    I already did.


  • Banned

    I have this Cron job running every 5 minutes and for now it has been running no isssues.

    pkill /usr/local/sbin/apinger -c /var/etc/apinger.conf

    It seems some sort of buffer is filled on WAN DHCP interfaces and it begins to "suffer".

    I havent got any issues on static WAN interfaces.


  • Netgate

    I just had to disable gateway monitoring on a pfSense in a datacenter because apinger started seeing 6% loss no matter what was going on.  Steady - every sample. 2.1.5 i386.



  • Please report it and discuss in the following 2.2 thread:

    https://forum.pfsense.org/index.php?topic=78502



  • @krylou:

    I`ve found temporary solution - restarting apinger every 5 minutes.

    Open shell via putty

    Go to /usr/local/www
    and create file apinger.php with the following content:
    require_once("/etc/inc/service-utils.inc");
    require_once("/etc/inc/globals.inc");
    require_once("/etc/inc/vslb.inc");
    require_once("/etc/inc/gwlb.inc");

    service_control_restart(apinger,restartservice)
    ?>

    chmod 755 apinger.php

    To run it manually you can just open http://pfsense/apinger.php

    Then install cron package via pfsense UI and add new job:
    */5 * * * * /usr/local/bin/php /usr/local/www/apinger.php

    Little extended script:

    
    require_once("/etc/inc/service-utils.inc");
    require_once("/etc/inc/globals.inc");
    require_once("/etc/inc/gwlb.inc");
    
    $counter = 0;
    $a_gateways = return_gateways_array();
    $gateways_status = array();
    $gateways_status = return_gateways_status(true);
    
    foreach ($a_gateways as $gname => $gateway) {
    	if ($gateways_status[$gname]) {
    		$str_data = $gateways_status[$gname]['delay'];
    		$pos = substr($str_data,0,strpos($str_data, "ms"));
    		if (floatval($pos) > 500 ) {
    			$counter++;
    			#print_r($counter ." - ". $pos ."\r\n");
    		}	
    	}	
    }
    if ($counter > 0) {
    	service_control_restart(apinger,restartservice);
    }
    ?>
    
    


  • @hmh:

    Little extended script:

    
    require_once("/etc/inc/service-utils.inc");
    require_once("/etc/inc/globals.inc");
    require_once("/etc/inc/gwlb.inc");
    
    $counter = 0;
    $a_gateways = return_gateways_array();
    $gateways_status = array();
    $gateways_status = return_gateways_status(true);
    
    foreach ($a_gateways as $gname => $gateway) {
    	if ($gateways_status[$gname]) {
    		$str_data = $gateways_status[$gname]['delay'];
    		$pos = substr($str_data,0,strpos($str_data, "ms"));
    		if (floatval($pos) > 500 ) {
    			$counter++;
    			#print_r($counter ." - ". $pos ."\r\n");
    		}	
    	}	
    }
    if ($counter > 0) {
    	service_control_restart(apinger,restartservice);
    }
    ?>
    
    

    What does this do ?



  • btw. This:

    @krylou:

    I`ve found temporary solution - restarting apinger every 5 minutes.

    Open shell via putty

    Go to /usr/local/www
    and create file apinger.php with the following content:
    require_once("/etc/inc/service-utils.inc");
    require_once("/etc/inc/globals.inc");
    require_once("/etc/inc/vslb.inc");
    require_once("/etc/inc/gwlb.inc");

    service_control_restart(apinger,restartservice)
    ?>

    chmod 755 apinger.php

    To run it manually you can just open http://pfsense/apinger.php

    Then install cron package via pfsense UI and add new job:
    */5 * * * * /usr/local/bin/php /usr/local/www/apinger.php

    and this :

    @TechSavvySam:

    In the meantime, I'm going to create a CRON job to restart apinger service at given intervals.  BTW, this command will restart it:

    /usr/local/sbin/pfSsh.php playback svc restart apinger

    Both kill my RD Graphs. Is that just me ?



  • **Just disable NTPd  time sync daemon.

    apinger wil become stable.**



  • @kamran:

    **Just disable NTPd  time sync daemon.

    apinger wil become stable.**

    That might help if you're running pfSense in a VM, but the problems with apinger are nowhere near that simple to solve.

    apinger has been replaced by dpinger in pfSense 2.3. Though 2.3 is still in beta, the core functionality is now very stable and it has a lot of worthwhile enhancements and fixes over 2.2.x.

    Packages are still a mixed bag on 2.3 - some have been converted, some are available but haven't had a proper GUI conversion and others are not available at this point.

    If you want to try 2.3, I strongly recommend backing up your configuration and making sure you have working 2.2.6 install media to hand before upgrading. That way, you can revert easily to 2.2.6 if necessary.