After updating to 2.1.3-release (amd64) i am getting strange WAN conn drops



  • 2.1.3-RELEASE (amd64)
    built on Thu May 01 15:52:13 EDT 2014

    FreeBSD 8.3-RELEASE-p16

    DNS server(s) - i use different dns servers for my dhcp clients
    (i used DNS benchmark to get the best and fastest responding DNS servers)
    156.154.70.22
    156.154.70.1
    198.153.194.1
    69.164.196.21

    logs most recent from top to bottom…can anyone see something i am missing? it usually happens when i start to have high bandwidth usage, but not always.
    em6 is my WAN interface (DHCP in this case)
    em7 is my LAN interface (192.168.1.XXX)

    
    May 7 12:10:48   check_reload_status: Syncing firewall
    May 7 12:10:45   php: rc.start_packages: No pfBlocker action during boot process.
    May 7 12:10:45   php: rc.start_packages: No pfBlocker action during boot process.
    May 7 12:10:45   php: rc.start_packages: No pfBlocker action during boot process.
    May 7 12:10:45   php: rc.start_packages: No pfBlocker action during boot process.
    May 7 12:10:43   php: rc.start_packages: Restarting/Starting all packages.
    May 7 12:10:41   check_reload_status: Reloading filter
    May 7 12:10:41   check_reload_status: Starting packages
    May 7 12:10:41   php: rc.newwanip: pfSense package system has detected an ip change 0.0.0.0 -> XXX.XXX.XXX.XXX ... Restarting packages.
    May 7 12:10:39   php: rc.newwanip: Creating rrd update script
    May 7 12:10:39   php: rc.newwanip: Resyncing OpenVPN instances for interface WAN.
    May 7 12:10:37   lighttpd[28231]: (connections.c.1692) SSL (error): 5 -1 1 Operation not permitted
    May 7 12:10:36   lighttpd[28231]: (connections.c.1692) SSL (error): 5 -1 1 Operation not permitted
    May 7 12:10:36   lighttpd[28231]: (connections.c.1692) SSL (error): 5 -1 1 Operation not permitted
    May 7 12:10:32   php: rc.newwanip: Removing static route for monitor XXX.XXX.XXX.XXX and adding a new route through XXX.XXX.XXX.XXX
    May 7 12:10:32   php: rc.newwanip: ROUTING: setting default route to XXX.XXX.XXX.XXX
    May 7 12:10:32   php: rc.newwanip: rc.newwanip: on (IP address: XXX.XXX.XXX.XXX) (interface: WAN[wan]) (real interface: em6).
    May 7 12:10:32   php: rc.newwanip: rc.newwanip: Informational is starting em6.
    May 7 12:10:30   check_reload_status: rc.newwanip starting em6
    May 7 12:10:13   check_reload_status: Syncing firewall
    May 7 12:10:10   php: rc.start_packages: No pfBlocker action during boot process.
    May 7 12:10:10   php: rc.start_packages: No pfBlocker action during boot process.
    May 7 12:10:10   php: rc.start_packages: No pfBlocker action during boot process.
    May 7 12:10:10   php: rc.start_packages: No pfBlocker action during boot process.
    May 7 12:10:08   php: rc.filter_configure_sync: Could not find IPv4 gateway for interface (wan).
    May 7 12:10:08   php: rc.start_packages: Restarting/Starting all packages.
    May 7 12:10:06   check_reload_status: Reloading filter
    May 7 12:10:06   check_reload_status: Starting packages
    May 7 12:10:06   php: rc.newwanip: pfSense package system has detected an ip change 0.0.0.0 -> 192.168.100.2 ... Restarting packages.
    May 7 12:10:05   lighttpd[28231]: (connections.c.1692) SSL (error): 5 -1 1 Operation not permitted
    May 7 12:10:04   php: rc.newwanip: Creating rrd update script
    May 7 12:10:04   php: rc.newwanip: Resyncing OpenVPN instances for interface WAN.
    May 7 12:10:04   lighttpd[28231]: (connections.c.1692) SSL (error): 5 -1 1 Operation not permitted
    May 7 12:10:01   check_reload_status: updating dyndns wan
    May 7 12:09:59   php: rc.newwanip: rc.newwanip: on (IP address: 192.168.100.2) (interface: WAN[wan]) (real interface: em6).
    May 7 12:09:59   php: rc.newwanip: rc.newwanip: Informational is starting em6.
    May 7 12:09:59   kernel: arpresolve: can't allocate llinfo for XXX.XXX.XXX.XXX
    May 7 12:09:58   kernel: arpresolve: can't allocate llinfo for XXX.XXX.XXX.XXX
    May 7 12:09:57   check_reload_status: rc.newwanip starting em6
    May 7 12:09:57   kernel: arpresolve: can't allocate llinfo for XXX.XXX.XXX.XXX
    May 7 12:09:56   kernel: arpresolve: can't allocate llinfo for XXX.XXX.XXX.XXX
    May 7 12:09:55   php: rc.linkup: HOTPLUG: Configuring interface wan
    May 7 12:09:55   php: rc.linkup: DEVD Ethernet attached event for wan
    May 7 12:09:55   kernel: arpresolve: can't allocate llinfo for XXX.XXX.XXX.XXX
    May 7 12:09:54   kernel: arpresolve: can't allocate llinfo for XXX.XXX.XXX.XXX
    May 7 12:09:53   php: rc.linkup: Clearing states to old gateway XXX.XXX.XXX.XXX.
    May 7 12:09:53   kernel: em6: link state changed to UP
    May 7 12:09:53   check_reload_status: Linkup starting em6
    May 7 12:09:52   php: rc.linkup: DEVD Ethernet detached event for wan
    May 7 12:09:50   kernel: em6: link state changed to DOWN
    May 7 12:09:50   check_reload_status: Linkup starting em6
    May 7 12:09:39   lighttpd[28231]: (connections.c.1692) SSL (error): 5 -1 1 Operation not permitted
    May 7 12:09:39   lighttpd[28231]: (connections.c.1692) SSL (error): 5 -1 1 Operation not permitted
    May 7 12:09:39   lighttpd[28231]: (connections.c.1692) SSL (error): 5 -1 1 Operation not permitted
    May 7 12:09:32   check_reload_status: Reloading filter
    May 7 12:09:32   check_reload_status: Restarting OpenVPN tunnels/interfaces
    May 7 12:09:32   check_reload_status: Restarting ipsec tunnels
    May 7 12:09:32   check_reload_status: updating dyndns WAN_DHCP
    May 7 12:07:11   check_reload_status: Reloading filter
    May 7 12:07:11   check_reload_status: Restarting OpenVPN tunnels/interfaces
    May 7 12:07:11   check_reload_status: Restarting ipsec tunnels
    May 7 12:07:11   check_reload_status: updating dyndns WAN_DHCP
    
    

    what the crap….i do not even use DynDns?
    bueller? bueller? bueller?



  • this is getting to be crap….i am having to reboot every couple of hours...resolution just completely drops.

    everything is set to default now, and its going tits up still.

    the only thing i see in the logs is when it updates dyndns ...which it shouldnt. there is not even a package installed on this box.


  • Netgate Administrator

    The DynDNS reference is there just from a script that notes if your WAN IP has changed in case you are using DynDNS. I don't think it's relevant here.

    So the problem you're having is that the pfSense box loses DNS resolution and is only fixed by rebooting?
    Your clients on LAN, using different DNS servers directly, still have DNS?

    Steve



  • @stephenw10:

    The DynDNS reference is there just from a script that notes if your WAN IP has changed in case you are using DynDNS. I don't think it's relevant here.

    So the problem you're having is that the pfSense box loses DNS resolution and is only fixed by rebooting?
    Your clients on LAN, using different DNS servers directly, still have DNS?

    Steve

    well i found that the interface keep going up and down every two seconds. i am installing 2.1.3 release (amd64) from scratch (instead of upgrading) to see if that fixes the issue. one issue i believe that may have been a problem was using ultradns servers .. so i will be using a different set of dns servers (not sure if i will use opendns or not…i like dns servers that filter crap out, but do not track like google public dns servers)

    it all came down to dns resolution and that particular interface (which i think was a little bunk'd)

    ill update results soon.



  • I think im having the same problem.  after a few hours to a day, the wan becomes very very slow. cpu and memory are very low.  a reboot resolves the issue.  I upgraded from 2.1.2 to 2.1.3.  My next step may be a fresh install too.

    Did a clean install resolve your issue?


  • Netgate Administrator

    How are you testing that WAN becomes slow? What are the symptoms?

    Probably not related to this post so better to start your own thread.

    Steve



  • I think i my issue has been resolved by  disabling hardware checksum offload.  Strange that I didn't have this issue with 2.1.2.  Did the intel nic drivers change in the new release?  I'm going to continue to monitor this, and I'll start a new thread if the issue returns.


  • Netgate Administrator

    As far as I know the drivers were updated between 2.1 and 2.1.1. not more recently.
    2.1.2 and 2.1.3 were mostly security patches.

    Steve



  • Hi,

    I experience the same problem.

    Here is a description :

    • 2.0.3 rock stable configuration, 6 month uptime without care of firewall etc.
    • Upgrade from interface to 2.1.3 yesterday, few minutes after first boot lost WAN interface, red cross on the status page.
    • Rebooting solved the problem for a couple of minutes.
    • This morning WAN interface was down, PPPOE was "UP" in the status (witch seems weird without WAN up).
    • As said before by Speedy3k, I have checked "disabling hardware checksum offload" and it seems to solve the problem.

    As this was done 1 hour ago only, I'll post the result of the "disabling hardware checksum offload" in a few hours.

    I use a PfSense 32 bits release, on an Intel Celeron NUC box with a proven working USB WAN interface in 2.0.3 release (my internet connection is 7MB so it's enough).

    How can we help developers to solve this issue ? Can you please advice me logs & traces to upload here ?
    It's a home firewall, I can play with it

    Cheers,

    Phil



  • I've been participating in a thread over in OpenVPN :https://forum.pfsense.org/index.php?topic=76735.0

    They seem to have uncovered a small bug which may inadvertantly trigger WAN address resets as if the associated gateway has dropped out (typo in /etc/newwanip).

    It might have nothing to do with your issues, but it's a simple fix and it might be worth a try to see if it helps.



  • Very interesting : I have a poor ADSL line, with a simple 768kb upload link and loooootsa latency issues, because it is always overloaded !

    I'll do some tests this evening when I'll be back at home, even if the previous fix seems to work : I don't have loose the WAN interface yet, my port-fowarded server always UP.

    Thanks for pointing this subject.



  • For the  gateway monitoring I have setup 750ms/950ms and it seems to be stable since a few hours.

    And I have setup back the hardware stuff, because of the rules "one mod only at a time  :)".

    More to come later …


Log in to reply