Wan cable disconnect/reconnect causes interface drop no recover



  • not sure if this is a bug or my config/setup..

    I am running latest snapshot:
    2.1-BETA1  (amd64)
    built on Mon Apr 15 16:43:17 EDT 2013
    FreeBSD 8.3-RELEASE-p7

    My setup:
    ADSL router–---WAN/pfsense/LAN------switch-----accesspoint----clients
    192.168.1.1  192.168.1.2 /  192.168.182.1                                192.169.182.10 (pfSense DHCP)

    2x DNS setup in General with ADSL router as gateway for both
    Interface WAN: static IPv4, no IPv6, ADSL router is the gateway
    Interface LAN:  static IPv4, no IPv6, no gateway, DHCP enabled, CP enabled, DNS forwarder enabled

    disconnecting/reconnecting the wan cable whilst pfSense running resulted in loss of internet at clients until manual intervention in pfsense.

    whilst the problem is apparent;
    pfSense status page shows "Unable to check for updates."
    ping 192.168.1.1 from pfSense/diagnostics - ok
    traceroute 192.168.1.1 from pfSense/diagnostics - fails
    However both the ping and traceroute to 192.168.1.1 from any of the clients works ok, even though no internet.

    to get it working again, either reboot or;
    pfSense gui->interfaces->wan->save->apply changes and all is well.
    yes it says apply changes even though nothing has been changed.

    I found this by accident after tripping over the cable from the ADSL router to eth0 wan pfSense.
    Now I found this out I can reproduce situation every time by unplugging/reconnecting the wan cable.

    Any ideas please?



  • @clart:

    traceroute 192.168.1.1 from pfSense/diagnostics - fails

    What is the report you have summarised as "fails"?

    Is your pfSense WAN interface static IP or DHCP? (The private IP address suggests it is probably not PPPoE.)



  • Traceroute output: (fails)

    1  * * *

    18  * * *

    after reboot or save/apply WAN interface settings;

    Traceroute output: (ok)

    1  192.168.1.1 (192.168.1.1)  0.461 ms *  0.398 ms

    pfSense WAN interface is static IP: 192.168.1.2
    it is cabled/connected to ADSL router (Lan port)



  • I  can't explain that ping gets a response while traceroute to the same target doesn't.

    To get more information about what is going on, please:
    1. Remove cable,note time;
    2. wait about a minute
    3. restore cable
    4. wait about a minute
    5. Click save button to restore link,note time
    and post the pfSense system log from about a minute before event 1 to about a minute after event 5.



  • I updated to the latest snapshot to see if it made a difference, but still the same problem;

    2.1-BETA1  (amd64)
    built on Wed Apr 17 07:44:02 EDT 2013
    FreeBSD 8.3-RELEASE-p7

    You are on the latest version.

    –---------------------------

    system log;

    step 1 - remove cable:
    Apr 18 00:59:59 kernel: bge0: link state changed to DOWN
    Apr 18 01:00:01 php: : DEVD Ethernet detached event for wan
    Apr 18 01:00:03 kernel: arpresolve: can't allocate llinfo for 192.168.1.1
    Apr 18 01:00:07 kernel: arpresolve: can't allocate llinfo for 192.168.1.1
    Apr 18 01:00:12 kernel: arpresolve: can't allocate llinfo for 192.168.1.1
    Apr 18 01:00:19 check_reload_status: updating dyndns GW_WAN
    Apr 18 01:00:19 check_reload_status: Restarting ipsec tunnels
    Apr 18 01:00:19 check_reload_status: Restarting OpenVPN tunnels/interfaces
    Apr 18 01:00:19 check_reload_status: Reloading filter
    Apr 18 01:00:23 kernel: arpresolve: can't allocate llinfo for 192.168.1.1
    step 3 - restore cable:
    Apr 18 01:01:04 check_reload_status: Linkup starting bge0
    Apr 18 01:01:04 kernel: bge0: link state changed to UP
    Apr 18 01:01:07 php: : DEVD Ethernet attached event for wan
    Apr 18 01:01:07 php: : HOTPLUG: Configuring interface wan
    Apr 18 01:01:17 check_reload_status: Updating all dyndns
    Apr 18 01:01:17 check_reload_status: Restarting ipsec tunnels
    Apr 18 01:01:17 check_reload_status: Restarting OpenVPN tunnels/interfaces
    Apr 18 01:01:17 check_reload_status: Reloading filter
    step 5 - click save button:
    Apr 18 01:02:02 check_reload_status: Syncing firewall
    Apr 18 01:02:06 php: /interfaces.php: ROUTING: setting default route to 192.168.1.1
    Apr 18 01:02:11 check_reload_status: updating dyndns wan
    Apr 18 01:02:14 check_reload_status: Reloading filter
    Apr 18 01:02:14 php: /interfaces.php: Creating rrd update script

    confirmed same problem, no connection/route/traffic after step 3, all ok after step 5

    out of interest, I set the WAN interface to DHCP and found the problem is gone.

    • connection/route/traffic is ok straight after cable is restored.

    here is the system log with WAN interface set to DHCP;

    step 1 - remove cable:
    Apr 18 01:39:05 kernel: bge0: link state changed to DOWN
    Apr 18 01:39:08 php: : DEVD Ethernet detached event for wan
    Apr 18 01:39:08 dhclient[41265]: connection closed
    Apr 18 01:39:08 dhclient[41265]: exiting.
    Apr 18 01:39:09 php: : Clearing states to old gateway 192.168.1.1.
    Apr 18 01:39:10 kernel: arpresolve: can't allocate llinfo for 192.168.1.1
    Apr 18 01:39:10 kernel: arpresolve: can't allocate llinfo for 1192.168.1.1
    Apr 18 01:39:13 kernel: arpresolve: can't allocate llinfo for 192.168.1.1
    Apr 18 01:39:16 kernel: arpresolve: can't allocate llinfo for 192.168.1.1
    Apr 18 01:39:20 kernel: arpresolve: can't allocate llinfo for 192.168.1.1
    Apr 18 01:39:23 kernel: arpresolve: can't allocate llinfo for 169.192.102.100
    Apr 18 01:39:25 check_reload_status: updating dyndns GW_WAN
    Apr 18 01:39:25 check_reload_status: Restarting ipsec tunnels
    Apr 18 01:39:25 check_reload_status: Restarting OpenVPN tunnels/interfaces
    Apr 18 01:39:25 check_reload_status: Reloading filter
    Apr 18 01:39:28 php: : Could not find IPv4 gateway for interface (wan).
    step 3 - restore cable:
    Apr 18 01:40:13 kernel: bge0: link state changed to UP
    Apr 18 01:40:13 check_reload_status: Linkup starting bge0
    Apr 18 01:40:15 php: : DEVD Ethernet attached event for wan
    Apr 18 01:40:15 php: : HOTPLUG: Configuring interface wan
    Apr 18 01:40:15 dhclient: PREINIT
    Apr 18 01:40:15 dhclient: Starting delete_old_states()
    Apr 18 01:40:15 dhclient: Comparing IPs: Old: New:
    Apr 18 01:40:15 dhclient[41131]: DHCPREQUEST on bge0 to 255.255.255.255 port 67
    Apr 18 01:40:15 dhclient[41131]: DHCPACK from 192.168.1.1
    Apr 18 01:40:15 dhclient: REBOOT
    Apr 18 01:40:15 dhclient: Starting delete_old_states()
    Apr 18 01:40:15 dhclient: Comparing IPs: Old: New: 192.168.1.12
    Apr 18 01:40:15 dhclient: Starting add_new_address()
    Apr 18 01:40:15 dhclient: ifconfig bge0 inet 192.168.1.12 netmask 255.255.255.0 broadcast 192.168.1.255
    Apr 18 01:40:15 dhclient: New IP Address (bge0): 192.168.1.12
    Apr 18 01:40:15 dhclient: New Subnet Mask (bge0): 255.255.255.0
    Apr 18 01:40:15 dhclient: New Broadcast Address (bge0): 192.168.1.255
    Apr 18 01:40:15 dhclient: New Routers (bge0): 192.168.1.1
    Apr 18 01:40:15 dhclient: Adding new routes to interface: bge0
    Apr 18 01:40:15 dhclient: /sbin/route add default 192.168.1.1
    Apr 18 01:40:15 dhclient: Creating resolv.conf
    Apr 18 01:40:15 check_reload_status: rc.newwanip starting bge0
    Apr 18 01:40:15 dhclient[41131]: bound to 192.168.1.12 – renewal in 129600 seconds.
    Apr 18 01:40:18 php: : rc.newwanip: Informational is starting bge0.
    Apr 18 01:40:18 php: : rc.newwanip: on (IP address: 192.168.1.12) (interface: wan) (real interface: bge0).
    Apr 18 01:40:18 php: : ROUTING: setting default route to 192.168.1.1
    Apr 18 01:40:18 check_reload_status: Updating all dyndns
    Apr 18 01:40:18 check_reload_status: Restarting ipsec tunnels
    Apr 18 01:40:18 check_reload_status: Restarting OpenVPN tunnels/interfaces
    Apr 18 01:40:18 check_reload_status: Reloading filter
    Apr 18 01:40:24 php: : Resyncing OpenVPN instances for interface WAN.
    Apr 18 01:40:24 php: : Creating rrd update script
    Apr 18 01:40:26 php: : pfSense package system has detected an ip change 192.168.1.12 -> 192.168.1.12 ... Restarting packages.
    Apr 18 01:40:26 check_reload_status: Starting packages
    Apr 18 01:40:28 php: : Restarting/Starting all packages.

    no need for step 5, all is ok after restoring the cable.

    then I changed WAN interface back to static IP and the problem is back again.



  • Thanks for the additional information.

    I wonder if the default route disappears when the link goes down and is not restored when a WAN interface with static IP comes back up.



  • yes, I agree - it appears the default route is lost when the link goes down and not restored automatically when the link comes back up (only when static IP is assigned to WAN interface).

    seems to be confirmed by the syslog line:
    Apr 18 01:02:06    php: /interfaces.php: ROUTING: setting default route to 192.168.1.1

    ..which appears only after the save button pressed (step 5), there is no routing message prior to that.

    is this a bug or maybe my system/config/setup?
    can you replicate the problem on your system?



  • I can confirm this as well. WAN interface is set to static IP and connected to comcast gateway with cable. When the cable is remove and then plugged back in, pfsense is no longer able to access the internet. Absolutely a bug and not a config problem seeing as I replicated the setup using 2.0.3 and after reconnecting pfsense WAN port to comcast business gateway, it is able to access the internet in milliseconds.



  • I'm glad someone else see the same problem, then it is not just my config/system.

    It seems when WAN interface has fixed IP and connection is restored these steps is missing:

    php: : rc.newwanip: Informational is starting bge0.
    php: : rc.newwanip: on (IP address: 192.168.1.12) (interface: wan) (real interface: bge0).
    php: : ROUTING: setting default route to 192.168.1.1

    maybe these should be called/set in the "check_reload_status" routine, or is bug elsewhere?



  • I can definitely confirm this… Every morning my WAN interface is down, with the link status bouncing between up and no carrier. It seems as though the driver goes berserk when there is no traffic.

    I thought it was a flaky NIC, changed it with a new one, same problem.. Changed the cable, I even went paranoid and changed the hard drive, reinstalling from scratch... Nothing!

    I am 100% positive the same NIC's with the same cables have been functioning perfectly with every snapshot before april... I need to stay on HEAD cause of the point to point IPSec NAT which STABLE does not have...

    Is there a way to install a precise snapshot from februrary/march? Has there been any change on the driver code?

    ..I went with a cron restart early morning, but this sucks ballz big time...



  • @namtab:

    I can definitely confirm this… Every morning my WAN interface is down, with the link status bouncing between up and no carrier. It seems as though the driver goes berserk when there is no traffic.

    I thought it was a flaky NIC, changed it with a new one, same problem.. Changed the cable, I even went paranoid and changed the hard drive, reinstalling from scratch... Nothing!

    Check this out: http://forum.pfsense.org/index.php/topic,60934.msg330964.html#msg330964
    Same problem with my Public interface… It is not WAN but still...



  • I have the same issue, and have for a long time, always figured it was something about my modem (this is a test site for my ISP, so lucky me gets to deal with multi-gateway setups…) but this is the second time something slightly different has happened and I decided to check here to see that I am not the only one with this issue.

    1: WAN Gateway goes down
    2: PFSense WAN side looses address
    3: Wan Gateway comes back up
    4: PFSense keeps rebooting the WAN NIC over and over for no reason at all.

    Now, I have a Dell 1550, not using any extra add-on nic, 4GB RAM, 73GB SCSI disk.   I also use SQUID on this install, eventually if I let this issue go on long enough something with one of the packages freaks out, and consumes disk space...  Remember, this is only the second time this issue in particular has happened to me in regards of disk space, the WAN down is almost a daily thing.  You may think, oh, a log file is consuming a GB or so of extra data, no, when the system finally gets to a point, it will go from ~15% of used disk space to 90% or more of used space in a matter of hours (only took my 8 hours of sleeping today).  At this point, packages fail to launch, and the only thing working anymore honestly is the GUI, I am essentially forced into a re-install as I can't find what has consumed all the disk space.

    One thing I have noticed, when my WAN goes down due to a link outage for any period of time, I tend to get a crash report, but can't send the crash report in, because the link is down, upon a reboot, well, I don't see options to send crash reports in...  Not even sure if those are checked on anyways. xD

    Sys Log

    May 22 00:52:07	kernel: arpresolve: can't allocate llinfo for 100.73.71.217
    May 22 00:52:06	dhcp6c[46100]: check_exit: exiting
    May 22 00:52:05	php: : Accept router advertisements on interface fxp0
    May 22 00:52:05	php: : The command '/sbin/dhclient -c /var/etc/dhclient_wan.conf fxp0 > /tmp/fxp0_output 2> /tmp/fxp0_error_output' returned exit code '15', the output was ''
    May 22 00:52:05	php: : DEVD Ethernet detached event for wan
    May 22 00:52:05	check_reload_status: Linkup starting fxp0
    May 22 00:52:05	kernel: fxp0: link state changed to UP
    May 22 00:52:04	kernel: arpresolve: can't allocate llinfo for 100.73.71.217
    May 22 00:52:03	kernel: fxp0: link state changed to DOWN
    May 22 00:52:03	check_reload_status: Linkup starting fxp0
    May 22 00:52:03	dhcp6c[46058]: client6_init: skip opening control port
    May 22 00:52:03	dhcp6c[46058]: client6_init: failed initialize control message authentication
    May 22 00:52:03	dhcp6c[46058]: dhcp6_ctl_authinit: failed to open /usr/local/etc/dhcp6cctlkey: No such file or directory
    May 22 00:52:03	dhcp6c[45512]: check_exit: exiting
    May 22 00:52:03	dhcp6c[45242]: client6_init: skip opening control port
    May 22 00:52:03	dhcp6c[45242]: client6_init: failed initialize control message authentication
    May 22 00:52:03	dhcp6c[45242]: dhcp6_ctl_authinit: failed to open /usr/local/etc/dhcp6cctlkey: No such file or directory
    May 22 00:52:03	php: : HOTPLUG: Configuring interface wan
    May 22 00:52:03	kernel: arpresolve: can't allocate llinfo for 100.73.71.217
    May 22 00:52:03	php: : DEVD Ethernet attached event for wan
    May 22 00:52:01	dhcp6c[33259]: client6_recvadvert: XID mismatch
    May 22 00:52:01	php: : Accept router advertisements on interface fxp0
    May 22 00:52:01	php: : The command '/sbin/dhclient -c /var/etc/dhclient_wan.conf fxp0 > /tmp/fxp0_output 2> /tmp/fxp0_error_output' returned exit code '15', the output was ''
    May 22 00:52:01	kernel: arpresolve: can't allocate llinfo for 100.73.71.217
    May 22 00:52:01	php: : DEVD Ethernet detached event for wan
    May 22 00:52:00	kernel: fxp0: link state changed to UP
    May 22 00:52:00	check_reload_status: Linkup starting fxp0
    May 22 00:52:00	dhcp6c[33259]: client6_recvadvert: XID mismatch
    

Locked