WAN down momentarily = Reboot to recover



  • Have found a lot of people reporting what are likely related issues, with no resolution proposed yet…

    While setting up new 2.1 pfSense install on a re-purposed IBM ThinkCentre, there was some plugging and unplugging of cables, and afterwards there was no connectivity to the internet.  The dashboard showed 100% CPU utilization and the WAN link state was alternating up, down, up, down, constantly.  After trying a number of things to recover, I rebooted the machine, and it came back up with a working WAN, all was well.  A little later I unplugged the WAN cable again temporarily, and the same thing happened, and the only solution was to reboot.  Here is what may be helpful information for others experiencing similar issues:

    -If WAN is DHCP and cable is momentarily unplugged (ISP device reboot), CPU utilization goes to 100%, WAN adaptor link state cycles up/down constantly, and all Internet connectivity is lost.  Rebooting the machine via Diagnostics or Terminal is the only fix.
    -If WAN is Static and cable is momentarily unplugged (ISP device reboot), CPU utilization goes to 100%, WAN adaptor link state cycles up/down constantly, and all Internet connectivity is lost.  Rebooting the machine via Diagnostics or Terminal is the only fix.

    I switched WAN from the onboard Intel NIC to a new Realtek Gigabit PCI card and repeated the above steps, with the same result.  The only packages currently installed are Unbound, Squid v3 and SquidGuard.

    The ISP is pretty reliable, unlike some of the reports from others having similar issues, but since the only solution is to reboot the machine, this seems to have the potential of being a big issue for anyone running critical services in-house.  Any suggestions?

    Similar posts:
    http://forum.pfsense.org/index.php/topic,53221.0.html
    http://forum.pfsense.org/index.php/topic,58819.0.html
    http://forum.pfsense.org/index.php/topic,59906.0.html



  • @niner:

    Any suggestions?

    Some more details would probably be helpful.

    @niner:

    While setting up new 2.1 pfSense install on a re-purposed IBM ThinkCentre,

    Which pfSense build?

    @niner:

    -If WAN is DHCP and cable is momentarily unplugged (ISP device reboot), CPU utilization goes to 100%, WAN adaptor link state cycles up/down constantly, and all Internet connectivity is lost.  Rebooting the machine via Diagnostics or Terminal is the only fix.

    While the dashboard shows 100% CPU utilization, what does the pfSense shell command```
    top -S -H

    
    @niner:
    
    > -If WAN is Static and cable is momentarily unplugged (ISP device reboot), CPU utilization goes to 100%, WAN adaptor link state cycles up/down constantly, and all Internet connectivity is lost.  Rebooting the machine via Diagnostics or Terminal is the only fix.
    
    While the dashboard shows 100% CPU utilization, what does the pfSense shell command```
    top -S -H 
    ```show as the top 3 or 4 CPU consumers and what proportion are they consuming?
    
    @niner:
    
    > I switched WAN from the onboard Intel NIC to a new Realtek Gigabit PCI card and repeated the above steps,
    
    What type of Intel NIC?
    
    I'm running pfSense build
    
    > 2.1-BETA1 (i386)
    > built on Sat Feb 23 16:39:07 EST 2013
    
    . Its WAN interface is vr0 and it uses DHCP. I disconnected the WAN link from the switch for a few minutes and plugged it back into the switch and there was no sign of the sort of meltdown you described in that I could connect to the GUI over the WAN link and the dashboard showed an uptime of over 25 days: the WAN recovered without a reboot.


  • @wallabybob:

    Some more details would probably be helpful…Which pfSense build?

    I am currently using 2.1-BETA1 (i386)
    built on Tue Mar 19 05:51:13 EDT 2013

    @wallabybob:

    While the dashboard shows 100% CPU utilization, what does the pfSense shell command```
    top -S -H

    Both DHCP and Static configurations yield basically the same results.  The stats fluctuate, as seen in the following 2 screenshots, but check_reload_status (40%-70%) is almost always at the top of the list, along with syslogd (5%-25%), php (5%-75%) and unbound (5%-10%) :

    Screenshot1:

    Screenshot2:

    @wallabybob:

    What type of Intel NIC?

    Onboard: Intel 82801BA (D865) Pro/100 VE Ethernet
    PCI: RealTek 8169/8169S/8169SB(L)/8110S/8110SB(L) Gigabit Ethernet

    @wallabybob:

    I'm running pfSense build 2.1-BETA1 (i386)
    built on Sat Feb 23 16:39:07 EST 2013. Its WAN interface is vr0 and it uses DHCP. I disconnected the WAN link from the switch for a few minutes and plugged it back into the switch and there was no sign of the sort of meltdown you described in that I could connect to the GUI over the WAN link and the dashboard showed an uptime of over 25 days: the WAN recovered without a reboot.

    My WAN interface is normally fxp0 and Static (or re0 and DHCP, if I am testing and using the Gigabit NIC) and the dashboard and LAN access remain available during the loss of WAN, so I can initiate a reboot locally…but if I were not onsite it would be another matter.

    If there is other information I can provide, or if there are suggestions for further troubleshooting, please let me know.  Thanks.






  • Thanks for the additional information.

    It would probably also be helpful to have the extract of the system log from a few lines BEFORE the WAN interface is reported down for  maybe the next few minutes. You will probably need to display the whole log file by pfSense shell command```
    clog /var/log/system.log

    
    A packet capture on the WAN interface would also be interesting to see - is anybody trying to start a conversation?


  • As requested, more info below.  After a reboot the log settled down and there was no activity until I logged into the Gui, so there is nothing interesting to show before the WAN interface goes down.  Here is what happens when it does go down:

    Mar 22 23:44:28 pfsense check_reload_status: Linkup starting fxp0
    Mar 22 23:44:28 pfsense kernel: fxp0: link state changed to DOWN
    Mar 22 23:44:31 pfsense php: : DEVD Ethernet detached event for wan
    Mar 22 23:44:45 pfsense kernel: arpresolve: can't allocate llinfo for 192.168.1.1
    Mar 22 23:44:46 pfsense kernel: arpresolve: can't allocate llinfo for 192.168.1.1
    Mar 22 23:44:46 pfsense kernel: fxp0: link state changed to UP
    Mar 22 23:44:46 pfsense check_reload_status: Linkup starting fxp0
    Mar 22 23:44:48 pfsense kernel: arpresolve: can't allocate llinfo for 192.168.1.1
    Mar 22 23:44:48 pfsense check_reload_status: Updating all dyndns
    Mar 22 23:44:48 pfsense check_reload_status: Reloading filter
    Mar 22 23:44:51 pfsense kernel: arpresolve: can't allocate llinfo for 192.168.1.1
    Mar 22 23:44:56 pfsense kernel: arpresolve: can't allocate llinfo for 192.168.1.1
    Mar 22 23:45:07 pfsense kernel: arpresolve: can't allocate llinfo for 192.168.1.1
    Mar 22 23:45:18 pfsense php: : DEVD Ethernet attached event for wan
    Mar 22 23:45:20 pfsense kernel: arpresolve: can't allocate llinfo for 192.168.1.1
    Mar 22 23:45:21 pfsense kernel: fxp0: link state changed to DOWN
    Mar 22 23:45:21 pfsense check_reload_status: Linkup starting fxp0
    Mar 22 23:45:24 pfsense kernel: fxp0: link state changed to UP
    Mar 22 23:45:24 pfsense check_reload_status: Linkup starting fxp0
    Mar 22 23:45:36 pfsense check_reload_status: Updating all dyndns
    Mar 22 23:45:36 pfsense check_reload_status: Restarting ipsec tunnels
    Mar 22 23:45:36 pfsense check_reload_status: Restarting OpenVPN tunnels/interfaces
    Mar 22 23:45:36 pfsense check_reload_status: Reloading filter
    Mar 22 23:45:37 pfsense php: : DEVD Ethernet attached event for wan
    Mar 22 23:45:37 pfsense php: : DEVD Ethernet detached event for wan
    Mar 22 23:45:38 pfsense php: : HOTPLUG: Configuring interface wan
    Mar 22 23:45:40 pfsense kernel: fxp0: link state changed to DOWN
    Mar 22 23:45:40 pfsense check_reload_status: Linkup starting fxp0
    Mar 22 23:45:42 pfsense kernel: fxp0: link state changed to UP
    Mar 22 23:45:42 pfsense check_reload_status: Linkup starting fxp0
    Mar 22 23:45:52 pfsense php: : DEVD Ethernet detached event for wan
    Mar 22 23:45:55 pfsense php: : DEVD Ethernet attached event for wan
    Mar 22 23:45:56 pfsense php: : HOTPLUG: Configuring interface wan
    Mar 22 23:45:58 pfsense kernel: fxp0: link state changed to DOWN
    Mar 22 23:45:58 pfsense check_reload_status: Linkup starting fxp0
    Mar 22 23:46:00 pfsense kernel: fxp0: link state changed to UP
    Mar 22 23:46:00 pfsense check_reload_status: Linkup starting fxp0
    Mar 22 23:46:05 pfsense php: : DEVD Ethernet detached event for wan
    Mar 22 23:46:05 pfsense php: : DEVD Ethernet attached event for wan
    Mar 22 23:46:05 pfsense php: : HOTPLUG: Configuring interface wan
    Mar 22 23:46:08 pfsense kernel: fxp0: link state changed to DOWN
    Mar 22 23:46:08 pfsense check_reload_status: Linkup starting fxp0
    Mar 22 23:46:10 pfsense check_reload_status: Linkup starting fxp0
    Mar 22 23:46:10 pfsense kernel: fxp0: link state changed to UP
    Mar 22 23:46:11 pfsense php: : DEVD Ethernet detached event for wan
    Mar 22 23:46:15 pfsense php: : DEVD Ethernet attached event for wan
    Mar 22 23:46:15 pfsense php: : HOTPLUG: Configuring interface wan
    Mar 22 23:46:16 pfsense kernel: fxp0: link state changed to DOWN
    Mar 22 23:46:16 pfsense check_reload_status: Linkup starting fxp0
    Mar 22 23:46:18 pfsense check_reload_status: Linkup starting fxp0
    Mar 22 23:46:18 pfsense kernel: fxp0: link state changed to UP
    Mar 22 23:46:19 pfsense php: : DEVD Ethernet detached event for wan
    Mar 22 23:46:26 pfsense check_reload_status: Updating all dyndns
    Mar 22 23:46:26 pfsense check_reload_status: Restarting ipsec tunnels
    Mar 22 23:46:26 pfsense check_reload_status: Restarting OpenVPN tunnels/interfaces
    Mar 22 23:46:27 pfsense check_reload_status: Reloading filter
    Mar 22 23:46:47 pfsense php: : DEVD Ethernet attached event for wan
    Mar 22 23:46:51 pfsense kernel: fxp0: link state changed to DOWN
    Mar 22 23:46:54 pfsense kernel: fxp0: link state changed to UP
    Mar 22 23:46:57 pfsense php: : DEVD Ethernet detached event for wan
    Mar 22 23:47:06 pfsense check_reload_status: Restarting OpenVPN tunnels/interfaces
    Mar 22 23:47:06 pfsense check_reload_status: Reloading filter
    Mar 22 23:47:07 pfsense php: : DEVD Ethernet attached event for wan
    Mar 22 23:47:08 pfsense php: : HOTPLUG: Configuring interface wan
    Mar 22 23:47:11 pfsense kernel: fxp0: link state changed to DOWN
    Mar 22 23:47:11 pfsense check_reload_status: Linkup starting fxp0
    Mar 22 23:47:13 pfsense kernel: fxp0: link state changed to UP
    Mar 22 23:47:13 pfsense check_reload_status: Linkup starting fxp0
    Mar 22 23:47:15 pfsense php: : DEVD Ethernet detached event for wan
    Mar 22 23:47:36 pfsense check_reload_status: Reloading filter
    Mar 22 23:47:42 pfsense php: : DEVD Ethernet attached event for wan
    Mar 22 23:47:43 pfsense php: : HOTPLUG: Configuring interface wan
    Mar 22 23:47:46 pfsense kernel: fxp0: link state changed to DOWN
    Mar 22 23:47:46 pfsense check_reload_status: Linkup starting fxp0
    Mar 22 23:47:48 pfsense check_reload_status: Linkup starting fxp0
    Mar 22 23:47:48 pfsense kernel: fxp0: link state changed to UP
    Mar 22 23:47:51 pfsense php: : DEVD Ethernet detached event for wan
    Mar 22 23:47:54 pfsense kernel: arpresolve: can't allocate llinfo for 192.168.1.1
    Mar 22 23:48:00 pfsense check_reload_status: Restarting OpenVPN tunnels/interfaces
    Mar 22 23:48:01 pfsense check_reload_status: Reloading filter
    Mar 22 23:48:10 pfsense php: : DEVD Ethernet attached event for wan
    Mar 22 23:48:10 pfsense php: : HOTPLUG: Configuring interface wan
    Mar 22 23:48:11 pfsense check_reload_status: Linkup starting fxp0
    Mar 22 23:48:11 pfsense kernel: fxp0: link state changed to DOWN
    Mar 22 23:48:11 pfsense check_reload_status: Updating all dyndns
    Mar 22 23:48:11 pfsense check_reload_status: Restarting ipsec tunnels
    Mar 22 23:48:11 pfsense check_reload_status: Restarting OpenVPN tunnels/interfaces
    Mar 22 23:48:11 pfsense check_reload_status: Reloading filter
    Mar 22 23:48:13 pfsense check_reload_status: Linkup starting fxp0
    Mar 22 23:48:13 pfsense kernel: fxp0: link state changed to UP
    Mar 22 23:48:14 pfsense php: : DEVD Ethernet detached event for wan
    Mar 22 23:48:23 pfsense php: : DEVD Ethernet attached event for wan
    Mar 22 23:48:23 pfsense php: : HOTPLUG: Configuring interface wan
    Mar 22 23:48:23 pfsense check_reload_status: Linkup starting fxp0
    Mar 22 23:48:23 pfsense kernel: fxp0: link state changed to DOWN
    Mar 22 23:48:25 pfsense check_reload_status: Linkup starting fxp0
    Mar 22 23:48:25 pfsense kernel: fxp0: link state changed to UP
    Mar 22 23:48:26 pfsense php: : DEVD Ethernet detached event for wan
    Mar 22 23:48:28 pfsense php: : DEVD Ethernet attached event for wan
    Mar 22 23:48:28 pfsense php: : HOTPLUG: Configuring interface wan
    Mar 22 23:48:28 pfsense check_reload_status: Linkup starting fxp0
    Mar 22 23:48:28 pfsense kernel: fxp0: link state changed to DOWN
    Mar 22 23:48:30 pfsense check_reload_status: Linkup starting fxp0
    Mar 22 23:48:30 pfsense kernel: fxp0: link state changed to UP
    Mar 22 23:48:36 pfsense check_reload_status: Updating all dyndns
    Mar 22 23:48:36 pfsense check_reload_status: Restarting ipsec tunnels
    Mar 22 23:48:36 pfsense check_reload_status: Restarting OpenVPN tunnels/interfaces
    Mar 22 23:48:37 pfsense check_reload_status: Reloading filter
    Mar 22 23:48:45 pfsense php: : DEVD Ethernet detached event for wan
    Mar 22 23:48:48 pfsense php: : DEVD Ethernet attached event for wan
    Mar 22 23:48:49 pfsense php: : HOTPLUG: Configuring interface wan
    Mar 22 23:48:51 pfsense kernel: fxp0: link state changed to DOWN
    Mar 22 23:48:51 pfsense check_reload_status: Linkup starting fxp0
    Mar 22 23:48:54 pfsense kernel: fxp0: link state changed to UP
    Mar 22 23:48:54 pfsense check_reload_status: Linkup starting fxp0
    Mar 22 23:49:03 pfsense php: : DEVD Ethernet detached event for wan
    Mar 22 23:49:09 pfsense php: : DEVD Ethernet attached event for wan
    Mar 22 23:49:10 pfsense php: : HOTPLUG: Configuring interface wan
    Mar 22 23:49:11 pfsense check_reload_status: Linkup starting fxp0
    Mar 22 23:49:11 pfsense kernel: fxp0: link state changed to DOWN
    Mar 22 23:49:14 pfsense kernel: fxp0: link state changed to UP
    Mar 22 23:49:14 pfsense check_reload_status: Linkup starting fxp0

    And it continues to repeat the same…  As for a packet capture, this is what happens after the link comes back up:

    23:49:49.425824 ARP, Request who-has 192.168.1.111 tell 192.168.1.111, length 28
    23:49:49.631470 ARP, Request who-has 192.168.1.1 tell 192.168.1.111, length 28
    23:49:50.631473 ARP, Request who-has 192.168.1.1 tell 192.168.1.111, length 28
    23:49:51.666721 ARP, Request who-has 192.168.1.1 tell 192.168.1.111, length 28
    23:49:56.579716 ARP, Request who-has 192.168.1.111 tell 192.168.1.111, length 28
    23:49:56.667402 ARP, Request who-has 192.168.1.1 tell 192.168.1.111, length 28
    23:49:57.667407 ARP, Request who-has 192.168.1.1 tell 192.168.1.111, length 28
    23:49:58.667458 ARP, Request who-has 192.168.1.1 tell 192.168.1.111, length 28
    23:49:59.667414 ARP, Request who-has 192.168.1.1 tell 192.168.1.111, length 28
    23:49:59.667733 ARP, Reply 192.168.1.1 is-at 00:27:22:d9:e7:02, length 46
    23:49:59.667763 IP 192.168.1.111 > 192.168.1.1: ICMP echo request, id 8516, seq 7969, length 44
    23:49:59.667987 IP 192.168.1.1 > 192.168.1.111: ICMP echo reply, id 8516, seq 7969, length 44
    23:50:01.808773 ARP, Request who-has 192.168.1.111 tell 192.168.1.111, length 28
    23:50:02.667430 ARP, Request who-has 192.168.1.1 tell 192.168.1.111, length 28
    23:50:03.667436 ARP, Request who-has 192.168.1.1 tell 192.168.1.111, length 28
    23:50:04.667435 ARP, Request who-has 192.168.1.1 tell 192.168.1.111, length 28
    23:50:04.667601 ARP, Reply 192.168.1.1 is-at 00:27:22:d9:e7:02, length 46
    23:50:04.667626 IP 192.168.1.111 > 192.168.1.1: ICMP echo request, id 8516, seq 9249, length 44
    23:50:04.667856 IP 192.168.1.1 > 192.168.1.111: ICMP echo reply, id 8516, seq 9249, length 44

    And it too continues to repeat the same…  (192.168.1.1 is ISP, 192.168.1.111 is WAN)

    Hope this, along with the process screen shot sent earlier, helps to clarify the situation.  Thanks again.



  • Thanks for the capture and the log extract.  I presume these were with the WAN interface configured with static IP (no sign of DHCP in the packet capture) BUT I notice /etc/rc.linkup appears to write those DEVD messages on interfaces which don't have static IPs.

    It would be more informative to have a packet capture covering the time span of the log extract but there are a number of things that seem strange to me:

    Mar 22 23:45:21 pfsense kernel: fxp0: link state changed to DOWN
    Mar 22 23:45:21 pfsense check_reload_status: Linkup starting fxp0

    Mar 22 23:45:40 pfsense kernel: fxp0: link state changed to DOWN
    Mar 22 23:45:40 pfsense check_reload_status: Linkup starting fxp0

    Perhaps check_reload_status is setting the interface down in order to reset things before attempting to restart the link, and the consequent kernel report is sneeking past the check_reload_status announcement to the kernel log. Otherwise it is hard to explain why check_reload_status is starting the interface immediately after the kernel reports it down.

    Mar 22 23:45:37 pfsense php: : DEVD Ethernet attached event for wan
    Mar 22 23:45:37 pfsense php: : DEVD Ethernet detached event for wan

    Perhaps the carrier signal from the modem is "flapping". Perhaps these messages are coming from independent processes that don't know what the other is doing.

    I wonder why

    Mar 22 23:45:55 pfsense php: : DEVD Ethernet attached event for wan

    is immediately followed by

    Mar  22 23:45:56 pfsense php: : HOTPLUG: Configuring interface wan

    but the attached event at Mar 22 23:45:18  isn't immediately followed by a

    HOTPLUG: Configuring interface wan

    I wonder why the

    Mar 22 23:45:38 pfsense php: : HOTPLUG: Configuring interface wan

    wasn't cancelled by the immediately preceding

    Mar 22 23:45:37 pfsense php: : DEVD Ethernet detached event for wan

    I haven't had a detailed look through the code reporting the messages that provoked my curiosity so its quite possible some or all them mean something a bit different to what I think they mean.



  • Yes, the log and packet capture were with WAN set to static IP.  And, I also noticed some inconsistency in the order of log events…

    @wallabybob:

    Perhaps the carrier signal from the modem is "flapping". Perhaps these messages are coming from independent processes that don't know what the other is doing.

    Actually, when first setting up the system and experiencing this issue the WAN was connected to an AP in bridge mode, and was only connected directly to the ISP modem later in the setup/troubleshooting process.  So, while it is possible that the modem and the AP are contributing to this behavior, it's not likely.

    Thanks again.


Locked