WAN DHCP Problems since 2.4
I've been having a major problem with DHCP/IP addressing on the WAN interface (igb0) of my PFSense device since i upgraded to 2.4. I'm seeing an odd condition whereby PFSense is losing its WAN IP address but believing is has one. I've been trying to troubleshoot this off and on for a few months now and I'm just not figuring it out. The WAN interface is connected to a cable modem. It all starts with a report in the log that igb0 has lost its link:
Dec 27 01:02:51 172.17.17.1 kernel: igb0: link state changed to DOWN
Dec 27 01:02:51 172.17.17.1 dhclient: igb0 link state up -> down
Dec 27 01:02:51 172.17.17.1 check_reload_status: Linkup starting igb0
PFsense seems to them cycle a bunch of processes - ipsec_starter, ntpd, dhcpd, dhclient, etc. It tries to use the old IP address but claims it has no DHCPOFFER. I also see some ignored offers from the cable modem (192.168.100.1). Then all of a sudden, dhclient decides the interface is up with no logging claiming it ever did a valid DHCP exchange:
Dec 27 01:04:31 172.17.17.1 dhclient: No DHCPOFFERS received.
Dec 27 01:04:31 172.17.17.1 dhclient: Trying recorded lease 24.93.xx.xx
Dec 27 01:04:31 172.17.17.1 dhclient: TIMEOUT
Dec 27 01:04:31 172.17.17.1 dhclient: Starting add_new_address()
Dec 27 01:04:31 172.17.17.1 dhclient: ifconfig igb0 inet 24.93.xx.xx netmask 255.255.240.0 broadcast 255.255.255.255
Dec 27 01:04:31 172.17.17.1 dhclient: New IP Address (igb0): 24.93.xx.xx
Dec 27 01:04:31 172.17.17.1 dhclient: New Subnet Mask (igb0): 255.255.240.0
Dec 27 01:04:31 172.17.17.1 dhclient: New Broadcast Address (igb0): 255.255.255.255
Dec 27 01:04:31 172.17.17.1 dhclient: New Routers (igb0): 24.93.xx.1
Dec 27 01:04:32 172.17.17.1 dhclient: New Routers (igb0): 24.93.xx.1
Dec 27 01:04:33 172.17.17.1 dhclient: Deleting old routes
Dec 27 01:04:33 172.17.17.1 dhclient: bound: renewal in 31414 seconds.
It claims it has an IP address as show above, but it doesn't nor does it assign it to the interface. However it does populate the next-hop as being the previously known default gateway. As soon as it does that, the log starts to become littered with various network destination unreachable messages for the various services and also an error about the interface itself:
Dec 27 01:04:36 172.17.17.1 kernel: arpresolve: can't allocate llinfo for 24.93.xx.1 on igb0
And then the dhclient process for igb0 will never do anything again. It's also interesting that whatever is happening is also preventing the acquisition of a DHCPv6 address because dhcp6c never even tries to reacquire an address. I was out of town for a few days this week and the pfsense box went almost 48 hours without ever trying to reacquire it's IP address. However if I go into the Status -> Interfaces screen and do a release/renew manually everything comes right up. I've even been working online when this condition has happened and with 30 seconds went into the admin console, observed the condition, and recycled the interface so it came right up. One further note is that having gateway monitoring enable or disabled has no effect on this condition. With gateway monitoring enabled, the log just becomes littered with "SentTo errors" because it can't send the traffic.
I fully believe that something in Spectrum/TWC's network might be causing the original hiccup, but it seems as if there's a bug somewhere in all of the management code managing the interface status/configuration. I would expect dhclient to continually try to reacquire the address and then successfully do it after just a small amount of time. I'm looking for any suggestions on how to troubleshoot this further.
Also, is there any way to call the Release and Renew functions (from Status -> Interfaces) from the shell? I am going to try to write some sort of watchdog script to try to cycle the interface when this happens because if I'm not local do the device, it's incredibly inconvenient until I can do it to cycle it.