DHCP on WAN suddenly started failing
-
Wow, I'm running out of things to change here.
New box, new USB (SiiG) ethernet adaptor on wan side, new cablemodem (SB6141), new Ethernet cable.
I've tried 2.0.1, 2.0.2, 2.0.3, 2.1-PRE all with the same outcome.Latest iteration is 2.0.1, freshly installed onto a new USB stick.
Packet capture: http://www.tubas.net/pf/ue0.pcap
System Log excerpt: http://www.tubas.net/pf/ue0_fail_system.log
(these overlap, but aren't exact)The only thing I can think of that might at this point be polluting this would be my configuration backup. Is there anything in there that could be affecting this? I've restored it to the new system whenever I've had to start from scratch.
-
The packet capture shows bursts of DHCP requests going out at intervals of a few microsoeconds but the system logs shows a most two in burst.
The packet capture shows DHCP replies and the system log shows dhclient exiting at the "same time" (but the granularity of the system log timestamp is not sufficiently fine). My suspicion is that dhclient is provoked into exiting by something in the DHCP reply. That dhclient sends some requests suggests to me the configuration file is probably OK UNLESS the problem is caused when dhclient tries to act on the reply using something from the configuration file.
dhclint has its standard output redirected to /tmp/ue0_output and its output redirected to /tmp/ue0_error_output. (Presumably stdout should be redirected to the first file and stderr to the second file.) Please post the contents of these files.
In a further attempt to capture what could be causing the problem, I suggest you
1. Capture the contents of pfSense file /var/etc/dhclient_wan.conf
2. Change the WAN interface type from DHCP to None
3. Reboot
4. Start a packet capture on the WAN interface, highly detailed and filtering on port 67 (DHCP traffic only)
5. In a SSH session to your pfSense box check the file /var/etc/dhclient_wan.conf hasn't changed (if it has, restore it) then issue the shell command /sbin/dhclient -c /var/etc/dhclient_wan.conf ue0 and capture the output. -
Noted.
I just did a 100% new built using the latest 2.1 snapshot. Didn't bring in any config from the old install - I only have a handful of NATs so not a big deal to recreate. We'll see if there's any change here.
-
Another failure…but this time different output.
And, I relized I've been crapping up multiple threads :-[ with this, so since I'm back on 2.1, I put the latest into over there:
http://forum.pfsense.org/index.php/topic,58819.0.html
-
In my opinion, here is no dhcp issue.
My wan connection failed with static address too in the same way (wan interface detached/connected again and again).
Only apply/reset filter works. (My script does it for now)It can be kernel issue. Is it?
P.s. my LAN/wan interfaces are build-in , so can't being detached
-
Has there been any updates to this issue?
I'm a newbie (less than a month) to pfSense and I'm having similar problems on 2.0.2. I lose my WAN connection at least once a day and the logs seem to indicate a DHCP failure. A simple "renew" restores connectivity. It doesn't appear that anyone has posted to this thread for the last month so I'm wondering if the issue was resolved in a "snap" or if the work-around script is the way to go (for now).
-
I ended up abandoning ship - replaced it with an AirPort Extreme. I miss all the features of pfSense, but got tired of fiddling with it. Perhaps I will try again on the future.
-
I had a similar problem that I resolved by changing the gateway monitor IP to Google's DNS servers (8.8.8.8). Haven't had an issue since.
-
Thanks for the tip! I made the same change and I've had no failures for 3 days now. Might be a little premature but seems to have cleared up the issue. I appreciate the help.
-
Of course! Day number 4 and I'm down again. The quality IP change definitely improved things but there's still no auto-recovery after a WAN failure. I am now trying one of the shell scripts above along with cron to reset my WAN interface. This has already worked once since last night and is the first auto-recovery I've had!
-
I have this exact same issue. I use pfSense at about 18 different sites across the valley here and NONE of them have exhibited this kind of behavior before. I too have Comcast and this seems to happen about once every 3 or 4 days. My logs are essentially identical to those that are already on here. Thanks for any help.
-
I updated our 2.0.2 pfsense box to 2.0.3 yesterday. Since then, our WAN interface dies repeatedly.
Because I control its upstream connection, I have even attempted to set the WAN interface to STATIC but it shows all traffic as blocked (on the status iterfaces screen).
I switch back to DHCP and it reconnects and the connection cycles constantly: status up dhcp up, status no carrier dhcp up, status up dhcp down
Seems like the same kind of issue others have reported?
Additional info:
I set the WAN to static settings (same as DHCP was providing), AND manually added Gateway route to the upstream router AND manually input google public DNS into the pfsense settings…and now my pfsense is working normally. (Before update, I used DHCP on the WAN which pulled both the IP and the ISP DNS from the router and everything worked.)
-
I hate to dredge this 2 month old thread up, but I'm having what I believe to be a similar or identical issue here; also on a cable modem for WAN, but I'm with Time Warner.
What's interesting is that when the behavior is happening, I can watch the blinkenlights and the interface is flapping on both pfSense and the modem – the modem is NOT rebooting during this, I just get a constantly flapping WAN interface / modem ethernet.
Thinking maybe it was somehow hardware related (unlikely as the hardware is Soekris with Intel / em NICs), I tried moving the modem from em0 to em3. The link stopped flapping on em3… until I reconfigured pfSense to use em3 as the WAN, at which point it started flapping again.
Rebooting made it all work correctly on em3, and then I was able to move the connection back to em0 and re-set that as the WAN, and it runs fine on that interface again. I'm not sure what the reboot was doing that I wasn't able to do by clicking "renew lease" in the GUI (IIRC, that had no effect).
I have everything that pfSense was outputting on a syslog server if the logs would be helpful. I don't have any of the stuff from /tmp because of the reboot (and this is an embedded install).
I was thinking about trying 2.1RC, but I don't see the point if this is still in 2.1 as well.
-
@bradenmcg:
I just get a constantly flapping WAN interface / modem ethernet.
What is that causes to consider the interface is "flapping"? The interface indicator lights blinking? A recurring sequence of events in the system log?
What is the system log reporting during the time you consider the interface is "flapping"?
-
I am running pfsense 2.0.3 and I have problem with it.
my conf is : isp–> modem--> pfsense--> lan
and my pfsense lost connection with modem gateway.
and suddenly drops all dhcp leases on lan. restart does not solve dhcp issues. -
and my pfsense lost connection with modem gateway.
and suddenly drops all dhcp leases on lan.What do you observe that you describe as "drops all DHCP leases on LAN"? Lose access to Internet? All LAN systems suddenly and almost simultaneously attempt to renew DHCP leases and all fail to get a response?
restart does not solve dhcp issues.
Restart of what: modem? pfSense box? DHCP server on pfSense LAN interface?
Please post an extract of pfSense system log from a few events before "lost connection with modem gateway" to a few minutes later.
-
I am running pfsense 2.0.3 and I have problem with it.
It looks as if you started a new topic for this issue: http://forum.pfsense.org/index.php/topic,64107
Lets keep the discussion there. -
@bradenmcg:
I just get a constantly flapping WAN interface / modem ethernet.
What is that causes to consider the interface is "flapping"? The interface indicator lights blinking? A recurring sequence of events in the system log?
What is the system log reporting during the time you consider the interface is "flapping"?
Sorry for the super delayed reply to this, I have been busy. :)
By "flapping" I mean both log indications from pfSense showing interface up / interface down, as well as physical link lights on both the hardware and the cable modem it is connecting to turning off (indicating no link).
It just happened to me again today when the modem lost signal on the coax. Pfsense would show the interface as "up" and then "no carrier" in Status -> Interfaces (if I kept refreshing the page). I can also SSH to the box and keep repeating an "ifconfig em0" (which is the link attached to the modem) and it alternates between "status: no carrier" and "status: active".
[2.0.3-RELEASE][admin@pf.bhm.ds]/root(9): ifconfig em0 em0: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500 options=4209b <rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum,wol_magic,vlan_hwtso>ether 00:00:24:cf:ef:70 inet6 fe80::200:24ff:fecf:ef70%em0 prefixlen 64 scopeid 0x1 inet [public IP] netmask 0xffffe000 broadcast 255.255.255.255 nd6 options=43 <performnud,accept_rtadv>media: Ethernet autoselect (1000baseT <full-duplex>) status: active [2.0.3-RELEASE][admin@pf.bhm.ds]/root(10): ifconfig em0 em0: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500 options=4209b <rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum,wol_magic,vlan_hwtso>ether 00:00:24:cf:ef:70 inet6 fe80::200:24ff:fecf:ef70%em0 prefixlen 64 scopeid 0x1 inet [public ip] netmask 0xffffe000 broadcast 255.255.255.255 nd6 options=43 <performnud,accept_rtadv>media: Ethernet autoselect status: no carrier</performnud,accept_rtadv></rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum,wol_magic,vlan_hwtso></up,broadcast,running,simplex,multicast></full-duplex></performnud,accept_rtadv></rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum,wol_magic,vlan_hwtso></up,broadcast,running,simplex,multicast>
This appears to be logged in syslog with a tag of "DEVD Ethernet {detached,attached} event on [interfacename]" – this is nice, but it would be better if the interface name in BSD-speak was also included, since searching for "em0" doesn't turn up squat about the link state. "wan (em0)" would be far more helpful than just "wan." According to syslog (this unit dumps to a server), it seems to detach 2 seconds after it has gone to attached state, and then 3 seconds later it goes back to attached. This will repeat forever until I take action.
I've found that hard-setting the interface to 1000/full clears everything up, or I can reboot pfSense and it will be OK again (still at auto) for some indeterminate period of time until there is a link status change. It seems like when the modem is without signal on coax for long enough, it either self-reboots (causing Eth to drop) or it simply drops and re-ups the Eth port.
I hate hardcoding equipment, especially when I can't also hardcode the same on the other side (can't change anything on the modem). I hope that the modem is smart enough to go to 1000/full, but I could be causing a duplex mis-match by hardcoding and I have no way to know since the modem doesn't provide any Ethernet link status beyond a light that indicates speed (blue = 1000, orange = 100 or 10). I know that a 10/100 link with auto on one side and 100/full on the other is generally supposed to operate at 100/half, but I don't know what gigabit is supposed to do under similar auto-mismatches.
I'm wondering if this is some sort of "green ethernet" "feature" in the em driver or the chip itself, or possibly a very strange incompatibility between the chip and this Motorola modem? I don't have any trouble with systems on older Soekris hardware (with the same modem) which didn't use Intel Gigabit chips. I also have other systems on the net6501 but with different WAN devices (Ubee modem in one location and DSL in another) and haven't heard complaints.
-
Do you have gateway monitoring enabled on the WAN link? (Edit the appropriate gateway from the System -> Routing, Gateways tab and look at the Disable Gateway Monitoring attribute.)
-
Yes, gateway monitoring is enabled per the defaults for a dynamic WAN connection.