DHCP on WAN suddenly started failing
-
Thanks for the tip! I made the same change and I've had no failures for 3 days now. Might be a little premature but seems to have cleared up the issue. I appreciate the help.
-
Of course! Day number 4 and I'm down again. The quality IP change definitely improved things but there's still no auto-recovery after a WAN failure. I am now trying one of the shell scripts above along with cron to reset my WAN interface. This has already worked once since last night and is the first auto-recovery I've had!
-
I have this exact same issue. I use pfSense at about 18 different sites across the valley here and NONE of them have exhibited this kind of behavior before. I too have Comcast and this seems to happen about once every 3 or 4 days. My logs are essentially identical to those that are already on here. Thanks for any help.
-
I updated our 2.0.2 pfsense box to 2.0.3 yesterday. Since then, our WAN interface dies repeatedly.
Because I control its upstream connection, I have even attempted to set the WAN interface to STATIC but it shows all traffic as blocked (on the status iterfaces screen).
I switch back to DHCP and it reconnects and the connection cycles constantly: status up dhcp up, status no carrier dhcp up, status up dhcp down
Seems like the same kind of issue others have reported?
Additional info:
I set the WAN to static settings (same as DHCP was providing), AND manually added Gateway route to the upstream router AND manually input google public DNS into the pfsense settings…and now my pfsense is working normally. (Before update, I used DHCP on the WAN which pulled both the IP and the ISP DNS from the router and everything worked.)
-
I hate to dredge this 2 month old thread up, but I'm having what I believe to be a similar or identical issue here; also on a cable modem for WAN, but I'm with Time Warner.
What's interesting is that when the behavior is happening, I can watch the blinkenlights and the interface is flapping on both pfSense and the modem – the modem is NOT rebooting during this, I just get a constantly flapping WAN interface / modem ethernet.
Thinking maybe it was somehow hardware related (unlikely as the hardware is Soekris with Intel / em NICs), I tried moving the modem from em0 to em3. The link stopped flapping on em3… until I reconfigured pfSense to use em3 as the WAN, at which point it started flapping again.
Rebooting made it all work correctly on em3, and then I was able to move the connection back to em0 and re-set that as the WAN, and it runs fine on that interface again. I'm not sure what the reboot was doing that I wasn't able to do by clicking "renew lease" in the GUI (IIRC, that had no effect).
I have everything that pfSense was outputting on a syslog server if the logs would be helpful. I don't have any of the stuff from /tmp because of the reboot (and this is an embedded install).
I was thinking about trying 2.1RC, but I don't see the point if this is still in 2.1 as well.
-
@bradenmcg:
I just get a constantly flapping WAN interface / modem ethernet.
What is that causes to consider the interface is "flapping"? The interface indicator lights blinking? A recurring sequence of events in the system log?
What is the system log reporting during the time you consider the interface is "flapping"?
-
I am running pfsense 2.0.3 and I have problem with it.
my conf is : isp–> modem--> pfsense--> lan
and my pfsense lost connection with modem gateway.
and suddenly drops all dhcp leases on lan. restart does not solve dhcp issues. -
and my pfsense lost connection with modem gateway.
and suddenly drops all dhcp leases on lan.What do you observe that you describe as "drops all DHCP leases on LAN"? Lose access to Internet? All LAN systems suddenly and almost simultaneously attempt to renew DHCP leases and all fail to get a response?
restart does not solve dhcp issues.
Restart of what: modem? pfSense box? DHCP server on pfSense LAN interface?
Please post an extract of pfSense system log from a few events before "lost connection with modem gateway" to a few minutes later.
-
I am running pfsense 2.0.3 and I have problem with it.
It looks as if you started a new topic for this issue: http://forum.pfsense.org/index.php/topic,64107
Lets keep the discussion there. -
@bradenmcg:
I just get a constantly flapping WAN interface / modem ethernet.
What is that causes to consider the interface is "flapping"? The interface indicator lights blinking? A recurring sequence of events in the system log?
What is the system log reporting during the time you consider the interface is "flapping"?
Sorry for the super delayed reply to this, I have been busy. :)
By "flapping" I mean both log indications from pfSense showing interface up / interface down, as well as physical link lights on both the hardware and the cable modem it is connecting to turning off (indicating no link).
It just happened to me again today when the modem lost signal on the coax. Pfsense would show the interface as "up" and then "no carrier" in Status -> Interfaces (if I kept refreshing the page). I can also SSH to the box and keep repeating an "ifconfig em0" (which is the link attached to the modem) and it alternates between "status: no carrier" and "status: active".
[2.0.3-RELEASE][admin@pf.bhm.ds]/root(9): ifconfig em0 em0: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500 options=4209b <rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum,wol_magic,vlan_hwtso>ether 00:00:24:cf:ef:70 inet6 fe80::200:24ff:fecf:ef70%em0 prefixlen 64 scopeid 0x1 inet [public IP] netmask 0xffffe000 broadcast 255.255.255.255 nd6 options=43 <performnud,accept_rtadv>media: Ethernet autoselect (1000baseT <full-duplex>) status: active [2.0.3-RELEASE][admin@pf.bhm.ds]/root(10): ifconfig em0 em0: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500 options=4209b <rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum,wol_magic,vlan_hwtso>ether 00:00:24:cf:ef:70 inet6 fe80::200:24ff:fecf:ef70%em0 prefixlen 64 scopeid 0x1 inet [public ip] netmask 0xffffe000 broadcast 255.255.255.255 nd6 options=43 <performnud,accept_rtadv>media: Ethernet autoselect status: no carrier</performnud,accept_rtadv></rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum,wol_magic,vlan_hwtso></up,broadcast,running,simplex,multicast></full-duplex></performnud,accept_rtadv></rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum,wol_magic,vlan_hwtso></up,broadcast,running,simplex,multicast>
This appears to be logged in syslog with a tag of "DEVD Ethernet {detached,attached} event on [interfacename]" – this is nice, but it would be better if the interface name in BSD-speak was also included, since searching for "em0" doesn't turn up squat about the link state. "wan (em0)" would be far more helpful than just "wan." According to syslog (this unit dumps to a server), it seems to detach 2 seconds after it has gone to attached state, and then 3 seconds later it goes back to attached. This will repeat forever until I take action.
I've found that hard-setting the interface to 1000/full clears everything up, or I can reboot pfSense and it will be OK again (still at auto) for some indeterminate period of time until there is a link status change. It seems like when the modem is without signal on coax for long enough, it either self-reboots (causing Eth to drop) or it simply drops and re-ups the Eth port.
I hate hardcoding equipment, especially when I can't also hardcode the same on the other side (can't change anything on the modem). I hope that the modem is smart enough to go to 1000/full, but I could be causing a duplex mis-match by hardcoding and I have no way to know since the modem doesn't provide any Ethernet link status beyond a light that indicates speed (blue = 1000, orange = 100 or 10). I know that a 10/100 link with auto on one side and 100/full on the other is generally supposed to operate at 100/half, but I don't know what gigabit is supposed to do under similar auto-mismatches.
I'm wondering if this is some sort of "green ethernet" "feature" in the em driver or the chip itself, or possibly a very strange incompatibility between the chip and this Motorola modem? I don't have any trouble with systems on older Soekris hardware (with the same modem) which didn't use Intel Gigabit chips. I also have other systems on the net6501 but with different WAN devices (Ubee modem in one location and DSL in another) and haven't heard complaints.
-
Do you have gateway monitoring enabled on the WAN link? (Edit the appropriate gateway from the System -> Routing, Gateways tab and look at the Disable Gateway Monitoring attribute.)
-
Yes, gateway monitoring is enabled per the defaults for a dynamic WAN connection.
-
Can you put a switch between the pfSense WAN interface and the modem? This should have the effect of leaving carrier to the modem permanently on (as long as as the switch has power) and should stop each end confusing the other by dropping carrier.
-
I could insert a switch between them, yes… but why should I have to? I know the switch works around the problem, as does hardcoding the interface on pfSense, but this is a bug somewhere, either in the modem's ethernet controller or the em driver in FreeBSD.
I don't have another system handy to see if the behavior persists with other drivers, and every NIC in this Soekris box is Intel/em. I have tried more than one of this Soekris box (in case it was a problem with the board or controller itself) but the problem has persisted across the devices.
-
Actually, I went back and put in a switch for the heck of it… and the interface continues flapping as described previously. The only way I can get it to stop now is to hardcode the interface to 1000/full, or rebooting pfSense fixes the problem until it decides to start up again. Presumably having the switch between the modem and pfSense should extend the time before it starts flapping though.
I've even tried disabling the interface for a while and then re-enabling it to see if downing/re-upping would reset the hardware, but no help.
Again, I don't know if this is necessarily a pfSense bug and seems more likely to be a FreeBSD (driver?) bug... but I'm not directly using FreeBSD. ;)
[edit]
Ok, played around with it some more. It seems like there is something strange going on with the "autoselect" option. When I have the interface set to "Autoselect," it will flap the link at layer 1 (LED turns on and off) indefinitely. If I set the interface to 1000/full, it stays up.
If I change the interface to "Default / no preference", it seems to take on the settings it was at previously – if I had been at 1000/full and I set to Default/No Pref and Save+Apply, the link stays hardcoded at 1000/full (with unknown effects since the device on the other side can't be hardcoded).
However, if I set the link to autoselect, save+apply and allow it to flap, and while that is happening change it back to "default/no preference," and then save+apply that setting... the link stays at autoselect but it doesn't flap anymore. o_O
There is something Not Right (TM) with autoselect mediaopt on em(4) in 2.0.3 release at least. >_<
-
Uhm… frankly this sounds like faulty cable or the modem being garbage, rather than any driver bug.
-
Please post an extract of pfSense system log from a few events before "lost connection with modem gateway" to a few minutes later.
There have been a number of different reports of this sort of problem but (as far as I know) the problem has not been identified. It seems to be quite difficult to reproduce. Could you turn off gateway monitoring for a few days to see if that makes a difference? Also please post the output of pfSense shell command```
wc -c -l /tmp/rules.debug -
Uhm… frankly this sounds like faulty cable or the modem being garbage, rather than any driver bug.
I would agree with you except the problem persists when I put a switch between the pfSense box and the modem, and the problem manifests on the pfSense box itself and not the modem in that case. (The link on pfSense is flapping and the modem doesn't change state at all if I have the switch between them.) It seems like when there ISN'T a switch between them, the entire process starts if the modem reboots or loses RF signal, which causes it to drop its Ethernet interface after some period of time. Once that link has dropped and reconnected, pfSense goes off the rails.
wallabybob:
That shouldn't be a problem, I think I'm able to reproduce it fairly easily now just by setting the link directly to autoselect then unplugging the NIC and plugging it back in. That seems to start the flapping anew (or at least it was last night).wc -c -l /tmp/rules.debug 222 12310 /tmp/rules.debug
I have some stale log events from last night in my syslog server, or I can attempt to recreate again tonight and post fresh log details.
-
@bradenmcg:
I would agree with you except the problem persists when I put a switch between the pfSense box and the modem, and the problem manifests on the pfSense box itself and not the modem in that case. (The link on pfSense is flapping and the modem doesn't change state at all if I have the switch between them.)
Well, then it's probably time to consider the pfsense box HW itself is faulty.
-
@bradenmcg:
I would agree with you except the problem persists when I put a switch between the pfSense box and the modem, and the problem manifests on the pfSense box itself and not the modem in that case. (The link on pfSense is flapping and the modem doesn't change state at all if I have the switch between them.)
Well, then it's probably time to consider the pfsense box HW itself is faulty.
It is a Soekris net6501. It's been replaced once already due to this because I thought it was bad HW, and it does it on the new box as well (I doubt I got two bad units in a row). The same behavior happens on multiple ports on the unit too, each a separate ethernet controller (albeit all using the em driver). If I change my definition of WAN to be em4 instead of em0 and move the ethernet connection to the modem there, it is fine until Layer 1 gets dropped for whatever reason, and then the flapping starts again.
If it's not the driver, I wonder if gateway monitoring could be the cause, since I don't believe LAN would get monitored by default (and I've never had this problem with the LAN port connected to a gigabit HP switch).