DHCP on WAN suddenly started failing

ZPrime

I didn't want to paste the entire log (500+ lines for about a minute or two of flapping), so I put it on pastebin.

http://pastebin.com/CJ4rN1yV

Unfortunately, system.log doesn't seem to make a note of the time that any changes were actually enacted by a user, and I didn't look at a seconds-accurate clock to know when I "fixed" the interface.

This was all started by simply going to Interfaces > (WAN interface), then clicking "Advanced" under the "Speed and duplex" heading, and changing it to "autoselect", then clicking "Save" at the bottom of the page, and then "Apply" at the top.

The link was allowed to flap up and down for a minute or two, and then somewhere around 23:06 I reversed the above process, resetting "speed and duplex" back to "Default (no preference, typically autoselect)". The final DHCP / link up / etc should all be due to that change, and it has remained stable since then.

Gateway Monitoring was DISABLED during this test, by checking "Disable Gateway Monitoring" under System: Gateways: Edit gateway.

My LAN interface apparently has GW monitoring enabled by default, and it is set as "autoselect" for speed/duplex, and has never exhibited this flapping behavior (to my knowledge at least). Then again, the switch doesn't get restarted often… but I just reset it last night and didn't have any problems on the LAN side.

If there is more info that would be helpful I can reproduce again and gather whatever's needed.

wallabybob

@bradenmcg:

My LAN interface apparently has GW monitoring enabled by default,

What GUI setting leads you to that conclusion?

What do you have set as DNS Server and Use gateway on System -> General Setup?

Your log shows (I have deleted some entries :

Aug 7 23:05:36 pf check_reload_status: Linkup starting em0
Aug 7 23:05:36 pf kernel: em0: link state changed to DOWN
Aug 7 23:05:36 pf dhclient[30956]: My address (76.189.121.xxx) was deleted, dhclient exiting
Aug 7 23:05:39 pf php: : DEVD Ethernet detached event for wan
Aug 7 23:05:40 pf check_reload_status: Linkup starting em0
Aug 7 23:05:40 pf kernel: em0: link state changed to UP
Aug 7 23:05:40 pf dhclient: PREINIT
Aug 7 23:05:40 pf dhclient[40076]: DHCPREQUEST on em0 to 255.255.255.255 port 67
Aug 7 23:05:40 pf dhclient[40076]: DHCPACK from 142.254.151.133
Aug 7 23:05:40 pf dhclient: REBOOT
Aug 7 23:05:40 pf dhclient: Starting add_new_address()
Aug 7 23:05:40 pf dhclient: ifconfig em0 inet 76.189.121.xxx netmask 255.255.224.0 broadcast 255.255.255.255
Aug 7 23:05:40 pf dhclient: New IP Address (em0): 76.189.121.xxx
Aug 7 23:05:40 pf dhclient: New Subnet Mask (em0): 255.255.224.0
Aug 7 23:05:40 pf dhclient: New Broadcast Address (em0): 255.255.255.255
Aug 7 23:05:40 pf dhclient: New Routers (em0): 76.189.96.1
Aug 7 23:05:40 pf dhclient: Adding new routes to interface: em0
Aug 7 23:05:40 pf dhclient: /sbin/route add default 76.189.96.1
Aug 7 23:05:40 pf dhclient: Creating resolv.conf
Aug 7 23:05:41 pf check_reload_status: rc.newwanip starting em0
Aug 7 23:05:41 pf dhclient[40076]: bound to 76.189.121.xxx – renewal in 21600 seconds.
Aug 7 23:05:41 pf php: /interfaces.php: ROUTING: setting default route to 76.189.96.1
Aug 7 23:05:42 pf php: : DEVD Ethernet attached event for wan
Aug 7 23:05:42 pf php: : HOTPLUG: Configuring interface wan
Aug 7 23:05:42 pf check_reload_status: Linkup starting em0
Aug 7 23:05:42 pf kernel: em0: link state changed to DOWN
Aug 7 23:05:43 pf dnsmasq[35896]: exiting on receipt of SIGTERM
Aug 7 23:05:43 pf php: : rc.newwanip: Informational is starting em0.
Aug 7 23:05:43 pf php: : rc.newwanip: on (IP address: 76.189.121.xxx) (interface: wan) (real interface: em0).
Aug 7 23:05:43 pf php: : The command 'route change -host 76.189.96.1' returned exit code '1', the output was 'route: writing to routing socket: No such process route: writing to routing socket: Invalid argument change host 76.189.96.1: Invalid argument'

Notice the link state to DOWN transition at 23:05:42 before the completion of the processing of the preceding linkup - the error returned by route at 23:05:43 appears to be because of an attempt to set a route to a DNS server failing because part of the necessary information has gone missing because the link went down.

The log extract shows a number of instances of link up promptly followed by link down:

Aug 7 23:05:36 pf check_reload_status: Linkup starting em0
Aug 7 23:05:36 pf kernel: em0: link state changed to DOWN

Aug 7 23:05:40 pf kernel: em0: link state changed to UP

Aug 7 23:05:42 pf check_reload_status: Linkup starting em0
Aug 7 23:05:42 pf kernel: em0: link state changed to DOWN

Aug 7 23:05:45 pf check_reload_status: Linkup starting em0
Aug 7 23:05:45 pf kernel: em0: link state changed to UP

Aug 7 23:05:48 pf kernel: em0: link state changed to DOWN
Aug 7 23:05:48 pf check_reload_status: Linkup starting em0
A
Aug 7 23:05:54 pf kernel: em0: link state changed to DOWN
Aug 7 23:05:54 pf check_reload_status: Linkup starting em0

Aug 7 23:05:57 pf kernel: em0: link state changed to UP
Aug 7 23:05:57 pf check_reload_status: Linkup starting em0

Aug 7 23:06:00 pf check_reload_status: Linkup starting em0
Aug 7 23:06:00 pf kernel: em0: link state changed to DOWN

Aug 7 23:06:06 pf check_reload_status: Linkup starting em0
Aug 7 23:06:06 pf kernel: em0: link state changed to DOWN

Aug 7 23:06:09 pf check_reload_status: Linkup starting em0
Aug 7 23:06:09 pf kernel: em0: link state changed to UP

Aug 7 23:06:11 pf check_reload_status: Linkup starting em0
Aug 7 23:06:11 pf kernel: em0: link state changed to DOWN

Aug 7 23:06:15 pf check_reload_status: Linkup starting em0
Aug 7 23:06:15 pf kernel: em0: link state changed to UP

. I don't know why this is happenning. But with link up running concurrently with link down PERHAPS the link up processing is detecting an unexpected condition (e.g. the WAN interface doesn't have an IP address) and forcing link down to try to reset things

Does it make a difference to disable miniupnpd? The log suggests it isn't happy. I don't know if miniupnpd is doing anything to contribute to this unhappy state of affairs.

ZPrime

@wallabybob:

@bradenmcg:

My LAN interface apparently has GW monitoring enabled by default,

What GUI setting leads you to that conclusion?

I looked at Gateway Monitoring and the LAN interface was in there and configured already, I did not alter it. I thought that only WAN gateways were monitored but I guess all gateways are unless you change them.

What do you have set as DNS Server and Use gateway on System -> General Setup?

DNS servers are all blank, and "use gateway" is WAN. I have "allow DNS server list to be overridden" enabled.

I do have two DNS domain overrides setup on Services > DNS Forwarder that allow me to resolve hostnames from my office (across VPN), but otherwise DNS forwarder is just using Time Warner's DHCP-assigned pair of servers.

Your log shows (I have deleted some entries :

Aug 7 23:05:36 pf check_reload_status: Linkup starting em0
Aug 7 23:05:36 pf kernel: em0: link state changed to DOWN
Aug 7 23:05:36 pf dhclient[30956]: My address (76.189.121.xxx) was deleted, dhclient exiting
Aug 7 23:05:39 pf php: : DEVD Ethernet detached event for wan
Aug 7 23:05:40 pf check_reload_status: Linkup starting em0
Aug 7 23:05:40 pf kernel: em0: link state changed to UP
Aug 7 23:05:40 pf dhclient: PREINIT
Aug 7 23:05:40 pf dhclient[40076]: DHCPREQUEST on em0 to 255.255.255.255 port 67
Aug 7 23:05:40 pf dhclient[40076]: DHCPACK from 142.254.151.133
Aug 7 23:05:40 pf dhclient: REBOOT
Aug 7 23:05:40 pf dhclient: Starting add_new_address()
Aug 7 23:05:40 pf dhclient: ifconfig em0 inet 76.189.121.xxx netmask 255.255.224.0 broadcast 255.255.255.255
Aug 7 23:05:40 pf dhclient: New IP Address (em0): 76.189.121.xxx
Aug 7 23:05:40 pf dhclient: New Subnet Mask (em0): 255.255.224.0
Aug 7 23:05:40 pf dhclient: New Broadcast Address (em0): 255.255.255.255
Aug 7 23:05:40 pf dhclient: New Routers (em0): 76.189.96.1
Aug 7 23:05:40 pf dhclient: Adding new routes to interface: em0
Aug 7 23:05:40 pf dhclient: /sbin/route add default 76.189.96.1
Aug 7 23:05:40 pf dhclient: Creating resolv.conf
Aug 7 23:05:41 pf check_reload_status: rc.newwanip starting em0
Aug 7 23:05:41 pf dhclient[40076]: bound to 76.189.121.xxx – renewal in 21600 seconds.
Aug 7 23:05:41 pf php: /interfaces.php: ROUTING: setting default route to 76.189.96.1
Aug 7 23:05:42 pf php: : DEVD Ethernet attached event for wan
Aug 7 23:05:42 pf php: : HOTPLUG: Configuring interface wan
Aug 7 23:05:42 pf check_reload_status: Linkup starting em0
Aug 7 23:05:42 pf kernel: em0: link state changed to DOWN
Aug 7 23:05:43 pf dnsmasq[35896]: exiting on receipt of SIGTERM
Aug 7 23:05:43 pf php: : rc.newwanip: Informational is starting em0.
Aug 7 23:05:43 pf php: : rc.newwanip: on (IP address: 76.189.121.xxx) (interface: wan) (real interface: em0).
Aug 7 23:05:43 pf php: : The command 'route change -host 76.189.96.1' returned exit code '1', the output was 'route: writing to routing socket: No such process route: writing to routing socket: Invalid argument change host 76.189.96.1: Invalid argument'

Notice the link state to DOWN transition at 23:05:42 before the completion of the processing of the preceding linkup - the error returned by route at 23:05:43 appears to be because of an attempt to set a route to a DNS server failing because part of the necessary information has gone missing because the link went down.

The route that it is trying to set is the default route - 76.189.96.1 is the default gateway for the IP I am assigned. TWC is using a fairly large subnet here (/19).

The log extract shows a number of instances of link up promptly followed by link down:

Aug 7 23:05:36 pf check_reload_status: Linkup starting em0
Aug 7 23:05:36 pf kernel: em0: link state changed to DOWN

Aug 7 23:05:40 pf kernel: em0: link state changed to UP

Aug 7 23:05:42 pf check_reload_status: Linkup starting em0
Aug 7 23:05:42 pf kernel: em0: link state changed to DOWN

Aug 7 23:05:45 pf check_reload_status: Linkup starting em0
Aug 7 23:05:45 pf kernel: em0: link state changed to UP

Aug 7 23:05:48 pf kernel: em0: link state changed to DOWN
Aug 7 23:05:48 pf check_reload_status: Linkup starting em0
A
Aug 7 23:05:54 pf kernel: em0: link state changed to DOWN
Aug 7 23:05:54 pf check_reload_status: Linkup starting em0

Aug 7 23:05:57 pf kernel: em0: link state changed to UP
Aug 7 23:05:57 pf check_reload_status: Linkup starting em0

Aug 7 23:06:00 pf check_reload_status: Linkup starting em0
Aug 7 23:06:00 pf kernel: em0: link state changed to DOWN

Aug 7 23:06:06 pf check_reload_status: Linkup starting em0
Aug 7 23:06:06 pf kernel: em0: link state changed to DOWN

Aug 7 23:06:09 pf check_reload_status: Linkup starting em0
Aug 7 23:06:09 pf kernel: em0: link state changed to UP

Aug 7 23:06:11 pf check_reload_status: Linkup starting em0
Aug 7 23:06:11 pf kernel: em0: link state changed to DOWN

Aug 7 23:06:15 pf check_reload_status: Linkup starting em0
Aug 7 23:06:15 pf kernel: em0: link state changed to UP

. I don't know why this is happening. But with link up running concurrently with link down PERHAPS the link up processing is detecting an unexpected condition (e.g. the WAN interface doesn't have an IP address) and forcing link down to try to reset things

Does it make a difference to disable miniupnpd? The log suggests it isn't happy. I don't know if miniupnpd is doing anything to contribute to this unhappy state of affairs.

To my knowledge miniupnpd doesn't actively do anything with the link, it is just complaining because it wants to know the WAN IP and can't find it. I can try disabling it though and then triggering the bug again to see if the link will still flap.

Should I open a bug for this in Redmine?

netmagi

Sorry to necro this, but I've been searching topics for the last hour, and this is the only topic I was able to find that is exactly, to the tee, what I am experiencing.

Was this ever resolved?

Running 2.0.1 release. This box has been running for over a year. no issues until this

Recently, I was experiencing issues with my cable modem and I made two changes to the WAN interface config:

I cloned the MAC to another system I was testing with temporarily (and then removed the clone settings when I was done testing)

. . . and

I manually set the interface to 100Mbps, and then set it to "autoselect" (which I thought was the default)

Since then, any interruption of the WAN carrier causes that interface to flap. For ex, if I unplug the WAN cable, and plug it back in. . flaps indefinitely. If the cable modem times out briefly due to service provider maintenance, and the cable modem resets, interface flaps forever.

Only a reboot of the system resolves it.

Before the above 2 changes, this never happened.

It seems like the OP here fixed by setting the interface back to the actual default setting- "Default (no preference, typically autoselect)". I set mine back to that as well now, but can't test it at the moment since it's production.

The OP asked about opening up a bug report? Is this something resolved in a newer version? Was the underlying cause ever discovered?

Nic uses em driver, IP config is DHCP, and connection to cable modem is gigabit.

-Rich H.

ZPrime

@netmagi:

Sorry to necro this, but I've been searching topics for the last hour, and this is the only topic I was able to find that is exactly, to the tee, what I am experiencing.

Was this ever resolved?

Running 2.0.1 release. This box has been running for over a year. no issues until this

Recently, I was experiencing issues with my cable modem and I made two changes to the WAN interface config:

I cloned the MAC to another system I was testing with temporarily (and then removed the clone settings when I was done testing)

. . . and

I manually set the interface to 100Mbps, and then set it to "autoselect" (which I thought was the default)

Since then, any interruption of the WAN carrier causes that interface to flap. For ex, if I unplug the WAN cable, and plug it back in. . flaps indefinitely. If the cable modem times out briefly due to service provider maintenance, and the cable modem resets, interface flaps forever.

Only a reboot of the system resolves it.

Before the above 2 changes, this never happened.

It seems like the OP here fixed by setting the interface back to the actual default setting- "Default (no preference, typically autoselect)". I set mine back to that as well now, but can't test it at the moment since it's production.

The OP asked about opening up a bug report? Is this something resolved in a newer version? Was the underlying cause ever discovered?

Nic uses em driver, IP config is DHCP, and connection to cable modem is gigabit.

-Rich H.

Rich, interestingly my system is also an em-driver interface (Soekris 65xx).

What is your cable modem here? I have a Motorola 6141, which has a gigabit port as well. The other work-around I've found is to hardcode the pfSense interface to 1000/full. Unlike 10/100 where an autonegotiation mismatch causes a link to go to 10/half, an auto-mismatch on gigabit will apparently allow both sides to link at 1000/full.

I was apprehensive about hardcoding a speed on the pfSense side because I knew I had no way to hardcode the modem and historically auto-negotiation mismatches are Very Bad(TM). Since this is at home and I can mess with things when the wife is asleep, I did it and then checked the link lights on the modem. They were still showing 1000/full so in this case at least, no duplex or speed mismatch.

This all said, my system is running 2.1-RELEASE now and I haven't seen any problems from it yet. The WAN if is still set to "Default" and nothing has flapped, and I know that the modem has been restarted a few times since the upgrade.

netmagi

Thanks so much for the reply.

My cable modem is a Motorola as well, the SB6121. After changing to "Default (no preference, typically autoselect)", and cycling the power on the cable modem last night, it's working correctly. No flap.

Seems easy to blame the cable modem here, but they are very widely used. I wonder if it's something very specific with the timing for the auto-negotiation between this modem and how pfsense brings the interface up and down . . .

-Rich H.

rdighe

I seem to be having same issue here.. Cable mode is Thomson DCM476. system is supermicro motherboard with two embedded em based nics. em1 is connected to Baystack 450-24t and is always up and rocksolid. em0 connected to cable modem is flapping. after landing on this thread I have just set wan interface speed to 1000baseT and will monitor for next couple of days.

–Rajiv