Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    DHCP on WAN suddenly started failing

    Scheduled Pinned Locked Moved DHCP and DNS
    67 Posts 17 Posters 28.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Z Offline
      ZPrime
      last edited by

      I didn't want to paste the entire log (500+ lines for about a minute or two of flapping), so I put it on pastebin.

      http://pastebin.com/CJ4rN1yV

      Unfortunately, system.log doesn't seem to make a note of the time that any changes were actually enacted by a user, and I didn't look at a seconds-accurate clock to know when I "fixed" the interface.

      This was all started by simply going to Interfaces > (WAN interface), then clicking "Advanced" under the "Speed and duplex" heading, and changing it to "autoselect", then clicking "Save" at the bottom of the page, and then "Apply" at the top.

      The link was allowed to flap up and down for a minute or two, and then somewhere around 23:06 I reversed the above process, resetting "speed and duplex" back to "Default (no preference, typically autoselect)".  The final DHCP / link up / etc should all be due to that change, and it has remained stable since then.

      Gateway Monitoring was DISABLED during this test, by checking "Disable Gateway Monitoring" under System: Gateways: Edit gateway.

      My LAN interface apparently has GW monitoring enabled by default, and it is set as "autoselect" for speed/duplex, and has never exhibited this flapping behavior (to my knowledge at least).  Then again, the switch doesn't get restarted often… but I just reset it last night and didn't have any problems on the LAN side.

      If there is more info that would be helpful I can reproduce again and gather whatever's needed.

      1 Reply Last reply Reply Quote 0
      • W Offline
        wallabybob
        last edited by

        @bradenmcg:

        My LAN interface apparently has GW monitoring enabled by default,

        What GUI setting leads you to that conclusion?

        What do you have set as DNS Server and Use gateway on System -> General Setup?

        Your log shows (I have deleted some entries :

        Aug  7 23:05:36 pf check_reload_status: Linkup starting em0
        Aug  7 23:05:36 pf kernel: em0: link state changed to DOWN
        Aug  7 23:05:36 pf dhclient[30956]: My address (76.189.121.xxx) was deleted, dhclient exiting
        Aug  7 23:05:39 pf php: : DEVD Ethernet detached event for wan
        Aug  7 23:05:40 pf check_reload_status: Linkup starting em0
        Aug  7 23:05:40 pf kernel: em0: link state changed to UP
        Aug  7 23:05:40 pf dhclient: PREINIT
        Aug  7 23:05:40 pf dhclient[40076]: DHCPREQUEST on em0 to 255.255.255.255 port 67
        Aug  7 23:05:40 pf dhclient[40076]: DHCPACK from 142.254.151.133
        Aug  7 23:05:40 pf dhclient: REBOOT
        Aug  7 23:05:40 pf dhclient: Starting add_new_address()
        Aug  7 23:05:40 pf dhclient: ifconfig em0 inet 76.189.121.xxx netmask 255.255.224.0 broadcast 255.255.255.255
        Aug  7 23:05:40 pf dhclient: New IP Address (em0): 76.189.121.xxx
        Aug  7 23:05:40 pf dhclient: New Subnet Mask (em0): 255.255.224.0
        Aug  7 23:05:40 pf dhclient: New Broadcast Address (em0): 255.255.255.255
        Aug  7 23:05:40 pf dhclient: New Routers (em0): 76.189.96.1
        Aug  7 23:05:40 pf dhclient: Adding new routes to interface: em0
        Aug  7 23:05:40 pf dhclient: /sbin/route add default 76.189.96.1
        Aug  7 23:05:40 pf dhclient: Creating resolv.conf
        Aug  7 23:05:41 pf check_reload_status: rc.newwanip starting em0
        Aug  7 23:05:41 pf dhclient[40076]: bound to 76.189.121.xxx – renewal in 21600 seconds.
        Aug  7 23:05:41 pf php: /interfaces.php: ROUTING: setting default route to 76.189.96.1
        Aug  7 23:05:42 pf php: : DEVD Ethernet attached event for wan
        Aug  7 23:05:42 pf php: : HOTPLUG: Configuring interface wan
        Aug  7 23:05:42 pf check_reload_status: Linkup starting em0
        Aug  7 23:05:42 pf kernel: em0: link state changed to DOWN
        Aug  7 23:05:43 pf dnsmasq[35896]: exiting on receipt of SIGTERM
        Aug  7 23:05:43 pf php: : rc.newwanip: Informational is starting em0.
        Aug  7 23:05:43 pf php: : rc.newwanip: on (IP address: 76.189.121.xxx) (interface: wan) (real interface: em0).
        Aug  7 23:05:43 pf php: : The command 'route change -host  76.189.96.1' returned exit code '1', the output was 'route: writing to routing socket: No such process route: writing to routing socket: Invalid argument change host 76.189.96.1: Invalid argument'

        Notice the link state to DOWN transition at 23:05:42 before the completion of the processing of the preceding linkup - the error returned  by route at 23:05:43 appears to be because of an attempt to set a route to a DNS server failing because part of the necessary information has gone missing because the link went down.

        The log extract shows a number of instances of link up promptly followed by link down:

        Aug  7 23:05:36 pf check_reload_status: Linkup starting em0
        Aug  7 23:05:36 pf kernel: em0: link state changed to DOWN

        Aug  7 23:05:40 pf kernel: em0: link state changed to UP

        Aug  7 23:05:42 pf check_reload_status: Linkup starting em0
        Aug  7 23:05:42 pf kernel: em0: link state changed to DOWN

        Aug  7 23:05:45 pf check_reload_status: Linkup starting em0
        Aug  7 23:05:45 pf kernel: em0: link state changed to UP

        Aug  7 23:05:48 pf kernel: em0: link state changed to DOWN
        Aug  7 23:05:48 pf check_reload_status: Linkup starting em0
        A
        Aug  7 23:05:54 pf kernel: em0: link state changed to DOWN
        Aug  7 23:05:54 pf check_reload_status: Linkup starting em0

        Aug  7 23:05:57 pf kernel: em0: link state changed to UP
        Aug  7 23:05:57 pf check_reload_status: Linkup starting em0

        Aug  7 23:06:00 pf check_reload_status: Linkup starting em0
        Aug  7 23:06:00 pf kernel: em0: link state changed to DOWN

        Aug  7 23:06:06 pf check_reload_status: Linkup starting em0
        Aug  7 23:06:06 pf kernel: em0: link state changed to DOWN

        Aug  7 23:06:09 pf check_reload_status: Linkup starting em0
        Aug  7 23:06:09 pf kernel: em0: link state changed to UP

        Aug  7 23:06:11 pf check_reload_status: Linkup starting em0
        Aug  7 23:06:11 pf kernel: em0: link state changed to DOWN

        Aug  7 23:06:15 pf check_reload_status: Linkup starting em0
        Aug  7 23:06:15 pf kernel: em0: link state changed to UP

        . I don't know why this is happenning. But with link up running concurrently with link down PERHAPS the link up processing is detecting an unexpected condition (e.g. the WAN interface doesn't have an IP address) and forcing link down to try to reset things

        Does it make a difference to disable miniupnpd? The log suggests it isn't happy. I don't know if miniupnpd is doing anything to contribute to this unhappy state of affairs.

        1 Reply Last reply Reply Quote 0
        • Z Offline
          ZPrime
          last edited by

          @wallabybob:

          @bradenmcg:

          My LAN interface apparently has GW monitoring enabled by default,

          What GUI setting leads you to that conclusion?

          I looked at Gateway Monitoring and the LAN interface was in there and configured already, I did not alter it.  I thought that only WAN gateways were monitored but I guess all gateways are unless you change them.

          What do you have set as DNS Server and Use gateway on System -> General Setup?

          DNS servers are all blank, and "use gateway" is WAN.  I have "allow DNS server list to be overridden" enabled.

          I do have two DNS domain overrides setup on Services > DNS Forwarder that allow me to resolve hostnames from my office (across VPN), but otherwise DNS forwarder is just using Time Warner's DHCP-assigned pair of servers.

          Your log shows (I have deleted some entries :

          Aug  7 23:05:36 pf check_reload_status: Linkup starting em0
          Aug  7 23:05:36 pf kernel: em0: link state changed to DOWN
          Aug  7 23:05:36 pf dhclient[30956]: My address (76.189.121.xxx) was deleted, dhclient exiting
          Aug  7 23:05:39 pf php: : DEVD Ethernet detached event for wan
          Aug  7 23:05:40 pf check_reload_status: Linkup starting em0
          Aug  7 23:05:40 pf kernel: em0: link state changed to UP
          Aug  7 23:05:40 pf dhclient: PREINIT
          Aug  7 23:05:40 pf dhclient[40076]: DHCPREQUEST on em0 to 255.255.255.255 port 67
          Aug  7 23:05:40 pf dhclient[40076]: DHCPACK from 142.254.151.133
          Aug  7 23:05:40 pf dhclient: REBOOT
          Aug  7 23:05:40 pf dhclient: Starting add_new_address()
          Aug  7 23:05:40 pf dhclient: ifconfig em0 inet 76.189.121.xxx netmask 255.255.224.0 broadcast 255.255.255.255
          Aug  7 23:05:40 pf dhclient: New IP Address (em0): 76.189.121.xxx
          Aug  7 23:05:40 pf dhclient: New Subnet Mask (em0): 255.255.224.0
          Aug  7 23:05:40 pf dhclient: New Broadcast Address (em0): 255.255.255.255
          Aug  7 23:05:40 pf dhclient: New Routers (em0): 76.189.96.1
          Aug  7 23:05:40 pf dhclient: Adding new routes to interface: em0
          Aug  7 23:05:40 pf dhclient: /sbin/route add default 76.189.96.1
          Aug  7 23:05:40 pf dhclient: Creating resolv.conf
          Aug  7 23:05:41 pf check_reload_status: rc.newwanip starting em0
          Aug  7 23:05:41 pf dhclient[40076]: bound to 76.189.121.xxx – renewal in 21600 seconds.
          Aug  7 23:05:41 pf php: /interfaces.php: ROUTING: setting default route to 76.189.96.1
          Aug  7 23:05:42 pf php: : DEVD Ethernet attached event for wan
          Aug  7 23:05:42 pf php: : HOTPLUG: Configuring interface wan
          Aug  7 23:05:42 pf check_reload_status: Linkup starting em0
          Aug  7 23:05:42 pf kernel: em0: link state changed to DOWN
          Aug  7 23:05:43 pf dnsmasq[35896]: exiting on receipt of SIGTERM
          Aug  7 23:05:43 pf php: : rc.newwanip: Informational is starting em0.
          Aug  7 23:05:43 pf php: : rc.newwanip: on (IP address: 76.189.121.xxx) (interface: wan) (real interface: em0).
          Aug  7 23:05:43 pf php: : The command 'route change -host  76.189.96.1' returned exit code '1', the output was 'route: writing to routing socket: No such process route: writing to routing socket: Invalid argument change host 76.189.96.1: Invalid argument'

          Notice the link state to DOWN transition at 23:05:42 before the completion of the processing of the preceding linkup - the error returned  by route at 23:05:43 appears to be because of an attempt to set a route to a DNS server failing because part of the necessary information has gone missing because the link went down.

          The route that it is trying to set is the default route - 76.189.96.1 is the default gateway for the IP I am assigned.  TWC is using a fairly large subnet here (/19).

          The log extract shows a number of instances of link up promptly followed by link down:

          Aug  7 23:05:36 pf check_reload_status: Linkup starting em0
          Aug  7 23:05:36 pf kernel: em0: link state changed to DOWN

          Aug  7 23:05:40 pf kernel: em0: link state changed to UP

          Aug  7 23:05:42 pf check_reload_status: Linkup starting em0
          Aug  7 23:05:42 pf kernel: em0: link state changed to DOWN

          Aug  7 23:05:45 pf check_reload_status: Linkup starting em0
          Aug  7 23:05:45 pf kernel: em0: link state changed to UP

          Aug  7 23:05:48 pf kernel: em0: link state changed to DOWN
          Aug  7 23:05:48 pf check_reload_status: Linkup starting em0
          A
          Aug  7 23:05:54 pf kernel: em0: link state changed to DOWN
          Aug  7 23:05:54 pf check_reload_status: Linkup starting em0

          Aug  7 23:05:57 pf kernel: em0: link state changed to UP
          Aug  7 23:05:57 pf check_reload_status: Linkup starting em0

          Aug  7 23:06:00 pf check_reload_status: Linkup starting em0
          Aug  7 23:06:00 pf kernel: em0: link state changed to DOWN

          Aug  7 23:06:06 pf check_reload_status: Linkup starting em0
          Aug  7 23:06:06 pf kernel: em0: link state changed to DOWN

          Aug  7 23:06:09 pf check_reload_status: Linkup starting em0
          Aug  7 23:06:09 pf kernel: em0: link state changed to UP

          Aug  7 23:06:11 pf check_reload_status: Linkup starting em0
          Aug  7 23:06:11 pf kernel: em0: link state changed to DOWN

          Aug  7 23:06:15 pf check_reload_status: Linkup starting em0
          Aug  7 23:06:15 pf kernel: em0: link state changed to UP

          . I don't know why this is happening. But with link up running concurrently with link down PERHAPS the link up processing is detecting an unexpected condition (e.g. the WAN interface doesn't have an IP address) and forcing link down to try to reset things

          Does it make a difference to disable miniupnpd? The log suggests it isn't happy. I don't know if miniupnpd is doing anything to contribute to this unhappy state of affairs.

          To my knowledge miniupnpd doesn't actively do anything with the link, it is just complaining because it wants to know the WAN IP and can't find it.  I can try disabling it though and then triggering the bug again to see if the link will still flap.

          Should I open a bug for this in Redmine?

          1 Reply Last reply Reply Quote 0
          • N Offline
            netmagi
            last edited by

            Sorry to necro this, but I've been searching topics for the last hour, and this is the only topic I was able to find that is exactly, to the tee, what I am experiencing.

            Was this ever resolved?

            Running 2.0.1 release.  This box has been running for over a year. no issues until this

            Recently, I was experiencing issues with my cable modem and I made two changes to the WAN interface config:

            1. I cloned the MAC to another system I was testing with temporarily (and then removed the clone settings when I was done testing)

            . . . and

            1. I manually set the interface to 100Mbps, and then set it to "autoselect" (which I thought was the default)

            Since then, any interruption of the WAN carrier causes that interface to flap.  For ex, if I unplug the WAN cable, and plug it back in. . flaps indefinitely.    If the cable modem times out briefly due to service provider maintenance, and the cable modem resets, interface flaps forever.

            Only a reboot of the system resolves it.

            Before the above 2 changes, this never happened.

            It seems like the OP here fixed by setting the interface back to the actual default setting- "Default (no preference, typically autoselect)".  I set mine back to that as well now, but can't test it at the moment since it's production.

            The OP asked about opening up a bug report?  Is this something resolved in a newer version?  Was the underlying cause ever discovered?

            Nic uses em driver, IP config is DHCP, and connection to cable modem is gigabit.

            -Rich H.

            1 Reply Last reply Reply Quote 0
            • Z Offline
              ZPrime
              last edited by

              @netmagi:

              Sorry to necro this, but I've been searching topics for the last hour, and this is the only topic I was able to find that is exactly, to the tee, what I am experiencing.

              Was this ever resolved?

              Running 2.0.1 release.  This box has been running for over a year. no issues until this

              Recently, I was experiencing issues with my cable modem and I made two changes to the WAN interface config:

              1. I cloned the MAC to another system I was testing with temporarily (and then removed the clone settings when I was done testing)

              . . . and

              1. I manually set the interface to 100Mbps, and then set it to "autoselect" (which I thought was the default)

              Since then, any interruption of the WAN carrier causes that interface to flap.  For ex, if I unplug the WAN cable, and plug it back in. . flaps indefinitely.    If the cable modem times out briefly due to service provider maintenance, and the cable modem resets, interface flaps forever.

              Only a reboot of the system resolves it.

              Before the above 2 changes, this never happened.

              It seems like the OP here fixed by setting the interface back to the actual default setting- "Default (no preference, typically autoselect)".  I set mine back to that as well now, but can't test it at the moment since it's production.

              The OP asked about opening up a bug report?  Is this something resolved in a newer version?  Was the underlying cause ever discovered?

              Nic uses em driver, IP config is DHCP, and connection to cable modem is gigabit.

              -Rich H.

              Rich, interestingly my system is also an em-driver interface (Soekris 65xx).

              What is your cable modem here?  I have a Motorola 6141, which has a gigabit port as well.  The other work-around I've found is to hardcode the pfSense interface to 1000/full.  Unlike 10/100 where an autonegotiation mismatch causes a link to go to 10/half, an auto-mismatch on gigabit will apparently allow both sides to link at 1000/full.

              I was apprehensive about hardcoding a speed on the pfSense side because I knew I had no way to hardcode the modem and historically auto-negotiation mismatches are Very Bad(TM).  Since this is at home and I can mess with things when the wife is asleep, I did it and then checked the link lights on the modem.  They were still showing 1000/full so in this case at least, no duplex or speed mismatch.

              This all said, my system is running 2.1-RELEASE now and I haven't seen any problems from it yet.  The WAN if is still set to "Default" and nothing has flapped, and I know that the modem has been restarted a few times since the upgrade.

              1 Reply Last reply Reply Quote 0
              • N Offline
                netmagi
                last edited by

                Thanks so much for the reply.

                My cable modem is a Motorola as well, the SB6121.  After changing to "Default (no preference, typically autoselect)", and cycling the power on the cable modem last night, it's working correctly.  No flap.

                Seems easy to blame the cable modem here, but they are very widely used.  I wonder if it's something very specific with the timing for the auto-negotiation between this modem and how pfsense brings the interface up and down . . .

                -Rich H.

                1 Reply Last reply Reply Quote 0
                • R Offline
                  rdighe
                  last edited by

                  I seem to be having same issue here.. Cable mode is Thomson DCM476. system is supermicro motherboard with two embedded em based nics. em1 is connected to Baystack 450-24t and is always up and rocksolid. em0 connected to cable modem is flapping. after landing on this thread I have just set wan interface speed to 1000baseT and will monitor for next couple of days.

                  –Rajiv

                  1 Reply Last reply Reply Quote 0
                  • First post
                    Last post
                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.