Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    DHCP on WAN suddenly started failing

    Scheduled Pinned Locked Moved DHCP and DNS
    67 Posts 17 Posters 28.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • D Offline
      doktornotor Banned
      last edited by

      Uhm… frankly this sounds like faulty cable or the modem being garbage, rather than any driver bug.

      1 Reply Last reply Reply Quote 0
      • W Offline
        wallabybob
        last edited by

        Please post an extract of pfSense system log from a few events before "lost connection with modem gateway" to a few minutes later.

        There have been a number of different reports of this sort of problem but (as far as I know) the problem has not been identified. It seems to be quite difficult to reproduce. Could you turn off gateway monitoring for a few days to see if that makes a difference? Also please post the output of pfSense shell command```
        wc -c -l /tmp/rules.debug

        1 Reply Last reply Reply Quote 0
        • Z Offline
          ZPrime
          last edited by

          @doktornotor:

          Uhm… frankly this sounds like faulty cable or the modem being garbage, rather than any driver bug.

          I would agree with you except the problem persists when I put a switch between the pfSense box and the modem, and the problem manifests on the pfSense box itself and not the modem in that case.  (The link on pfSense is flapping and the modem doesn't change state at all if I have the switch between them.)  It seems like when there ISN'T a switch between them, the entire process starts if the modem reboots or loses RF signal, which causes it to drop its Ethernet interface after some period of time.  Once that link has dropped and reconnected, pfSense goes off the rails.

          wallabybob:
          That shouldn't be a problem, I think I'm able to reproduce it fairly easily now just by setting the link directly to autoselect then unplugging the NIC and plugging it back in.  That seems to start the flapping anew (or at least it was last night).

           wc -c -l /tmp/rules.debug
               222   12310 /tmp/rules.debug
          

          I have some stale log events from last night in my syslog server, or I can attempt to recreate again tonight and post fresh log details.

          1 Reply Last reply Reply Quote 0
          • D Offline
            doktornotor Banned
            last edited by

            @bradenmcg:

            I would agree with you except the problem persists when I put a switch between the pfSense box and the modem, and the problem manifests on the pfSense box itself and not the modem in that case.  (The link on pfSense is flapping and the modem doesn't change state at all if I have the switch between them.)

            Well, then it's probably time to consider the pfsense box HW itself is faulty.

            1 Reply Last reply Reply Quote 0
            • Z Offline
              ZPrime
              last edited by

              @doktornotor:

              @bradenmcg:

              I would agree with you except the problem persists when I put a switch between the pfSense box and the modem, and the problem manifests on the pfSense box itself and not the modem in that case.  (The link on pfSense is flapping and the modem doesn't change state at all if I have the switch between them.)

              Well, then it's probably time to consider the pfsense box HW itself is faulty.

              It is a Soekris net6501.  It's been replaced once already due to this because I thought it was bad HW, and it does it on the new box as well (I doubt I got two bad units in a row).  The same behavior happens on multiple ports on the unit too, each a separate ethernet controller (albeit all using the em driver).  If I change my definition of WAN to be em4 instead of em0 and move the ethernet connection to the modem there, it is fine until Layer 1 gets dropped for whatever reason, and then the flapping starts again.

              If it's not the driver, I wonder if gateway monitoring could be the cause, since I don't believe LAN would get monitored by default (and I've never had this problem with the LAN port connected to a gigabit HP switch).

              1 Reply Last reply Reply Quote 0
              • Z Offline
                ZPrime
                last edited by

                I didn't want to paste the entire log (500+ lines for about a minute or two of flapping), so I put it on pastebin.

                http://pastebin.com/CJ4rN1yV

                Unfortunately, system.log doesn't seem to make a note of the time that any changes were actually enacted by a user, and I didn't look at a seconds-accurate clock to know when I "fixed" the interface.

                This was all started by simply going to Interfaces > (WAN interface), then clicking "Advanced" under the "Speed and duplex" heading, and changing it to "autoselect", then clicking "Save" at the bottom of the page, and then "Apply" at the top.

                The link was allowed to flap up and down for a minute or two, and then somewhere around 23:06 I reversed the above process, resetting "speed and duplex" back to "Default (no preference, typically autoselect)".  The final DHCP / link up / etc should all be due to that change, and it has remained stable since then.

                Gateway Monitoring was DISABLED during this test, by checking "Disable Gateway Monitoring" under System: Gateways: Edit gateway.

                My LAN interface apparently has GW monitoring enabled by default, and it is set as "autoselect" for speed/duplex, and has never exhibited this flapping behavior (to my knowledge at least).  Then again, the switch doesn't get restarted often… but I just reset it last night and didn't have any problems on the LAN side.

                If there is more info that would be helpful I can reproduce again and gather whatever's needed.

                1 Reply Last reply Reply Quote 0
                • W Offline
                  wallabybob
                  last edited by

                  @bradenmcg:

                  My LAN interface apparently has GW monitoring enabled by default,

                  What GUI setting leads you to that conclusion?

                  What do you have set as DNS Server and Use gateway on System -> General Setup?

                  Your log shows (I have deleted some entries :

                  Aug  7 23:05:36 pf check_reload_status: Linkup starting em0
                  Aug  7 23:05:36 pf kernel: em0: link state changed to DOWN
                  Aug  7 23:05:36 pf dhclient[30956]: My address (76.189.121.xxx) was deleted, dhclient exiting
                  Aug  7 23:05:39 pf php: : DEVD Ethernet detached event for wan
                  Aug  7 23:05:40 pf check_reload_status: Linkup starting em0
                  Aug  7 23:05:40 pf kernel: em0: link state changed to UP
                  Aug  7 23:05:40 pf dhclient: PREINIT
                  Aug  7 23:05:40 pf dhclient[40076]: DHCPREQUEST on em0 to 255.255.255.255 port 67
                  Aug  7 23:05:40 pf dhclient[40076]: DHCPACK from 142.254.151.133
                  Aug  7 23:05:40 pf dhclient: REBOOT
                  Aug  7 23:05:40 pf dhclient: Starting add_new_address()
                  Aug  7 23:05:40 pf dhclient: ifconfig em0 inet 76.189.121.xxx netmask 255.255.224.0 broadcast 255.255.255.255
                  Aug  7 23:05:40 pf dhclient: New IP Address (em0): 76.189.121.xxx
                  Aug  7 23:05:40 pf dhclient: New Subnet Mask (em0): 255.255.224.0
                  Aug  7 23:05:40 pf dhclient: New Broadcast Address (em0): 255.255.255.255
                  Aug  7 23:05:40 pf dhclient: New Routers (em0): 76.189.96.1
                  Aug  7 23:05:40 pf dhclient: Adding new routes to interface: em0
                  Aug  7 23:05:40 pf dhclient: /sbin/route add default 76.189.96.1
                  Aug  7 23:05:40 pf dhclient: Creating resolv.conf
                  Aug  7 23:05:41 pf check_reload_status: rc.newwanip starting em0
                  Aug  7 23:05:41 pf dhclient[40076]: bound to 76.189.121.xxx – renewal in 21600 seconds.
                  Aug  7 23:05:41 pf php: /interfaces.php: ROUTING: setting default route to 76.189.96.1
                  Aug  7 23:05:42 pf php: : DEVD Ethernet attached event for wan
                  Aug  7 23:05:42 pf php: : HOTPLUG: Configuring interface wan
                  Aug  7 23:05:42 pf check_reload_status: Linkup starting em0
                  Aug  7 23:05:42 pf kernel: em0: link state changed to DOWN
                  Aug  7 23:05:43 pf dnsmasq[35896]: exiting on receipt of SIGTERM
                  Aug  7 23:05:43 pf php: : rc.newwanip: Informational is starting em0.
                  Aug  7 23:05:43 pf php: : rc.newwanip: on (IP address: 76.189.121.xxx) (interface: wan) (real interface: em0).
                  Aug  7 23:05:43 pf php: : The command 'route change -host  76.189.96.1' returned exit code '1', the output was 'route: writing to routing socket: No such process route: writing to routing socket: Invalid argument change host 76.189.96.1: Invalid argument'

                  Notice the link state to DOWN transition at 23:05:42 before the completion of the processing of the preceding linkup - the error returned  by route at 23:05:43 appears to be because of an attempt to set a route to a DNS server failing because part of the necessary information has gone missing because the link went down.

                  The log extract shows a number of instances of link up promptly followed by link down:

                  Aug  7 23:05:36 pf check_reload_status: Linkup starting em0
                  Aug  7 23:05:36 pf kernel: em0: link state changed to DOWN

                  Aug  7 23:05:40 pf kernel: em0: link state changed to UP

                  Aug  7 23:05:42 pf check_reload_status: Linkup starting em0
                  Aug  7 23:05:42 pf kernel: em0: link state changed to DOWN

                  Aug  7 23:05:45 pf check_reload_status: Linkup starting em0
                  Aug  7 23:05:45 pf kernel: em0: link state changed to UP

                  Aug  7 23:05:48 pf kernel: em0: link state changed to DOWN
                  Aug  7 23:05:48 pf check_reload_status: Linkup starting em0
                  A
                  Aug  7 23:05:54 pf kernel: em0: link state changed to DOWN
                  Aug  7 23:05:54 pf check_reload_status: Linkup starting em0

                  Aug  7 23:05:57 pf kernel: em0: link state changed to UP
                  Aug  7 23:05:57 pf check_reload_status: Linkup starting em0

                  Aug  7 23:06:00 pf check_reload_status: Linkup starting em0
                  Aug  7 23:06:00 pf kernel: em0: link state changed to DOWN

                  Aug  7 23:06:06 pf check_reload_status: Linkup starting em0
                  Aug  7 23:06:06 pf kernel: em0: link state changed to DOWN

                  Aug  7 23:06:09 pf check_reload_status: Linkup starting em0
                  Aug  7 23:06:09 pf kernel: em0: link state changed to UP

                  Aug  7 23:06:11 pf check_reload_status: Linkup starting em0
                  Aug  7 23:06:11 pf kernel: em0: link state changed to DOWN

                  Aug  7 23:06:15 pf check_reload_status: Linkup starting em0
                  Aug  7 23:06:15 pf kernel: em0: link state changed to UP

                  . I don't know why this is happenning. But with link up running concurrently with link down PERHAPS the link up processing is detecting an unexpected condition (e.g. the WAN interface doesn't have an IP address) and forcing link down to try to reset things

                  Does it make a difference to disable miniupnpd? The log suggests it isn't happy. I don't know if miniupnpd is doing anything to contribute to this unhappy state of affairs.

                  1 Reply Last reply Reply Quote 0
                  • Z Offline
                    ZPrime
                    last edited by

                    @wallabybob:

                    @bradenmcg:

                    My LAN interface apparently has GW monitoring enabled by default,

                    What GUI setting leads you to that conclusion?

                    I looked at Gateway Monitoring and the LAN interface was in there and configured already, I did not alter it.  I thought that only WAN gateways were monitored but I guess all gateways are unless you change them.

                    What do you have set as DNS Server and Use gateway on System -> General Setup?

                    DNS servers are all blank, and "use gateway" is WAN.  I have "allow DNS server list to be overridden" enabled.

                    I do have two DNS domain overrides setup on Services > DNS Forwarder that allow me to resolve hostnames from my office (across VPN), but otherwise DNS forwarder is just using Time Warner's DHCP-assigned pair of servers.

                    Your log shows (I have deleted some entries :

                    Aug  7 23:05:36 pf check_reload_status: Linkup starting em0
                    Aug  7 23:05:36 pf kernel: em0: link state changed to DOWN
                    Aug  7 23:05:36 pf dhclient[30956]: My address (76.189.121.xxx) was deleted, dhclient exiting
                    Aug  7 23:05:39 pf php: : DEVD Ethernet detached event for wan
                    Aug  7 23:05:40 pf check_reload_status: Linkup starting em0
                    Aug  7 23:05:40 pf kernel: em0: link state changed to UP
                    Aug  7 23:05:40 pf dhclient: PREINIT
                    Aug  7 23:05:40 pf dhclient[40076]: DHCPREQUEST on em0 to 255.255.255.255 port 67
                    Aug  7 23:05:40 pf dhclient[40076]: DHCPACK from 142.254.151.133
                    Aug  7 23:05:40 pf dhclient: REBOOT
                    Aug  7 23:05:40 pf dhclient: Starting add_new_address()
                    Aug  7 23:05:40 pf dhclient: ifconfig em0 inet 76.189.121.xxx netmask 255.255.224.0 broadcast 255.255.255.255
                    Aug  7 23:05:40 pf dhclient: New IP Address (em0): 76.189.121.xxx
                    Aug  7 23:05:40 pf dhclient: New Subnet Mask (em0): 255.255.224.0
                    Aug  7 23:05:40 pf dhclient: New Broadcast Address (em0): 255.255.255.255
                    Aug  7 23:05:40 pf dhclient: New Routers (em0): 76.189.96.1
                    Aug  7 23:05:40 pf dhclient: Adding new routes to interface: em0
                    Aug  7 23:05:40 pf dhclient: /sbin/route add default 76.189.96.1
                    Aug  7 23:05:40 pf dhclient: Creating resolv.conf
                    Aug  7 23:05:41 pf check_reload_status: rc.newwanip starting em0
                    Aug  7 23:05:41 pf dhclient[40076]: bound to 76.189.121.xxx – renewal in 21600 seconds.
                    Aug  7 23:05:41 pf php: /interfaces.php: ROUTING: setting default route to 76.189.96.1
                    Aug  7 23:05:42 pf php: : DEVD Ethernet attached event for wan
                    Aug  7 23:05:42 pf php: : HOTPLUG: Configuring interface wan
                    Aug  7 23:05:42 pf check_reload_status: Linkup starting em0
                    Aug  7 23:05:42 pf kernel: em0: link state changed to DOWN
                    Aug  7 23:05:43 pf dnsmasq[35896]: exiting on receipt of SIGTERM
                    Aug  7 23:05:43 pf php: : rc.newwanip: Informational is starting em0.
                    Aug  7 23:05:43 pf php: : rc.newwanip: on (IP address: 76.189.121.xxx) (interface: wan) (real interface: em0).
                    Aug  7 23:05:43 pf php: : The command 'route change -host  76.189.96.1' returned exit code '1', the output was 'route: writing to routing socket: No such process route: writing to routing socket: Invalid argument change host 76.189.96.1: Invalid argument'

                    Notice the link state to DOWN transition at 23:05:42 before the completion of the processing of the preceding linkup - the error returned  by route at 23:05:43 appears to be because of an attempt to set a route to a DNS server failing because part of the necessary information has gone missing because the link went down.

                    The route that it is trying to set is the default route - 76.189.96.1 is the default gateway for the IP I am assigned.  TWC is using a fairly large subnet here (/19).

                    The log extract shows a number of instances of link up promptly followed by link down:

                    Aug  7 23:05:36 pf check_reload_status: Linkup starting em0
                    Aug  7 23:05:36 pf kernel: em0: link state changed to DOWN

                    Aug  7 23:05:40 pf kernel: em0: link state changed to UP

                    Aug  7 23:05:42 pf check_reload_status: Linkup starting em0
                    Aug  7 23:05:42 pf kernel: em0: link state changed to DOWN

                    Aug  7 23:05:45 pf check_reload_status: Linkup starting em0
                    Aug  7 23:05:45 pf kernel: em0: link state changed to UP

                    Aug  7 23:05:48 pf kernel: em0: link state changed to DOWN
                    Aug  7 23:05:48 pf check_reload_status: Linkup starting em0
                    A
                    Aug  7 23:05:54 pf kernel: em0: link state changed to DOWN
                    Aug  7 23:05:54 pf check_reload_status: Linkup starting em0

                    Aug  7 23:05:57 pf kernel: em0: link state changed to UP
                    Aug  7 23:05:57 pf check_reload_status: Linkup starting em0

                    Aug  7 23:06:00 pf check_reload_status: Linkup starting em0
                    Aug  7 23:06:00 pf kernel: em0: link state changed to DOWN

                    Aug  7 23:06:06 pf check_reload_status: Linkup starting em0
                    Aug  7 23:06:06 pf kernel: em0: link state changed to DOWN

                    Aug  7 23:06:09 pf check_reload_status: Linkup starting em0
                    Aug  7 23:06:09 pf kernel: em0: link state changed to UP

                    Aug  7 23:06:11 pf check_reload_status: Linkup starting em0
                    Aug  7 23:06:11 pf kernel: em0: link state changed to DOWN

                    Aug  7 23:06:15 pf check_reload_status: Linkup starting em0
                    Aug  7 23:06:15 pf kernel: em0: link state changed to UP

                    . I don't know why this is happening. But with link up running concurrently with link down PERHAPS the link up processing is detecting an unexpected condition (e.g. the WAN interface doesn't have an IP address) and forcing link down to try to reset things

                    Does it make a difference to disable miniupnpd? The log suggests it isn't happy. I don't know if miniupnpd is doing anything to contribute to this unhappy state of affairs.

                    To my knowledge miniupnpd doesn't actively do anything with the link, it is just complaining because it wants to know the WAN IP and can't find it.  I can try disabling it though and then triggering the bug again to see if the link will still flap.

                    Should I open a bug for this in Redmine?

                    1 Reply Last reply Reply Quote 0
                    • N Offline
                      netmagi
                      last edited by

                      Sorry to necro this, but I've been searching topics for the last hour, and this is the only topic I was able to find that is exactly, to the tee, what I am experiencing.

                      Was this ever resolved?

                      Running 2.0.1 release.  This box has been running for over a year. no issues until this

                      Recently, I was experiencing issues with my cable modem and I made two changes to the WAN interface config:

                      1. I cloned the MAC to another system I was testing with temporarily (and then removed the clone settings when I was done testing)

                      . . . and

                      1. I manually set the interface to 100Mbps, and then set it to "autoselect" (which I thought was the default)

                      Since then, any interruption of the WAN carrier causes that interface to flap.  For ex, if I unplug the WAN cable, and plug it back in. . flaps indefinitely.    If the cable modem times out briefly due to service provider maintenance, and the cable modem resets, interface flaps forever.

                      Only a reboot of the system resolves it.

                      Before the above 2 changes, this never happened.

                      It seems like the OP here fixed by setting the interface back to the actual default setting- "Default (no preference, typically autoselect)".  I set mine back to that as well now, but can't test it at the moment since it's production.

                      The OP asked about opening up a bug report?  Is this something resolved in a newer version?  Was the underlying cause ever discovered?

                      Nic uses em driver, IP config is DHCP, and connection to cable modem is gigabit.

                      -Rich H.

                      1 Reply Last reply Reply Quote 0
                      • Z Offline
                        ZPrime
                        last edited by

                        @netmagi:

                        Sorry to necro this, but I've been searching topics for the last hour, and this is the only topic I was able to find that is exactly, to the tee, what I am experiencing.

                        Was this ever resolved?

                        Running 2.0.1 release.  This box has been running for over a year. no issues until this

                        Recently, I was experiencing issues with my cable modem and I made two changes to the WAN interface config:

                        1. I cloned the MAC to another system I was testing with temporarily (and then removed the clone settings when I was done testing)

                        . . . and

                        1. I manually set the interface to 100Mbps, and then set it to "autoselect" (which I thought was the default)

                        Since then, any interruption of the WAN carrier causes that interface to flap.  For ex, if I unplug the WAN cable, and plug it back in. . flaps indefinitely.    If the cable modem times out briefly due to service provider maintenance, and the cable modem resets, interface flaps forever.

                        Only a reboot of the system resolves it.

                        Before the above 2 changes, this never happened.

                        It seems like the OP here fixed by setting the interface back to the actual default setting- "Default (no preference, typically autoselect)".  I set mine back to that as well now, but can't test it at the moment since it's production.

                        The OP asked about opening up a bug report?  Is this something resolved in a newer version?  Was the underlying cause ever discovered?

                        Nic uses em driver, IP config is DHCP, and connection to cable modem is gigabit.

                        -Rich H.

                        Rich, interestingly my system is also an em-driver interface (Soekris 65xx).

                        What is your cable modem here?  I have a Motorola 6141, which has a gigabit port as well.  The other work-around I've found is to hardcode the pfSense interface to 1000/full.  Unlike 10/100 where an autonegotiation mismatch causes a link to go to 10/half, an auto-mismatch on gigabit will apparently allow both sides to link at 1000/full.

                        I was apprehensive about hardcoding a speed on the pfSense side because I knew I had no way to hardcode the modem and historically auto-negotiation mismatches are Very Bad(TM).  Since this is at home and I can mess with things when the wife is asleep, I did it and then checked the link lights on the modem.  They were still showing 1000/full so in this case at least, no duplex or speed mismatch.

                        This all said, my system is running 2.1-RELEASE now and I haven't seen any problems from it yet.  The WAN if is still set to "Default" and nothing has flapped, and I know that the modem has been restarted a few times since the upgrade.

                        1 Reply Last reply Reply Quote 0
                        • N Offline
                          netmagi
                          last edited by

                          Thanks so much for the reply.

                          My cable modem is a Motorola as well, the SB6121.  After changing to "Default (no preference, typically autoselect)", and cycling the power on the cable modem last night, it's working correctly.  No flap.

                          Seems easy to blame the cable modem here, but they are very widely used.  I wonder if it's something very specific with the timing for the auto-negotiation between this modem and how pfsense brings the interface up and down . . .

                          -Rich H.

                          1 Reply Last reply Reply Quote 0
                          • R Offline
                            rdighe
                            last edited by

                            I seem to be having same issue here.. Cable mode is Thomson DCM476. system is supermicro motherboard with two embedded em based nics. em1 is connected to Baystack 450-24t and is always up and rocksolid. em0 connected to cable modem is flapping. after landing on this thread I have just set wan interface speed to 1000baseT and will monitor for next couple of days.

                            –Rajiv

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post
                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.