Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    DHCP fails silently, but works on reboot of pfSense

    Scheduled Pinned Locked Moved DHCP and DNS
    37 Posts 4 Posters 5.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • O
      obelsen
      last edited by

      I have an issue that has persisted between several installs of pfSense, CE on a VM, CE on bare hardware, and even the image for the XG-2758 from Netgate.

      Network:
      Two physical interfaces on the pfSense machine; WAN and LAN. On the LAN interface there are 200 VLANs defined, which we use to segment our network and allow a different subnet for each of the endpoints for the VLANs. Hence we can create firewall rules for each "VLAN interface" in pfSense.

      My problem is that ANY changes that modifies an interface results in a silent crash of the DHCP server, which then cannot be restarted (whether it's the LAN interface, a VPN interface or something else).
      Nothing is present in the logs, other than the startup of the DHCP server on each interface, and maybe a DHCPDISCOVER or two until it's restarted by the service watchdog package.
      Restarting the DHCP service does not work, however a restart of pfSense sometimes works (but not always!). We have however identified a procedure that allows DHCP to start on every reboot of pfSense:

      1. Reboot pfSense
      2. Disconnect interfaces from our backbone switch (there are 10 of these interfaces, each with 20 VLANS on it)
      3. Wait for pfSense to boot up, and DHCP service to start.
      4. Reconnect interfaces with a few seconds interval between connecting each interface.

      This procedure leads me to believe there's a problem with the service more or less being DDOS'ed by our hosts, but I'm not sure to verify if that's what causes the DHCP service to not want to restart.

      Anyone know what I can do here? It's quite annoying, as nothing can be done on the machine itself that modifies an interface in any way (such as creating a VPN server or even restarting one).

      1 Reply Last reply Reply Quote 0
      • O
        obelsen
        last edited by

        I have found a workaround so far by using the DHCP relay to an external DHCP server. Still very puzzled why the local DHCP daemon crashes on pfSense.

        1 Reply Last reply Reply Quote 0
        • B
          BJK
          last edited by

          I'm seeing a very similar issue. DHCP just stops working... reboot come back online. restarting DHCP service alone does not do the trick. Running XG-1541 on 2.4.4

          1 Reply Last reply Reply Quote 0
          • jimpJ
            jimp Rebel Alliance Developer Netgate
            last edited by

            There are no errors at all in the system log or DHCP log?

            The service shows as stopped under Status > Services?

            If you restart the service there, it doesn't work? (Or if it's running, stop and then start it again)

            Anything else unusual when it's down, like RAM usage?

            Are you using any other new/recent features like DNS over TLS?

            Remember: Upvote with the ๐Ÿ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

            Need help fast? Netgate Global Support!

            Do not Chat/PM for help!

            B O 2 Replies Last reply Reply Quote 0
            • B
              BJK @jimp
              last edited by

              @jimp

              1. The system log is littered with:
                Nov 6 08:10:55 check_reload_status updating dyndns optxx
                Nov 6 08:10:56 kernel vlanxx: changing name to 'igb4.xxxx'
                Nov 6 08:10:56 check_reload_status Restarting ipsec tunnels
              2. Big Red X next to DHCP Services
              3. Hitting the little blue "play" button gets the green gear spinning but then settles back at a red X with blue play button
              4. Memory usage is at 2%
              5. No DNS over TLS (using default Resolver

              I'm confused about the ipsec tunnels being constantly restarted as I have not started any VPN services. I am running 100+ VLANs.

              1 Reply Last reply Reply Quote 0
              • jimpJ
                jimp Rebel Alliance Developer Netgate
                last edited by

                It's sounding like there is some kind of interface event happening that triggers it. Something is causing the link to bounce on that interface. Are you sure those are the only three messages that happen? Maybe if you show more log lines you'd see the actual culprit.

                Are any of the VLANs (or the parent interface) set to obtain their own IP address from DHCP, like a cable WAN? If so you might be hitting https://redmine.pfsense.org/issues/8507 if the interface has advanced DHCP client options enabled.

                Remember: Upvote with the ๐Ÿ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                Need help fast? Netgate Global Support!

                Do not Chat/PM for help!

                1 Reply Last reply Reply Quote 0
                • B
                  BJK
                  last edited by

                  @jimp
                  Every VLAN interface is setup for static IP configuration
                  I am not sure why the "updating dyndns optxx" either unless apart of default DNS Resolver
                  Also, the "kernel vlanxx" are vlan numbers I have not setup

                  Nov 6 08:22:12 check_reload_status Restarting ipsec tunnels
                  Nov 6 08:22:25 check_reload_status updating dyndns optxx
                  Nov 6 08:22:25 kernel vlanxx: changing name to 'igb2.xxxx'
                  Nov 6 08:22:25 check_reload_status Restarting ipsec tunnels
                  Nov 6 08:22:37 check_reload_status updating dyndns optxx
                  Nov 6 08:22:37 php-fpm 27426 /interfaces.php: Gateway, none 'available' for inet6, use the first one configured. ''
                  Nov 6 08:22:37 kernel vlanxx: changing name to 'igb4.xxxx'
                  Nov 6 08:22:37 check_reload_status Restarting ipsec tunnels
                  Nov 6 08:22:49 check_reload_status updating dyndns optxx
                  Nov 6 08:22:49 kernel vlanxx: changing name to 'igb4.xxxx'
                  Nov 6 08:22:49 check_reload_status Restarting ipsec tunnels
                  Nov 6 08:23:01 check_reload_status updating dyndns optxx
                  Nov 6 08:23:02 kernel vlanxx: changing name to 'igb2.xxxx'
                  Nov 6 08:23:02 check_reload_status Restarting ipsec tunnels
                  Nov 6 08:23:14 check_reload_status updating dyndns optxx
                  Nov 6 08:23:14 check_reload_status Restarting ipsec tunnels
                  Nov 6 08:23:14 kernel vlanxx: changing name to 'igb2.xxxx'
                  Nov 6 08:23:26 check_reload_status updating dyndns optxx
                  Nov 6 08:23:27 check_reload_status Restarting ipsec tunnels
                  Nov 6 08:23:27 kernel vlanxx: changing name to 'igb2.xxxx'
                  Nov 6 08:23:39 check_reload_status updating dyndns optxx
                  Nov 6 08:23:40 kernel vlanxx: changing name to 'igb2.xxxx'
                  Nov 6 08:23:40 php-fpm 9855 /interfaces.php: Gateway, none 'available' for inet6, use the first one configured. ''
                  Nov 6 08:23:40 check_reload_status Restarting ipsec tunnels
                  Nov 6 08:23:52 check_reload_status updating dyndns optxx
                  Nov 6 08:23:52 check_reload_status Restarting ipsec tunnels
                  Nov 6 08:23:52 kernel vlanxx: changing name to 'igb2.xxxx'

                  1 Reply Last reply Reply Quote 0
                  • jimpJ
                    jimp Rebel Alliance Developer Netgate
                    last edited by

                    There must be something farther back in the logs that triggers all of that, though. Those are all the consequence of some kind of interface event (link up/down) or similar.

                    Remember: Upvote with the ๐Ÿ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                    Need help fast? Netgate Global Support!

                    Do not Chat/PM for help!

                    1 Reply Last reply Reply Quote 0
                    • A
                      akuma1x
                      last edited by

                      This is WAY off topic, so I apologize, and maybe it can be a separate topic.

                      @obelsen - I'm very new to the use and tech, but why would there be a real-world use for so many VLAN's - you state 200 in your original post. How/when would anybody need to use that many virtual networks? I could see a handful being useful, but why a very large amount? I'm just curious...

                      Jeff

                      1 Reply Last reply Reply Quote 0
                      • O
                        obelsen @jimp
                        last edited by

                        @jimp said in DHCP fails silently, but works on reboot of pfSense:

                        There are no errors at all in the system log or DHCP log?

                        The service shows as stopped under Status > Services?

                        If you restart the service there, it doesn't work? (Or if it's running, stop and then start it again)

                        Anything else unusual when it's down, like RAM usage?

                        There are no errors in either the system log or the DHCP log.
                        It is not possible to restart the service, only by restarting the machine. The service is stopped, and is not running. RAM usage is far from a problem on our XG-2758.

                        Are you using any other new/recent features like DNS over TLS?

                        No.

                        @jimp said in DHCP fails silently, but works on reboot of pfSense:

                        There must be something farther back in the logs that triggers all of that, though. Those are all the consequence of some kind of interface event (link up/down) or similar.

                        While this was a response to someone else, there is nothing on my machine's logs that indicates any problem at all. The service simply fails to start, but when it runs it doesn't have any issues unless something configures an interface, leading to the DHCP service silent crash.

                        @akuma1x said in DHCP fails silently, but works on reboot of pfSense:

                        This is WAY off topic, so I apologize, and maybe it can be a separate topic.

                        @obelsen - I'm very new to the use and tech, but why would there be a real-world use for so many VLAN's - you state 200 in your original post. How/when would anybody need to use that many virtual networks? I could see a handful being useful, but why a very large amount? I'm just curious...

                        Jeff

                        200 VLANs is hardly what I would expect as exceptional. We use it to ensure that all communication goes through our firewall, so we can filter traffic as we want per socket. Port isolation on our switches was another alternative, but gives less control.

                        1 Reply Last reply Reply Quote 0
                        • jimpJ
                          jimp Rebel Alliance Developer Netgate
                          last edited by

                          So there are no log messages at all in your case? Not even any interface events or other logs that seem to repeat like the other person seeing this?

                          When you have dhcpd running, look at the output of ps uxaww | grep dhcpd and note the full command output. Next time it fails, try to run that by hand from an ssh or console shell and see if it produces any output. If it doesn't, try adding -d to the parameters before anything else, which should have it print output to the terminal.

                          Remember: Upvote with the ๐Ÿ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                          Need help fast? Netgate Global Support!

                          Do not Chat/PM for help!

                          O 1 Reply Last reply Reply Quote 0
                          • O
                            obelsen @jimp
                            last edited by

                            @jimp said in DHCP fails silently, but works on reboot of pfSense:

                            So there are no log messages at all in your case? Not even any interface events or other logs that seem to repeat like the other person seeing this?

                            None. In the DHCP log, only events related to new leases are present. When I try to restart the service, it only logs the listening/sending interfaces. Following every restart of the service, it's nothing until the same log spam of listening/sending interfaces, ending with the sending on socket/fallback.

                            When you have dhcpd running, look at the output of ps uxaww | grep dhcpd and note the full command output. Next time it fails, try to run that by hand from an ssh or console shell and see if it produces any output. If it doesn't, try adding -d to the parameters before anything else, which should have it print output to the terminal.

                            The DHCP server fails only when modifying interfaces. It does not crash when it is already running. I would expect this is because the daemon is restarted, and it is unable to start again.
                            I will try to replicate the conditions again in a VM and run ps, however it did not enlighten me earlier when I did my own troubleshooting.

                            jimpJ 1 Reply Last reply Reply Quote 0
                            • jimpJ
                              jimp Rebel Alliance Developer Netgate @obelsen
                              last edited by

                              @obelsen said in DHCP fails silently, but works on reboot of pfSense:

                              None. In the DHCP log, only events related to new leases are present.

                              What about in the main system log?

                              @obelsen said in DHCP fails silently, but works on reboot of pfSense:

                              The DHCP server fails only when modifying interfaces

                              Do you have any special settings on that interface? Maybe a spoofed MAC address, MTU, or other similar setting either on the parent interface (if assigned) or one of the VLANs?

                              Remember: Upvote with the ๐Ÿ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                              Need help fast? Netgate Global Support!

                              Do not Chat/PM for help!

                              O 1 Reply Last reply Reply Quote 0
                              • O
                                obelsen @jimp
                                last edited by obelsen

                                @jimp said in DHCP fails silently, but works on reboot of pfSense:

                                @obelsen said in DHCP fails silently, but works on reboot of pfSense:

                                None. In the DHCP log, only events related to new leases are present.

                                What about in the main system log?

                                Negative. No logs at all. The only items present were standard login logs and other unrelated info.

                                @obelsen said in DHCP fails silently, but works on reboot of pfSense:

                                The DHCP server fails only when modifying interfaces

                                Do you have any special settings on that interface? Maybe a spoofed MAC address, MTU, or other similar setting either on the parent interface (if assigned) or one of the VLANs?

                                The interfaces are all standard settings with a static ip defined. No MAC spoofing or other non standard configuration.
                                The parent interface (LAN) also had a static IP like each vlan interface.

                                1 Reply Last reply Reply Quote 0
                                • O
                                  obelsen
                                  last edited by

                                  I would like to add that this has been an issue for around a year, and I have previously written on the subreddit for pfSense for help to no avail.

                                  1 Reply Last reply Reply Quote 0
                                  • jimpJ
                                    jimp Rebel Alliance Developer Netgate
                                    last edited by

                                    Can you answer the other questions I asked in my previous reply?

                                    Remember: Upvote with the ๐Ÿ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                                    Need help fast? Netgate Global Support!

                                    Do not Chat/PM for help!

                                    O 1 Reply Last reply Reply Quote 0
                                    • O
                                      obelsen @jimp
                                      last edited by

                                      @jimp said in DHCP fails silently, but works on reboot of pfSense:

                                      Can you answer the other questions I asked in my previous reply?

                                      I did, forgot to put a newline :)

                                      1 Reply Last reply Reply Quote 0
                                      • jimpJ
                                        jimp Rebel Alliance Developer Netgate
                                        last edited by

                                        The symptoms all fit with something causing a link loop which generally only happens on certain drivers in certain situations such as changing specific settings which cause the link to drop and come back.

                                        That triggers the link up/down scripts, which reconfigure the interfaces, which triggers a new link event, and so on.

                                        But that scenario would log quite a lot of info in the main system log as it happens. It wouldn't happen silently.

                                        Are the affected NICs all igb interfaces?

                                        Remember: Upvote with the ๐Ÿ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                                        Need help fast? Netgate Global Support!

                                        Do not Chat/PM for help!

                                        O 1 Reply Last reply Reply Quote 0
                                        • O
                                          obelsen @jimp
                                          last edited by

                                          @jimp said in DHCP fails silently, but works on reboot of pfSense:

                                          The symptoms all fit with something causing a link loop which generally only happens on certain drivers in certain situations such as changing specific settings which cause the link to drop and come back.

                                          That triggers the link up/down scripts, which reconfigure the interfaces, which triggers a new link event, and so on.

                                          But that scenario would log quite a lot of info in the main system log as it happens. It wouldn't happen silently.

                                          Are the affected NICs all igb interfaces?

                                          It happens both when LAN is in the rj45 port (igb) and when moved to one of the spf+ ports (ix i believe)

                                          1 Reply Last reply Reply Quote 0
                                          • O
                                            obelsen
                                            last edited by

                                            Additionally, I forgot to mention that even restarting an openvpn server caused the problem.
                                            Openvpn does however not cause the problem itself, as an install without openvpn showed the same issues.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.