Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Subnet collapses periodically since 24.11-RELEASE

    Scheduled Pinned Locked Moved DHCP and DNS
    38 Posts 5 Posters 1.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • V
      vf1954 @johnpoz
      last edited by

      @johnpoz I got a bit further. It went down three times since we last talked, this being the third. Thankfully it's midnight so I can finally test around without having to immediately restart it. Here are some findings.

      Setup
      Internet -> netgate/pfsense -> {wifi_router1, aruba switch}
      -> wifi_router1 -> wifi_router2 -> wifi_router3
      (Easy-Mesh Tp-link which disables router2/router3)
      -> aruba switch -> 2nd tp-link switch

      Testing

      1. plugging in eth cable directly from netgate LAN -> laptop (running linux) does not produce a connection.
      2. therefore, no access to online GUI
      3. access to serial shows uptime of 6 days.
      4. I can ping 1.1.1.1 in pfsense shell, but i cannot ping domain (DNS server is pi-hole that is now on the 192.168.0.x network)
      5. wifi access gave me gateway of 192.168.0.1
      6. logging into 192.168.0.1 sent me to the second switch
      7. second switch was set to manual DHCP, IP 192.168.0.1 with 0.0.0.0 as gateway (not 100% sure if it went to static IP automatically but when pfsense is back up I'll create a rule for it)
      8. changed 2nd TP-switch to automatically get IP from DHCP server (i.e., netgate pfsense) and restarted ...
      9. no more access to tp-link switch ... -.-" still on 192.168.0.x connected to wifi (with no internet access)
      10. ran ip route | grep default and found new gateway at 192.168.0.254 (which is a TP-link router). TP-link router not accessible as I use 3 of them with Easy-Mesh and disabled DHCP... so likely using wifi_router2
      11. physically disconnected switch2 and router2 forcing me to go to wifi_router1 only
      12. 192.168.0.254 still gateway (surprised me). Still not accessible (now I'm only using wifi_router1 which I should be able to access...)
        (also, wifi_router1 is set to 192.168.3.3 in pfsense)
      13. not sure what other test I can run while under serial shell for pfsense...

      Will restart system and ensure 2nd switch is in DHCP rules and update wifi_router firmware.

      GertjanG S 2 Replies Last reply Reply Quote 0
      • GertjanG
        Gertjan @vf1954
        last edited by

        @vf1954

        Your LAN :

        83b1c00e-7630-44c6-98fb-c53b7aa8d9e3-image.png

        so 192.168.3.2/24
        Why not 192.168.3.1/24 ? .2 is ok of course, any .1 to .253 is ok - but 'strange'.

        @vf1954 said in Subnet collapses periodically since 24.11-RELEASE:

        192.168.0.1 sent ...

        Where does this network come from ? It's not a pfSense interface.
        You have a router-after-router setup ? ( ! ). Why ? Again, it can be done, it can work, but why make a more complicated network like that ?
        What about the god old [ISP] <=> [pfSense WAN <-> pfSense LAN] <=> switch <=> (all your PCs, APs, all other devices)
        Your PCs and all other device will use the default DHCP, so they will connect.
        If you use APs, set them up with static IPs like 192.168.3.3 192.168.3.4 etc - they will all have their gateway set to 192.168.3.2 (pfSense) - disable on all APs the DHCP server - set the DNS on all APs to 192.168.3.2 (pfSense) - if your APs have a labeled "WAN" port do not use it, use a LAN port. after all, you use the APs as an AP, you don't want them to use as a 'router'. pfSense your one and only router.

        @vf1954 said in Subnet collapses periodically since 24.11-RELEASE:

        plugging in eth cable directly from netgate LAN -> laptop (running linux) does not produce a connection.
        

        Before plugging your laptop into the pfSense LAN port : check :
        Is the pfSense DHCP server up and running ?

        edit : on console, menu option 8, type

        ps aux | grep 'kea'
        

        If you use ISC :

        ps aux | grep 'kea'
        

        end edit.

        Is the laptop using DHCP client (default, it is) ?
        Now, console access pfSense, menu option 8 :

        tail -f /var/log/dhcpd.log
        

        and now connect you laptop.
        What shows up ?

        No "help me" PM's please. Use the forum, the community will thank you.
        Edit : and where are the logs ??

        V 1 Reply Last reply Reply Quote 0
        • V
          vf1954 @Gertjan
          last edited by

          @Gertjan Thank you for your wonderful reply.

          I have everything up and running since I reset the pfsense.

          I have router after router because I use them as an "Easy-Mesh" network so the company can traverse the entire property without dropping the signal. So the "routers" don't actually do any DHCP. If I make all 3 AP then I lose the Easy-Mesh functionality.

          The only problem is whenever I update firmware I have to start the entire process over again because these TP Archers are not connected via WAN but LAN.

          .2 was because .1 was problematic due to our ISP. Today I may revert back to .1 but meh.

          I suspect something strange is occuring with the routers. So I completely re-programmed them and updated the firmware. I also set a few key components to static (like the DNS and that second switch)

          If the network goes down again, I'll follow your advice with the shell prompts (I assume the second one was meant to say 'isc')? Thank you so much!

          GertjanG 1 Reply Last reply Reply Quote 0
          • GertjanG
            Gertjan @vf1954
            last edited by

            @vf1954 said in Subnet collapses periodically since 24.11-RELEASE:

            .2 was because .1 was problematic due to our ISP

            Hummmmm
            You took .2 because .1 was already used ? Like "192.168.3.1" is already occupied on LAN ? WAN ? Where ? On WAN ? If so, you can't use 192.168.3.x/24 on LAN.

            No "help me" PM's please. Use the forum, the community will thank you.
            Edit : and where are the logs ??

            V 1 Reply Last reply Reply Quote 1
            • V
              vf1954 @Gertjan
              last edited by

              @Gertjan This was many years ago.

              192.168.3.1 is not in use. But since so many clients have 192.168.3.2 hardcoded it's best to just use .2

              Clearly updating the firmware didn't solve the problem.

              It happens just randomly. Today at 3PM I suddenly lose wifi and ethernet access. And more bizarre, only a few computers, but progressively all of them.

              Uptime is currently 7 days.

              When I run

              ps aux | grep 'isc'
              

              I get

              root   1651   0.0   0.1   4672   2256   u0   S+   15:34        0:00.01 grep isc
              

              running

              tail -f /var/log/dhcpd.log
              

              Produces

              Sending to Solicit (multiple lines)
              

              The actual time it takes to even get a connection is a good 45 seconds, and then I just get a ? on the wired connection on ubuntu laptop and when I go to properties of the wired connection ... no IP shows up.

              When I wrote the 'kea' I get more dhcp6 stuff (which is turned off in the GUI)

              Screenshot from 2025-02-14 15-48-22.png

              What is happening?

              :(

              V 1 Reply Last reply Reply Quote 0
              • V
                vf1954 @vf1954
                last edited by

                @vf1954 said in Subnet collapses periodically since 24.11-RELEASE:

                tail -f /var/log/dhcpd.log

                Doing it after I reboot the netgate produces some warnings
                18c5b96d-c1e8-49b4-b8ce-9f90bff01857-image.png

                1 Reply Last reply Reply Quote 0
                • S
                  SteveITS Galactic Empire @vf1954
                  last edited by

                  @vf1954 said in Subnet collapses periodically since 24.11-RELEASE:

                  192.168.0.254 (which is a TP-link router)

                  Where is this set on your TP-Link? How is it connected to your pfSense LAN network?

                  Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                  When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                  Upvote 👍 helpful posts!

                  V 2 Replies Last reply Reply Quote 0
                  • V
                    vf1954 @SteveITS
                    last edited by

                    @SteveITS It is simply plugged in, gets assigned a lan address from pfsense at 192.168.3.3, and then that's it

                    1 Reply Last reply Reply Quote 0
                    • V
                      vf1954 @SteveITS
                      last edited by

                      @SteveITS sorry I see what you mean.

                      It is set at 192.168.3.3 in hte LAN settings in tplink

                      AND

                      it is set to 192.168.3.3 in pfsense dhcp static.

                      S 1 Reply Last reply Reply Quote 0
                      • S
                        SteveITS Galactic Empire @vf1954
                        last edited by

                        @vf1954 So, what is the .254 you mentioned?

                        Screencap the change in pfSense when this happens.

                        If the fields in pfSense aren’t changing I suspect what you’re seeing is another DHCP server. Windows and I’m sure other clients will show the DHCP server used for example “ipconfig /all”

                        Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                        When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                        Upvote 👍 helpful posts!

                        V 1 Reply Last reply Reply Quote 0
                        • V
                          vf1954 @SteveITS
                          last edited by

                          @SteveITS said in Subnet collapses periodically since 24.11-RELEASE:

                          Screencap the change in pfSense when this happens.

                          Not sure what you mean here. Does screencap mean screenshot? Screenshot what?

                          The address being circulated is 192.168.0.xx but the other DHCP router is the wifi which is turned off.

                          S 1 Reply Last reply Reply Quote 0
                          • S
                            SteveITS Galactic Empire @vf1954
                            last edited by

                            @vf1954 yes, screenshot pfSense with the changed settings, or some evidence.

                            If you’re not saying anything in pfSense actually changes then it’s not pfSense. Unplug pfSense LAN, restart a client, and see what it’s IP and DHCP server are.

                            Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                            When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                            Upvote 👍 helpful posts!

                            V 1 Reply Last reply Reply Quote 0
                            • V
                              vf1954 @SteveITS
                              last edited by

                              @SteveITS Fair enough. But why would pfsense just give up its DHCP authority ... randomly ... after 6-14 days?

                              johnpozJ 1 Reply Last reply Reply Quote 0
                              • johnpozJ
                                johnpoz LAYER 8 Global Moderator @vf1954
                                last edited by johnpoz

                                @vf1954 said in Subnet collapses periodically since 24.11-RELEASE:

                                DHCP authority ... randomly ... after 6-14 days?

                                what authority?? When a client does a discover - the first dhcp server that answers wins..

                                If there is more than 1 dhcp server on your network - its a coinflip who will answer first.

                                You can run more than 1 on the same network.. But they need to hand out the same info.. This is can be done as a failover scenario - where you split the scope between then..

                                Say dhcpd1 hands out 192.168.1.10-128
                                Where dhcpd2 hands out 192.168.1.129-244

                                Both point say to 192.168.1.1 for dns and gateway.. Leaving you .2-9 and .245-254 as IPs you can set statically on devices.

                                But if your handing out different IP range and different gateway - yeah your going to have a bad day on clients that get IP from that dhcp server.

                                An intelligent man is sometimes forced to be drunk to spend time with his fools
                                If you get confused: Listen to the Music Play
                                Please don't Chat/PM me for help, unless mod related
                                SG-4860 24.11 | Lab VMs 2.8, 24.11

                                V 1 Reply Last reply Reply Quote 0
                                • V
                                  vf1954 @johnpoz
                                  last edited by

                                  @johnpoz Hello John,

                                  Yes you taught me something new again. I thought DHCP holds authority.

                                  But regardless, even if two DHCP servers were vying for the same "authority" (to grant leases), I'd expect, statistically, that many of the clients would choose 192.168.0.x and lose network/internet access and that that would appear sporadically during the day/week. This is not the behaviour. It is perfectly stable with netgate "in charge", all the time, for all clients, until suddenly every client decides to pivot to 192.168.0.x (albeit at different times, but once one goes the rest will follow within an hour).

                                  You would think they all magically pick up netgate after a couple hours... but they don't either. the pfsense just become inaccessible until I console into it.

                                  my two switches are hardcoded to be on 192.168.3.x address.
                                  my 3 tp-link archer 5400 are set as 192.168.3.3 .4 .5 on easy-mesh with the primary dhcp = off.

                                  There is no other dhcp server afaik.

                                  johnpozJ 1 Reply Last reply Reply Quote 0
                                  • johnpozJ
                                    johnpoz LAYER 8 Global Moderator @vf1954
                                    last edited by johnpoz

                                    @vf1954 clearly there is.. Here do this.. Look at your client currently.

                                    What does it list for the dhcp server?

                                    $ ipconfig /all                                                                       
                                                                                                                          
                                    Windows IP Configuration                                                              
                                                                                                                          
                                       Host Name . . . . . . . . . . . . : i9-win                                         
                                       Primary Dns Suffix  . . . . . . . : home.arpa                                      
                                       Node Type . . . . . . . . . . . . : Broadcast                                      
                                       IP Routing Enabled. . . . . . . . : No                                             
                                       WINS Proxy Enabled. . . . . . . . : No                                             
                                       DNS Suffix Search List. . . . . . : home.arpa                                      
                                                                                                                          
                                    Ethernet adapter Local:                                                               
                                                                                                                          
                                       Connection-specific DNS Suffix  . :                                                
                                       Description . . . . . . . . . . . : Killer E2600 Gigabit Ethernet Controller       
                                       Physical Address. . . . . . . . . : B0-4F-13-0B-FD-16                              
                                       DHCP Enabled. . . . . . . . . . . : Yes                                            
                                       Autoconfiguration Enabled . . . . : Yes                                            
                                       IPv4 Address. . . . . . . . . . . : 192.168.9.100(Preferred)                       
                                       Subnet Mask . . . . . . . . . . . : 255.255.255.0                                  
                                       Lease Obtained. . . . . . . . . . : Friday, February 14, 2025 2:01:59 PM           
                                       Lease Expires . . . . . . . . . . : Tuesday, February 18, 2025 2:02:00 PM          
                                       Default Gateway . . . . . . . . . : 192.168.9.253                                  
                                       DHCP Server . . . . . . . . . . . : 192.168.9.253                                  
                                       DNS Servers . . . . . . . . . . . : 192.168.3.10                                   
                                       NetBIOS over Tcpip. . . . . . . . : Enabled                                        
                                    

                                    192.168.9.253 is my pfsense.. now if I look at the mac address

                                    $ arp -a
                                    
                                    Interface: 192.168.9.100 --- 0x5
                                      Internet Address      Physical Address      Type
                                      192.168.9.10          00-11-32-7b-29-7d     dynamic
                                      192.168.9.253         00-08-a2-0c-e6-24     dynamic
                                      192.168.9.255         ff-ff-ff-ff-ff-ff     static
                                      224.0.0.22            01-00-5e-00-00-16     static
                                      239.255.255.250       01-00-5e-7f-ff-fa     static
                                      255.255.255.255       ff-ff-ff-ff-ff-ff     static
                                    

                                    So its mac is 00-08-a2-0c-e6-24. If pfsense was out of the blue changing its IP and dhcp scope, that that mac address would be the same.

                                    As to why your not seeing a random distribution, maybe pfsense dhcp answers faster - but when it goes offline the only one to answer is your other dhcp server.

                                    Pfsense is just not going to randomly change its IP address.. You either changing it, or your loading a bad/old config? Looking to what mac address your dhcp server is at will tell you for sure that its pfsense, or its some other box.

                                    An intelligent man is sometimes forced to be drunk to spend time with his fools
                                    If you get confused: Listen to the Music Play
                                    Please don't Chat/PM me for help, unless mod related
                                    SG-4860 24.11 | Lab VMs 2.8, 24.11

                                    V 1 Reply Last reply Reply Quote 0
                                    • V
                                      vf1954 @johnpoz
                                      last edited by

                                      @johnpoz I agree a second dhcp is somewhere lurking but I am at wits end to figure out where.

                                      TP-Link: unless the tp link is acting out, it's off. I updated the firmware but that didn't have any effect.
                                      Novell (OES2 server). It has dhcp disabled and the port to dhcp also blocked.
                                      Pi-Hole: turned off (and even if it was turned on, it would serve 192.168.3.x)
                                      Switches: no dhcp server capability (afaik)
                                      We have several unmanaged switches connecting various PCs in an office back to one of the switches

                                      ...

                                      that's it.

                                      johnpozJ 1 Reply Last reply Reply Quote 0
                                      • johnpozJ
                                        johnpoz LAYER 8 Global Moderator @vf1954
                                        last edited by johnpoz

                                        @vf1954 well next time it happens, check the mac - that should help you track down what is doing it.

                                        Or turn off the dhcp server in pfsense.. Do a release and renew on some client, that you were seeing this before.. Does it get the 192.168.0 address.. If so what is the mac of the dhcp server and hope you can track it down from that. The first 3 numbers of the mac should tall you what brand of device it is atleast.

                                        Unless your switches are all just dumb switches, managed and smart switch can provide dhcp.

                                        edit: I mean it could be possible if pfsense is rebooting to an old config or something.. When you console in, look to see what IPs are on the interfaces, etc. I just find that so highly improbable.. What makes more sense and quite possible to happen is something else serving dhcp..

                                        Checking the mac address of dhcp server IP when you get the wrong lease and IP should tell you for sure.. My money is on rogue dhcp and not pfsense just spontaneously changing its IP of an interface and handing out different dhcp info

                                        An intelligent man is sometimes forced to be drunk to spend time with his fools
                                        If you get confused: Listen to the Music Play
                                        Please don't Chat/PM me for help, unless mod related
                                        SG-4860 24.11 | Lab VMs 2.8, 24.11

                                        V 1 Reply Last reply Reply Quote 0
                                        • w0wW
                                          w0w
                                          last edited by

                                          Any chance that there's some mess with flow control on the switches or client devices?
                                          Some USB and non-USB Realtek network adapters embedded into motherboards are known to cause similar issues, such as endless pauses on RX/TX, which can literally collapse the network. I've run into this twice, so it's likely not such an uncommon issue nowadays.
                                          I would start by disabling FC on pfSense and on the switches too, if it is enabled.
                                          Netgate Documentation - Flow Control
                                          Also, disable FC on the switches and routers you are using in your LAN.

                                          V 1 Reply Last reply Reply Quote 0
                                          • V
                                            vf1954 @w0w
                                            last edited by

                                            @w0w I don't know. I never use flow control. I will look more deeply into this.

                                            johnpozJ 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.