Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    DNS resolver exiting when loading pfblocker 25.03.b.20250409.2208

    Scheduled Pinned Locked Moved Plus 25.03 Develoment Snapshots
    81 Posts 3 Posters 5.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • RobbieTTR
      RobbieTT @w0w
      last edited by

      @w0w said in DNS resolver exiting when loading pfblocker 25.03.b.20250409.2208:

      Even the basic physics...

      Begad, even the laws of physics are against me!

      I need to tend to my infinite improbability drive.

      ☕️

      w0wW 1 Reply Last reply Reply Quote 3
      • w0wW
        w0w @RobbieTT
        last edited by

        @RobbieTT said in DNS resolver exiting when loading pfblocker 25.03.b.20250409.2208:

        Begad, even the laws of physics are against me!

        😁

        What about the power-saving settings for the CPU and the network card? For the CPU, you can check and adjust it to maximum performance in the advanced settings of pfSense for testing purposes. As for the network card, it should be possible to set dev.ice.X.power_save=0 in the system tunables, where X is the port number. However, I'm not sure if this actually works. In theory, this should eliminate glitches with interrupts on the bus if they are, for example, related to power-saving features.

        RobbieTTR 1 Reply Last reply Reply Quote 0
        • RobbieTTR
          RobbieTT @w0w
          last edited by

          @w0w

          A good thought but no NIC involved for these particular links (SFP28), they are direct to the CPU.

          No PowerD in use either - core level Speed Shift capable and enabled in pfSense.

          ☕️

          w0wW 1 Reply Last reply Reply Quote 0
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            Hmm, a change in the default eee setting or similar might present like this. Though I don't see anything likely in the driver history. Especially since 24.11.

            1 Reply Last reply Reply Quote 1
            • w0wW
              w0w @RobbieTT
              last edited by

              @RobbieTT
              I noticed a difference in how SpeedShift behaved between versions 24.11 and 25.03. Maybe it’s just a coincidence and the reason lies elsewhere, but it could also be due to changes in the default SpeedShift settings between the versions. Also it could be some kernel changes regarding it.

              You can list most of the settings by issuing a command

              sysctl -a | grep -E 'hwp|epp|dev.cpu.0.(freq|cx)'
              

              Ideally, do it on both systems and then you can compare it.

              1 Reply Last reply Reply Quote 0
              • w0wW
                w0w
                last edited by w0w

                On my system I've found that
                machdep.hwpstate_pkg_ctrl is 0 on 24.11 and 1 on 25.03

                I think this is it
                69197929-7a4c-4f3d-82c4-6e5d7f5bcc65-image.png
                I don't remember if I changed this setting or not — probably not, but I'm not sure.

                1 Reply Last reply Reply Quote 0
                • stephenw10S
                  stephenw10 Netgate Administrator
                  last edited by

                  That default hasn't changed as far as I know. I really wouldn't expect it to make any difference to a driver from switching that though.

                  w0wW 1 Reply Last reply Reply Quote 0
                  • w0wW
                    w0w @stephenw10
                    last edited by

                    @stephenw10
                    Yep, it is possible that I have changed it.
                    And regarding this "new" ice behaviour, I don't know, do some kernel options changed in between?

                    RobbieTTR 1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      Seems like it must be something like that. I can't see any driver changes that could present like this directly.

                      Though perhaps it could be something in the SFP module since I've nothing on our 8300 test boxes that also use ice(4) NICs.

                      RobbieTTR 2 Replies Last reply Reply Quote 0
                      • RobbieTTR
                        RobbieTT @w0w
                        last edited by

                        @w0w said in DNS resolver exiting when loading pfblocker 25.03.b.20250409.2208:

                        @stephenw10
                        Yep, it is possible that I have changed it.

                        On mine:

                        machdep.hwpstate_pkg_ctrl: 0
                        

                        So no difference between versions on my systems.

                        ☕️

                        1 Reply Last reply Reply Quote 2
                        • RobbieTTR
                          RobbieTT @stephenw10
                          last edited by

                          @stephenw10 said in DNS resolver exiting when loading pfblocker 25.03.b.20250409.2208:

                          Seems like it must be something like that. I can't see any driver changes that could present like this directly.

                          Though perhaps it could be something in the SFP module since I've nothing on our 8300 test boxes that also use ice(4) NICs.

                          Replaced this ipolex SFP+ DAC:

                          	drivername: ice0
                          	plugged: SFP/SFP+/SFP28 Unknown (Copper pigtail)
                          	vendor: ipolex PN: SFP-H10GB-CU1M SN: WTS11J72204 DATE: 2019-07-24
                          

                          With a freshly purchased 10Gtek one:

                          	drivername: ice0
                          	plugged: SFP/SFP+/SFP28 Unknown (Copper pigtail)
                          	vendor: OEM PN: CAB-10GSFP-P1M SN: CSC241010630178 DATE: 2024-10-22
                          

                          It didn't change anything and the issue remains. I wasn't really expecting a difference as I had been through my stock of SFP+ DACs but best to be sure I guess.

                          So far the only way to stop the issue is to revert to 24.11 and below. The problem only manifests itself with 25.03b.

                          1 Reply Last reply Reply Quote 1
                          • w0wW
                            w0w
                            last edited by

                            kern.ipc.tls.enable: 1
                            

                            on 25.03

                            I doubt that this option has any real effect, but for now it's the only difference I can see in the kernel. At the very least, it can be used by both the ice driver and the kernel itself.

                            1 Reply Last reply Reply Quote 0
                            • RobbieTTR
                              RobbieTT @stephenw10
                              last edited by

                              @stephenw10 said in DNS resolver exiting when loading pfblocker 25.03.b.20250409.2208:

                              Seems like it must be something like that. I can't see any driver changes that could present like this directly.

                              Do we still need to add ice_ddp_load="YES" to the loader.conf.local file, or are we done with that tuneable?

                              I don't have it added but I presumed all is well given that the system shows that it is loaded:

                              ice0: <Intel(R) Ethernet Connection E823-L for SFP - 1.43.2-k> mem 0xf0000000-0xf7ffffff,0xfa010000-0xfa01ffff at device 0.0 numa-domain 0 on pci11
                              ice0: Loading the iflib ice driver
                              ice0: The DDP package was successfully loaded: ICE OS Default Package version 1.3.41.0, track id 0xc0000001.
                              ice0: fw 5.5.17 api 1.7 nvm 2.28 etid 80011e36 netlist 0.1.7000-1.25.0.f083a9d5 oem 1.3200.0
                              

                              ☕️

                              RobbieTTR 1 Reply Last reply Reply Quote 0
                              • RobbieTTR
                                RobbieTT @RobbieTT
                                last edited by

                                I'm not sure if this unbound error is relevant but as it appears around the time of these events:

                                php-fpm	16215	/rc.newwanipv6: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1749405403] unbound[41019:0] error: bind: address already in use [1749405403] unbound[41019:0] fatal error: could not open ports'
                                

                                Looking at sockets:

                                IPv4 System Socket Information
                                USER	COMMAND	PID	FD	PROTO	LOCAL	FOREIGN
                                root	php-fpm	16215	4	udp4	*:*	*:*
                                
                                IPv6 System Socket Information
                                USER	COMMAND	PID	FD	PROTO	LOCAL	FOREIGN
                                root	php-fpm	16215	5	udp6	*:*	*:*
                                

                                It's not an area I am familiar with.

                                ☕️

                                1 Reply Last reply Reply Quote 0
                                • stephenw10S
                                  stephenw10 Netgate Administrator
                                  last edited by

                                  No that's ugly but shouldn't be an issue. It tries to start Unbound too rapidly and it's already running. That should not stop it.

                                  RobbieTTR 1 Reply Last reply Reply Quote 0
                                  • RobbieTTR
                                    RobbieTT @stephenw10
                                    last edited by

                                    @stephenw10

                                    Not grabbing the bunting just yet but the short-lived 0606 beta booted ok and without all the noise the interface allegedly having issues. It did have 1 cycle of reacting to a non-existent interface issue when running but the improvement at boot was quite a change.

                                    Now running the 0610 beta and still no issues at boot and no false interface issues for 30 hours+.

                                    Did a bug get caught and shot?

                                    ☕️

                                    1 Reply Last reply Reply Quote 0
                                    • stephenw10S
                                      stephenw10 Netgate Administrator
                                      last edited by

                                      No not as far as I know. Which is interesting! Hmmm

                                      RobbieTTR 1 Reply Last reply Reply Quote 0
                                      • RobbieTTR
                                        RobbieTT @stephenw10
                                        last edited by

                                        @stephenw10 said in DNS resolver exiting when loading pfblocker 25.03.b.20250409.2208:

                                        No not as far as I know. Which is interesting! Hmmm

                                        It could be a big coincidence of course but my last hotplug event:

                                        2025-06-11 15:35:26.596764+01:00	php-fpm	5667	/rc.linkup: HOTPLUG: Configuring interface opt3
                                        2025-06-11 15:35:26.596749+01:00	php-fpm	5667	/rc.linkup: DEVD Ethernet attached event for opt3
                                        2025-06-11 15:35:26.596686+01:00	php-fpm	5667	/rc.linkup: Hotplug event detected for LAN(opt3) dynamic IP address (4: 10.0.1.1, 6: track6)
                                        2025-06-11 15:35:26.596176+01:00	check_reload_status	648	Reloading filter
                                        

                                        ...was over 52 hours ago.

                                        I'll keep monitoring.

                                        ☕️

                                        1 Reply Last reply Reply Quote 0
                                        • stephenw10S
                                          stephenw10 Netgate Administrator
                                          last edited by

                                          The fix for VIPs on PPPoE went into that beta. But I'm not sure how that would affect LAN...

                                          RobbieTTR 1 Reply Last reply Reply Quote 0
                                          • RobbieTTR
                                            RobbieTT @stephenw10
                                            last edited by RobbieTT

                                            Over 72 hrs since my last boot and zero issues with the false interface errors. Packages are happy and DNS resolver has a healthy cache again. Looking like a fix.

                                            @stephenw10 said in DNS resolver exiting when loading pfblocker 25.03.b.20250409.2208:

                                            The fix for VIPs on PPPoE went into that beta. But I'm not sure how that would affect LAN...

                                            Well it didn't really look like a genuine LAN issue from the start. Multiple DACs, vendors and different physical interfaces all showed the same issue.

                                            Regress to v24.11 and the issue went away. Simple DHCP on the WAN to another router and the issue went away again. Remove IPv6 and the issue went away. Use v25.03b with the old PPPoE and the issue went away.

                                            With v25.03b + PPPoE + IPv6 and the problem exerted itself, perhaps with the odd unsolicited RA in the mix, producing the loose periodicity noted. That and perhaps the new SFP28 driver.

                                            Anyway, I am just a sample size of one but so far it all looks good with beta 0610.

                                            I've got logs pre and post fix if required.

                                            ☕️

                                            1 Reply Last reply Reply Quote 1
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.