Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Outbound ping problem to DNS Filter servers

    Scheduled Pinned Locked Moved NAT
    9 Posts 2 Posters 89 Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • N Offline
      njc
      last edited by

      This is the first PFSense problem I've encountered in over a decade that I've been unable to solve by researching the forums and experimenting. I'll attempt to make this post as complete and concise as possible.

      Problem:
      At two locations, both Netgate 7100's running 25.07.1 (though this problem existed at least on 24.11). DNSFilter client applications continuously ping 103.247.36.36 and 103.247.37.37 to determine connectivity. The problem is these pings often do not get through (but sometimes they do go through and receive a reply). It seems to be related to the amount of pings being sent (both of these locations have 100+ clients). Pings to other addresses always work. DNS lookup requests to those servers also always seem to work. Notably, pinging those servers from PFSense itself (GUI or ssh) always works also.

      At any given time, many clients have pings that work, but it is not isolated to any particular clients. If I leave a ping going, eventually it will start working. Once it's working, it will continue working until it is stopped and some time elapses (presumably when the state expires).

      PFSense Configurations:
      All IPv4. No interfaces have IPv6 configured. "Allow IPv6" is UNchecked in Advanced->Networking.
      No limiters/schedules/shapers at either location.

      • Location A: freeradius, openvpn ('aws-wizard, ipsec-profile-wizard', 'Netgate_Firmware_Upgrade', and 'Nexus' are also installed.)
      • Location B: No packages (however 'aws-wizard, ipsec-profile-wizard', 'Netgate_Firmware_Upgrade', and 'Nexus' are installed - are these 'stock'?)

      Troubleshooting:
      I've tried a lot, but I'll attempt to list everything I've done.

      • Packet captures, both from GUI and from ssh (tcpdump). I always see the packet on the LAN side, but I do not see it on the WAN side (unless it happens to be working, of course).

      • Logging. I set up a rule to Pass ICMP/any traffic to those servers and set it to log. Interestingly, I do not see ALL the ping requests logged, only the first one of a series (indicated as a pass).

      • I tried an explicit Outbound NAT rule.

      • Toggled state policy between Interface Bound and Floating.

      • I've messed with the Advanced firewall settings ICMP timeouts.

      • I set net.inet.icmp.icmplim up to 2000.

      • I set net.route.netisr_maxqlen to 4096.

      • I set net.inet.ip.intr_queue_maxlen to 4000.

      • I've enabled/disabled NIC hardware checksum offloading.

      • I've tried increasing the state table sizes, though nowhere have I seen the state table look anywhere close to full. The search rate might be "high", often around 5000/s.

      • PFinfo shows a continuously growing number of "Blocked" packets out. I suspect this may be indicative of my problem, but not sure where else to look.

      • It appears to be a NAT issue, but I do not know where to find any outbound NAT logs

      • Of course I've searched and read countless forum posts here and elsewhere.

      Additional:
      We have other locations running 23.01 (also 7100's) that do not appear to have this problem, but they also have fewer clients generating these pings. On those, I show far fewer packets out blocked (and it's not growing) on the WAN on PFinfo. All other settings/tunables are stock.

      Summary:
      I think I'm seeing the packets be accepted on the local interface, passing through the firewall rules, but then not being NAT'd out the WAN (except when it's working).

      Where to go from here?
      Thanks in advance,
      Nick

      N S 2 Replies Last reply Reply Quote 0
      • N Offline
        njc @njc
        last edited by

        tcpdump output below.
        lagg1.52 is the LAN (actually a separate VLAN from the regular LAN, so it's easier to capture). lagg1.4090 is the WAN side. It's tricky to separate all the traffic, so what I did was set the length of the ping to 60 so it's obvious which pings are from my test vs. the others.
        When I ping 9.9.9.9 -l 60, I can see this as length 102. When I ping 103.247.37.37 -l 60 I do not see it on the WAN side at all. It's also obvious on LAN side that there's no reply from 103.247.37.37.

        LAN side:

        [25.07.1-RELEASE][admin@***]/root: tcpdump -i lagg1.52 -n host 10.52.0.10 -e
        tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
        listening on lagg1.52, link-type EN10MB (Ethernet), snapshot length 262144 bytes
        15:33:38.147038 {WAN} > 00:e0:ed:f0:5b:a2, ethertype IPv4 (0x0800), length 102: 10.52.0.10 > 103.247.37.37: ICMP echo request, id 1, seq 13345, length 68
        15:33:42.927090 {WAN} > 00:e0:ed:f0:5b:a2, ethertype ARP (0x0806), length 56: Request who-has 10.52.0.1 (00:e0:ed:f0:5b:a2) tell 10.52.0.10, length 42
        15:33:42.927098 00:e0:ed:f0:5b:a2 > {WAN}, ethertype ARP (0x0806), length 42: Reply 10.52.0.1 is-at 00:e0:ed:f0:5b:a2, length 28
        15:33:42.927183 {WAN} > 00:e0:ed:f0:5b:a2, ethertype IPv4 (0x0800), length 102: 10.52.0.10 > 103.247.37.37: ICMP echo request, id 1, seq 13346, length 68
        15:33:47.941896 {WAN} > 00:e0:ed:f0:5b:a2, ethertype IPv4 (0x0800), length 102: 10.52.0.10 > 103.247.37.37: ICMP echo request, id 1, seq 13347, length 68
        15:33:52.753096 {WAN} > 00:e0:ed:f0:5b:a2, ethertype IPv4 (0x0800), length 102: 10.52.0.10 > 9.9.9.9: ICMP echo request, id 1, seq 13348, length 68
        15:33:52.767568 00:e0:ed:f0:5b:a2 > {WAN}, ethertype IPv4 (0x0800), length 102: 9.9.9.9 > 10.52.0.10: ICMP echo reply, id 1, seq 13348, length 68
        15:33:53.777293 {WAN} > 00:e0:ed:f0:5b:a2, ethertype IPv4 (0x0800), length 102: 10.52.0.10 > 9.9.9.9: ICMP echo request, id 1, seq 13349, length 68
        15:33:53.791564 00:e0:ed:f0:5b:a2 > {WAN}, ethertype IPv4 (0x0800), length 102: 9.9.9.9 > 10.52.0.10: ICMP echo reply, id 1, seq 13349, length 68
        15:33:54.793003 {WAN} > 00:e0:ed:f0:5b:a2, ethertype IPv4 (0x0800), length 102: 10.52.0.10 > 9.9.9.9: ICMP echo request, id 1, seq 13350, length 68
        15:33:54.807614 00:e0:ed:f0:5b:a2 > {WAN}, ethertype IPv4 (0x0800), length 102: 9.9.9.9 > 10.52.0.10: ICMP echo reply, id 1, seq 13350, length 68
        

        WAN side (only relevant traffic and obscured WAN IP/MAC):
        9.9.9.9:

        [25.07.1-RELEASE][admin@***]/root: tcpdump -i lagg1.4090 -n net 9.9.9.9/32 -e
        tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
        listening on lagg1.4090, link-type EN10MB (Ethernet), snapshot length 262144 bytes
        15:38:15.102894 00:e0:ed:f0:5b:a2 > {WAN GW}, ethertype IPv4 (0x0800), length 102: {WAN IP} > 9.9.9.9: ICMP echo request, id 1, seq 13351, length 68
        15:38:15.117499 {WAN GW} > 00:e0:ed:f0:5b:a2, ethertype IPv4 (0x0800), length 102: 9.9.9.9 > {WAN IP}: ICMP echo reply, id 1, seq 13351, length 68
        15:38:16.114749 00:e0:ed:f0:5b:a2 > {WAN GW}, ethertype IPv4 (0x0800), length 102: {WAN IP} > 9.9.9.9: ICMP echo request, id 1, seq 13352, length 68
        15:38:16.129052 {WAN GW} > 00:e0:ed:f0:5b:a2, ethertype IPv4 (0x0800), length 102: 9.9.9.9 > {WAN IP}: ICMP echo reply, id 1, seq 13352, length 68
        15:38:17.130690 00:e0:ed:f0:5b:a2 > {WAN GW}, ethertype IPv4 (0x0800), length 102: {WAN IP} > 9.9.9.9: ICMP echo request, id 1, seq 13353, length 68
        15:38:17.145017 {WAN GW} > 00:e0:ed:f0:5b:a2, ethertype IPv4 (0x0800), length 102: 9.9.9.9 > {WAN IP}: ICMP echo reply, id 1, seq 13353, length 68
        

        103.247.37.37:

        [25.07.1-RELEASE][admin@***]/root: tcpdump -i lagg1.4090 -n net 103.247.37.37/32 -e
        
        //nothing found with length 102
        
        N 1 Reply Last reply Reply Quote 0
        • N Offline
          njc @njc
          last edited by

          OK, new discovery. I ran this command:

          pfctl -x loud
          

          ...and the firewall promptly locked up (I'm remote, so the VPN dropped). A few minutes later it came back up (thankfully). A review of the system log shows thousands of messages like this:

          Nov 20 16:31:53 pfSense kernel: pf: wire key attach failed on lagg1.4090: :3ICMP out wire: 103.247.37.37:8 {WAN IP}:9 0:0 @49, existing: ICMP out wire: 103.247.37.37:8 {WAN IP}:9 stack: 103.247.37.37:8 10.100.80.70:9 0:0 @49
          Nov 20 16:31:53 pfSense kernel: {WAN IP}pf: BAD state: :3 stack: 103.247.37.37:8 10.100.80.120ICMP out wire: 103.247.36.36:8 {WAN IP}:1 0:0 @49, existing: ICMP out wire: 103.247.36.36:8 {WAN IP}:1 stack: 103.247.36.36ICMP out wire: 103.247.36.36:8 {WAN IP}:9 0:0 @49, existing: ICMP out wire: 103.247.36.36:8 {WAN IP}:9 stack: 103.247.36.36:8 10.100.80.70:9 0:0 @49
          Nov 20 16:31:53 pfSense kernel: pf: wire key attach failed on lagg1.4090: 0:0 @49, existing: ICMP out wire: 103.247.37.37:8 out wire: 103.247.37.37:8 {WAN IP}:4 0:0 @49, existing: ICMP out wire: 103.247.37.37:8 {WAN IP}:4 stack: 103.247.37.37:8 10.100.80.129:4 0:0 @49
          Nov 20 16:31:53 pfSense kernel: pf: wire key attach failed on lagg1.4090: pf: wire key attach failed on lagg1.4090: pf: wire key attach failed on lagg1.4090: pf: wire key attach failed on lagg1.4090: ICMPICMP out wire: 103.247.37.37:8 {WAN IP}:3ICMP out wire: 103.247.37.37:8 {WAN IP}:12 0:0 @49, existing: ICMP out wire: 103.247.37.37:8 {WAN IP}:12 stack: 103.247.37.37:8 10.100.80.221:12 0:0 @49
          

          I believe this is clearly related to my problem, and my guess is when I enabled "loud" debug, there were so many messages to be logged, the system got overwhelmed. The debug level is back to "urgent" and these messages are not being logged.

          So what does that error mean and what can I do about it...?

          N 1 Reply Last reply Reply Quote 0
          • S Offline
            SteveITS Galactic Empire @njc
            last edited by

            @njc said in Outbound ping problem to DNS Filter servers:

            I do not see ALL the ping requests logged, only the first one of a series

            After that the state is open so there is not a "new" connection being made.

            Overall, are these Windows PCs, and did you make NAT changes? There is an edge case bug in FreeBSD for pinging the same host:

            @stephenw10 said in Can't ping the same IP from multiple devices:

            if you have 1:1 NAT (or static ports outbound NAT) then only one internal system can open a unique state

            It is fixed so maybe it's in 25.11? pfSense release notes don't normally call out FreeBSD bugs IIRC.

            Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
            When upgrading, allow 10-15 minutes to reboot, or more depending on packages, CPU, and/or disk speed.
            Upvote ๐Ÿ‘ helpful posts!

            1 Reply Last reply Reply Quote 1
            • N Offline
              njc @njc
              last edited by

              Update. I believe this is my issue.
              https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=283795

              Next step is to load up a spare 7100 with 25.11.r.20251118.1708 (FreeBSD-16.0-CURRENT), apply our config, and swap the cables over. This way if that RC has other issues we can go back...

              N 1 Reply Last reply Reply Quote 1
              • N Offline
                njc @njc
                last edited by

                cross-post. Thank you @SteveITS !

                S 1 Reply Last reply Reply Quote 2
                • S Offline
                  SteveITS Galactic Empire @njc
                  last edited by

                  @njc ๐Ÿ‘ If you have ZFS on this 7100 you can revert to a 25.07 boot environment. But a spare works too.

                  This one drove me nuts for a while.

                  Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                  When upgrading, allow 10-15 minutes to reboot, or more depending on packages, CPU, and/or disk speed.
                  Upvote ๐Ÿ‘ helpful posts!

                  N 1 Reply Last reply Reply Quote 1
                  • N Offline
                    njc @SteveITS
                    last edited by

                    @SteveITS Thanks. I'd upvote you but I don't have enough street cred yet :)

                    S 1 Reply Last reply Reply Quote 2
                    • S Offline
                      SteveITS Galactic Empire @njc
                      last edited by

                      @njc :) hereโ€™s a couple

                      Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                      When upgrading, allow 10-15 minutes to reboot, or more depending on packages, CPU, and/or disk speed.
                      Upvote ๐Ÿ‘ helpful posts!

                      1 Reply Last reply Reply Quote 1
                      • First post
                        Last post
                      Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.