Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Issues with an Intel x710 and pfsense 2.4.5-p1

    Scheduled Pinned Locked Moved Hardware
    13 Posts 3 Posters 2.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • E
      elbuit @stephenw10
      last edited by

      @stephenw10 said in Issues with an Intel x710 and pfsense 2.4.5-p1:

      Is that something that just started? It was running without loss for some time?

      Sorry, it was running ok without packet loss for 10 days.
      It became suddenly.

      You have the same NIC in the other node and that is not doing that? Even if you fail over?

      Exactly, I've other node with the same hardware that is running for 6 days.

      Steve

      I don't understand the state table behaviour, with a lot of searches and deletes or inserts, even when the device was disconnected.
      Thanks Steve.

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        Commonly that might be seen when there is an interface mismatch between the nodes causing a state sync loop between them.
        Check the interfaces are assigned identically in the config on both nodes.

        Steve

        E 1 Reply Last reply Reply Quote 0
        • E
          elbuit @stephenw10
          last edited by

          @stephenw10 said in Issues with an Intel x710 and pfsense 2.4.5-p1:

          Commonly that might be seen when there is an interface mismatch between the nodes causing a state sync loop between them.
          Check the interfaces are assigned identically in the config on both nodes.

          Steve

          ummm, pfsync was disabled in both nodes, so I guess no states were synchronized.

          WAN interface is the same in both nodes (ixl1) and openvpn intefaces were created by config sync (XMLRPC Sync) so they should be identical.
          The others interfaces are down (without link).

          I also think that a internal loop could be involved, but I don't know how to find where the loop come from.
          I know that if I reboot this server all things will back to normallity, but firstly I would like to find where the bug comes from.

          For example right now only a snmp query and a ssh session (10 states in state table ) makes it have
          682.0/s inserts and removals.

          /root: pfctl -s info
          Status: Enabled for 1 days 17:18:29           Debug: Urgent
          
          Interface Stats for ixl1              IPv4             IPv6
           Bytes In                  10792675853232           463536
           Bytes Out                 10912798045880              260
           Packets In
             Passed                     13165783082                0
             Blocked                      115984155             6601
           Packets Out
             Passed                     13117785234                0
             Blocked                         368850                3
          
          State Table                          Total             Rate
           current entries                       10               
           searches                     39525266221       265789.3/s
           inserts                        101422148          682.0/s
           removals                       101422138          682.0/s
          Counters
           match                          190610985         1281.8/s
           bad-offset                             0            0.0/s
           fragment                            2559            0.0/s
           short                                573            0.0/s
           normalize                            453            0.0/s
           memory                                 0            0.0/s
           bad-timestamp                          0            0.0/s
           congestion                             0            0.0/s
           ip-option                           2760            0.0/s
           proto-cksum                            0            0.0/s
           state-mismatch                    117091            0.8/s
           state-insert                           0            0.0/s
           state-limit                            0            0.0/s
           src-limit                              0            0.0/s
           synproxy                               0            0.0/s
           map-failed                             0            0.0/s
          
          

          That blew my mind ;-)

          1 Reply Last reply Reply Quote 0
          • E
            elbuit
            last edited by

            @elbuit said in Issues with an Intel x710 and pfsense 2.4.5-p1:

            Additionally, I've checked "Disable hardware checksum offload"

            -RXCSUM,TXCSUM.

            ixl1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
            	options=400b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO>
            

            When I've dissabled all hardware accel, kernel showed a WARNING about queue 3

            kernel: ixl1: WARNING: queue 3 appears to be hung!
            rc.gateway_alarm[7790]: >>> Gateway alarm: WANGW (Addr:xxx.xxx.xxx.xxx Alarm:1 RTT:10.504ms RTTsd:55.100ms Loss:31%)
            

            But it started to answer pings and now is more stable

            Gateways
            WANGW (default)		xxx.xxx.xxx.xxx 	xxx.xxx.xxx.xxx 	6.397ms 	41.771ms 	0.0% 	Online 	
            

            But state table is also behaving in the same way, with only 10 entries and a lot of searches/s, that should be approximatelly the same than packets per second.
            But pfsense is getting less than 10pps and 263009.2searches per second.

            1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by stephenw10

              The 'queue appears to be hung' warning seems to often be triggered when the driver starts or re-starts. In itself it does not seem to be a problem. Whether or not that appears the NIC works as expected after the driver is loaded and you don't see that error again.

              Steve

              E 1 Reply Last reply Reply Quote 0
              • E
                elbuit @stephenw10
                last edited by

                @stephenw10 said in Issues with an Intel x710 and pfsense 2.4.5-p1:

                The 'queue appears to be hung' warning seems to often be triggered when the driver starts or re-starts. In itself it does seem to be a problem.

                I'm not sure about that, problem started with a similar message:

                Nov 15 02:51:48 vpn2ha kernel: ixl1: Malicious Driver Detection event 2 on TX queue 771, pf number 1
                Nov 15 02:51:48 vpn2ha kernel: ixl1: MDD TX event is for this function!
                Nov 15 02:51:50 vpn2ha kernel: ixl1: Malicious Driver Detection event 2 on TX queue 770, pf number 1
                Nov 15 02:51:50 vpn2ha kernel: ixl1: MDD TX event is for this function!
                Nov 15 02:51:59 vpn2ha kernel: ixl1: WARNING: queue 3 appears to be hung!
                Nov 15 02:52:01 vpn2ha kernel: ixl1: WARNING: queue 2 appears to be hung!
                
                

                Could it not be the cause but the consequence?
                I don't know.
                It seems that is somethig related to driver/firwmare NIC, but I don't guess how it finished in a "states insertions/deletions loop"
                That's quite weird.

                Whetehr or not that appears the NIC works as expected after the driver is loaded and you don't see that error again.

                Yes, NIC is working correctly and states loop is going down, rigth now:

                State Table                          Total             Rate
                  current entries                       10               
                  searches                     39526358708       152927.3/s
                  inserts                        101445909          392.5/s
                  removals                       101445899          392.5/s
                
                

                Steve

                Thanks Steve for your help.

                1 Reply Last reply Reply Quote 0
                • E
                  elbuit
                  last edited by

                  I'm trying to capture these inserts/removals.
                  I've tried with tcpdump on all interfaces , including pfsync, pflog, enc0, loopback, ... and no packets found.
                  Only my ssh session.😧

                  G 1 Reply Last reply Reply Quote 0
                  • G
                    GreenAnt @elbuit
                    last edited by GreenAnt

                    @elbuit I have had a similar issue come up with my XG-1527 recently.

                    Queue hung after driver event. This causes incoming requests to hang.

                    pfsense_error.png

                    I note this thread that suggests disabling TSO:

                    https://www.reddit.com/r/PFSENSE/comments/fqtgmj/intel_x710t4_ixl_malicious_driver_detection/

                    doc here:
                    https://docs.netgate.com/pfsense/en/latest/config/advanced-networking.html#hardware-tcp-segmentation-offloading

                    G 1 Reply Last reply Reply Quote 0
                    • G
                      GreenAnt @GreenAnt
                      last edited by

                      @greenant

                      Network (Advanced) Settings for reference

                      0d413487-3593-4a89-88a7-d2f300f28c29-image.png

                      E 1 Reply Last reply Reply Quote 0
                      • E
                        elbuit @GreenAnt
                        last edited by

                        Hi @greenant
                        I've disabled TSO and as pfsense doc you've posted TSO is not a good choice if you are running a firewall. It's mainly for servers.

                        Thanks for your advice.
                        Regards.

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by

                          TSO should be disabled by default. As you say it doesn't make much sense to have it enabled on a firewall for almost all setups.

                          Steve

                          1 Reply Last reply Reply Quote 0
                          • L Louis89 referenced this topic on
                          • First post
                            Last post
                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.