Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    pfSense unresponsive during and for several seconds after an iperf3 test?

    General pfSense Questions
    2
    5
    379
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • T
      Tantamount
      last edited by Tantamount

      Hello friendly people!

      I've recently been upgrading bits and pieces of my network, most recently upgrading the backhaul between two switches to 10gbe by switching to fiber via the SFP+ ports.

      My pfsense (2.6.0) router uses UTP 2.5GBE nics, the one connected to the switch is using an SPF+ port RJ45 adapter.

      The testing computer has a 10Gbe network connection at the remote switch (fiber).

      If I run a test from that machine, pfsense immediately becomes unresponsive -- pings aren't returned (not just from that machine, but from any). If I abort the test, it takes several seconds for pfsense to return responses and go back to normal.

      I'm trying to determine what is going on here. Is it hardware? unstable drivers? something about the tcp/ip protocol not handling the speed mismatch?

      The pfsense unit has 32gigs of ram and the dashboard is reporting these cpu details:
      Intel(R) Pentium(R) Silver N6005 @ 2.00GHz
      4 CPUs: 1 package(s) x 4 core(s)

      The resource usage numbers are normally in single digit percentages, and the one thing that could have been suspect (Suricata) is disabled.

      I believe, but need to verify, that these dmesg events occur during the iperf3 tests:
      igc1: link state changed to DOWN
      igc1: link state changed to UP

      Driver crash? Switch reset?

      The tcp/ip protocol is supposed to handle mismatches like this automatically, right? Drop packets, adjust window sizes, and otherwise inform the client to throttle?

      Switch details:
      At the pfsense device: Qnap QSW-M408S: 8 1GbE RJ45 ports, 4 SFP+ 10GbE ports
      At the client device: Qnap QSW-M2106-4S: 6 2.5GbE RJ45 ports, 4 SFP+ 10 GbE ports

      If I run a similar test with a client on a 2.5GbE port on the same switch as the unit using 10GbE it works fine and I get the expected 2.3+GbE results.

      Give that pfsense becomes unresponsive during these tests, I'm going to attempt to connect to the console port so that I can look at more details in real-time, but if anyone has any idea what's going on here, or which binaries I can run from the console to capture what could be going on here I would appreciate it!

      1 Reply Last reply Reply Quote 0
      • T
        Tantamount
        last edited by Tantamount

        I just remembered -- the 2.5GbE nic that attaches to the switch's SPF+ port is using a 10GbE transceiver. I wonder if it shows as 10GbE in the switch -- like maybe the switch can't negotiate to 2.5GbE because it's not a 2.5GbE transceiver? It's just weird though because it otherwise works -- like back in the day if I tried setting a 100mbit nic to 1gbe it just wouldn't work at all.

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          I assume igc1 is the NIC connected to the switch?

          Nothing else logged in pfSense once it becomes available again?

          Are you testing to iperf running in pfSense directly?

          Check mac stats in: sysctl dev.igc.1

          Steve

          T 1 Reply Last reply Reply Quote 0
          • T
            Tantamount @stephenw10
            last edited by

            @stephenw10

            Hi Steve,

            Yes, igc1 is connected to the switch.

            Yeah, I've got the iperf package installed and running in server mode.

            Thanks for that sysctl command -- lot of stats! I'll need to run that before and after a test to see what values change.

            After watching the system log overnight, I noticed these would happen on occasion even when not testing:
            igc1: link state changed to DOWN
            igc1: link state changed to UP

            Since moving the connection to an rj45 1gbe port, that stopped.

            I'm pretty certain at this point that the problem is with the transponder and/or the SPF+ port not natively supporting 2.5GbE (The switch docs only show 1/10 for those types of ports).

            I've purchased a replacement system with spf+ ports that can handle 10GbE and will report back. It uses the same CPU and will actually "only" have 16 gigs of ram, so would be a good test to see if this problem was a resource constraint issue.

            1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              It's unlikely you're using anything anywhere near 16GB unless there is a serious memory leak somehow. That should be pretty obvious from the monitoring graphs.

              1 Reply Last reply Reply Quote 0
              • First post
                Last post
              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.