Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Intermittent high latency between two LAN interfaces

    Scheduled Pinned Locked Moved General pfSense Questions
    15 Posts 2 Posters 1.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S
      stephenw10 Netgate Administrator
      last edited by

      Is there a specific reason you're not running 22.01 yet?

      1 Reply Last reply Reply Quote 0
      • A
        amartin
        last edited by

        No reason, just haven't spent the time to do the upgrade yet.

        I don't see anything in the release notes that would indicate a fix for this issue, so I'd prefer to invest time in gathering additional debug data first rather than upgrading if the act of upgrading by itself is unlikely to resolve this problem.

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          Indeed and in fact there is a known issue in 22.01 that might present like that. There's nothing in 21.05.2 that did though as far as I know.

          What is lagg1 there?

          Do you see errors on any interfaces in Status > Interfaces?

          Are you running any packages?

          Steve

          1 Reply Last reply Reply Quote 0
          • A
            amartin
            last edited by

            Thanks, it's good to know about that similar issue in 22.01.

            The lagg1 interface is a lagg between ix0 and ix1 with LAGG Protocol set to FAILOVER and Failover Master Interface set to auto.

            Do you see errors on any interfaces in Status > Interfaces?

            Yes, the LAN interface (gray on the above MTR) has the following:

            In/out errors 0/456 
            Collisions 0 
            

            And the OPT1 interface (blue above) has the following:

            In/out errors 0/166 
            Collisions 0 
            

            Installed packages:

            • Cron
            • mtr-nox11
            • openvpn-client-export
            • pfBlockerNG-devel
            • zabbix-agent4
            1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              Hmm, that's not ideal. What does netstat -i show at the command line?

              Could be errors on the lagg or one of the members not just the VLAN.

              Did this just start happening?

              1 Reply Last reply Reply Quote 0
              • A
                amartin
                last edited by

                This has been happening for a few months but seems to have gotten worse over the past few weeks. Here's the output of netstat -i - I removed the specific MAC addresses and the IPv6 lines since those aren't relevant here:

                Name   Mtu    Network        Address            Ipkts        Ierrs  Idrop  Opkts        Oerrs  Coll
                ix0    1500   <Link#1>       00:11:22:33:44:55  70322185682  423    0      70804435446  0      0
                ix1    1500   <Link#2>       00:11:22:33:44:55  24934970     2      0      19091        0      0
                ix2    1500   <Link#3>       66:77:aa:bb:cc:dd  1012533026   0      0      493073228    0      0
                ix3    1500   <Link#14>      66:77:aa:bb:cc:dd  4914         0      0      0            0      0
                lo0    16384  <Link#15>      lo0                2183050      0      0      2183050      0      0
                lo0    -      localhost      localhost          0            -      -      0            -      -
                lo0    -      your-net       localhost          2183049      -      -      2183050      -      -
                enc0*  1536   <Link#16>      enc0               0            0      0      0            0      0
                pflog  33160  <Link#17>      pflog0             0            0      0      134021       0      0
                pfsyn  1500   <Link#18>      pfsync0            0            0      0      0            0      0
                lagg0  1500   <Link#19>      66:77:aa:bb:cc:dd  1012537940   0      0      493073228    3      0
                lagg1  1500   <Link#20>      00:11:22:33:44:55  70347118689  425    0      70804454537  622    0
                lagg0  1500   <Link#21>      66:77:aa:bb:cc:dd  1004528099   0      0      492383298    2      0
                lagg0  -      <WAN-ONE>      <WAN-ONE-IP>       14472999     -      -      35           -      -
                lagg0  1500   <Link#22>      66:77:aa:bb:cc:dd  0            0      0      7            0      0
                lagg1  1500   <Link#23>      00:11:22:33:44:55  45500619462  0      0      24028805736  456    0
                lagg1  -      <GRAY-NETWORK> <GRAY-NETWORK-IP>  40479752     -      -      80336493     -      -
                lagg1  1500   <Link#24>      00:11:22:33:44:55  24821582530  0      0      46775648818  166    0
                lagg1  -      <BLUE-NETWORK> <BLUE-NETWORK-IP>  358175778    -      -      358086480    -      -
                lagg0  1500   <Link#25>      66:77:aa:bb:cc:dd  8004823      0      0      689918       1      0
                lagg0  -      <WAN-TWO>      <WAN-TWO-IP>       526569       -      -      0            -
                
                1 Reply Last reply Reply Quote 0
                • stephenw10S
                  stephenw10 Netgate Administrator
                  last edited by

                  What is lagg1 connected to? Are you unable to run LACP there?

                  You might try swapping the lagg conections and see if the input errors follow that.

                  None of that explains 9s pings though.

                  I would try going to 22.01 if you can. There is a workaround patch for the known issue there if you hit it. Even if you do though you still wouldn't see pings at 9000ms.

                  Steve

                  1 Reply Last reply Reply Quote 0
                  • A
                    amartin
                    last edited by

                    Correct, lagg1 cannot support LACP on the other end because the other end of lagg1 is a set of two independent switches (not stacked). I am really only interested in fault tolerance when aggregating the links, so using the FAILOVER protocol is sufficient.

                    Is there a way for me to see which VLAN (e.g. lagg1.X) had those errors?

                    Is there something about 22.01 that would make it more likely to not have this issue, or give me better visibility? I can't explain why I saw 9000ms pings on 21.05.2-RELEASE, so I'm concerned that same situation will just occur again on 22.01.

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      Not beyond what you see from 'netstat -i' above. You can see the out errors there on the VLAN subnets.

                      Do you see any link changes when you see the very high latency?

                      Can you test running without the lagg?

                      Steve

                      1 Reply Last reply Reply Quote 0
                      • A
                        amartin
                        last edited by

                        I haven't noticed any link chances when I see the high latency. As far as testing without the lagg, can I just unplug one of the cables to do the test (forcing all of the traffic to go through one port with no other option) or is that not sufficient (do I actually need to reconfigure to remove the lagg interface itself in the config)?

                        1 Reply Last reply Reply Quote 0
                        • A
                          amartin
                          last edited by

                          I captured another instance of this disruption again today; this time the latency is much smaller (but still high) and there's some packet loss:
                          890662dc-5ba4-457e-a76e-938d0024fbc0-image.png

                          When this happens, TCP connections (e.g. an SSH session) get dropped.

                          The errors on ix0 and lagg1 have increased slightly:

                          Name   Mtu    Network        Address            Ipkts        Ierrs  Idrop  Opkts        Oerrs  Coll
                          ix0    1500   <Link#1>       00:11:22:33:44:55  70482923217  426    0      70965439161  0      0
                          lagg1  1500   <Link#20>      00:11:22:33:44:55  70507907764  428    0      70965458252  622    0
                          
                          1 Reply Last reply Reply Quote 0
                          • stephenw10S
                            stephenw10 Netgate Administrator
                            last edited by

                            Hmm, you would not expect some minor packet loss to cause TCP connections to fail. You just see retransmissons.

                            Unless all of those failures were happening at the same time so it times out. That would take a while though.

                            This starts to look more like a duplicate IP or a packet loop. You can see that if you have a loop that's prevented by stp and it periodically resets.

                            Removing one link from the lagg entirely might prove that.

                            Steve

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post
                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.