Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Intermittent high latency between two LAN interfaces

    Scheduled Pinned Locked Moved General pfSense Questions
    15 Posts 2 Posters 1.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • A
      amartin
      last edited by

      Hello, I'm running pfSense Plus 21 on an XG-7100 and have configured two LAN interfaces:

      • LAN - lagg1.X - gray network in screenshot
      • OPT1 - lagg1.Y - blue network in screenshot

      When passing traffic between these two LAN interfaces (WAN is not involved at all with this traffic), I'll occasionally experience intermittent low packet loss and as seen in the screenshot below (pfSense is the red line) and some very high latency:
      mtr_pfsense.png

      I don't see evidence of CPU or disk contention on pfSense during this time period, and netstat -idb -I <interface> doesn't show any errors or drops on either interface. The Quality graph under Status - Monitoring shows the following (with the time period that the MTR was taken circled in yellow):
      quality_pfsense.png

      What else can I do to diagnose and resolve this intermittent problem between the LAN interfaces? Thanks!

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        @amartin said in Intermittent high latency between two LAN interfaces:

        Plus 21

        What actual version are you running?

        A 1 Reply Last reply Reply Quote 0
        • A
          amartin @stephenw10
          last edited by

          @stephenw10 21.05.2-RELEASE

          1 Reply Last reply Reply Quote 0
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            Is there a specific reason you're not running 22.01 yet?

            1 Reply Last reply Reply Quote 0
            • A
              amartin
              last edited by

              No reason, just haven't spent the time to do the upgrade yet.

              I don't see anything in the release notes that would indicate a fix for this issue, so I'd prefer to invest time in gathering additional debug data first rather than upgrading if the act of upgrading by itself is unlikely to resolve this problem.

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                Indeed and in fact there is a known issue in 22.01 that might present like that. There's nothing in 21.05.2 that did though as far as I know.

                What is lagg1 there?

                Do you see errors on any interfaces in Status > Interfaces?

                Are you running any packages?

                Steve

                1 Reply Last reply Reply Quote 0
                • A
                  amartin
                  last edited by

                  Thanks, it's good to know about that similar issue in 22.01.

                  The lagg1 interface is a lagg between ix0 and ix1 with LAGG Protocol set to FAILOVER and Failover Master Interface set to auto.

                  Do you see errors on any interfaces in Status > Interfaces?

                  Yes, the LAN interface (gray on the above MTR) has the following:

                  In/out errors 0/456 
                  Collisions 0 
                  

                  And the OPT1 interface (blue above) has the following:

                  In/out errors 0/166 
                  Collisions 0 
                  

                  Installed packages:

                  • Cron
                  • mtr-nox11
                  • openvpn-client-export
                  • pfBlockerNG-devel
                  • zabbix-agent4
                  1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    Hmm, that's not ideal. What does netstat -i show at the command line?

                    Could be errors on the lagg or one of the members not just the VLAN.

                    Did this just start happening?

                    1 Reply Last reply Reply Quote 0
                    • A
                      amartin
                      last edited by

                      This has been happening for a few months but seems to have gotten worse over the past few weeks. Here's the output of netstat -i - I removed the specific MAC addresses and the IPv6 lines since those aren't relevant here:

                      Name   Mtu    Network        Address            Ipkts        Ierrs  Idrop  Opkts        Oerrs  Coll
                      ix0    1500   <Link#1>       00:11:22:33:44:55  70322185682  423    0      70804435446  0      0
                      ix1    1500   <Link#2>       00:11:22:33:44:55  24934970     2      0      19091        0      0
                      ix2    1500   <Link#3>       66:77:aa:bb:cc:dd  1012533026   0      0      493073228    0      0
                      ix3    1500   <Link#14>      66:77:aa:bb:cc:dd  4914         0      0      0            0      0
                      lo0    16384  <Link#15>      lo0                2183050      0      0      2183050      0      0
                      lo0    -      localhost      localhost          0            -      -      0            -      -
                      lo0    -      your-net       localhost          2183049      -      -      2183050      -      -
                      enc0*  1536   <Link#16>      enc0               0            0      0      0            0      0
                      pflog  33160  <Link#17>      pflog0             0            0      0      134021       0      0
                      pfsyn  1500   <Link#18>      pfsync0            0            0      0      0            0      0
                      lagg0  1500   <Link#19>      66:77:aa:bb:cc:dd  1012537940   0      0      493073228    3      0
                      lagg1  1500   <Link#20>      00:11:22:33:44:55  70347118689  425    0      70804454537  622    0
                      lagg0  1500   <Link#21>      66:77:aa:bb:cc:dd  1004528099   0      0      492383298    2      0
                      lagg0  -      <WAN-ONE>      <WAN-ONE-IP>       14472999     -      -      35           -      -
                      lagg0  1500   <Link#22>      66:77:aa:bb:cc:dd  0            0      0      7            0      0
                      lagg1  1500   <Link#23>      00:11:22:33:44:55  45500619462  0      0      24028805736  456    0
                      lagg1  -      <GRAY-NETWORK> <GRAY-NETWORK-IP>  40479752     -      -      80336493     -      -
                      lagg1  1500   <Link#24>      00:11:22:33:44:55  24821582530  0      0      46775648818  166    0
                      lagg1  -      <BLUE-NETWORK> <BLUE-NETWORK-IP>  358175778    -      -      358086480    -      -
                      lagg0  1500   <Link#25>      66:77:aa:bb:cc:dd  8004823      0      0      689918       1      0
                      lagg0  -      <WAN-TWO>      <WAN-TWO-IP>       526569       -      -      0            -
                      
                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        What is lagg1 connected to? Are you unable to run LACP there?

                        You might try swapping the lagg conections and see if the input errors follow that.

                        None of that explains 9s pings though.

                        I would try going to 22.01 if you can. There is a workaround patch for the known issue there if you hit it. Even if you do though you still wouldn't see pings at 9000ms.

                        Steve

                        1 Reply Last reply Reply Quote 0
                        • A
                          amartin
                          last edited by

                          Correct, lagg1 cannot support LACP on the other end because the other end of lagg1 is a set of two independent switches (not stacked). I am really only interested in fault tolerance when aggregating the links, so using the FAILOVER protocol is sufficient.

                          Is there a way for me to see which VLAN (e.g. lagg1.X) had those errors?

                          Is there something about 22.01 that would make it more likely to not have this issue, or give me better visibility? I can't explain why I saw 9000ms pings on 21.05.2-RELEASE, so I'm concerned that same situation will just occur again on 22.01.

                          1 Reply Last reply Reply Quote 0
                          • stephenw10S
                            stephenw10 Netgate Administrator
                            last edited by

                            Not beyond what you see from 'netstat -i' above. You can see the out errors there on the VLAN subnets.

                            Do you see any link changes when you see the very high latency?

                            Can you test running without the lagg?

                            Steve

                            1 Reply Last reply Reply Quote 0
                            • A
                              amartin
                              last edited by

                              I haven't noticed any link chances when I see the high latency. As far as testing without the lagg, can I just unplug one of the cables to do the test (forcing all of the traffic to go through one port with no other option) or is that not sufficient (do I actually need to reconfigure to remove the lagg interface itself in the config)?

                              1 Reply Last reply Reply Quote 0
                              • A
                                amartin
                                last edited by

                                I captured another instance of this disruption again today; this time the latency is much smaller (but still high) and there's some packet loss:
                                890662dc-5ba4-457e-a76e-938d0024fbc0-image.png

                                When this happens, TCP connections (e.g. an SSH session) get dropped.

                                The errors on ix0 and lagg1 have increased slightly:

                                Name   Mtu    Network        Address            Ipkts        Ierrs  Idrop  Opkts        Oerrs  Coll
                                ix0    1500   <Link#1>       00:11:22:33:44:55  70482923217  426    0      70965439161  0      0
                                lagg1  1500   <Link#20>      00:11:22:33:44:55  70507907764  428    0      70965458252  622    0
                                
                                1 Reply Last reply Reply Quote 0
                                • stephenw10S
                                  stephenw10 Netgate Administrator
                                  last edited by

                                  Hmm, you would not expect some minor packet loss to cause TCP connections to fail. You just see retransmissons.

                                  Unless all of those failures were happening at the same time so it times out. That would take a while though.

                                  This starts to look more like a duplicate IP or a packet loop. You can see that if you have a loop that's prevented by stp and it periodically resets.

                                  Removing one link from the lagg entirely might prove that.

                                  Steve

                                  1 Reply Last reply Reply Quote 0
                                  • First post
                                    Last post
                                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.