Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    LACP doesn't work reliably, "slow" PDU transmission rate suspected

    Scheduled Pinned Locked Moved HA/CARP/VIPs
    8 Posts 3 Posters 2.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S
      stephan23
      last edited by

      Hi

      I have problems with the reliability of the LACP LAG of 2 1Gb interfaces between pfSense and a Juniper switch stack.

      FW1 (pfSense primary) is connected to switch 1 and 3 via LACP
      FW2 (pfsense secondary) is connected to switch 0 and 2 via LACP

      When switch 0 failed earlier I would have expected the other leg connected to switch 2 in the Virtual Chassis to provide resiliency, but that didn't happen. The secondary firewall went active because it couldn't reach the primary firewall anymore via CARP. At the same time FW2 did have some sort of connectivity, since it could take on traffic to some external IP addresses, and cause havoc in the process.

      One possible error source I would like to explore is that Juniper tells me that its LACP partner (the FW's) use slow transmissions:

      {master:1}
      stephan@us1-swi> show lacp interfaces ae2 extensive   
      Aggregated interface: ae2
          LACP state:       Role   Exp   Def  Dist  Col  Syn  Aggr  Timeout  Activity
            ge-2/0/1       Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active
            ge-2/0/1     Partner    No    No   Yes  Yes  Yes   Yes     Slow    Active
            ge-0/0/1       Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active
            ge-0/0/1     Partner    No    No   Yes  Yes  Yes   Yes     Slow    Active
          LACP protocol:        Receive State  Transmit State          Mux State 
            ge-2/0/1                  Current   Slow periodic Collecting distributing
            ge-0/0/1                  Current   Slow periodic Collecting distributing
          LACP info:        Role     System             System       Port     Port    Port 
                                   priority         identifier   priority   number     key 
            ge-2/0/1       Actor        127  7c:25:86:ce:a4:2f        127       51       3
            ge-2/0/1     Partner      32768  ac:1f:6b:66:fb:a2      32768        5     427
            ge-0/0/1       Actor        127  7c:25:86:ce:a4:2f        127        1       3
            ge-0/0/1     Partner      32768  ac:1f:6b:66:fb:a2      32768        3     427
      

      The explanation for fast and slow timeouts is as follows (from Juniper docs):

      Timeout—LACP timeout preference. Periodic transmissions of LACP PDUs occur at either a slow or fast transmission rate, depending upon the expressed LACP timeout preference (Slow Timeout or Fast Timeout). In a fast timeout, PDUs are sent every second and in a slow timeout, PDUs are sent every 30 seconds. LACP timeout occurs when 3 consecutive PDUs are missed. If LACP timeout is a fast timeout, the time taken when 3 consecutive PDUs are missed is 3 seconds (3x1 second). If LACP timeout is a slow timeout, the time taken is 90 seconds( 3x30 seconds).
      

      So this sounds to me like the most likely reason why this fails. And sure enough, other LACPs, eg. to some of my Linux servers, do use Fast:

      {master:1}
      stephan@us1-swi> show lacp interfaces ae4 extensive    
      Aggregated interface: ae4
          LACP state:       Role   Exp   Def  Dist  Col  Syn  Aggr  Timeout  Activity
            ge-2/0/4       Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active
            ge-2/0/4     Partner    No    No   Yes  Yes  Yes   Yes     Fast    Active
            ge-0/0/4 FUP    Actor   No    No   Yes  Yes  Yes   Yes     Fast    Active
            ge-0/0/4 FUP  Partner   No    No   Yes  Yes  Yes   Yes     Fast    Active
          LACP protocol:        Receive State  Transmit State          Mux State 
            ge-2/0/4                  Current   Fast periodic Collecting distributing
            ge-0/0/4                  Current   Fast periodic Collecting distributing
          LACP info:        Role     System             System       Port     Port    Port 
                                   priority         identifier   priority   number     key 
            ge-2/0/4       Actor        127  7c:25:86:ce:a4:2f        127       53       5
            ge-2/0/4     Partner      65535  c6:e9:d0:d8:94:79        255        1       9
            ge-0/0/4       Actor        127  7c:25:86:ce:a4:2f        127        3       5
            ge-0/0/4     Partner      65535  c6:e9:d0:d8:94:79        255        2       9
      

      So my question is: How can I configure pfSense to use the fast transmissions for LACP?

      Thanks
      Stephan

      P 1 Reply Last reply Reply Quote 0
      • dotdashD
        dotdash
        last edited by

        The lagg manpage states:
        BUGS
        There is no way to configure LACP administrative variables, including
        system and port priorities. The current implementation always performs
        active-mode LACP and uses 0x8000 as system and port priorities

        1 Reply Last reply Reply Quote 0
        • P
          pete.s.
          last edited by pete.s.

          I'm interested in this as well.

          In freebsd you are suppose to be able to set this with ifconfig.

           lacp_fast_timeout
               Enable lacp fast-timeout on the interface.
          
           -lacp_fast_timeout
               Disable lacp fast-timeout on the interface.
          

          I don't know where that would be done in pfSense though.

          1 Reply Last reply Reply Quote 0
          • P
            pete.s. @stephan23
            last edited by

            @stephan23 said in LACP doesn't work reliably, "slow" PDU transmission rate suspected:

            So my question is: How can I configure pfSense to use the fast transmissions for LACP?

            Maybe you should reword the title of this thread to => "How to configure fast transmission for LACP?"

            dotdashD 1 Reply Last reply Reply Quote 0
            • dotdashD
              dotdash @pete.s.
              last edited by

              @pete-s
              Ah, I didn't see that before. I thought the note meant you couldn't tweak any of the lacp knobs.
              I don't have anything to test with right now, but I'd guess it would be:
              ifconfig lagg0 lacp_fast_timeout
              You could use the shellcmd package to set it on boot.

              1 Reply Last reply Reply Quote 0
              • S
                stephan23
                last edited by

                Thanks a lot for pointing out this option, that's excellent news!

                What are the chances to make "fast" the default in pfSense? I can't think of a good reason why the less resilient setting should be the default. Or, if backwards compatibility is a concern here, at least expose the option to the WebUI? I've not used the shellcmd package before, but it sounds a bit hacky to me ... would shellcmd work without the risk of race conditions at boot time?

                1 Reply Last reply Reply Quote 0
                • dotdashD
                  dotdash
                  last edited by

                  The shellcmd package has several options, I'd advise you test on a non-production system, but it's just an ifconfig command to set an option on the lagg, so I don't think it's a concern. You can test the command from 'diagnostics, command prompt' to verify it does what you need. As for adding the option in the webgui, you could put in a feature request in redmine.

                  1 Reply Last reply Reply Quote 0
                  • S
                    stephan23
                    last edited by

                    Ok thanks, raised a freature request:

                    https://redmine.pfsense.org/issues/10504

                    1 Reply Last reply Reply Quote 1
                    • First post
                      Last post
                    Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.