• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

can't load balance with bgp multipath

Scheduled Pinned Locked Moved FRR
12 Posts 3 Posters 2.1k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • B
    baketopher
    last edited by baketopher Jul 31, 2023, 5:25 PM Jul 28, 2023, 1:24 AM

    I'm trying to load balance DNS traffic between two machines over BGP anycast. However no matter what I configure, one is always chosen in preference over the other. Ie, DNS requests are always sent to one machine and not the other when sending multiple test queries to the anycast address from a test endpoint. Is load balancing traffic between these two machines possible in this way?

    Reading the latest comment from Jim in https://redmine.pfsense.org/issues/9545 it would appear it is on 2.70 CE, but I can't seem to get it to work.

    I've tried adding "maximum-paths 2" to the config manually and re-tested but it doesn't seem to make any difference.

    Using pfSense 2.70 CE and frr 1.2_3

    Current Config:

    ##################### DO NOT EDIT THIS FILE! ######################
    ###################################################################
    # This file was created by an automatic configuration generator.  #
    # The contents of this file will be overwritten without warning!  #
    ###################################################################
    !
    frr defaults traditional
    hostname pf.lab
    password 123456
    service integrated-vtysh-config
    !
    router bgp 65248
     bgp router-id 192.168.0.1
     no bgp network import-check
     bgp bestpath as-path multipath-relax
     neighbor 192.168.0.3 peer-group dnsgroup
     neighbor 192.168.0.3 remote-as 65249
     neighbor 192.168.0.3 description DNS1
     neighbor 192.168.0.4 peer-group dnsgroup
     neighbor 192.168.0.4 remote-as 65249
     neighbor 192.168.0.4 description DNS2
     neighbor dnsgroup peer-group
     neighbor dnsgroup remote-as 65249
     neighbor dnsgroup description DNS Servers
     neighbor dnsgroup update-source 192.168.0.1
     !
     address-family ipv4 unicast
      neighbor 192.168.0.3 activate
      neighbor 192.168.0.4 activate
      no neighbor 192.168.0.3 send-community
      no neighbor 192.168.0.4 send-community
      no neighbor dnsgroup send-community
      neighbor dnsgroup route-map Allow-All in
      neighbor dnsgroup route-map Allow-All out
     exit-address-family
     !
    !
    route-map Allow-All permit 100
    !
    line vty
    !
    end
    

    BGP routes as displayed by the frr GUI:

    BGP table version is 2, local router ID is 192.168.0.1, vrf id 0
    Default local pref 100, local AS 65248
    Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
                   i internal, r RIB-failure, S Stale, R Removed
    Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
    Origin codes:  i - IGP, e - EGP, ? - incomplete
    
       Network          Next Hop            Metric LocPrf Weight Path
    *= 10.3.0.5/32      192.168.0.4              0             0 65249 i
    *>                  192.168.0.3              0             0 65249 i
    
    Displayed  1 routes and 2 total paths
    

    Thanks!

    M 1 Reply Last reply Jul 28, 2023, 2:34 PM Reply Quote 0
    • M
      michmoor LAYER 8 Rebel Alliance @baketopher
      last edited by Jul 28, 2023, 2:34 PM

      @baketopher The route table looks good.
      How are you monitoring the links to determine if load balancing is working?

      Firewall: NetGate,Palo Alto-VM,Juniper SRX
      Routing: Juniper, Arista, Cisco
      Switching: Juniper, Arista, Cisco
      Wireless: Unifi, Aruba IAP
      JNCIP,CCNP Enterprise

      1 Reply Last reply Reply Quote 0
      • B
        baketopher
        last edited by baketopher Jul 28, 2023, 7:16 PM Jul 28, 2023, 7:14 PM

        @michmoor

        My interpretation of the bgp route table is that ">" indicates a preferred path?

        I'm doing a packet capture on both DNS servers and sending queries from an endpoint.

        This is the BGP routes on pfsense at the time of testing:

        BGP table version is 2, local router ID is 192.168.0.1, vrf id 0
        Default local pref 100, local AS 65248
        Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
                       i internal, r RIB-failure, S Stale, R Removed
        Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
        Origin codes:  i - IGP, e - EGP, ? - incomplete
        
           Network          Next Hop            Metric LocPrf Weight Path
        *= 10.3.0.5/32      192.168.0.4              0             0 65249 i
        *>                  192.168.0.3              0             0 65249 i
        
        Displayed  1 routes and 2 total paths
        

        This is the query being sent by the client (192.168.0.100) repeatedly:

        C:\Users\user>nslookup example.com 10.3.0.5
        Server:  UnKnown
        Address:  10.3.0.5
        
        Non-authoritative answer:
        Name:    example.com
        Addresses:  2606:2800:220:1:248:1893:25c8:1946
                  93.184.216.34
        

        DNS server 192.168.0.3:

        ~ # tcpdump -i ens160 src 192.168.0.100 and dst port 53
        tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
        listening on ens160, link-type EN10MB (Ethernet), capture size 262144 bytes
        18:55:19.244666 IP 192.168.0.100.64495 > 10.3.0.5.domain: 4+ A? example.com. (29)
        18:55:19.258017 IP 192.168.0.100.64496 > 10.3.0.5.domain: 5+ AAAA? example.com. (29)
        18:55:20.736584 IP 192.168.0.100.64500 > 10.3.0.5.domain: 4+ A? example.com. (29)
        18:55:20.745781 IP 192.168.0.100.64501 > 10.3.0.5.domain: 5+ AAAA? example.com. (29)
        18:55:21.449836 IP 192.168.0.100.64505 > 10.3.0.5.domain: 4+ A? example.com. (29)
        18:55:21.461138 IP 192.168.0.100.64506 > 10.3.0.5.domain: 5+ AAAA? example.com. (29)
        18:55:22.088173 IP 192.168.0.100.64510 > 10.3.0.5.domain: 4+ A? example.com. (29)
        18:55:22.097876 IP 192.168.0.100.64511 > 10.3.0.5.domain: 5+ AAAA? example.com. (29)
        

        DNS server 192.168.0.4 - no requests received:

         ~ # tcpdump -i ens160 src 192.168.0.100 and dst port 53
        tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
        listening on ens160, link-type EN10MB (Ethernet), capture size 262144 bytes
        

        Based on my reading of the BGP route table, it makes sense that 192.168.0.3 is always chosen as it is designated as the best route by ">".

        My goal is to spread out (load balance) the requests over these two DNS servers. Which should be possible as far as I know if multipathing is supported in the kernel and in the FRR package?

        @jimp do you mind taking a look since this is relevant to your ticket https://redmine.pfsense.org/issues/9545 as well as your latest comment there

        M 1 Reply Last reply Jul 28, 2023, 9:12 PM Reply Quote 0
        • M
          michmoor LAYER 8 Rebel Alliance @baketopher
          last edited by Jul 28, 2023, 9:12 PM

          @baketopher there may be a load distribution algorithm happening. ECMP is clearly enabled but you are trying from the same client - 192.168.0.100
          I dont think its going to work in a round robin way.
          Do you have multiple clients to test from?

          Firewall: NetGate,Palo Alto-VM,Juniper SRX
          Routing: Juniper, Arista, Cisco
          Switching: Juniper, Arista, Cisco
          Wireless: Unifi, Aruba IAP
          JNCIP,CCNP Enterprise

          1 Reply Last reply Reply Quote 0
          • B
            baketopher
            last edited by baketopher Jul 31, 2023, 5:27 PM Jul 28, 2023, 10:19 PM

            @michmoor

            Sending DNS requests from two different endpoints at the same time, the requests always land on the same DNS server.

            Clients: 192.168.0.5 and 192.168.0.100

            On 192.168.0.3:

            ~ # tcpdump -i ens160 dst 10.3.0.5 and dst port 53
            tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
            listening on ens160, link-type EN10MB (Ethernet), capture size 262144 bytes
            22:12:11.984097 IP 192.168.0.5.55340 > 10.3.0.5.domain: 40613+ [1au] A? example.com. (52)
            22:12:13.097821 IP 192.168.0.5.47871 > 10.3.0.5.domain: 41677+ [1au] A? example.com. (52)
            22:12:13.771729 IP 192.168.0.5.59100 > 10.3.0.5.domain: 8229+ [1au] A? example.com. (52)
            22:12:16.427651 IP 192.168.0.100.49990 > 10.3.0.5.domain: 40889+ [1au] A? example.com. (52)
            22:12:17.565212 IP 192.168.0.100.49993 > 10.3.0.5.domain: 39667+ [1au] A? example.com. (52)
            22:12:18.375251 IP 192.168.0.100.49996 > 10.3.0.5.domain: 40255+ [1au] A? example.com. (52)
            22:12:19.044664 IP 192.168.0.100.49999 > 10.3.0.5.domain: 21932+ [1au] A? example.com. (52)
            22:12:23.442956 IP 192.168.0.5.58802 > 10.3.0.5.domain: 61311+ [1au] A? example.com. (52)
            22:12:23.963051 IP 192.168.0.5.40557 > 10.3.0.5.domain: 51993+ [1au] A? example.com. (52)
            22:12:24.453484 IP 192.168.0.5.50206 > 10.3.0.5.domain: 45389+ [1au] A? example.com. (52)
            22:12:26.483159 IP 192.168.0.100.50003 > 10.3.0.5.domain: 64178+ [1au] A? example.com. (52)
            22:12:27.015098 IP 192.168.0.100.50006 > 10.3.0.5.domain: 14870+ [1au] A? example.com. (52)
            22:12:27.556100 IP 192.168.0.100.50009 > 10.3.0.5.domain: 56294+ [1au] A? example.com. (52)
            22:12:27.992089 IP 192.168.0.100.50012 > 10.3.0.5.domain: 50358+ [1au] A? example.com. (52)
            

            On 192.168.0.4 - no requests:

            ~ # tcpdump -i ens160 dst 10.3.0.5 and dst port 53
            tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
            listening on ens160, link-type EN10MB (Ethernet), capture size 262144 bytes
            

            Wondering if it might be about volume of traffic I spammed DNS requests on both machines with a while loop while true; do dig @10.3.0.5 example.com +short; done and same results as above, the DNS requests all hit 192.168.0.3.

            M 1 Reply Last reply Jul 31, 2023, 7:09 PM Reply Quote 0
            • M
              michmoor LAYER 8 Rebel Alliance @baketopher
              last edited by Jul 31, 2023, 7:09 PM

              @baketopher hmm thats quite a bind. Yeah im not sure how to test this then. Route table showing one thing is different than actual path of the data. Unless someone has a better idea maybe open a redmine?

              Firewall: NetGate,Palo Alto-VM,Juniper SRX
              Routing: Juniper, Arista, Cisco
              Switching: Juniper, Arista, Cisco
              Wireless: Unifi, Aruba IAP
              JNCIP,CCNP Enterprise

              B 1 Reply Last reply Jul 31, 2023, 7:22 PM Reply Quote 0
              • B
                baketopher @michmoor
                last edited by Jul 31, 2023, 7:22 PM

                @michmoor

                From the routing table:

                   Network          Next Hop            Metric LocPrf Weight Path
                *= 10.3.0.5/32      192.168.0.4              0             0 65249 i
                *>                  192.168.0.3              0             0 65249 i
                

                This line in particular:

                *>                  192.168.0.3              0             0 65249 i
                

                Doesn't the ">" indicate that 192.168.0.3 is the preferred/best path? Ie, load balancing will never happen across these two hosts while one is chosen as the best?

                M 1 Reply Last reply Jul 31, 2023, 7:28 PM Reply Quote 0
                • M
                  michmoor LAYER 8 Rebel Alliance @baketopher
                  last edited by michmoor Jul 31, 2023, 7:28 PM Jul 31, 2023, 7:28 PM

                  @baketopher > does indicate best path but im wondering because the = sign is there plus both nexthops are showing up for the same network.....
                  As a test what happens when you create 2x static routes to the same network? So forget about eBGP for a second. Can pfSense load balance between two static routes (presumably with the same admin distance..).

                  edit: I dont know if pfSense has a concept of admin distance for static routes. For FRR of course but what about something thats non-dynamic?

                  Firewall: NetGate,Palo Alto-VM,Juniper SRX
                  Routing: Juniper, Arista, Cisco
                  Switching: Juniper, Arista, Cisco
                  Wireless: Unifi, Aruba IAP
                  JNCIP,CCNP Enterprise

                  1 Reply Last reply Reply Quote 0
                  • jimpJ
                    jimp Rebel Alliance Developer Netgate
                    last edited by Aug 1, 2023, 6:46 PM

                    Copying my text here from the Redmine issue:

                    From our local testing here on Plus (23.05.1, 23.09 snaps) and CE (2.7.0, 2.8.0 snaps), with both static and BGP it appears to be working, however, be aware that the OS computes outbound flow hashing for connections. What that means is, similar to lagg, you may only see connections/packets taking the alternate paths if they are different in some way, such as different protocols, src/dst IP address combinations, and TCP/UDP connection port pairs. For example, testing with ICMP only from one to the other with no variation may never see flows take another path. The hashing takes the 5-tuple connection property set "(proto, src, dst, srcport, dstport)" into account.

                    If the sysctl oid for net.route.multipath is 1 and both routes show in the table, that should be enough to know it's prepared to work. You can check the nexthop data with netstat -4onW and nexthop group data with netstat -4OW and both of those should show both gateways and that they belong to the same "group".

                    You might need to adjust your rules to ensure that traffic that egresses over one path can have its replies ingress over the other path, which may also complicate things. But the current way pf allows states on multiple interfaces that may be OK as-is.

                    Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                    Need help fast? Netgate Global Support!

                    Do not Chat/PM for help!

                    B 1 Reply Last reply Aug 2, 2023, 7:22 AM Reply Quote 0
                    • B
                      baketopher @jimp
                      last edited by baketopher Aug 2, 2023, 7:24 AM Aug 2, 2023, 7:22 AM

                      @jimp The only thing that would be different between the two clients is the src (192.168.0.5 and 192.168.0.100) and srcport (random ephemeral) not sure if that wold be enough for the hashing algo? Which rules are you referring to - something within FRR I assume?

                      Below is the output from the commands you referenced as well as some vtysh bgp commands. I'm not seeing anything that stands out as a red flag but I don't have much experience looking at these outputs.

                      [2.7.0-RELEASE][admin@pf.lab]/root: sysctl net.route.multipath
                      net.route.multipath: 1
                      
                      [2.7.0-RELEASE][admin@pf.lab]/root: netstat -4onW
                      Nexthop data
                      
                      Internet:
                      Idx   Type         IFA                Gateway             Flags      Use Mtu         Netif     Addrif Refcnt Prepend
                      1       v4/resolve 10.10.0.65         vmx2.1503/resolve                0   1500  vmx2.1503               2 
                      2       v4/resolve 127.0.0.1          lo0/resolve        H           311  16384        lo0               2 
                      3       v4/resolve 127.0.0.1          lo0/resolve        HS            0  16384        lo0 vmx2.1501     2 
                      4       v4/resolve 10.10.0.97         vmx2.1504/resolve                0   1500  vmx2.1504               2 
                      5       v4/resolve 127.0.0.1          lo0/resolve        HS            0  16384        lo0 vmx2.1502     2 
                      6       v4/resolve 10.10.0.225        vmx2.1508/resolve                0   1500  vmx2.1508               2 
                      7       v4/resolve 127.0.0.1          lo0/resolve        HS            0  16384        lo0 vmx2.1503     2 
                      8       v4/resolve 10.10.1.1          vmx2.1509/resolve                0   1500  vmx2.1509               2 
                      9       v4/resolve 127.0.0.1          lo0/resolve        HS            0  16384        lo0 vmx2.1504     2 
                      10      v4/resolve 10.10.1.33         vmx2.1510/resolve                0   1500  vmx2.1510               2 
                      11      v4/resolve 127.0.0.1          lo0/resolve        HS            0  16384        lo0 vmx2.1505     2 
                      12      v4/resolve 192.168.0.1        vmx1/resolve               6046438   1500       vmx1               4 
                      13      v4/resolve 127.0.0.1          lo0/resolve        HS            0  16384        lo0 vmx2.1506     2 
                      14      v4/resolve 127.0.0.1          lo0/resolve        HS            0  16384        lo0      vmx1     2 
                      15      v4/resolve 127.0.0.1          lo0/resolve        HS            0  16384        lo0 vmx2.1507     2 
                      16           v4/gw 10.52.0.12         10.52.0.1          GS        28437   1500       vmx0               3 
                      17      v4/resolve 127.0.0.1          lo0/resolve        HS            0  16384        lo0 vmx2.1508     2 
                      18           v4/gw 192.168.0.1        192.168.0.3        GH1           0   1500       vmx1               1 
                      19      v4/resolve 127.0.0.1          lo0/resolve        HS            0  16384        lo0 vmx2.1509     2 
                      20           v4/gw 192.168.0.1        192.168.0.4        GH1           0   1500       vmx1               1 
                      21      v4/resolve 127.0.0.1          lo0/resolve        HS            0  16384        lo0 vmx2.1510     2 
                      22      v4/resolve 10.52.0.12         vmx0/resolve                     0   1500       vmx0               2 
                      23      v4/resolve 127.0.0.1          lo0/resolve        HS            0  16384        lo0      vmx0     2 
                      24      v4/resolve 10.10.0.1          vmx2.1501/resolve          3544101   1500  vmx2.1501               2 
                      25      v4/resolve 10.10.0.33         vmx2.1502/resolve                0   1500  vmx2.1502               2 
                      26      v4/resolve 10.10.0.129        vmx2.1505/resolve                0   1500  vmx2.1505               2 
                      27      v4/resolve 10.10.0.161        vmx2.1506/resolve                0   1500  vmx2.1506               2 
                      28      v4/resolve 10.10.0.193        vmx2.1507/resolve                0   1500  vmx2.1507               2 
                      
                      [2.7.0-RELEASE][admin@pf.lab]/root: netstat -4OW
                      Nexthop groups data
                      
                      Internet:
                      GrpIdx  NhIdx     Weight   Slots           Gateway     Netif  Refcnt
                      29        ------- ------- ------- ----------------- ---------       2
                                    18       1       1       192.168.0.3      vmx1
                                    20       1       1       192.168.0.4      vmx1
                      
                      [2.7.0-RELEASE][admin@pf.lab]/root: vtysh
                      
                      Hello, this is FRRouting (version 7.5.1).
                      Copyright 1996-2005 Kunihiro Ishiguro, et al.
                      
                      pf.lab# show ip route
                      Codes: K - kernel route, C - connected, S - static, R - RIP,
                             O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
                             T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
                             F - PBR, f - OpenFabric,
                             > - selected route, * - FIB route, q - queued, r - rejected, b - backup
                      
                      K>* 0.0.0.0/0 [0/0] via 10.52.0.1, 5d05h55m
                      B>* 10.3.0.5/32 [20/0] via 192.168.0.3, vmx1, weight 1, 5d05h52m
                        *                    via 192.168.0.4, vmx1, weight 1, 5d05h52m
                      C>* 10.10.0.0/27 [0/1] is directly connected, vmx2.1501, 5d05h55m
                      C>* 10.10.0.32/27 [0/1] is directly connected, vmx2.1502, 5d05h55m
                      C>* 10.10.0.64/27 [0/1] is directly connected, vmx2.1503, 5d05h55m
                      C>* 10.10.0.96/27 [0/1] is directly connected, vmx2.1504, 5d05h55m
                      C>* 10.10.0.128/27 [0/1] is directly connected, vmx2.1505, 5d05h55m
                      C>* 10.10.0.160/27 [0/1] is directly connected, vmx2.1506, 5d05h55m
                      C>* 10.10.0.192/27 [0/1] is directly connected, vmx2.1507, 5d05h55m
                      C>* 10.10.0.224/27 [0/1] is directly connected, vmx2.1508, 5d05h55m
                      C>* 10.10.1.0/27 [0/1] is directly connected, vmx2.1509, 5d05h55m
                      C>* 10.10.1.32/27 [0/1] is directly connected, vmx2.1510, 5d05h55m
                      C>* 10.52.0.0/24 [0/1] is directly connected, vmx0, 5d05h55m
                      C>* 192.168.0.0/24 [0/1] is directly connected, vmx1, 5d05h55m
                      
                      pf.lab# show bgp detail
                      BGP table version is 2, local router ID is 192.168.0.1, vrf id 0
                      Default local pref 100, local AS 65248
                      Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
                                     i internal, r RIB-failure, S Stale, R Removed
                      Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
                      Origin codes:  i - IGP, e - EGP, ? - incomplete
                      
                         Network          Next Hop            Metric LocPrf Weight Path
                      *= 10.3.0.5/32      192.168.0.4              0             0 65249 i
                      *>                  192.168.0.3              0             0 65249 i
                      
                      Displayed  1 routes and 2 total paths
                      
                      pf.lab# show bgp summary
                      
                      IPv4 Unicast Summary:
                      BGP router identifier 192.168.0.1, local AS number 65248 vrf-id 0
                      BGP table version 2
                      RIB entries 1, using 192 bytes of memory
                      Peers 2, using 29 KiB of memory
                      Peer groups 1, using 64 bytes of memory
                      
                      Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt
                      192.168.0.3     4      65249     90836     90840        0    0    0 5d06h06m            1        1
                      192.168.0.4     4      65249     90836     90838        0    0    0 5d06h06m            1        1
                      
                      Total number of neighbors 2
                      
                      1 Reply Last reply Reply Quote 0
                      • jimpJ
                        jimp Rebel Alliance Developer Netgate
                        last edited by Aug 2, 2023, 6:01 PM

                        It checks all of proto+srcip+dstip+srcport+dstport so any difference in those could make a flow take a different path.

                        Since the weights are identical it should be balancing flows 50%/50% between the gateways.

                        What I did was watch each interface with a packet capture and tried a variety of connection types to/from different addresses and it was balancing things about how I expected.

                        Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                        Need help fast? Netgate Global Support!

                        Do not Chat/PM for help!

                        M 1 Reply Last reply Sep 30, 2023, 2:53 AM Reply Quote 0
                        • M
                          michmoor LAYER 8 Rebel Alliance @jimp
                          last edited by michmoor Sep 30, 2023, 2:54 AM Sep 30, 2023, 2:53 AM

                          Wanted to come back to this topic and say that multipath works very well.

                          I got two IPsec VPN tunnels running eBGP with ecmp set up.
                          My iperf test is below
                          My WAN is 500/500Mbps.
                          OCITunnel1 and2 are IPsec.
                          As you can see an iperf with 100 simultaneous connections out the LAN is able to be split up quite nicely across both Tunnels pretty evenly.

                          9be4c18e-46d2-4326-9e2a-5910f2155c6f-image.png

                          ded9569d-2541-47c4-b1b9-e074a9187f55-image.png

                          Firewall: NetGate,Palo Alto-VM,Juniper SRX
                          Routing: Juniper, Arista, Cisco
                          Switching: Juniper, Arista, Cisco
                          Wireless: Unifi, Aruba IAP
                          JNCIP,CCNP Enterprise

                          1 Reply Last reply Reply Quote 1
                          • First post
                            Last post
                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.
                            This community forum collects and processes your personal information.
                            consent.not_received