Netgate 7100 NAT/routing poor performance issue
-
Do you see a lot of retrys/retransmissions in the iperf test?
Do you see dropped packets or errors on the interfaces?
Try testing to the 7100 from each end but using the opposite IP address so it's still routing/NATinbg the traffic but only actually passing one NIC.
Which NICs are you using in the 7100?
Steve
-
Here are some tests.
From internal host to router LAN:root@internal.vm# iperf3 -c 172.22.2.1 Connecting to host 172.22.2.1, port 5201 [ 4] local 172.22.2.2 port 48876 connected to 94.240.XX.1 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 87.5 MBytes 734 Mbits/sec 0 509 KBytes [ 4] 1.00-2.00 sec 94.0 MBytes 789 Mbits/sec 0 509 KBytes [ 4] 2.00-3.00 sec 89.7 MBytes 752 Mbits/sec 0 509 KBytes [ 4] 3.00-4.00 sec 99.3 MBytes 833 Mbits/sec 0 509 KBytes [ 4] 4.00-5.00 sec 104 MBytes 870 Mbits/sec 4 469 KBytes [ 4] 5.00-6.00 sec 101 MBytes 844 Mbits/sec 0 488 KBytes [ 4] 6.00-7.00 sec 95.6 MBytes 802 Mbits/sec 0 488 KBytes [ 4] 7.00-8.00 sec 94.4 MBytes 791 Mbits/sec 0 488 KBytes [ 4] 8.00-9.00 sec 95.9 MBytes 804 Mbits/sec 0 488 KBytes [ 4] 9.00-10.00 sec 97.3 MBytes 816 Mbits/sec 0 488 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 958 MBytes 804 Mbits/sec 4 sender [ 4] 0.00-10.00 sec 956 MBytes 802 Mbits/sec receiver
The more pararell connections, the more Retries (not sure if that's normal):
root@internal.vm# iperf3 -c 172.22.2.1 -P10 [...] [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 118 MBytes 99.3 Mbits/sec 128 sender [ 4] 0.00-10.00 sec 118 MBytes 99.3 Mbits/sec receiver [ 6] 0.00-10.00 sec 109 MBytes 91.0 Mbits/sec 133 sender [ 6] 0.00-10.00 sec 109 MBytes 91.0 Mbits/sec receiver [ 8] 0.00-10.00 sec 117 MBytes 97.7 Mbits/sec 132 sender [ 8] 0.00-10.00 sec 117 MBytes 97.7 Mbits/sec receiver [ 10] 0.00-10.00 sec 114 MBytes 95.9 Mbits/sec 124 sender [ 10] 0.00-10.00 sec 114 MBytes 95.9 Mbits/sec receiver [ 12] 0.00-10.00 sec 114 MBytes 95.6 Mbits/sec 123 sender [ 12] 0.00-10.00 sec 114 MBytes 95.6 Mbits/sec receiver [ 14] 0.00-10.00 sec 118 MBytes 98.8 Mbits/sec 129 sender [ 14] 0.00-10.00 sec 118 MBytes 98.8 Mbits/sec receiver [ 16] 0.00-10.00 sec 113 MBytes 94.5 Mbits/sec 120 sender [ 16] 0.00-10.00 sec 113 MBytes 94.5 Mbits/sec receiver [ 18] 0.00-10.00 sec 108 MBytes 90.7 Mbits/sec 124 sender [ 18] 0.00-10.00 sec 108 MBytes 90.7 Mbits/sec receiver [ 20] 0.00-10.00 sec 115 MBytes 96.9 Mbits/sec 124 sender [ 20] 0.00-10.00 sec 115 MBytes 96.9 Mbits/sec receiver [ 22] 0.00-10.00 sec 101 MBytes 84.6 Mbits/sec 128 sender [ 22] 0.00-10.00 sec 101 MBytes 84.6 Mbits/sec receiver [SUM] 0.00-10.00 sec 1.10 GBytes 945 Mbits/sec 1265 sender [SUM] 0.00-10.00 sec 1.10 GBytes 945 Mbits/sec receiver
Now from pfSense to external host:
[23.09.1-RELEASE][root@pfsense]/root: iperf3 -c 94.240.XX.19 Connecting to host 94.240.XX.19, port 5201 [ 5] local 94.240.XX.1 port 40745 connected to 94.240.XX.19 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 112 MBytes 941 Mbits/sec 77 100 KBytes [ 5] 1.00-2.00 sec 112 MBytes 939 Mbits/sec 19 120 KBytes [ 5] 2.00-3.00 sec 112 MBytes 938 Mbits/sec 26 119 KBytes [ 5] 3.00-4.00 sec 112 MBytes 939 Mbits/sec 27 104 KBytes [ 5] 4.00-5.00 sec 112 MBytes 938 Mbits/sec 26 127 KBytes [ 5] 5.00-6.00 sec 112 MBytes 939 Mbits/sec 23 112 KBytes [ 5] 6.00-7.00 sec 112 MBytes 938 Mbits/sec 24 82.1 KBytes [ 5] 7.00-8.00 sec 112 MBytes 939 Mbits/sec 32 81.1 KBytes [ 5] 8.00-9.00 sec 112 MBytes 939 Mbits/sec 37 112 KBytes [ 5] 9.00-10.00 sec 112 MBytes 939 Mbits/sec 27 146 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 1.09 GBytes 939 Mbits/sec 318 sender [ 5] 0.00-10.00 sec 1.09 GBytes 939 Mbits/sec receiver
and additional -P10 test:
[23.09.1-RELEASE][root@pfsense]/root: iperf3 -c 94.240.XX.19 -P10 [...] [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 116 MBytes 97.3 Mbits/sec 1527 sender [ 5] 0.00-10.00 sec 114 MBytes 95.6 Mbits/sec receiver [ 7] 0.00-10.00 sec 132 MBytes 110 Mbits/sec 1491 sender [ 7] 0.00-10.00 sec 130 MBytes 109 Mbits/sec receiver [ 9] 0.00-10.00 sec 138 MBytes 116 Mbits/sec 1167 sender [ 9] 0.00-10.00 sec 136 MBytes 114 Mbits/sec receiver [ 11] 0.00-10.00 sec 118 MBytes 98.6 Mbits/sec 1646 sender [ 11] 0.00-10.00 sec 116 MBytes 96.9 Mbits/sec receiver [ 13] 0.00-10.00 sec 105 MBytes 87.7 Mbits/sec 1088 sender [ 13] 0.00-10.00 sec 103 MBytes 86.0 Mbits/sec receiver [ 15] 0.00-10.00 sec 114 MBytes 95.2 Mbits/sec 1109 sender [ 15] 0.00-10.00 sec 112 MBytes 93.6 Mbits/sec receiver [ 17] 0.00-10.00 sec 89.0 MBytes 74.7 Mbits/sec 1080 sender [ 17] 0.00-10.00 sec 88.3 MBytes 74.0 Mbits/sec receiver [ 19] 0.00-10.00 sec 91.8 MBytes 77.0 Mbits/sec 1302 sender [ 19] 0.00-10.00 sec 89.9 MBytes 75.4 Mbits/sec receiver [ 21] 0.00-10.00 sec 105 MBytes 87.7 Mbits/sec 1028 sender [ 21] 0.00-10.00 sec 103 MBytes 86.0 Mbits/sec receiver [ 23] 0.00-10.00 sec 121 MBytes 101 Mbits/sec 986 sender [ 23] 0.00-10.00 sec 119 MBytes 99.6 Mbits/sec receiver [SUM] 0.00-10.00 sec 1.10 GBytes 946 Mbits/sec 12424 sender [SUM] 0.00-10.00 sec 1.08 GBytes 930 Mbits/sec receiver
There are more Retries but bandwidth stays in 940 area.
Now, iperf from interal vm to pfSense WAN IP
root@internal.vm# iperf3 -c 94.240.XX.1 Connecting to host 94.240.XX.1, port 5201 [ 4] local 172.22.2.2 port 47524 connected to 94.240.XX.1 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 96.4 MBytes 808 Mbits/sec 401 1.05 MBytes [ 4] 1.00-2.00 sec 106 MBytes 891 Mbits/sec 0 1.12 MBytes [ 4] 2.00-3.00 sec 97.5 MBytes 818 Mbits/sec 6 426 KBytes [ 4] 3.00-4.00 sec 104 MBytes 870 Mbits/sec 0 566 KBytes [ 4] 4.00-5.00 sec 102 MBytes 860 Mbits/sec 0 672 KBytes [ 4] 5.00-6.00 sec 102 MBytes 860 Mbits/sec 0 768 KBytes [ 4] 6.00-7.00 sec 104 MBytes 870 Mbits/sec 19 501 KBytes [ 4] 7.00-8.00 sec 102 MBytes 860 Mbits/sec 0 619 KBytes [ 4] 8.00-9.00 sec 100 MBytes 839 Mbits/sec 0 717 KBytes [ 4] 9.00-10.00 sec 105 MBytes 881 Mbits/sec 0 798 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 1020 MBytes 856 Mbits/sec 426 sender [ 4] 0.00-10.00 sec 1018 MBytes 854 Mbits/sec receiver
root@internal.vm# iperf3 -c 94.240.XX.1 -P10 [...] [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 110 MBytes 92.5 Mbits/sec 128 sender [ 4] 0.00-10.00 sec 109 MBytes 91.9 Mbits/sec receiver [ 6] 0.00-10.00 sec 112 MBytes 94.2 Mbits/sec 135 sender [ 6] 0.00-10.00 sec 112 MBytes 93.7 Mbits/sec receiver [ 8] 0.00-10.00 sec 108 MBytes 90.9 Mbits/sec 126 sender [ 8] 0.00-10.00 sec 108 MBytes 90.3 Mbits/sec receiver [ 10] 0.00-10.00 sec 110 MBytes 91.9 Mbits/sec 130 sender [ 10] 0.00-10.00 sec 109 MBytes 91.4 Mbits/sec receiver [ 12] 0.00-10.00 sec 113 MBytes 94.8 Mbits/sec 138 sender [ 12] 0.00-10.00 sec 112 MBytes 94.1 Mbits/sec receiver [ 14] 0.00-10.00 sec 102 MBytes 85.6 Mbits/sec 143 sender [ 14] 0.00-10.00 sec 101 MBytes 85.1 Mbits/sec receiver [ 16] 0.00-10.00 sec 111 MBytes 93.4 Mbits/sec 131 sender [ 16] 0.00-10.00 sec 110 MBytes 92.7 Mbits/sec receiver [ 18] 0.00-10.00 sec 110 MBytes 91.9 Mbits/sec 135 sender [ 18] 0.00-10.00 sec 109 MBytes 91.2 Mbits/sec receiver [ 20] 0.00-10.00 sec 123 MBytes 103 Mbits/sec 128 sender [ 20] 0.00-10.00 sec 122 MBytes 103 Mbits/sec receiver [ 22] 0.00-10.00 sec 126 MBytes 105 Mbits/sec 131 sender [ 22] 0.00-10.00 sec 125 MBytes 105 Mbits/sec receiver [SUM] 0.00-10.00 sec 1.10 GBytes 944 Mbits/sec 1325 sender [SUM] 0.00-10.00 sec 1.09 GBytes 938 Mbits/sec receiver
Additional info:
WAN is a VLAN 4090 on lagg0 -> Eth1 of Marvel Switch
LAN is a VLAN 2 on lagg0 -> Eth2 of Marvel SwitchInterfaces: (are dropped packets errors here?)
Network Settings:
-
You can also check the output of
netstat -i
.The only odd thing there is in the first result output. The iperf client output still shows it's connected to the pfSense WAN IP even though it's called with the LAN IP.
Also it shows the full public IP there so you might want to edit that!Hard to see why it would show that...but connecting to it specifically seems to give a better result.
-
@stephenw10 Akismet says my edit is spam :/ and doesn't allow me to edit. Are you able to edit it?
In fact the response from LAN interface is correct:
# iperf3 -c 172.22.2.1 Connecting to host 172.22.2.1, port 5201 [ 4] local 172.22.2.2 port 51774 connected to 172.22.2.1 port 5201
I must have made some pasting mistake. So no issue here anyway.
netstat -i also shows almost no drops or errors:
-
And here are those problematic iperf tests - from local vm to external host over 7100:
# iperf3 -c 94.240.XX.19 Connecting to host 94.240.XX.19, port 5201 [ 4] local 172.22.2.2 port 59614 connected to 94.240.XX.19 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 27.8 MBytes 233 Mbits/sec 125 41.0 KBytes [ 4] 1.00-2.00 sec 32.6 MBytes 273 Mbits/sec 289 79.2 KBytes [ 4] 2.00-3.00 sec 28.7 MBytes 241 Mbits/sec 269 18.4 KBytes [ 4] 3.00-4.00 sec 32.6 MBytes 273 Mbits/sec 249 29.7 KBytes [ 4] 4.00-5.00 sec 31.6 MBytes 265 Mbits/sec 234 43.8 KBytes [ 4] 5.00-6.00 sec 32.8 MBytes 275 Mbits/sec 224 50.9 KBytes [ 4] 6.00-7.00 sec 30.6 MBytes 257 Mbits/sec 280 41.0 KBytes [ 4] 7.00-8.00 sec 33.2 MBytes 278 Mbits/sec 213 31.1 KBytes [ 4] 8.00-9.00 sec 33.2 MBytes 278 Mbits/sec 230 69.3 KBytes [ 4] 9.00-10.00 sec 33.4 MBytes 280 Mbits/sec 254 42.4 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 316 MBytes 265 Mbits/sec 2367 sender [ 4] 0.00-10.00 sec 316 MBytes 265 Mbits/sec receiver
And with -P10
[root@jumbo ~]# iperf3 -c 94.240.XX.19 -P10 Connecting to host 94.240.XX.19, port 5201 [...] [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 37.7 MBytes 31.6 Mbits/sec 993 sender [ 4] 0.00-10.00 sec 37.5 MBytes 31.4 Mbits/sec receiver [ 6] 0.00-10.00 sec 26.3 MBytes 22.1 Mbits/sec 746 sender [ 6] 0.00-10.00 sec 26.1 MBytes 21.9 Mbits/sec receiver [ 8] 0.00-10.00 sec 28.5 MBytes 23.9 Mbits/sec 873 sender [ 8] 0.00-10.00 sec 28.2 MBytes 23.7 Mbits/sec receiver [ 10] 0.00-10.00 sec 45.3 MBytes 38.0 Mbits/sec 1097 sender [ 10] 0.00-10.00 sec 45.0 MBytes 37.8 Mbits/sec receiver [ 12] 0.00-10.00 sec 21.0 MBytes 17.6 Mbits/sec 714 sender [ 12] 0.00-10.00 sec 20.8 MBytes 17.4 Mbits/sec receiver [ 14] 0.00-10.00 sec 26.2 MBytes 22.0 Mbits/sec 820 sender [ 14] 0.00-10.00 sec 25.9 MBytes 21.7 Mbits/sec receiver [ 16] 0.00-10.00 sec 24.4 MBytes 20.4 Mbits/sec 653 sender [ 16] 0.00-10.00 sec 24.2 MBytes 20.3 Mbits/sec receiver [ 18] 0.00-10.00 sec 51.3 MBytes 43.0 Mbits/sec 1227 sender [ 18] 0.00-10.00 sec 50.9 MBytes 42.7 Mbits/sec receiver [ 20] 0.00-10.00 sec 27.4 MBytes 23.0 Mbits/sec 819 sender [ 20] 0.00-10.00 sec 27.2 MBytes 22.8 Mbits/sec receiver [ 22] 0.00-10.00 sec 26.6 MBytes 22.3 Mbits/sec 796 sender [ 22] 0.00-10.00 sec 26.3 MBytes 22.0 Mbits/sec receiver [SUM] 0.00-10.00 sec 315 MBytes 264 Mbits/sec 8738 sender [SUM] 0.00-10.00 sec 312 MBytes 262 Mbits/sec receiver
Retries aren't that bad, but the throughput is very poor :(
-
Hmm, that number of retries is not great though.
Do you see packet loss to the IP if you run a ping whilst testing?
-
No, 0 packet loss during ping.
But I might have found what causes this problem... or at least: where the problem is located. IOur top switch is connected directly to switch of their Mikrotik router. And as soon our ISP puts our downlink port to Off (cutting us of from the Internet as a result), instantly iperf from our local vm to our external machine goes up to 940 Mbps.
They say their configuration is good, but doesn't seem to me. I also see "Redirect host (New nexthop:)" messages (from IPS gateway) when I ping our external host from internal vm. Don't know why pings go as far as to ISP the gateway and not stay within our switches.Update: on second thoughts I believe it's rather fault of our configuration. Maybe I incorrectly configured something on 7100? In the end the problem is limited to our internal hosts that are NATed on pfSense. Any hints what should I look into? What could make packets outgoing from my local network hit IPS gateway/router before arriving at our external machine?
-
Oh, you could have a routing issue here if traffic either target is forced via it's gateway. pfSense has a rule to prevent that though. Traffic from the WAN to some other address in the WAN subnet bypasses the route-to rules that would otherwise force it. Your other hosts or routers may not.
Do you see ICMP redirects anywhere? That would be a sure sign something is wrong. -
@stephenw10 Yes, I see ICMP redirects on the connections that are causing problems here - when pinging from our internal host(s) - over XG-7100 - to our external host:
[root@172.22.2.2 ~]# ping 94.240.XX.19 PING our.external.host (94.240.XX.19) 56(84) bytes of data. From our.isp.gateway.ip (94.240.XX.254): icmp_seq=1 Redirect Host(New nexthop: our.external.host (94.240.XX.19)) 64 bytes from our.external.host (94.240.XX.19): icmp_seq=1 ttl=63 time=6.25 ms From our.isp.gateway.ip (94.240.XX.254): icmp_seq=2 Redirect Host(New nexthop: our.external.host (94.240.XX.19)) 64 bytes from our.external.host (94.240.XX.19): icmp_seq=2 ttl=63 time=4.64 ms From our.isp.gateway.ip (94.240.XX.254): icmp_seq=3 Redirect Host(New nexthop: our.external.host (94.240.XX.19)) 64 bytes from our.external.host (94.240.XX.19): icmp_seq=3 ttl=63 time=2.54 ms From our.isp.gateway.ip (94.240.XX.254): icmp_seq=4 Redirect Host(New nexthop: our.external.host (94.240.XX.19)) 64 bytes from our.external.host (94.240.XX.19): icmp_seq=4 ttl=63 time=2.50 ms ^C --- our.external.host ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 3004ms rtt min/avg/max/mdev = 2.503/3.987/6.257/1.573 ms
IPS has confirmed, that all traffic initiated from our internal hosts (172.22.2.0/24) and targeted at our external host (94.240.XX.19) goes via their Mikrotik switch interface (traffic is seen there). It should not do that, as 94.240.XX.19 host is connected to a switch before ISP router (please see network map scheme in the first post).
I don't know what can cause that. We have rather classic Manual outbound NAT (including NATing 172.22.2.0/24 over WAN). We also have several Virtual IPs mapped to WAN and NAT 1:1 for them. But it doesn't matter if I ping from an internal host that is mapped to WAN IP address or has its own mapped 1:1 public address - it's the same. Ping redirects + iperf 290Mbps. What should I look into?
-
Is the WAN subnet size set correctly on the 7100?
Otherwise check the rulset in /tmp/rules.debug. You should see a rule like:
pass out route-to ( lagg0.4090 94.240.XX.254 ) from 94.240.XX.1 to !94.240.XX.0/24 ridentifier 1000028911 keep state allow-opts label "let out anything from firewall host itself"
In other words it only applies route-to to traffic that isn't inside the WAN subnet. But that shouldn't apply to this traffic you're seeing.
-
WAN seems to be set OK (94.240.XX.1 /24, GW_WAN 94.240.XX.254)
All Virtual IPs are also /24 (ie. 94.240.XX.2 / 24).I've looked into /tmp/rules.debug and yes, I have such rule:
[23.09.1-RELEASE][root@pfsense]/root: cat /tmp/rules.debug | grep "lagg0.4090 94.240.XX.254" | grep -v IPsec GWGW_WAN = " route-to ( lagg0.4090 94.240.XX.254 ) " GWfailover = " route-to { ( lagg0.4090 94.240.XX.254 ) } " pass out log route-to ( lagg0.4090 94.240.XX.254 ) from 94.240.XX.1 to !94.240.XX.0/24 ridentifier 1000012111 keep state allow-opts label "let out anything from firewall host itself" [... many other rules ...]
Looks ok, doesn't it?
Btw, I'm not sure if this is relevant here, but I also have this rules:
pass in quick on $LAN inet from $LAN__NETWORK to <negate_networks> ridentifier 10000001 keep state label "NEGATE_ROUTE: Negate policy routing for destination" label "id:1422071308" label "gw:failover" pass in quick on $LAN $GWfailover inet from $LAN__NETWORK to any ridentifier 1422071308 keep state label "USER_RULE: Default LAN -> any" label "id:1422071308" label "gw:failover"
as a result of such last rule in fw rules for LAN interface:
where failover is a gateway group of WAN and WAN2:
-
OMG! Was it that GW failover rule that was messing here?
As soon as I've added yet another rule ABOVE that last failover rule, set this way (no specific gateway set here):
ping to our external host are not redirected anymore and iperf seems to be OK!
Was I doing the failover the wrong way here?? Or failover is OK, but because of our specific setup (WAN network with real hosts in it) that additional rule is required here? What is the the recommended setup here?
-
Aha! Yes. The policy routing via the failover gateway applies route-to to the states regardless of what the outbound rules are doing.
So, yes, you need a more specific rule above that to bypass the policy routing for traffic to the WAN subnet. Which is sounds like is exactly what you added.
-
THANK YOU VERY MUCH for helping me analyze this weird issue and what finally lead me to solution! Your support and input was amazing! Thank you!