Big downloads are killing throughput ?
Hi. I have Partaker 4K 3865U (8GB RAM/128GB SSD, 6x Intel PRO 1000) with pfSense 2.4.4-RELEASE-p3 onboard.
Port 1 is connected to WAN uplink which attached to CPE, ports 5&6 are attached to routers each has its public IP. CPE's public IP acts as GW for both routers.
pfSense is configured in transparent mode = bridged Port1(WAN)+Port5+Port6. That allows traffic to traverse pfSense directly between routers and CPE (GW). While having also firewall rules in pfSense which dictates what traffic is allowed to traverse pfSense and what traffic to drop (outgoing allowed all, incoming based on rules exceptions). No limiters or shapers were configured/enabled at all. Proxy is also not enabled.
Now the actual problem. PC attached to one of the routers. While surfing regular websites on PC all is working fine. Pings loss towards Google DNS 0%-1% which is fine. When doing speedtest - it almost reaches bandwidth capacity 450Mbps. However if straight after the first attempt to do speedtest again it will be 30-50Mbps. In half a minute or so speed is being recovered back to ~400Mbps.
When downloading big ISO (of 4GB) - speed goes down from 20Mbps to 50-100Kbps in about 15 seconds. And stays with that speed... When running pings during that download getting 20-60% of pings loss and that correlates with the time when download started. Even after stopping download pings loss persists for the next 10-20 minutes. Websites either loading veeery slowly or timing out. WCPU on pfSense box is 97-100% idle. So it's not CPU related I believe.
I checked what is the max MTU can go over that link, it's 1472 which is pretty standard. I however configured interfaces on pfSense to 1472 (even tried smaller) but that doesn't help. Interfaces on all routers&CPE are set to autonegotiate speed/duplex, the same is set on pfSense.
When bypassing pfSense and connecting one of the routers directly to CPE speed is decent and stable which points towards pfSense as a point of failure.
Any ideas? Seems like a caching/queueing problem on pfSense? How can I investigate that further? Thanks!
Gertjan last edited by
Buffer bloat test also shows different results. One time it showed 5/30Mbps with overall rating A and bufferbloat rating A, second time it showed 430/100Mbps with A+ as overall and "-" for bufferbloat, but 105ms pings.....
What looks weird for me as well as that when I am trying to download something behind routerA, while capturing packets with tcpdump on routerB - routerB is also receiving random packets destined from internet for routerA and trying to reroute them to public IP of routerA. Is that expected behavior when ports are bridged in pfSense? Or is it a leakage in virtual switch in pfSense? Seems weird. Or maybe that is irrelevant for that throughput issue, just probing different things here...
Is there any tools/commands to troubleshoot Layer-2 on pfSense? In particular would like to see what is going on between pfSense and OpenWRT-based router, if OpenWRT is capable receiving packets or frames at the rate which pfSense is sending them or it keeps "saying" hold-on hold-on etc to pfSense. According to sysctl output it seems there are "pauses" in transmission but I can be wrong, hopefully someone can advise if those stats are ok?
dev.em.4.mac_stats.xoff_txd: 0 dev.em.4.mac_stats.xoff_recvd: 29845 dev.em.4.mac_stats.xon_txd: 0 dev.em.4.mac_stats.xon_recvd: 25281 dev.em.4.mac_stats.good_pkts_recvd: 29853536 dev.em.4.mac_stats.total_pkts_recvd: 29908662 dev.em.4.queue_tx_0.no_desc_avail: 214788
UPD: the same issue as described at the beginning of my post is happening when connecting switch to pfSense and RouterA and RouterB to that switch thus hanging two routers on one pfSense port. Seems to be not an issue with virtual switch on pfSense as in this scenario using only one port.
Once separated Port5 and Port6 on pfSense to different private subnets and attaching RouterA and RouterB independently to pfSense box (+NAT with public VIPs) issue is gone. It appeared when both routers are connected to the same bridge or external switch they can't work reliably together. But I would still appreciate if someone can point me to the right direction how to investigate that further and perhaps with some Layer-2 debugging.