4100 ix Flow Control Help
-
I have moved a configuration from an SG-4680 that passed away to a 4100 Max. After about 36 hours my internet download speeds reduce from ~700Mbps to around ~20Mbps. After a Normal Reboot in the GUI Diagnostics -> Reboot the throughput on the internet restores back to normal. This didn't happen with the SG-4680's interface port labeled WAN (an igb interface).
From other topics there was a mention of a flow control potentially being a problem on the ix2 and/or ix3 interfaces on the 4100 (see reference wayy below) that cuts down on interface throughput.
My current ix3 (WAN) interface reports back with the following from the 'ifconfig -vv' command prompt:
ix3: flags=8963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 description: WAN options=4e138bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP> media: Ethernet autoselect (1000baseT <full-duplex,rxpause,txpause>) status: active nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
From which the reference post and quote below mention that the "rxpause,txpause" portion could be an issue.
When I go to the PfSense documentation I find this page for setting flow control on the interfaces:
https://docs.netgate.com/pfsense/en/latest/hardware/tune.html#flow-control. When I use the GUI Diagnostics -> Edit File I receive a "File Does not Exist Error" for the file "/boot/loader.conf.local" listed in the documentation.TL;DR
-
Is it safe to just create a new file as specified in the PfSense documentation and PfSense 23.01 will read that file on bootup to disable the flow control on the ix2 and ix3 interfaces?
-
Additionally, does the command
hw.ix.flow_control="0"
disable flow control for both ix2 and ix3 interfaces or do I need to specify an interface number (e.g., ix3 specifically)?
Other Troubleshooting for Reference
Over the past few weeks I have:- disabled gateway monitoring action
- disabled gateway monitoring (since then re-enabled)
- removed pfBlocker (since then re-added)
- removed Traffic Shaping
- updated firmware via System -> Netgate Firmware Upgrade
- added the patches from System -> Patches
- increased the Firewall Maximum States and Firewall Maximum Table Entries in System -> Advanced -> Firewall & NAT
All of the above have not solved the 36 hour throughput dropout issue on the ix3 interface of the 4100.
Ref flow control topic:
https://forum.netgate.com/topic/175249/throughput-problems-on-4100/24 -
-
@selfjc For q1, you have to create the .local file. Loader.conf gets overwritten by pfSense.
-
@steveits
Roger that.I'll 'touch' the file and add that command into the loader.conf.local. Then I'll report back the results on the ix2 and ix3 interfaces if both have the flow control disabled.
For now I have repurposed one of the Intel i225 igc interfaces to the WAN.
-
I added the following to '/boot/loader.conf.local':
hw.ix.flow_control="0"
And the 'rxpause,txpause' reappears on ix3.
I then changed '/boot/loader.conf.local' to:
hw.ix.flow_control="0" hw.ix2.flow_control="0" hw.ix3.flow_control="0"
And the 'rxpause,txpause' reappears on ix3.
I then changed '/boot/loader.conf.local' to:
hw.ix.flow_control="0" hw.ix.2.flow_control="0" hw.ix.3.flow_control="0"
And 'rxpause,txpause' reappears on ix3.
Some background, my WAN connection connects to a Netgear CM1200 cable modem.
I'll revert to the i225 igc interface for WAN to see if this solves my throughput drop outs after ~36 hours.
For anyone's curiosity, the igc interface returns the following from the 'ifconfig -vv' command:
igc2: flags=8963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 description: WAN options=4e020bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_MAGIC,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP> media: Ethernet autoselect (1000baseT <full-duplex>) status: active nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
-
@selfjc
One more interesting thing to note. I found this old Redmine bug #6766 discussing the flow control as unsettable.When I run:
sysctl -a | grep hw.ix
I receive the output that the flow control is not settable:
<118>Setting up extended sysctls...sysctl: oid 'hw.ix.flow_control' is a read only tunable <118>sysctl: oid 'hw.ix.flow_control' is a read only tunable hw.ix.flow_control: 3
For any other internet searches for 4100 and ix flow controls, setting flow controls via the GUI appears in Redmine feature request #11056 and target release as "Future."
I will continue trying out the i225 igc interface for WAN instead since the igc doesn't report 'rxpause,txpause' when connected to the same Netgear CM1200 modem.
-
@selfjc
The error was User error. A Diagnostics -> Reboot -> Reroot does not set the 'hw.ix.flow_control' but a Diagnostics -> Reboot -> Normal Reboot does.Now the ix3 interface does not show a 'rxpause,txpause.'
I will monitor the WAN throughput and report back if there is a drop out in throughput like previously experienced with the flow controls on.
-
@selfjc
Another update:
The ix3 interface still cut down the download bandwidth from 320 Mbps to 80 Mbps after approximately 36 hours. A Diagnostics - Reboot -> Normal Reboot brought the ix3 interface back up to full speed.I am moving my WAN port over to one of the unused igc interfaces to test now for a 36 hour download bandwidth drop out.
-
Is that throughput difference a step change?
Does the link show differently in each case? Are there a lot of errors/colisions on ix3 after it slows?
20Mbps is so low it has to be something pretty low level, like flow control as you state.
-
@stephenw10 said in 4100 ix Flow Control Help:
Is that throughput difference a step change?
Yes, as far as I know. The throughput change happens either when I am not home or during the middle of the night. My Home Assistant running speed test once an hour is when I first see the change in throughput. I then confirm that speed test with my laptop and from the command prompt on the 4100.
Does the link show differently in each case? Are there a lot of errors/colisions on ix3 after it slows?
No, the ix3 interface shows no errors or collisions.
20Mbps is so low it has to be something pretty low level, like flow control as you state.
I think so because the previous igb interfaces on the SG-4680 with the same pfsense configuration did not exhibit this. The 4100's igc (Intel i225) interface so far is holding strong since 10pm Monday night (about 32 hours ago). I'll report back if the igc interface holds the WAN throughput longer than the ix3 interface (probably an update tomorrow).
-
@selfjc
Welp, a few moments after I made that post the download and upload speeds slowed down.I checked the WAN interface (it was on the igc3 interface) and there were no errors or collisions. I wrote down my public IP address.
I swapped the WAN cable from the cable modem back to the ix3 interface from the igc3 interface. I changed the pfsense WAN interface back to ix3 to follow the cable. My ISP handed me a new IP address. I reran speed tests with the same low speed results of ~20 Mbps download. The Traffic Shaper Limiter for CODEL and FQ_CODEL are on.
Then (WAN still in ix3 interface) I disabled my Traffic Shaper's Limiter floating firewall rule for CODEL and FQ_CODEL. Now I get full ISP link speeds again ~914 Mbps download and ~50 Mbps upload.
I admit I am not being super scientific for changing two variables just now of the public IP from my ISP and disabling the Traffic Shaper again. But this is starting to point to the Traffic Shaper's Limiter and floating firewall rule "filling up" or something.
I just now re-enabled my Floating Rule to implement the Traffic Shaper Limiter using the CODEL and FQ_CODEL and the download speeds are download of ~483 Mbps (the WANdown limit set to 700 Mbps) and the upload of ~20 Mbps (the WANup limit set to 22 Mbps).
Conclusion
The ix3 interface looks to be okay.
My Traffic Shaper Limiter rules seem to be "clogging up" after about 36 hours.Thank you for the help @stephenw10 !
-
Ah, nice find. That's weird though!
You see any errors logged from the shaper? The queues somehow completely full? -
@stephenw10
I will try to look at that if/when the download speeds slow down.Does Diagnostics -> Limiter Info contains that information?
-
Ah, sorry I was thinking they were AltQ shapers.
You might see something there though if the Limiters are misbehaving. -
Welp, this seems to be total connections or total throughput into/through the Traffic Shaper Limiter.
The ix3 interface serving as WAN just suffered the same slow down after only 12 hours of uptime and ~50GiB. I disabled the floating Firewall floating rule that forces in the WANdownQ and WANupQ. The old states and connections still suffer slow bandwidth but speed tests to new servers come right back up towards the ISP link speeds.
After a reboot, with the same ix3 interface and the same public IP the download speeds return back to full speed.
This is pointing more and more to the Traffic Shaper Limiter. I will now try just leaving that off to monitor if the bandwidth in and out through the WAN interface slows down again.
-
Hmm, like it hits this after ~50GB every time? That's...odd.
-
if this is a thing - then i wouldn't be surprised if it's actually something like 42.949GB. or 53.687 if there's a bit/byte conversion along the way
-
The issue cropped back up today while I was at work when my Home Assistant notified that the speedtest was slowed. I confirmed by IPSec VPNing to the 4100 and running the speedtest-cli from the Command Prompt.
I have now set the following per the Hardware Tuning Guide as an attempt:
kern.ipc.nmbclusters="1000000" kern.ipc.nmbjumbop="524288"
In the /boot/loader.conf.local and the following system tunable:
hw.intr_storm_threshold="10000"
I don't anticipate this to fix this issue I have because the issue happened also on the igc interfaces when set to WAN not just the ix interfaces. But it's worth exhausting all avenues.
I'll post an update back again if the bandwidth dropout happens again.
At that point I only have the following options left:
- Factory reset and forego the configuration restore
- RMA the 4100 Max
Any suggestions?
-
So that was with the Limiters disabled?
-
No surprise, look at your limiter parameters:
This Time is so low, your CPU clock is not high enough to work out the Queue.I use this on the 2100 and 6100, with ECN active:
AQM CoDel target 11ms interval 25ms ECN
-
@nocling
I'll make sure to keep that in mind when I add Traffic Shaping back in.Right now I have flashed the 4100 back to bare pfsense 23.01 because I was having the bandwidth dropout without Traffic Shaping.
The plan is to setup the interfaces with the segregated network IP ranges with only basic firewall from WAN to LANs. Hopefully the 4100 doesn't suffer drop outs with this arrangement. Then add back in the features I had before.