Sudden downstream speed drop
I have been running a PFSense install as my router for about a month or so and I just ran into my first major problem with it. I upgraded my cable modem earlier today and, for some reason, my network downstream speed has taken a massive hit (data below). It is definitely related to the PFSense box as I have tested 2 other known-good laptops and a known-good desktop by directly connecting them to the modem and the issue had disappeared. I have not seen anything unusual in any of the logs and there doesn't seem to be any packet loss. Pings respond with a reasonable delay on both sets of systems.
Known-Good systems (Averages):
Athlon XP 1600+
WAN NIC: RealTek 8139
LAN NIC: Digital 21140A
Running PFSense v1.2.1 in Full mode
State Table Usage: 153/10000
MBUF Usage: 35/525
CPU Usage: <10%
Memory Usage: 10%-15%
Swap Usage: 0%
Disk usage: <1%
The only installed package is bandwidthd
Previous modem: Toshiba PCX2200
New modem: Motorola SB5101
Let me know if I need to post any more information!
How have you measured the downstream speed?
I have not investigated this so don't have an explanation. My ADSL modem typically reports the downstream speed as 5 to 6 Mbps. I routinely download 25MB files with Firefox. On my Linux system Firefox routinely reports the download speed as about 160kBps. On my Win2k desktop (a slower system than the Linux system system) Firefox routinely reports the download speed as 80 to 85kBps. I generally download one file at a time but I have observed three concurrent downloads on the Win 2K system with Firefox reporting 80 to 85kBps on each download. Is my download speed 80Kbps or 240kBps?
I have downloaded some CD size files (Linux distributions) by bittorrent on the Win 2K system and I'm sure I've seen reported aggregate download speeds larger than 240kBps but I haven't watched these to any great extent.
I suspect my single stream download speed is limited by TCP window size.
Your observed download speed could also depend on how it is measured.
The download speed has been measured two ways, both of which produced numbers that agreed with each other and the above results. The first method I used was to upload a 30MB file to a remote server, that I have control of, 3 times on each system and averaged the total runs. I then downloaded the same file from the server 3 times on each system and averaged those results. In order to double check my method I then went over to DSLReports and used their Parsippany, NJ Java speed test server and ran that about 4 times on each system. The results were the same in every case; my upstream speed has remained fairly constant (2500Kbps ± 300Kbps) while my download speed suffered nearly a 95% speed hit when going through PFSense.
I don't know how DSLreports calculates its download speed. Your method will give single TCP stream speed. Can you download two or more copies of the file at the same time? Does each download give about the same speed as your previously reported single download? If so, and you want to get higher speed on a single download then it would probably be worthwhile significantly increasing the TCP window size. (At the moment I'm not sure how to do that so lets get the numbers first.)
I doubt it's an issue with the default TCP window size as the system was maxing out my older modem (about 6Mbps) whereas it's struggling to reach 1Mbps now, even though the connection itself is more than capable of handling it. As for multiple streams, I get a similar result; all connections are maxed at around 1Mbps (combined) or so with no erroneous messages printed to the log. I tested a direct connection to the modem again to make sure and it's only the PFSense box that is having problems.
I think I found the issue; it appears that I've been targetting the wrong problem all day. The modem itself is operating fine but it appears that one of my NIC, namely the RealTek one, decided to start failing.
So with concurrent connections the transfer speeds roughly scales with the number of connections (at least to the number you tried)?
One thing that goes up with the number of TCP connections is the aggregate TCP window size.
You haven't given much detail about your configuration. Lets assume your have one public IP address and consequently your modem is actually a modem/router. Then without pfSense in the picture your modem is doing Network Address Translation (NAT) which means when you do a file download from the internet you have a TCP connection between your modem and the file server and between the modem and your client system. Each TCP connection has its own distinct window size.
If you add pfSense between your client system and the modem you may end up with three distinct TCP connections: client to pfSense, pfSense to modem, modem to file server. Each of these TCP connections has its own distinct window size. Each of these TCP connections can have their own interesting interactions with NAT. I'll do some research on how these are supposed to interact.
How is the NIC failing (what is it doing that it shouldn't do)?
My network topology is pretty basic, behind the router is a simple star network with a 10/100-Base-t switch at the center. The rest of it is as follows:
Internet -> Cable Modem -> PFSense Router -> (switch)
How is the NIC failing (what is it doing that it shouldn't do)?
Simply put, it's slow and, after watching the network throughput on several of my local machines, seems to randomly stall (no packet loss, however) when transferring data; your TCP window suggestion just didn't fit because the everything was fine last week and, short of doing a release/renew, there were no configuration changes to explain the sudden drop, nor was there any packet loss that would cause the window to shrink.
I tested the card by performing a couple of different transfers to try and isolate the problem. Using a clean install of PFSense (installed onto a secondary HD) with everything at the default values I ran the following tests via SCP using a 1GB file (all tests were done between a single machine, the router and a single remote server):
-> implies "copying to"
Everything looks normal in this first set:
rl0 as WAN, de0 as LAN
Internal server -> Router (using de0) - ≈7MB/s
Router -> Internal server (using de0) - ≈8MB/s
Router -> External Server (using rl0) - 350-500KBps (normal given my rated connection)
External Server -> Router (using rl0) - 150KB/s (VERY bad, my connection is rated for 20Mb/s)
After this I swapped the interfaces:
de0 as WAN, rl0 as LAN
Internal server -> Router (using rl0) - ≈7MB/s
Router -> Internal server (using rl0) - 170KB/s (!!! Should be WELL above this on an internal, direct connection)
Router -> External Server (using de0) - 350-500KBps (Normal)
External Server -> Router (using de0) - 1.1MB/s (Normal)
As you can see the problem is independent of any configurations present on the PFSense system AND it follows the card around; therefore the card is failing. Either that or I have bad PCI slot.
Thanks for all the additional information. I have been puzzling on and off about the low single connection speed I've been seeing on Windows. My configuration also has rl0 as the wan link. Your report has encouraged me to look into the issue a bit more seriously. I'll play around a bit and see if I can do anything to improve things.
Hrm, now the problem gets really interesting. I just dumped the card into a Window machine and everything is running fine. It is now appearing more like there is some other issue so I'm going to try recompiling the kernel using the latest rl(4) driver from the FreeBSD 7 RELENG_7 branch as there was a bug involving bus_dma(9) and rl(4) that was recently fixed that caused (as you guessed it) slow upload speeds (limited to around 1Mbps!). It's entirely possible I have been affected by this bug since my initial install but, since my upstream speed was limited to around 400Kbps, I just didn't notice it.