Strange problem with slow download speed on WAN



  • Hello guys!
    I want to share with you my weird problem and I hope to get some valuable hints because I tried several things without success.

    My DSL connection/modem status:

    Download: 17.326 Mbps.
    Upload:  1.149 Mbps.

    When I do a speedtest (speedtest.net) with my computer plugged in directly on the modem I get:

    Down: 15+ Mbps
    Upload: 0.95 Mbps

    When my computer is behind the pfsense box I get:

    Down: 4-5 Mbps
    Upload: 0.95 Mbps

    • The problem started about two weeks ago.

    • My pfSense version: 2.0.1-RELEASE (i386).

    • No hardware changes the last 10 months.

    • I did the tests with the simplest possible topology: 1 PC on LAN -> pfSense -> DSL modem

    • The modem is not on bridge mode.

    • My ISP did some tests from their side and they confirmed everything works fine (of course it works since bypassing the pfsense box the speed is fine).

    • On my WAN iface I have 0 errors and 0 collisions!

    • Using auto-negotiation for WAN iface "100baseTX <full-duplex>" is selected.</full-duplex>

    • With "full" speed (5+Mbps stable),pfsense has CPU usage around 15-20% and memory usage around 15%

    • State table size and MBUF Usage have low values too.

    • It seems to affect only the download speed and not the upload.

    • No traffic shaping enabled.

    • There was no bridge between the interfaces.

    • Then I created a bridge for the LAN and used different ports for the tests.

    • I replaced the cables with new.

    • I haven't installed squid or squidGuard and during the speedtests I also stopped several services (snort, ntop).

    • I disabled/enabled the hardware checksum offload.

    • I checked different settings for the interface speed but since I get no errors/collisions I do not think it is related.

    Nothing of the above helped! Any ideas???
    Any feedback really welcome since I am running out of troubleshooting ideas… :)





  • @heper:

    you could try this:

    http://help.expedient.net/broadband/mtu_ping_test.shtml

    I tried that and:
    For packet size 1472 I get timeout.
    For packet size 1462 I get normal response.
    For anything above 1472 I get ICMP error for DF flag as expected.

    Now by default the MTU is 1500 but I will also try 1490 (1462) on my interfaces and let you know.

    • UPDATE 1: I get the best results using the default settings (1500)…so the problem still exists.
    • UPDATE 2: Using for all interfaces (LAN,WAN) MTU 1492 which is used by my modem I do not notice any difference on the speed but just higher CPU load on pfSense.


  • How long has your pfSense box ever performed well before running into problems?  Have you made any recent hardware changes?  You might want to try running a test directly from your firewall as described in the below thread.  I've used this method via SSH to the cachefly site successfully many times.

    http://forum.pfsense.org/index.php/topic,45874.0.html



  • This pfsense box was running without problems the last 10 months.
    No hardware related to pfsense or the modem has been changed after the initial setup (10 months ago).

    I tried the following from the box itself:

    fetch -o /dev/null http://cachefly.cachefly.net/100mb.test
    

    and I get average download speed 550kBps (4.4 Mbps).


  • Netgate Administrator

    Ok so the problem appears even when the pfSense box isn't routing at all.
    Is your modem in bridge mode? I guess I really mean is your modem configured differently when connected to pfSense than connected directly to a PC?

    If it is then possibly your ISP has changed their setup slightly such that bridge mode is not working as it should. They may not even realise it happened, firmware update on some piece of kit between you and them for instance. Anyway you should be able to test that easily enough by leaving it in bridge mode and using the PPPoE (if that's what you're using) client in your PC.

    If you are double NATing then possibly a bad NIC or cable are to blame. Try reassignng your NICs the other way around to WAN and LAN. Test again.

    Steve



  • Hi Steve,
    No my modem is not on bridge mode, so when I connect the PC network cable from pfsense directly to one of the LAN slots in the modem the speed is high as normal, when I plug it to the pfSense LAN ports again, we are back to ~5Mbps. In general any PC connected to the LAN ports of the modem has high download speed and when connected to pfSence low. That's why I don't think it's something that could be resolved if I change the modem to bridge mode.

    I may try to reassign the WAN to another interface. Just for information all the interfaces on the box are Gigabit ethernet.


  • Netgate Administrator

    Hmm. OK.
    That certainly suggests a problem with the pfSense box and since it just happened without any hardware or software/firmware changes I would be looking at a bad NIC or cable. In the webGUI look at Status: Interfaces: Do you see any errors or collisions?
    You could try and do a test though the modem/router but not via the internet. Connect a server of some type to one of the spare LAN ports on the modem and then try to tranfer a file from it through pfSense (or fetch it as before if it's a web server).

    Steve



  • Using the initial interface I had 0 errors/collisions, but anyway I assigned WAN to another interface and replaced the cable.

    When Speed and Duplex is on "autoselect" I get timeouts and the link between pfsense and modem is not stable.
    If I switch is manually to "100baseT full duplex" it works but I get many "errors in" for the WAN.
    If I switch is manually to "100baseTX half duplex" it works but I get some collisions (no more errors). The download speed is again around 5Mbps.

    I noticed that sometimes a restart of the pfSense box has a real impact on issues related to the interfaces. So after a reboot, "autoselect" works fine again: full duplex, no errors, no collisions.

    • UPDATE: I connected a server on a LAN port of the modem. Then I downloaded a file from that server to my client behind pfSense. The download speed was higher than 5Mbps, around 7.5-10Mbps…


  • The outcome till now:

    Download speed behind pfsense is higher for downloading locally (through modem) than downloading from the internet…so it seems that the pfSense interface works fine.

    At the same time downloading from the internet, bypassing pfSense (connected to the modem) works fine.

    Does it makes sense? Anybody with a similar issue maybe?


  • Netgate Administrator

    @/CS:

    When Speed and Duplex is on "autoselect" I get timeouts and the link between pfsense and modem is not stable.
    If I switch is manually to "100baseT full duplex" it works but I get many "errors in" for the WAN.
    If I switch is manually to "100baseTX half duplex" it works but I get some collisions (no more errors). The download speed is again around 5Mbps

    This is your problem. You have a mismatch somewhere or a bad cable or socket where not all pins are connected. Hard to imagine how that just happened but I guess physical damage can 'just happen' if your box is vulnerably placed.

    If you set the pfSense box to anything other than 'auto' then you MUST have the same settings at both ends of the cable. If you cannot set the speed/duplex at the router you have a problem.  ;)

    @/CS:

    Download speed behind pfsense is higher for downloading locally (through modem) than downloading from the internet…so it seems that the pfSense interface works fine.

    Not really. If you are only getting 7-10Mbps through a local connection then something is wrong unless your server is very slow. Or perhaps the other end is set to 10baseT.

    Usually if you have a duplex mismatch the speed is very slow indeed, like <500K.  Hard to say quite what's happening here. Have you tried different LAN ports on the modem?

    Try setting both ends to 100TX.

    Steve



  • So to recap, with your original WAN interface set to auto-negotiation, you experience slow speed but no errors or collisions.  Using a second interface for the WAN, you get timeouts and what appears to be high-latency conditions.  Adjusting the speed/duplex on the second interface introduces errors or collisions.

    You've swapped out the cable and tested the modem's network interface with a directly-connected PC, so those two components are assumed good.

    I agree with Steve in that it's a bad cable or interface problem, so with the cable considered good, that would lead us to the interface.  But the second is experiencing issues as well?  What are the hardware specs of your pfSense firewall?  What type of network interfaces are you using (Intel, Broadcom, etc.)?  You noted the interfaces on your firewall are all gigabit, is the interface on the PC you've directly connected to the modem gigabit as well?  Sorry for all the questions, just trying to get a better feel for your setup.



  • Hi guys,
    feel free to ask anything! :)

    So my pfSense box is an embedded board (net6501-50: http://soekris.com/products/net6501.html) with 4x Intel 82574L Gigabit Ethernet ports. Yes my computer connected to pfSense has also a Gigabit interface. Unfortunately I cannot define the interface speed on the specific modem but the problem is more generic affecting LAN interfaces too.

    I agree that it seems to be something related to the interfaces and not the cables, since I used around 5 different cables! With the same cable connected to the modem I have normal speed. ;)

    I get the same weird behavior trying 3 different LAN ports of the board! When the "autoselect" doesn't work, I need to define the speed and then I get errors. I am afraid the issue is a hardware failure of the board…but I cannot understand what happened and messed the interfaces. Right now the WAN with "autoselect" works without any errors/collisions and there is a problem with the LAN ifaces.

    I do not install any updates automatically and nothing has been changed.

    The condition of the interfaces/pins is perfect and the box is located in a protected area indoors. The is no physical damage and the age of the box is around 10 months.



  • Ah, a Soekris box.  That's a pretty popular board, so certainly not an uncommon configuration by any means.  I'll admit I don't have much exposure to the Soekris products, other than what I've read here and other places on the Interwebs.  I do run a little Atom board with pfSense using the same Intel nics and they've performed admirably.

    Are the nics on your Soekris box connected to an expansion card that perhaps needs to be reseated?  If your firewall is anything like mine, then it's tucked away behind a media center collecting more than it's fair share of dust and dirt.  Are you running the embedded (nanobsd) of pfSense or the full version?  On a CF card or other storage device?


  • Netgate Administrator

    @/CS:

    I noticed that sometimes a restart of the pfSense box has a real impact on issues related to the interfaces. So after a reboot, "autoselect" works fine again: full duplex, no errors, no collisions.

    When you did the above was the speed still limited?

    One thing I have seen in the past that resulted in similarly flaky network connections is a dying power supply. Especially an external power brick. Does it run hot? Do you have anything you could replace it with for a test?
    As Trunix said that is a popular board. If it is a PSU failure someone will have seen that before.

    Oh and what modem do you have?

    Steve



  • Yeah, after looking at the pics of your Soekris and realizing all the nics are mounted directly to the board, I was thinking psu as a possible culprit as well…but Steve beat me to it!  My other thoughts were your storage device, but of the two, I think a failing psu aligns better with the problems you're experiencing.



  • PSU? Never thought about that but it makes sense!
    Just to mention that during ALL kind of tests behind pfSense mentioned above the speed was limited. Even when I have no errors/collisions on the interfaces the speed is limited.

    I have installed nanobsd version of pfSense on an external 2.5" Fujitsu SATA disk.
    Modem: DSL-EasyBOX 803 A by Astoria Networks/TWONKY.



  • I would swap the psu and see if that fixes things.  In any case, it's probably a good idea to have a spare one on-hand.  Let us know if that's the solution so others can benefit!

    I wonder if your external hdd is increasing your power draw enough that it's shortening your overall psu lifetime.  If the psu is indeed the problem, and the replacement fails in a similar time period, you may want to consider a mSATA ssd.



  • Thanks for your support guys.
    I will check with a new PSU and let you know. :)



  • I contacted Soekris and they also suggested to use a new PSU.
    I just did and … the same problem! :(
    What else could be?  ???


  • Netgate Administrator

    Hmm. What did you replace the PSU with? Was it of sufficient amperage?
    Hard to say what else could be causing the negotiation problems, or even if that is a symptom rather than a cause.
    You could try putting a switch/hub in between the modem and pfSense box if you have one to hand.

    Steve



  • @stephenw10:

    What did you replace the PSU with? Was it of sufficient amperage?

    You could try putting a switch/hub in between the modem and pfSense box if you have one to hand.

    My first and my second PSU were the recommended parts by Soekris for this board (12V, 3A).
    I will use a hub between pfSense and the modem and will update this post within half an hour.

    • UPDATE1:
        I used a hub between :
        >pfSense WAN iface and the modem…same problem.
          [PC–(1000baseT<full-duplex>)--pfSenseLAN--pfSenseWAN--(100baseTX<full-duplex>)--Modem], 0 errors, 0 collisions
        >pfSense LAN iface and the PC…same problem.

    • UPDATE2:
        Downloading (Copy-Paste large file) from the LAN PC to a WLAN PC over RDP through pfsense box gives me around 15+ Mbps...not bad and with low CPU/Memory usage.</full-duplex></full-duplex>



  • Huh.  I figured a new psu would fix you up.  You might want to try running iperf from one wired pc on one interface to another wired pc on another interface and see what results you get.  Jperf provides a nice little gui for running under Windows, it can be a little finicky to get working, but I've used it many times with good results.

    I wonder if your pcb got zapped somehow, affecting all your onboard nics, the controller or something else.



  • It may also be worthwhile to reinstall pfsense to ensure it's a definitely hardware issue and not bad behavior from an add-on package.



  • If I finally reinstall pfSense nanobsd version in a new HDD which one would you recommend for Soekris board net6501-50?
    mSATA SSD maybe? Any recommended vendor/product?

    I'm not yet convinced to reinstall the system from scratch but I'm asking in case I will do so.



  • I would think any of the Transcend mSATA SSDs that are listed on the product page of your 6501 would be great choices.  I've even bought ones off eBay like the one below.  I think any of these are reliable enough that you could run the full version of pfSense instead of the nanobsd image if you wished.

    http://cgi.ebay.com/ws/eBayISAPI.dll?ViewItem&item=221163403530

    Can your Soekris board boot off usb?  That would be a non-destructive way to temporarily run another image to see if that improves your situation.



  • @trunix:

    Can your Soekris board boot off usb?  That would be a non-destructive way to temporarily run another image to see if that improves your situation.

    Yes it can, so I will try to boot from USB and test before any hardware replacement or complete reinstallation.



  • I'd be interested to see if that changes things.  Definitely don't want to put you through a spending spree and not have it fix anything, though I don't mind spending other people's $$ :D

    When you accomplished the large file transfer from your the LAN PC to your WLAN PC over RDP were both machines wired directly to the GE interfaces?



  • @trunix:

    I'd be interested to see if that changes things.  Definitely don't want to put you through a spending spree and not have it fix anything, though I don't mind spending other people's $$ :D

    Lol, don't worry trunix, if I would think your recommendation doesn't make sense I wouldn't follow it. ;)
    The whole troubleshooting didn't cost me anything till now.

    @trunix:

    When you accomplished the large file transfer from your the LAN PC to your WLAN PC over RDP were both machines wired directly to the GE interfaces?

    The LAN PC was on GE interface and the WLAN PC on wireless 11g. Both connected to pfSense box.



  • Is your wireless connection a separate access point, or is it integrated in either the modem or the 6501?  What was different from this PC transfer test that netted 15+ Mbps than the earlier one you conducted that only resulted in 7-10 Mbps?



  • @trunix:

    Is your wireless connection a separate access point, or is it integrated in either the modem or the 6501?  What was different from this PC transfer test that netted 15+ Mbps than the earlier one you conducted that only resulted in 7-10 Mbps?

    Wireless integrated on 6501 board.
    The difference is that before there were problems with the interfaces causing errors and collisions. These problems were resolved by themselves by rebooting the board and setting the interfaces to specific speed and then back to auto. Right now they are all to "auto" without errors.



  • Got it.  Previously you had a PC connected to the LAN port of your DSL modem, so I guess that was a small config difference as well.  Hmmm, having to fiddle with the interface settings to get things stabilized isn't good.  Hardware checksum offload is back to enabled (checkbox cleared), correct?  Technically your nics are capable of hardware TCP segmentation offload, but I'd leave that off for now and it may be something to optimize later when everything is back to working okay.

    Hardware large receive offload and device polling are disabled are well, correct?  Disabled for LRO means the checkbox is marked and disabled for device polling means the checkbox is cleared.  I mention that only because I get turned around and flip-flop the setting from time to time when I'm making adjustments.

    Any improvement running booting off usb?



  • Unfortunately I didn't succeed to boot it from USB. I tried several supported distributions and USB sticks without success. Do you recommend any distro which I may haven't tried yet?

    My BIOS version supports booting from USB: "comBIOS ver. 1.41a  20111203"
    I got the same error for all my attempts:

    comBIOS Monitor.  Press ? for help.

    boot 80
    No Boot device available, enter monitor.

    comBIOS Monitor.  Press ? for help.

    For the settings you mentioned above I will check even I think these are correct.
    Just for information, Soekris suggested to send them the board for inspection.



  • I'm sure you've already tried the i386 memstick distro, and that's always worked for me.  More often than not, I just keep the latest LiveCD handy and use a usb cd/dvd drive.

    Sorry, I'm not gonna be any help with the Soekris error messages, perhaps Steve or someone else familiar with the hardware can chime in?  If I were you, I'd want to try this last bit of troubleshooting before mailing the firewall back to Soekris, but be mindful your warranty doesn't expire in the meantime.



  • After several unsuccessful tests, I upgraded my comBIOS to the latest version:

    comBIOS ver. 1.41c  20121115

    Then I tried with FreeBSD (FreeBSD-9.0-RELEASE) following these two guides without success:
    http://www.macfreek.nl/memory/FreeBSD_9_on_Soekris_net6501
    http://wiki.soekris.info/Net6501_freebsd

    Actually for these tests with USB sticks I have unplugged the SATA disk and each time the command "boot 80" takes some seconds and then always gives the "No boot device available" error.

    I think I may try to reinstall pfsense on the same SATA disk.
    The board has 3 years warranty and it's 10 months old, so the expiration is not a problem. I want to try everything before sending it back.



  • Happy New Year guys!

    It's time for an update on this topic.
    So…I finally found a USB stick that was recognized by the comBios so I installed the full version of the latest pfSense version (2.0.2) on a different HDD, Transcend mSATA SSD. Actually I used the "pfSense-memstick-serial-2.0.2-RELEASE-i386-20121207-1630.img" image and the "Win32DiskImager" software to write it to the USB stick.

    After the installation I assigned only the WAN and LAN interfaces and did my speed tests using two specific cables and reassigning the LAN/WAN interfaces.
    To keep it short I get slow speeds ONLY when a specific interface (em3) is assigned to WAN/LAN. This iface was always assigned to WAN or LAN when I was doing my tests in the past and it was the reason I was getting the slow speeds. So it seems to be a hardware related issue.

    I had a closer look to this interface and I cannot see any difference comparing with the other. I mean all 8 pins are clean and without any signs of strain.

    What do you recommend? Should I send it to Soekris service? Is there anything else I could do?
    The board is quite new and considering its price I am not willing to use only 3 of the interfaces…


  • Netgate Administrator

    Hmm, seems odd.
    Before any of your trouble started were your using em3 and getting full speed?
    If so then it pretty much can only be a hardware problem of some kind.
    The only other thing to consider is how the NICs are connected internally. For instance my home box has 3 Intel Gigabit NICs but they are actually provided by one Gigabit chip and one Dual Gigabit chip. These are connected on separate PCI bus connections. I can achieve a greater throughput by arranging the highest traffic to use both chips. It's pretty marginal though!  ;)

    Steve



  • FINALLY SOLVED!!! :o

    Hi guys,
    let me explain the root cause and how I found it out.

    I noticed that the speed is slow only when that interface is assigned to WAN and NOT to LAN. It means when em3 was LAN the speed was high and when WAN, slow! That pointed me to the right direction. I remembered that in the past I also had slow speeds with other interfaces as WAN…
    So the same interface is slow as WAN and fast as LAN...but what's the difference?
    WAN had always a specific IP (DHCP reservation on the DSL router). So, I logged in to the router and changed the reservation to a different IP and suddenly I got high speed again for WAN! The only setting on the router which is related to the specific IP is the DMZ, since I had it enabled for this specific IP to be able to use pfSense for VPN etc. That was necessary since the DSL modem was not on bridge mode.

    So finally when I disable the DMZ setting for the WAN IP then the speed is high and when enable again is slow!!! It's clear that it's a malfunction of the DSL-Easybox!

    Actually, it explains why I had this problem even if I was reassigning the WAN in the past. Back then I was updating that DHCP reservation to point to the same IP (to be able to use my VPN) and I always had the same problem. Yesterday I built the system from scratch again (new HDD) and I did my tests without caring to get this specific IP (without updating the reservation) and that's why I got the different results for one of the interfaces…

    Thank you all guys for the feedback and the ideas you provided. I enjoyed the troubleshooting and for me it was indeed an interesting case!  ;D

    ***UPDATE: Using port forward (NAT settings) instead of the DMZ setting works fine for VPN without affecting the speed! :)


  • Netgate Administrator

    Ah! Well deduced.  :)
    There is no QoS / traffic shaping on the Easybox?

    Steve



  • @stephenw10:

    There is no QoS / traffic shaping on the Easybox?

    The QoS menu is locked/disabled by default and sooner or later I need to reset and rebuild it from scratch too.
    I'm afraid there are some default settings which are currently not visible to the user.


Locked