Scp & rsync traffic randomly stalling



  • Since I upgraded to 2.0 (Jan 10 snapshot) I have consistently seen scp/rsync stall out randomly. I have a 100m connection and an IPsec tunnel between sites. If I scp directly, bypassing IPsec, it still occurs, though not as often. When it works, it works great ~90m/s…but then randomly stalls. If I kill it and restart, it starts transferring at the peak rate again. I've had a separate ping running, and it never lost a packet, so I was not losing connectivity.

    Any ideas?


  • Rebel Alliance Developer Netgate

    This doesn't affect any other protocols, just scp/rsync?

    Some things to try toggling that may or may not make a difference, under advanced options, are disable hardware checksums, disable tso, disable lro, and disable scrub.

    Change one at a time and see if the behavior stays.

    Also you may want to look at the output of "netstat -ni", "netstat -m" and check for interface errors or anything filling up. There are other error counters but without knowing what type of interfaces you have.



  • I have to report the exact same issue. I do not have anything fancy setup; I am new to pfsense, first time user and just have a basic setup with "2.0-BETA5  (i386) built on Fri Feb 4 15:47:28 EST 2011"

    I will try your ideas and report back.



  • I have the same issue, FYI. Just noticed it a day or two ago, as I don't use it that much. I'll see if I can use those suggestions when I play with it to figure out anything about what is going on.

    Dave



  • So far, the suggestions have not solved the issue.

    It is definitely just an issue uploading, not downloading. I just downloaded a 30MB file with SCP no issues, but trying to upload the same file to the host I just downloaded it from, over scp, and it stalls after about 3.7MB.
    Totally cans the internet connections, times me out from my remote location and I need to ssh back in even.

    Maybe this NAT Reflection is needed…..



  • @jimp:

    Also you may want to look at the output of "netstat -ni", "netstat -m" and check for interface errors or anything filling up. There are other error counters but without knowing what type of interfaces you have.

    netstat doesn't seem to show much wrong, from my untrained eye:

    [2.0-BETA5][admin@pfsense.rizal]/root(1): netstat -ni
    Name              Mtu Network      Address              Ipkts Ierrs Idrop    Opkts Oerrs  Coll
    dc0              1500 <link#1>00:50:bf:9f:88:e5    55788    0    0    60555    0    0
    dc0              1500 10.3.3.0/24  10.3.3.1            23708    -    -    29618    -    -
    dc0              1500 fe80:1::250:b fe80:1::250:bfff:        0    -    -        1    -    -
    rl0              1500 <link#2>00:50:ba:c8:86:e1    79469    0    0    68593    0    0
    rl0              1500 fe80:2::250:b fe80:2::250:baff:        0    -    -        0    -    -
    lo0              16384 <link#3>281    0    0      281    0    0
    lo0              16384 127.0.0.0/8  127.0.0.1              281    -    -      281    -    -
    lo0              16384 ::1/128      ::1                      0    -    -        0    -    -
    lo0              16384 fe80:3::1/64  fe80:3::1                0    -    -        0    -    -
    pflog0*          33200 <link#4>0    0    0    10909    0    0
    pfsync0*          1460 <link#5>0    0    0        0    0    0
    enc0*            1536 <link#6>0    0    0        0    0    0
    pppoe0            1492 <link#7>77282    0    0    66402    0    0
    pppoe0            1492 59.167.255.13 59.167.255.134      45843    -    -        1    -    -
    pppoe0            1492 fe80:7::250:b fe80:7::250:bfff:        0    -    -        2    -    -
    ovpns1            1500 <link#8>0    0    0        3    0    0
    ovpns1            1500 fe80:8::250:b fe80:8::250:bfff:        0    -    -        0    -    -
    ovpns1            1500 10.0.8.1/32  10.0.8.1                0    -    -        0    -    -
    [2.0-BETA5][admin@pfsense.rizal]/root(2): netstat -m
    195/195/390 mbufs in use (current/cache/total)
    194/68/262/8512 mbuf clusters in use (current/cache/total/max)
    193/63 mbuf+clusters out of packet secondary zone in use (current/cache)
    0/5/5/4256 4k (page size) jumbo clusters in use (current/cache/total/max)
    0/0/0/2128 9k jumbo clusters in use (current/cache/total/max)
    0/0/0/1064 16k jumbo clusters in use (current/cache/total/max)
    436K/204K/641K bytes allocated to network (current/cache/total)
    0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
    0/0/0 requests for jumbo clusters denied (4k/9k/16k)
    0/6/2384 sfbufs in use (current/peak/max)
    0 requests for sfbufs denied
    0 requests for sfbufs delayed
    0 requests for I/O initiated by sendfile
    0 calls to protocol drain routines</link#8></link#7></link#6></link#5></link#4></link#3></link#2></link#1>



  • Ok now I am thinking it is just an uploading issue. Trying to send an email with a 2.7MB attachment and it is failing at 48% each time. Saying it failed to contact the SMTP server.

    Is there an "upload" bug in 2.0 Beta5?


  • Rebel Alliance Developer Netgate

    I haven't had any issues, and I upload lots of stuff all the time, over IPsec, OpenVPN, scp, you name it.

    Before doing anything else, make sure that you are on the most current snapshot. The snapshots mentioned elsewhere in this thread are quite old.



  • I am on.

    2.0-BETA5 (i386)
    built on Wed Feb 23 15:42:56 EST 2011

    I will do some more isloation testing this weekend, see if my particular issue doesn't lie somewhere else.

    It does seem firewall though.

    I will report back.



  • Just another question. How best can one trouble shoot such and issue on the pfsense box?

    Thanks.


  • Rebel Alliance Developer Netgate

    you could run a really large packet capture and hope to catch something, but otherwise all you can really do is periodically check any error counters to see if things increase.

    Ultimately you'd probably need the packet captures to have any hope of figuring out what is causing the connection to stall.



  • Thanks Jimp

    I'll try that and also isolating the modem etc on the weekend. I really need to get to the bottom of this.



  • This is my WAN interface card. Could this be it? As outbound works, but a steady stream at max upload speed is what ends up stalling and basically "crashing"

    rl0@pci0:9:0: class=0x020000 card=0x13011186 chip=0x13001186 rev=0x10 hdr=0x00
    vendor = 'D-Link System Inc'
    device = 'DL 10038C or 10038D (Remark of Realtek RTL-8139) Fast Ethernet Adapter'
    class = network
    subclass = ethernet


  • Rebel Alliance Developer Netgate

    It's certainly possible, those rl chips are not known for their quality.



  • Thanks Jimp

    I think it makes more sense given the symptoms…maybe. I am not a networking expert though.
    Recap, outbound traffic works in bursts, but a continuous flow of outbound traffic halts/stalls/crashes at about the ~3MB mark.

    I don't see any errors or lost packets.

    I tried running with a 760 MTU. No difference.

    Running in bridge mode with a TP-Link 8840T ADSL2+ modem.



  • Ok I think I should start a new thread, but what forum? 2.0 specific or not?

    So I just ssh'd into the pfsense box itself, got up a shell and scp a 10MB file to an external host.

    As it is was doing on a PC on the network, it got to around 3.6-3.7MB and stalls forcing me to close the terminal and re-login again.

    So I tried this:

    scp -l 100 10mb external.domain:

    And this completed successfully. Albeit slow, but all done.

    So the issue is more specifically, uplink saturation causing a dropout/stall.
    Max upload is about 900KB/s and download is 1.11MB/s.

    Downloading at full speed is no issue though, so the card can support speeds much greater then a fully saturated upload link.

    I will be testing tomorrow many tests including pull the pfsense box entirely out of the equation.

    This is a new house, new cables, new net connection, so anything is possible.

    Thanks for reading.



  • So I changed the modem from bridge modem to PPPoE and then changed the pfsense boxs WAN interface from PPPoE to DHCP client.

    No more issues with uploading/outbound etc.
    So it is not the interface (Realtek) or anything hardware related. It is either the modem in Bridge or the pfsense box in PPPoE where the issue lies.

    Another note is I had the WAN on MTU 760. Had tremendous issue when I first put it in this configuration for all LAN clients. I reset the MTU to blank but still issue. Check with ifconfig on the pfsense box in a shell and MTU was still 760. A reboot of pfsense resolved this issue, resetting MTU to 1500.


  • Rebel Alliance Developer Netgate

    One of my WANs is PPPoE and I run quite a bit of scp over it. Granted the upload on that circuit is only ~0.5Mbit so it probably isn't stressing anything anywhere.



  • Did a bunch of tests on the weekend. Issue is with the pfsense box.

    I started a new topic on this, felt bad hijacking this guys thread.

    http://forum.pfsense.org/index.php/topic,33709.0.html



  • I am having the same trouble with scp/rsync but only when rsync between two pfsense box via IPSEC. When I run the rsync from to WAN it works just fine, when I run rsync via IPSEC it stalls.

    PFSENSE 1.2.3


Log in to reply