Dismal squid performance



  • pfsense 2.0 beta4
    August 7 build
    AMD64

    Atom D510 dual-core
    4GB RAM
    dual Intel GBE (em)
    OCZ Vertex 2 240GB

    I used the squid package in 1.2.3 on a Soekris net5501 (500MHz, 512MB) with a 5400 rpm drive. Cached downloads would come in at 30-50 mbps, which is awesome when your internet connection is 6/1 mbps.

    Now I've upgraded to the hardware and software listed above, and my internet connection is 32/4. As a test file, I downloaded direct to ramdisk an Ubuntu iso from a local mirror using wget on a LAN host, and it came in at a steady 10-11 mbps.

    Then I logged into another LAN host and downloaded to ramdisk the same iso from the same mirror using wget. This time the download speed was a steady 3 mbps. I know it came out of squid's cache because pfsense reported no WAN traffic throughout the download.

    Here are some holdups that I've ruled out and my rationale:

    network: I've transferred files between vlans at up to 200 mbps and I have no doubt it can push a lot more than that. The LAN interface was virtually idle while I pulled the iso from cache.

    CPU: I ran top on pfsense during the download and the idle process hovered around 90%.

    disk: The vertex 2 is one of the fastest 2.5" SSDs available, whether for sequential or random data. The HDD activity light on the firewall flashed briefly every couple of seconds during the download. It looks far busier than that when booting pfsense.

    Things I think it could be:

    squid misconfiguration: I didn't change much from default. It's running in transparent mode and I changed the cache system to diskd. I set the memory cache size to 1.5GB and the hard disk cache size to 150GB. I set the level 1 subdirectories to 256. The only one of these options I haven't tried on other systems is diskd.

    kernel boot configuration: I added the following lines to /boot/loader.conf:

    kern.ipc.nmbclusters="32768"
    kern.maxfiles="65536"
    kern.maxfilesperproc="32768"
    net.inet.ip.portrange.last="65535"

    These were recommended in this forum and I had good performance with them in 1.2.3 on the net5501.

    software issues: Are there known issues with the squid package on the AMD64 pfsense platform? Does somebody have it running well?

    What could I look at to find what's causing the poor performance?

    Thanks.


  • Rebel Alliance Developer Netgate

    Have you tried it on 32-bit?

    I'm not sure how well 64-bit is supposed to perform on the Atoms.

    I've tried squid on amd64 in a VM (Though it's on a core i5-750) and it was fine at the time.



  • @jimp:

    Have you tried it on 32-bit?

    No, because I wanted to use all of my RAM. I may have to if I don't get some resolution though. Still, I would have expected to see something not working out if it wasn't well supported, like high CPU usage or so.

    Of course, now that I've done a fresh reinstall, squid won't even start, so maybe that's a sign.


  • Rebel Alliance Developer Netgate

    Well it's hard to say what the exact cause might be in this case, I was just curious if the performance was the same or different on 32 vs 64. If they are both slow, it's probably more related to the network card or some other chipset issue.



  • Right. I will give 32 bit a try.


  • Rebel Alliance Developer Netgate

    Looks like amd64 packages are having a problem. Somehow some of them are being mixed up with their 32-bit counterparts, resulting in libs that won't load on amd64. I'm checking into it.


  • Rebel Alliance Developer Netgate

    32-bit and 64-bit squid should be OK now. Both were rebuilt today and I've tried them out and they are starting again.



  • I reinstalled squid 64-bit in a vm today and performance is worse than before. For example, downloading an iso from http://mirror.csclub.uwaterloo.ca, without squid it comes in ~6mbps. Using squid (non-transparent mode) it's a steady 1.75+/-.02 mbps. Using squid on a 32-bit pfsense 1.2.3 vm, the same download comes out of cache ~200 mbps. Both VMs are on the same ESXi server using 2 Xeon 5150 CPU cores and Intel GBE. PF 2.0 has 6GB RAM and 1.2.3 has 3.0GB, so hardware should not be an issue. Squid settings are identical between the two.


  • Rebel Alliance Developer Netgate

    What is the output of "ifconfig -a" when running on 64-bit?

    If it says anything about TSO and/or LRO, go into the advanced options and make sure the options to disable those are checked, and you may as well disable checksums.



  • $ ifconfig -a
    em0: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500
    	options=9b <rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum>ether 00:0c:29:59:fa:4e
    	inet6 fe80::20c:29ff:fe59:fa4e%em0 prefixlen 64 scopeid 0x1 
    	inet 172.21.252.1 netmask 0xfffffe00 broadcast 172.21.253.255
    	nd6 options=3 <performnud,accept_rtadv>media: Ethernet autoselect (1000baseT <full-duplex>)
    	status: active
    em1: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500
    	options=9b <rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum>ether 00:0c:29:59:fa:58
    	inet 172.21.33.58 netmask 0xfffffe00 broadcast 172.21.33.255
    	inet6 fe80::20c:29ff:fe59:fa58%em1 prefixlen 64 scopeid 0x2 
    	nd6 options=3 <performnud,accept_rtadv>media: Ethernet autoselect (1000baseT <full-duplex>)
    	status: active
    plip0: flags=8810 <pointopoint,simplex,multicast>metric 0 mtu 1500
    lo0: flags=8049 <up,loopback,running,multicast>metric 0 mtu 16384
    	options=3 <rxcsum,txcsum>inet 127.0.0.1 netmask 0xff000000 
    	inet6 ::1 prefixlen 128 
    	inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4 
    	nd6 options=3 <performnud,accept_rtadv>pfsync0: flags=0<> metric 0 mtu 1460
    	syncpeer: 224.0.0.240 maxupd: 128
    pflog0: flags=100 <promisc>metric 0 mtu 33152
    enc0: flags=0<> metric 0 mtu 1536</promisc></performnud,accept_rtadv></rxcsum,txcsum></up,loopback,running,multicast></pointopoint,simplex,multicast></full-duplex></performnud,accept_rtadv></rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum></up,broadcast,running,simplex,multicast></full-duplex></performnud,accept_rtadv></rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum></up,broadcast,running,simplex,multicast>
    

    TSO and LRO were already disabled. I disabled checksum offloading and DL from cache is still 1.77mbps. This is on a VM.


  • Rebel Alliance Developer Netgate

    Looks normal then in that regard.

    Is it VMware?
    If so, can you confirm if the normal VMware settings are still there?

    /etc/sysctl.conf should have:

    kern.timecounter.hardware=i8254
    

    /boot/loader.conf should have:

    kern.hz="100"
    


  • I'll try those. By default my sysctl.conf has no uncommented lines, and loader.conf is empty.



  • I copied /boot/loader.conf from another 2.0 machine, added those two lines from your post to their respective files, and rebooted. DL from cache is now 2.3 +/- .2 mbps. This is on ESXi, pfsense installed from pfsense.iso



  • Are you getting good performance from yours now? If so I will try reinstalling it on my home pfsense. I just set up the vm because of the crashing that happened during the 32/64 fiasco. I want to minimize testing on my home box but if there's a chance that my issues are specific to this vm then I'll drop it like a rock.


  • Rebel Alliance Developer Netgate

    I don't have a VM setup "behind" it to test right now. Mine is also in VirtualBox so it's a bit different setup.



  • I installed squid on a fresh Beta4 i386 system this morning and….same problem. iso comes out of cache at 3.3 mbps. About halfway through the download it jumps to 10 mbps. No detectable signs of strain from any part of the system, just slow.


  • Rebel Alliance Developer Netgate

    I tried it out on my amd64 box today, I moved an XP VM behind the amd64 pfSense VM with squid, and it came through at the full wire speed of my cable, 10Mbps.



  • Uncached objects always download at the expected rate, it's the cached objects that are slow without apparent cause. Try deleting the download from your xp machine and then download it again. Assuming your squid cache was configured to cache objects of that size, you will see what I mean.

    The expected behaviour is that cached objects should download at a speed that is bounded only by CPU, LAN network, or disk speed. Try this exercise in 1.2.3 and see the difference.


  • Rebel Alliance Developer Netgate

    Ah, well I noticed in my haste of configuring squid I left its cache size as 100mb, not very effective if I want to store ISOs while testing…

    So I bumped it up to 3GB, and tried again, and the download came to me at 17MByte/s (byte, not bit). I deleted the downloaded file and cleared firefox's cache between attempts. Squid's access log shows a TCP_HIT for the request so I know it was coming from the cache.

    I'll try it a few more times with different download locations to see if I get any differences.

    This is on an amd64 VM on a core i5-750, with the only tweak being the single loader.conf line tweaking nmbclusters to 32768.


  • Rebel Alliance Developer Netgate

    Tried a few more iso downloads and the lowest one I got was 10MByte/s. (Though it went back and forth between 10-11) and the fastest was that first one, about 17MB/s.

    I did notice that at least one place, Knoppix, looked like it was an http link on a mirror but actually redirected to an ftp link so it bypassed the cache entirely. (I only figured that out when I right clicked on the download in FF and copied the link to be sure I got the same thing twice).



  • Well that's way better than I can get out of it. Can I copy your config? I must be doing something wrong, or there is a hardware problem that is common to my home machine and the vm at work.


  • Rebel Alliance Developer Netgate

    Nothing special about the config, just a stock install (I haven't even changed the hostname or many other basic options really)

    Squid and Lightsquid are the only packages on it.

    Here's the squid sections:

    
    		 <squid><config><active_interface>lan</active_interface>
    				<allow_interface>on</allow_interface>
    				<transparent_proxy>on</transparent_proxy>
    				 <private_subnet_proxy_off><defined_ip_proxy_off><log_enabled>on</log_enabled>
    				<log_dir>/var/squid/log</log_dir>
    				 <log_rotate><proxy_port>3128</proxy_port>
    				 <icp_port><visible_hostname>localhost</visible_hostname>
    				<admin_email>admin@localhost</admin_email>
    				<error_language>English</error_language>
    				 <disable_xforward><disable_via><uri_whitespace>strip</uri_whitespace>
    				 <dns_nameservers><disable_squidversion></disable_squidversion></dns_nameservers></disable_via></disable_xforward></icp_port></log_rotate></defined_ip_proxy_off></private_subnet_proxy_off></config></squid> 
    		 <squidcache><config><harddisk_cache_size>3000</harddisk_cache_size>
    				<harddisk_cache_system>aufs</harddisk_cache_system>
    				<harddisk_cache_location>/var/squid/cache</harddisk_cache_location>
    				<memory_cache_size>8</memory_cache_size>
    				<minimum_object_size>0</minimum_object_size>
    				<maximum_object_size>50000000</maximum_object_size>
    				<level1_subdirs>16</level1_subdirs>
    				<memory_replacement_policy>heap GDSF</memory_replacement_policy>
    				<cache_replacement_policy>heap LFUDA</cache_replacement_policy>
    				<cache_swap_low>90</cache_swap_low>
    				<cache_swap_high>95</cache_swap_high>
    				 <donotcache><enable_offline></enable_offline></donotcache></config></squidcache> 
    
    


  • Just thought I would add my $0.02 here.  I was running into the exact same problems with squid v2.7.9_1  on pfSense 2.0-BETA4 - Sept 1, 2010.  After enabling squid, I was getting extremely poor performance from the local cache compared to just downloading from the internet and my ping times went thru the roof.  Prior to squid, I ran a speed test (speedtest.net) and my ping times were 20ms-30ms.  After enabling squid, my ping times went to +300ms and beyond.  Also, once a file was in the local squid cache, the download rate of a file in squid cache would vary wildly from 1MB/sec to 25MB/sec (I was running running "wget <cached_url>" from my Mac terminal app).

    After reading thru this thread, I made the changes both clarknova and jimp mentioned (/boot/loader.conf and /etc/sysctl.conf) as well as rebooting pfSense and stopping/(re)starting squid and mucking around with the /var/squid directory.  Finally, something "clicked" on my firewall and squid started working properly.  Here are some things I did which finally caused squid to behave properly:

    • Edited the /boot/loader.conf with this info:
      –-----------------------------------------------
      kern.ipc.nmbclusters="32768"
      kern.maxfiles="65536"
      kern.maxfilesperproc="32768"
      net.inet.ip.portrange.last="65535"

    • Edit /etc/sysctl.conf with this info:


    net.inet.tcp.inflight.enable=0
    net.inet.tcp.hostcache.expire=1

    • Enabled all the TSL/LRO features in System-->Advanced-->Networking (including Disable hardware checksum offload, Disable hardware TCP segmentation offload, and Disable hardware large receive offload).  I rebooted the system, saw no difference, then DISABLED these features again (I think this is what caused squid to start working properly).

    • Edited the squid configuration options:


    • Dropped the RAM cache to 8MB

    • Modified the disk cache to 50G

    • Changed the disk disk cache system to ufs (I think this is also what caused squid to work properly)

    • Recreated the /var/squid directory:


    • From the CLI, stopped squid (/usr/local/etc/rc.d/squid stop
    • Renamed /var/squid to /var/squid.old
    • Created a new /var/squid directory
    • Copied /var/squid.old/. /var/squid
    • Restarted squid (/usr/local/etc/rc.d/squid start)

    After mucking around for about 2hrs, squid started working as it should, and I am now able to download from the local disk cache at +35MB/sec.  I think it has something to do with my SSD and disk options set in the squid configuration.

    All in all, I don't have the magic bullet to fix the problem - just a set of things I did that made my install magically work.

    Hope this helps...</cached_url>



  • I gave up and went to 32-bit. These are the type of results I expect (not unlike jimp's results). I changed my nmbclusters to 32768 and increased my RAM cache, disk cache, and max file size. Not sure why I can't get these results in 64-bit pfsense (and not able to test now due to signal 6 ;)

    Notice the "out" rate of ~10mbps on the first half of the graph. This is me downloading an ubuntu iso from mit.edu to a LAN host for the first time. Then the graph shoots up to 100-150mbps as I download the same iso from the same server. The WAN graph meanwhile sat around baseline, a clear indication that the second download came from cache (besides the fact that my WAN is limited ~30mbps right now).




  • rkelleyrtp,

    Thanks for the thorough information. I tried all your suggestion (on i386) and changed the hard disk cache system to diskd. cached downloads went from ~120mbps to ~3mbps–same as I was seeing on 64-bit! Same reason I switched to 32-bit!

    I undid all the changes except nmbclusters, since Jim and many others had no problem with it. No change. Then I rebooted pfsense and speeds are back to 100+mbps. Odd, because the system tunables appear to change immediately when the Apply button is hit, right? Could it be diskd that is causing my misery?

    Also odd: any other download from the web while downloading (slowly) from cache will cause the cached download to increase in speed! Every time I loaded a web page my cached download would surge to 20mbps. If I downloaded a video file that came in around 15mbps, my cached download would rise to 50mbps and stay there until the video was done downloading, then settle back down to 3mbps!

    I re-added all your suggested changes, because having read up on them, they all look sensible. In fact, with or without those modifications my d/l speed from cache is consistently 12-12.4 MB/s on a 700MB iso download, although on a loaded multi-user system it might be a different story.

    I rebooted just to be sure, and d/l speed is good. diskd has to be the culprit. I guess I have to go back to 64-bit pfsense now to see if I can get similar performance using aufs.

    Thanks rkelleyrtp and Jim for your input.

    ps, I'm still curious how you're getting 35 MB/s when I'm getting not even half that. top -S shows that my idle process is over 100% (on a dual-core) while downloading at 12 MB/s, and the vertex 2 is certainly capable of better speeds.

    I'm also curious why you set your squid RAM cache to 8MB. It's generally recommended to set it to 1/10 the amount of disk cache, although I have no idea what effect varying that will have.



  • Definitely disk cache. I did a 'usr/local/etc/rc.d/squid.sh restart' between tests, changing only the disk cache type, and repeated it a few times to ensure consistent results.

    Would be interesting to see comparative results with multiple simultaneous users. I'll report back when I've had a chance to test 64-bit on the same hardware with the same config.




  • @clarknova:

    rkelleyrtp,

    I re-added all your suggested changes, because having read up on them, they all look sensible. In fact, with or without those modifications my d/l speed from cache is consistently 12-12.4 MB/s on a 700MB iso download, although on a loaded multi-user system it might be a different story.

    I rebooted just to be sure, and d/l speed is good. diskd has to be the culprit. I guess I have to go back to 64-bit pfsense now to see if I can get similar performance using aufs.

    Thanks rkelleyrtp and Jim for your input.

    ps, I'm still curious how you're getting 35 MB/s when I'm getting not even half that. top -S shows that my idle process is over 100% (on a dual-core) while downloading at 12 MB/s, and the vertex 2 is certainly capable of better speeds.

    I'm also curious why you set your squid RAM cache to 8MB. It's generally recommended to set it to 1/10 the amount of disk cache, although I have no idea what effect varying that will have.

    Clarknova,

    No problem, I am glad to help.  Debugging these sorts of issues takes time, patience, and a very methodical approach.

    As far as dropping the RAM cache to 8MB, I was just experimenting with various tunables to see if something would click.  I have since set it back to 1G

    However, I think you have found the main issue:  Hard disk cache type.  This morning, I reinstalled squid and set it back to the default configuration.  From my Mac, I opened two terminal windows to pfSense and a third window to download an ISO image.  On the first window, I tailed the /var/squid/log/access file and watched the output as I ran "wget  http://centos.secsup.org/4.8/isos/alpha/centos-4.3-alpha-disc4.iso" in a separate window.  The first download peaked around 1M/s, and I had a TCP_MISS entry for the file in the squid access log.  Now that the file was in the Squid cache, I started experimenting with the disk cache type.  Here is what I found:

    • Disk Cache set to  ufs - Once the initial file was in cache, subsequent "wget" calls hit +35MB/s and I had a TCP_HIT entry in the squid access log file.  All subsequent calls were at 35MB/sec and I saw no slowdown what-so-ever.

    • Disk Cache set to aufs - After changing to aufs and restarting squid from CLI ("/usr/local/etc/rc.d/squid restart),  download speeds ranged wildly from 500K/s to 10M/sec but seemed to average around 3M/sec.  I saw a TCP_HIT entry in the squid access log file.  Funny thing is, any non-cached content is very fast (normal internet speeds).  So, definitely something wrong with this setting as all cached content ranges from fast to slow.

    • Disk Cache set to diskd - After changing to diskd and restarting squid from CLI ("/usr/local/etc/rc.d/squid restart),  download speeds were at normal internet speeds, and it seems like the file was simply ignored by squid.  In fact, it even seemed lightly lower (averaged 350K/sec instead of 1M/sec).  The Squid access log file had a TCP_HIT but the file was definitely not loaded very fast.

    So, what do we make of all this?  The default disk cache (UFS) seems to be the most reliable option for squid.  The other options cause download speeds to either become very sporadic (afs) or much slower than normal (diskd).  From what I can see, keeping the disk cache to UFS is the best option.

    I hope this helps…



  • @clarknova:

    the system tunables appear to change immediately when the Apply button is hit, right?

    I'm not sure about the particular system tunables you were modifying but some system tunables are used only at startup - that is, only the boot time setting has any significance.


  • Rebel Alliance Developer Netgate

    Those are some good stats, my only concern would be long-term performance with thousands upon thousands of objects in the cache, and dozens (or hundreds) of clients. I wonder which one would hold up then.

    I may change the default disk type in the package to ufs though, see if anyone even notices. It would only affect fresh installs of course.



  • @jimp:

    Those are some good stats, my only concern would be long-term performance with thousands upon thousands of objects in the cache, and dozens (or hundreds) of clients. I wonder which one would hold up then.

    I may change the default disk type in the package to ufs though, see if anyone even notices. It would only affect fresh installs of course.

    For me, I have to wonder why the other choices (AFS and diskd) perform so poorly compared to UFS.  Is there some underlying issue with the SATA controller?


  • Rebel Alliance Developer Netgate

    From what I have heard, there is some dark magic involved with disk cache layout; A lot of things affect performance: Too much in the cache, too little in the cache, underlying filesystem layout, file system size, filesystem type, disk hardware, disk placement (front of the disk vs end of the disk), other items on the same disk controller, other general hardware items, etc, etc.

    I'm not sure any one choice would be better for everybody long-term, but ufs may be the best place to start.


Log in to reply