HAProxy Slows At 1500+ connections Really Need some help to figure out why
-
So I had haproxy installed (as a front end for a cluster of squid proxies) on a low end Dell server with PFS 2.1.5 and was experiencing slow down with 1500+ connections so I built up a new PFS 2.2.4 machine on a brand new Dell R630 with 64gb RAM, Dual CPU, bad ass raid disks etc….loaded and configured haproxy with several squid backends and some ICAP backends. Things work great until I hit about 1500 or more connections and then everything just slows to a crawl. Restarting haproxy helps momentarily but it will slow back down again very quickly. If I offload clients to the point of only 300-400 connections it will become responsive again. In the haproxy stats page it will show 97% idle or similar and the output from top will show maybe 5% cpu for haproxy. If I configure the browser client to use one of the squid backends directly it works fast but as soon as I put the broswer proxy config back to use the haproxy frontend IP it will slow down.
I am not really sure how to troubleshoot this and would appreciate any help. I have done the usual searching and tried many of the fixes others have posted but my problem continues. I can post any info here that would help someone determine where my problems may be, I am just not sure what is useful. Below are a few of my essential configs to start with
TIA..!
/var/etc/haproxy.cfg file contents:
global
maxconn 50000
log /var/run/log local0 info
stats socket /tmp/haproxy.socket level admin
uid 80
gid 80
nbproc 1
chroot /tmp/haproxy_chroot
daemon
spread-checks 5listen HAProxyLocalStats
bind 127.0.0.1:2200 name localstats
mode http
stats enable
stats admin if TRUE
stats uri /haproxy_stats.php?haproxystats=1
timeout client 5000
timeout connect 5000
timeout server 5000frontend HTPL_PROXY
bind 10.1.4.105:8181 name 10.1.4.105:8181
mode http
log global
option http-server-close
option forwardfor
acl https ssl_fc
reqadd X-Forwarded-Proto:\ http if !https
reqadd X-Forwarded-Proto:\ https if https
timeout client 30000
default_backend HTPL_WEB_PROXY_http_ipvANYfrontend HTPL_CONTENT_FILTER
bind 10.1.4.106:8182 name 10.1.4.106:8182
mode tcp
log global
timeout client 30000
default_backend HTPL_CONT_FILTER_tcp_ipvANYbackend HTPL_WEB_PROXY_http_ipvANY
mode http
cookie SERVERID insert indirect
stick-table type ip size 1m expire 5m
stick on src
balance roundrobin
timeout connect 50000
timeout server 50000
retries 3
server HTPL-PROXY-01 10.1.4.103:3128 cookie HTPLPROXY01 check inter 60000 weight 150 fastinter 1000 fall 5
server HTPL-PROXY-02 10.1.4.104:3128 cookie HTPLPROXY02 check inter 60000 weight 100 fastinter 1000 fall 5
server HTPL-PROXY-03 10.1.4.107:3128 cookie HTPLPROXY03 check inter 60000 weight 50 fastinter 1000 fall 5
server HTPL-PROXY-04 10.1.4.108:3128 cookie HTPLPROXY04 check inter 60000 weight 200 fastinter 1000 fall 5
server HTHPL-PROXY-01 10.1.4.101:3128 cookie HTHPLPROXY1 check inter 60000 weight 150 fastinter 1000 fall 5
server HTHPL-PROXY-02 10.1.4.102:3128 cookie HTPHLPROXY02 check inter 60000 weight 100 fastinter 1000 fall 5backend HTPL_CONT_FILTER_tcp_ipvANY
mode tcp
balance roundrobin
timeout connect 50000
timeout server 50000
retries 3
server HTHPL-PROXY-01 10.1.4.101:1344 check inter 60000 disabled weight 100 fastinter 1000 fall 5
server HTHPL-PROXY-02 10.1.4.102:1344 check inter 60000 disabled weight 100 fastinter 1000 fall 5
server HTPL-WEB-01 10.1.4.153:1344 check inter 60000 weight 200 fastinter 1000 fall 5
server HTPL-WEB-02 10.1.4.154:1344 check inter 60000 weight 200 fastinter 1000 fall 5Some sysctl stuff
kern.ostype: FreeBSD
kern.osrelease: 10.1-RELEASE-p15
kern.osrevision: 199506
kern.version: FreeBSD 10.1-RELEASE-p15 #0 c5ab052(releng/10.1)-dirty: Sat Jul 25 20:20:58 CDT 2015
root@pfs22-amd64-builder:/usr/obj.amd64/usr/pfSensesrc/src/sys/pfSense_SMP.10kern.maxvnodes: 200000
kern.maxproc: 70788
kern.maxfiles: 204800
kern.argmax: 262144
kern.securelevel: -1
kern.hostname: HTPL-PROXY-03.hth.hightechhigh.org
kern.hostid: 1053306123
kern.clockrate: { hz = 1000, tick = 1000, profhz = 8128, stathz = 127 }
kern.posix1version: 200112
kern.ngroups: 1023
kern.job_control: 1
kern.saved_ids: 0
kern.boottime: { sec = 1443678149, usec = 901465 } Wed Sep 30 22:42:29 2015
kern.domainname:
kern.osreldate: 1001000
kern.bootfile: /boot/kernel/kernel
kern.maxfilesperproc: 300000
kern.maxprocperuid: 63709
kern.ipc.maxsockbuf: 4262144
kern.ipc.sockbuf_waste_factor: 8
kern.ipc.max_linkhdr: 16
kern.ipc.max_protohdr: 60
kern.ipc.max_hdr: 76
kern.ipc.max_datalen: 76
kern.ipc.maxmbufmem: 217774080
kern.ipc.nmbclusters: 262144
kern.ipc.nmbjumbop: 13291
kern.ipc.nmbjumbo9: 11814
kern.ipc.nmbjumbo16: 8860
kern.ipc.nmbufs: 1048590
kern.ipc.maxpipekva: 1071579136
kern.ipc.pipekva: 163840
kern.ipc.pipefragretry: 0
kern.ipc.pipeallocfail: 0
kern.ipc.piperesizefail: 0
kern.ipc.piperesizeallowed: 1
kern.ipc.msgmax: 16384
kern.ipc.msgmni: 40
kern.ipc.msgmnb: 8192
kern.ipc.msgtql: 2048
kern.ipc.msgssz: 32
kern.ipc.msgseg: 512
kern.ipc.semmni: 50
kern.ipc.semmns: 340
kern.ipc.semmnu: 150
kern.ipc.semmsl: 340
kern.ipc.semopm: 100
kern.ipc.semume: 50
kern.ipc.semusz: 632
kern.ipc.semvmx: 32767
kern.ipc.semaem: 16384
kern.ipc.shmmax: 536870912
kern.ipc.shmmin: 1
kern.ipc.shmmni: 192
kern.ipc.shmseg: 128
kern.ipc.shmall: 131072
kern.ipc.shm_use_phys: 0
kern.ipc.shm_allow_removed: 0
kern.ipc.soacceptqueue: 4096
kern.ipc.numopensockets: 3448
kern.ipc.maxsockets: 2092935
kern.ipc.sendfile.readahead: 1
kern.dummy: 0
kern.ps_strings: 140737488351200
kern.usrstack: 140737488351232
kern.logsigexit: 1
kern.iov_max: 1024
kern.hostuuid: 1d9f393c-6870-11e5-9ebd-000e1e9c38d0
kern.cam.sort_io_queues: 1
kern.cam.boot_delay: 0
kern.cam.num_doneqs: 6
kern.cam.dflags: 0
kern.cam.debug_delay: 0
kern.cam.pmp.retry_count: 1
kern.cam.pmp.default_timeout: 30
kern.cam.pmp.hide_special: 1
kern.cam.cam_srch_hi: 0
kern.cam.scsi_delay: 5000
kern.cam.cd.poll_period: 3
kern.cam.cd.retry_count: 4
kern.cam.cd.timeout: 30000
kern.cam.ada.legacy_aliases: 1
kern.cam.ada.retry_count: 4
kern.cam.ada.default_timeout: 30
kern.cam.ada.send_ordered: 1
kern.cam.ada.spindown_shutdown: 1
kern.cam.ada.spindown_suspend: 1
kern.cam.ada.read_ahead: 1
kern.cam.ada.write_cache: 1
kern.cam.da.poll_period: 3
kern.cam.da.retry_count: 4
kern.cam.da.default_timeout: 60
kern.cam.da.send_ordered: 1
kern.cam.enc.emulate_array_devices: 1
kern.tty_pty_warningcnt: 1
kern.random.adaptors: yarrow,dummy
kern.random.active_adaptor: yarrow
kern.random.live_entropy_sources: Hardware, Intel Secure Key RNG
kern.random.yarrow.gengateinterval: 10
kern.random.yarrow.bins: 10
kern.random.yarrow.fastthresh: 96
kern.random.yarrow.slowthresh: 128
kern.random.yarrow.slowoverthresh: 2
kern.random.sys.seeded: 1
kern.random.sys.harvest.ethernet: 0
kern.random.sys.harvest.point_to_point: 0
kern.random.sys.harvest.interrupt: 0
kern.random.sys.harvest.swi: 1
kern.rndtest.retest: 120
kern.rndtest.verbose: 1
kern.vt.enable_altgr: 1
kern.vt.debug: 0
kern.vt.deadtimer: 15
kern.vt.suspendswitch: 1
kern.vt.kbd_halt: 1
kern.vt.kbd_poweroff: 1
kern.vt.kbd_reboot: 1
kern.vt.kbd_debug: 1
kern.vt.kbd_panic: 0
kern.disks: mfisyspd9 mfisyspd8 mfisyspd7 mfisyspd6 mfisyspd5 mfisyspd4 mfisyspd3 mfisyspd2 mfisyspd1 mfisyspd0
kern.geom.eli.version: 7
kern.geom.eli.debug: 0
kern.geom.eli.tries: 3
kern.geom.eli.visible_passphrase: 0
kern.geom.eli.overwrites: 5
kern.geom.eli.threads: 0
kern.geom.eli.batch: 0
kern.geom.eli.boot_passcache: 1
kern.geom.eli.key_cache_limit: 8192
kern.geom.eli.key_cache_hits: 0
kern.geom.eli.key_cache_misses: 0
kern.geom.dev.delete_max_sectors: 262144
kern.geom.disk.mfisyspd0.led:
kern.geom.disk.mfisyspd1.led:
kern.geom.disk.mfisyspd2.led:
kern.geom.disk.mfisyspd3.led:
kern.geom.disk.mfisyspd4.led:
kern.geom.disk.mfisyspd5.led:
kern.geom.disk.mfisyspd6.led:
kern.geom.disk.mfisyspd7.led:
kern.geom.disk.mfisyspd8.led:
kern.geom.disk.mfisyspd9.led:
kern.geom.transient_maps: 33202
kern.geom.transient_map_retries: 10
kern.geom.transient_map_hard_failures: 0
kern.geom.transient_map_soft_failures: 0
kern.geom.inflight_transient_maps: 0
kern.geom.confxml: -
This was fixed with the sysctl's for your bge network interfaces?
http://marc.info/?l=haproxy&m=144399351725189&w=2hw.bge.tso_enable=0 hw.pci.enable_msix=0
I changed them just now and was able to easily achieve these numbers without a wink:
pid = 50054 (process #1, nbproc = 1)
uptime = 0d 0h03m25s
system limits: memmax = unlimited; ulimit-n = 100047
maxsock = 100047; maxconn = 50000; maxpipes = 0
current conns = 5562; current pipes = 0/0; conn rate = 64/sec
Running tasks: 1/5587; idle = 97 % -
PiBA, yes, I totally boneheaded it and put bce instead of bge..I have several servers, some with bce and some with bge and I just confused it. After making the change and rebooting it seems to be working better. I am slowly ramping up the users but so far so good at 2500+. The stats I posted below were from Apache Bench so I need real world clients to really test it out.
Thanks for reminding me to post back to the group.