HAProxy Slows At 1500+ connections Really Need some help to figure out why



  • So I had haproxy installed (as a front end for a cluster of squid proxies) on a low end Dell server with PFS 2.1.5 and was experiencing slow down with 1500+ connections so I  built up a new PFS 2.2.4 machine on a brand new Dell R630  with 64gb RAM, Dual CPU,  bad ass raid disks etc….loaded and configured haproxy with several squid backends and some ICAP  backends. Things work great until I hit about 1500 or more connections and then everything just slows to a crawl. Restarting haproxy helps momentarily but it will slow back down again very quickly. If I offload clients to the point of only 300-400 connections it will become responsive again. In the haproxy stats page it will show 97% idle or similar and the output from top will show maybe 5% cpu for haproxy. If I configure the browser client to use one of the squid backends directly it works fast but as soon as I put the broswer proxy config back to use the haproxy frontend IP it will slow down.

    I am not really sure how to troubleshoot this and would appreciate any help. I have done the usual searching and tried many of the fixes others have posted but my problem continues. I can post any info here that would help someone determine where my problems may be, I am just not sure what is useful. Below are a few of my  essential configs to start with

    TIA..!

    /var/etc/haproxy.cfg file contents:
    global
    maxconn 50000
    log /var/run/log local0 info
    stats socket /tmp/haproxy.socket level admin
    uid 80
    gid 80
    nbproc 1
    chroot /tmp/haproxy_chroot
    daemon
    spread-checks 5

    listen HAProxyLocalStats
    bind 127.0.0.1:2200 name localstats
    mode http
    stats enable
    stats admin if TRUE
    stats uri /haproxy_stats.php?haproxystats=1
    timeout client 5000
    timeout connect 5000
    timeout server 5000

    frontend HTPL_PROXY
    bind 10.1.4.105:8181 name 10.1.4.105:8181 
    mode http
    log global
    option http-server-close
    option forwardfor
    acl https ssl_fc
    reqadd X-Forwarded-Proto:\ http if !https
    reqadd X-Forwarded-Proto:\ https if https
    timeout client 30000
    default_backend HTPL_WEB_PROXY_http_ipvANY

    frontend HTPL_CONTENT_FILTER
    bind 10.1.4.106:8182 name 10.1.4.106:8182 
    mode tcp
    log global
    timeout client 30000
    default_backend HTPL_CONT_FILTER_tcp_ipvANY

    backend HTPL_WEB_PROXY_http_ipvANY
    mode http
    cookie SERVERID insert indirect
    stick-table type ip size 1m expire 5m
    stick on src
    balance roundrobin
    timeout connect 50000
    timeout server 50000
    retries 3
    server HTPL-PROXY-01 10.1.4.103:3128 cookie HTPLPROXY01 check inter 60000  weight 150 fastinter 1000 fall 5
    server HTPL-PROXY-02 10.1.4.104:3128 cookie HTPLPROXY02 check inter 60000  weight 100 fastinter 1000 fall 5
    server HTPL-PROXY-03 10.1.4.107:3128 cookie HTPLPROXY03 check inter 60000  weight 50 fastinter 1000 fall 5
    server HTPL-PROXY-04 10.1.4.108:3128 cookie HTPLPROXY04 check inter 60000  weight 200 fastinter 1000 fall 5
    server HTHPL-PROXY-01 10.1.4.101:3128 cookie HTHPLPROXY1 check inter 60000  weight 150 fastinter 1000 fall 5
    server HTHPL-PROXY-02 10.1.4.102:3128 cookie HTPHLPROXY02 check inter 60000  weight 100 fastinter 1000 fall 5

    backend HTPL_CONT_FILTER_tcp_ipvANY
    mode tcp
    balance roundrobin
    timeout connect 50000
    timeout server 50000
    retries 3
    server HTHPL-PROXY-01 10.1.4.101:1344 check inter 60000 disabled weight 100 fastinter 1000 fall 5
    server HTHPL-PROXY-02 10.1.4.102:1344 check inter 60000 disabled weight 100 fastinter 1000 fall 5
    server HTPL-WEB-01 10.1.4.153:1344 check inter 60000  weight 200 fastinter 1000 fall 5
    server HTPL-WEB-02 10.1.4.154:1344 check inter 60000  weight 200 fastinter 1000 fall 5

    Some sysctl stuff
    kern.ostype: FreeBSD
    kern.osrelease: 10.1-RELEASE-p15
    kern.osrevision: 199506
    kern.version: FreeBSD 10.1-RELEASE-p15 #0 c5ab052(releng/10.1)-dirty: Sat Jul 25 20:20:58 CDT 2015
        root@pfs22-amd64-builder:/usr/obj.amd64/usr/pfSensesrc/src/sys/pfSense_SMP.10

    kern.maxvnodes: 200000
    kern.maxproc: 70788
    kern.maxfiles: 204800
    kern.argmax: 262144
    kern.securelevel: -1
    kern.hostname: HTPL-PROXY-03.hth.hightechhigh.org
    kern.hostid: 1053306123
    kern.clockrate: { hz = 1000, tick = 1000, profhz = 8128, stathz = 127 }
    kern.posix1version: 200112
    kern.ngroups: 1023
    kern.job_control: 1
    kern.saved_ids: 0
    kern.boottime: { sec = 1443678149, usec = 901465 } Wed Sep 30 22:42:29 2015
    kern.domainname:
    kern.osreldate: 1001000
    kern.bootfile: /boot/kernel/kernel
    kern.maxfilesperproc: 300000
    kern.maxprocperuid: 63709
    kern.ipc.maxsockbuf: 4262144
    kern.ipc.sockbuf_waste_factor: 8
    kern.ipc.max_linkhdr: 16
    kern.ipc.max_protohdr: 60
    kern.ipc.max_hdr: 76
    kern.ipc.max_datalen: 76
    kern.ipc.maxmbufmem: 217774080
    kern.ipc.nmbclusters: 262144
    kern.ipc.nmbjumbop: 13291
    kern.ipc.nmbjumbo9: 11814
    kern.ipc.nmbjumbo16: 8860
    kern.ipc.nmbufs: 1048590
    kern.ipc.maxpipekva: 1071579136
    kern.ipc.pipekva: 163840
    kern.ipc.pipefragretry: 0
    kern.ipc.pipeallocfail: 0
    kern.ipc.piperesizefail: 0
    kern.ipc.piperesizeallowed: 1
    kern.ipc.msgmax: 16384
    kern.ipc.msgmni: 40
    kern.ipc.msgmnb: 8192
    kern.ipc.msgtql: 2048
    kern.ipc.msgssz: 32
    kern.ipc.msgseg: 512
    kern.ipc.semmni: 50
    kern.ipc.semmns: 340
    kern.ipc.semmnu: 150
    kern.ipc.semmsl: 340
    kern.ipc.semopm: 100
    kern.ipc.semume: 50
    kern.ipc.semusz: 632
    kern.ipc.semvmx: 32767
    kern.ipc.semaem: 16384
    kern.ipc.shmmax: 536870912
    kern.ipc.shmmin: 1
    kern.ipc.shmmni: 192
    kern.ipc.shmseg: 128
    kern.ipc.shmall: 131072
    kern.ipc.shm_use_phys: 0
    kern.ipc.shm_allow_removed: 0
    kern.ipc.soacceptqueue: 4096
    kern.ipc.numopensockets: 3448
    kern.ipc.maxsockets: 2092935
    kern.ipc.sendfile.readahead: 1
    kern.dummy: 0
    kern.ps_strings: 140737488351200
    kern.usrstack: 140737488351232
    kern.logsigexit: 1
    kern.iov_max: 1024
    kern.hostuuid: 1d9f393c-6870-11e5-9ebd-000e1e9c38d0
    kern.cam.sort_io_queues: 1
    kern.cam.boot_delay: 0
    kern.cam.num_doneqs: 6
    kern.cam.dflags: 0
    kern.cam.debug_delay: 0
    kern.cam.pmp.retry_count: 1
    kern.cam.pmp.default_timeout: 30
    kern.cam.pmp.hide_special: 1
    kern.cam.cam_srch_hi: 0
    kern.cam.scsi_delay: 5000
    kern.cam.cd.poll_period: 3
    kern.cam.cd.retry_count: 4
    kern.cam.cd.timeout: 30000
    kern.cam.ada.legacy_aliases: 1
    kern.cam.ada.retry_count: 4
    kern.cam.ada.default_timeout: 30
    kern.cam.ada.send_ordered: 1
    kern.cam.ada.spindown_shutdown: 1
    kern.cam.ada.spindown_suspend: 1
    kern.cam.ada.read_ahead: 1
    kern.cam.ada.write_cache: 1
    kern.cam.da.poll_period: 3
    kern.cam.da.retry_count: 4
    kern.cam.da.default_timeout: 60
    kern.cam.da.send_ordered: 1
    kern.cam.enc.emulate_array_devices: 1
    kern.tty_pty_warningcnt: 1
    kern.random.adaptors: yarrow,dummy
    kern.random.active_adaptor: yarrow
    kern.random.live_entropy_sources: Hardware, Intel Secure Key RNG
    kern.random.yarrow.gengateinterval: 10
    kern.random.yarrow.bins: 10
    kern.random.yarrow.fastthresh: 96
    kern.random.yarrow.slowthresh: 128
    kern.random.yarrow.slowoverthresh: 2
    kern.random.sys.seeded: 1
    kern.random.sys.harvest.ethernet: 0
    kern.random.sys.harvest.point_to_point: 0
    kern.random.sys.harvest.interrupt: 0
    kern.random.sys.harvest.swi: 1
    kern.rndtest.retest: 120
    kern.rndtest.verbose: 1
    kern.vt.enable_altgr: 1
    kern.vt.debug: 0
    kern.vt.deadtimer: 15
    kern.vt.suspendswitch: 1
    kern.vt.kbd_halt: 1
    kern.vt.kbd_poweroff: 1
    kern.vt.kbd_reboot: 1
    kern.vt.kbd_debug: 1
    kern.vt.kbd_panic: 0
    kern.disks: mfisyspd9 mfisyspd8 mfisyspd7 mfisyspd6 mfisyspd5 mfisyspd4 mfisyspd3 mfisyspd2 mfisyspd1 mfisyspd0
    kern.geom.eli.version: 7
    kern.geom.eli.debug: 0
    kern.geom.eli.tries: 3
    kern.geom.eli.visible_passphrase: 0
    kern.geom.eli.overwrites: 5
    kern.geom.eli.threads: 0
    kern.geom.eli.batch: 0
    kern.geom.eli.boot_passcache: 1
    kern.geom.eli.key_cache_limit: 8192
    kern.geom.eli.key_cache_hits: 0
    kern.geom.eli.key_cache_misses: 0
    kern.geom.dev.delete_max_sectors: 262144
    kern.geom.disk.mfisyspd0.led:
    kern.geom.disk.mfisyspd1.led:
    kern.geom.disk.mfisyspd2.led:
    kern.geom.disk.mfisyspd3.led:
    kern.geom.disk.mfisyspd4.led:
    kern.geom.disk.mfisyspd5.led:
    kern.geom.disk.mfisyspd6.led:
    kern.geom.disk.mfisyspd7.led:
    kern.geom.disk.mfisyspd8.led:
    kern.geom.disk.mfisyspd9.led:
    kern.geom.transient_maps: 33202
    kern.geom.transient_map_retries: 10
    kern.geom.transient_map_hard_failures: 0
    kern.geom.transient_map_soft_failures: 0
    kern.geom.inflight_transient_maps: 0
    kern.geom.confxml:



  • This was fixed with the sysctl's for your bge network interfaces?
    http://marc.info/?l=haproxy&m=144399351725189&w=2

    hw.bge.tso_enable=0
    hw.pci.enable_msix=0
    
    

    I changed them just now and was able to easily achieve these numbers without a wink:
    pid = 50054 (process #1, nbproc = 1)
    uptime = 0d 0h03m25s
    system limits: memmax = unlimited; ulimit-n = 100047
    maxsock = 100047; maxconn = 50000; maxpipes = 0
    current conns = 5562; current pipes = 0/0; conn rate = 64/sec
    Running tasks: 1/5587; idle = 97 %



  • PiBA, yes, I totally boneheaded it and put bce instead of bge..I have several servers, some with bce and some with bge and I just confused it. After making the change and rebooting it seems to be working better. I am slowly ramping up the users but so far so good at 2500+. The stats I posted below were from Apache Bench so I need real world clients to really test it out.

    Thanks for reminding me to post back to the group.


Log in to reply