Unholy device eats 100% cpu php-fpm by accessing captive portal like hell

deajan

Hello,

[EDIT] This is not an upgrade issue as it seemed[/EDIT]

I have a couple of SG 2440 devices runnnig with mostly identical configuration (clones from a master).

I've updated two of them from pfSense 2.3.1-1 to 2.3.1-5.
One of the two devices now shows permanent 100% cpu usage since the upgrade (was 30-40% before).
If I happen to reboot it, cpu goes 100% after a couple of minutes again.

Here's what I get from ps aux


USER      PID %CPU %MEM    VSZ    RSS TT  STAT STARTED     TIME COMMAND
root    83276 31.3  0.9 274440  37832  -  R     6:16PM  0:06.94 php-fpm: pool nginx (php-fpm)
root    64146 31.0  0.9 274440  37728  -  R     6:16PM  0:10.65 php-fpm: pool nginx (php-fpm)
root    99202 23.5  0.9 274440  37836  -  R     6:17PM  0:03.62 php-fpm: pool nginx (php-fpm)
root       12 22.0  0.0      0    416  -  WL    3:28PM 20:26.92 [intr]
squid   61240 20.9 15.4 737832 640356  -  S     3:29PM 12:00.18 (squid-1) -f /usr/local/etc/squid/squid.conf (squid)
root       11  3.9  0.0      0     32  -  RL    3:28PM 95:49.77 [idle]
root    50094  0.8  0.2  38848   7600  -  S     3:28PM  0:33.70 nginx: worker process (nginx)
root    49329  0.5  0.2  38848   7560  -  S     3:28PM  0:34.55 nginx: worker process (nginx)
squid   87161  0.4  0.3  33564  12040  -  S     4:57PM  0:13.72 (squidGuard) -c /usr/local/etc/squidGuard/squidGuard.conf (squidGuard)
root    38758  0.1  0.1  14508   2320  -  Ss    3:29PM  0:54.43 /usr/sbin/syslogd -s -c -c -l /var/dhcpd/var/run/log -P /var/run/syslog.pid -f /var/etc/syslog.conf -b 127.0.0.1
root        0  0.0  0.0      0    304  -  DLs   3:28PM  1:54.39 [kernel]
root        1  0.0  0.0   9136    820  -  ILs   3:28PM  0:00.03 /sbin/init --
root        2  0.0  0.0      0     16  -  DL    3:28PM  0:00.00 [crypto]
root        3  0.0  0.0      0     16  -  DL    3:28PM  0:00.00 [crypto returns]
root        4  0.0  0.0      0     32  -  DL    3:28PM  0:00.03 [cam]
root        5  0.0  0.0      0     16  -  DL    3:28PM  0:08.31 [pf purge]
root        6  0.0  0.0      0     16  -  DL    3:28PM  0:00.00 [sctp_iterator]
root        7  0.0  0.0      0     32  -  DL    3:28PM  0:01.03 [pagedaemon]
root        8  0.0  0.0      0     16  -  DL    3:28PM  0:00.00 [vmdaemon]
root        9  0.0  0.0      0     16  -  DL    3:28PM  0:00.00 [pagezero]
root       10  0.0  0.0      0     16  -  DL    3:28PM  0:00.00 [audit]
root       13  0.0  0.0      0     32  -  DL    3:28PM  0:00.00 [ng_queue]
root       14  0.0  0.0      0     48  -  DL    3:28PM  0:00.46 [geom]
root       15  0.0  0.0      0     16  -  DL    3:28PM  0:19.30 [rand_harvestq]
root       16  0.0  0.0      0     80  -  DL    3:28PM  0:00.54 [usb]
root       17  0.0  0.0      0     16  -  DL    3:28PM  0:00.02 [idlepoll]
root       18  0.0  0.0      0     32  -  DL    3:28PM  0:01.22 [bufdaemon]
root       19  0.0  0.0      0     16  -  DL    3:28PM  0:03.63 [syncer]
root       20  0.0  0.0      0     16  -  DL    3:28PM  0:00.06 [vnlru]
root       56  0.0  0.0      0     16  -  DL    3:28PM  0:00.35 [md0]
root      274  0.0  0.6 270344  24804  -  Ss    3:28PM  0:04.39 php-fpm: master process (/usr/local/lib/php-fpm.conf) (php-fpm)
root      313  0.0  0.1  18888   2444  -  INs   3:28PM  0:00.03 /usr/local/sbin/check_reload_status
root      315  0.0  0.1  18888   2288  -  IN    3:28PM  0:00.00 check_reload_status: Monitoring daemon of check_reload_status
root      327  0.0  0.1  13624   4836  -  Ss    3:28PM  0:00.01 /sbin/devd -q
root     3066  0.0  0.0  12268   1872  -  Is    3:29PM  0:00.00 /usr/local/bin/minicron 240 /var/run/ping_hosts.pid /usr/local/bin/ping_hosts.sh
root     3306  0.0  0.0  12268   1884  -  I     3:29PM  0:00.00 minicron: helper /usr/local/bin/ping_hosts.sh  (minicron)
root     3646  0.0  0.0  12268   1872  -  Is    3:29PM  0:00.00 /usr/local/bin/minicron 3600 /var/run/expire_accounts.pid /usr/local/sbin/fcgicli -f /etc/rc.expireaccounts
root     3819  0.0  0.0  12268   1884  -  I     3:29PM  0:00.00 minicron: helper /usr/local/sbin/fcgicli -f /etc/rc.expireaccounts  (minicron)
root     3961  0.0  0.0  12268   1872  -  Is    3:29PM  0:00.00 /usr/local/bin/minicron 86400 /var/run/update_alias_url_data.pid /usr/local/sbin/fcgicli -f /etc/rc.update_alias_url_data
root     4510  0.0  0.0  12268   1884  -  I     3:29PM  0:00.00 minicron: helper /usr/local/sbin/fcgicli -f /etc/rc.update_alias_url_data  (minicron)
root     6586  0.0  0.2  59068   6348  -  Is    3:28PM  0:00.00 /usr/sbin/sshd
root     6791  0.0  0.1  14612   2108  -  Is    3:28PM  0:00.01 /usr/local/sbin/sshlockout_pf 15
root     7335  0.0  0.1  21624   5696  -  Ss    4:57PM  0:00.85 /usr/local/sbin/openvpn --config /var/etc/openvpn/server1.conf
nobody   9957  0.0  0.1  16836   4824  -  Ss    4:57PM  0:15.62 /usr/local/sbin/darkstat -i igb0 -b 172.16.1.1 -b 127.0.0.1 -p 666
nobody   9971  0.0  0.1  16836   2368  -  Is    4:57PM  0:00.00 darkstat: DNS child (darkstat)
root    10295  0.0  0.1  16676   2232  -  Ss    3:28PM  0:05.41 /usr/local/sbin/filterlog -i pflog0 -p /var/run/filterlog.pid
squid   10929  0.0  0.1  37752   4084  -  S     3:46PM  0:01.62 (pinger) (pinger)
root    11185  0.0  0.1  18896   2400  -  Is    3:28PM  0:00.02 /usr/local/sbin/xinetd -syslog daemon -f /var/etc/xinetd.conf -pidfile /var/run/xinetd.pid
root    12579  0.0  0.4 776064  18252  -  Is    4:57PM  0:00.04 /usr/local/sbin/radiusd
root    13546  0.0  0.1  16652   2104  -  S     6:17PM  0:00.00 ping -c 2 -i .2 172.16.1.62
root    19086  0.0  0.2  42836   6864  -  I     4:57PM  0:00.00 /usr/local/sbin/syslog-ng -p /var/run/syslog-ng.pid
root    19305  0.0  0.2  58280   8860  -  Ss    4:57PM  0:02.86 /usr/local/sbin/syslog-ng -p /var/run/syslog-ng.pid
squid   21185  0.0  0.1  37752   4084  -  S     3:35PM  0:01.61 (pinger) (pinger)
squid   22991  0.0  0.1  37752   4084  -  S     3:46PM  0:01.84 (pinger) (pinger)
root    23250  0.0  0.1  14428   2116  -  I     3:29PM  0:00.01 /usr/libexec/getty al.Pc ttyv0
root    23348  0.0  0.1  15012   2292  -  Ss    4:57PM  0:00.91 /usr/local/bin/dpinger -S -r 0 -i CompletelFibre -B XX.XX.XX.XX -p /var/run/dpinger_CompletelFibre_XX.XX.XX.XX_8.8.4.4.pi
root    23384  0.0  0.1  15012   2292  -  Ss    4:57PM  0:00.90 /usr/local/bin/dpinger -S -r 0 -i CompletelFibre20 -B YY.YY.YY.YY -p /var/run/dpinger_CompletelFibre20_YY.YY.YY.YY_208.67.2
root    23422  0.0  0.1  14612   2180  -  Is    3:29PM  0:00.00 /usr/local/sbin/sshlockout_pf 15
unbound 26075  0.0  0.8  59220  32612  -  Ss    4:57PM  0:25.63 /usr/local/sbin/unbound -c /var/unbound/unbound.conf
root    26514  0.0  0.1  38848   5876  -  Is    3:28PM  0:00.00 nginx: master process /usr/local/sbin/nginx -c /var/etc/nginx-webConfigurator.conf (nginx)
root    26813  0.0  0.2  38848   7000  -  S     3:28PM  0:02.10 nginx: worker process (nginx)
root    26857  0.0  0.2  38848   7036  -  S     3:28PM  0:02.23 nginx: worker process (nginx)
root    26989  0.0  0.1  16532   2260  -  Ss    3:28PM  0:00.13 /usr/sbin/cron -s
root    30368  0.0  0.4  30140  17968  -  Ss    3:28PM  0:00.66 /usr/local/sbin/ntpd -g -c /var/etc/ntpd.conf -p /var/run/ntpd.pid
dhcpd   30747  0.0  0.4  24800  15036  -  Ss    4:57PM  0:01.30 /usr/local/sbin/dhcpd -user dhcpd -group _dhcp -chroot /var/dhcpd -cf /etc/dhcpd.conf -pf /var/run/dhcpd.pid igb0
root    31063  0.0  0.2  82268   7372  -  Ss    6:14PM  0:00.06 sshd: root@pts/0 (sshd)
root    31868  0.0  0.1  17000   2584  -  IN    4:57PM  0:01.42 /bin/sh /var/db/rrd/updaterrd.sh
squid   32799  0.0  0.1  37752   4084  -  S     3:35PM  0:01.66 (pinger) (pinger)
root    41937  0.0  0.1  16532   2264  -  I     6:15PM  0:00.00 cron: running job (cron)
root    42288  0.0  0.1  17000   2496  -  Is    6:15PM  0:00.01 sh /usr/local/bin/rangecheck-launcher.sh
root    43020  0.0  0.1  17760   3224  -  S     6:15PM  0:00.22 bash /usr/local/bin/rangecheck.sh
root    48914  0.0  0.0   8168   1824  -  IN    6:16PM  0:00.00 sleep 60
root    49230  0.0  0.1  38848   6228  -  Is    3:28PM  0:00.00 nginx: master process /usr/local/sbin/nginx -c /var/etc/nginx-lan-CaptivePortal.conf (nginx)
root    49536  0.0  0.2  38848   7484  -  S     3:28PM  0:32.72 nginx: worker process (nginx)
root    49784  0.0  0.2  38848   7572  -  S     3:28PM  0:36.73 nginx: worker process (nginx)
root    49852  0.0  0.2  38848   7532  -  S     3:28PM  0:30.47 nginx: worker process (nginx)
root    50402  0.0  0.2  38848   7540  -  S     3:28PM  0:37.58 nginx: worker process (nginx)
root    50506  0.0  0.0  12268   1876  -  Is    3:28PM  0:00.00 /usr/local/bin/minicron 60 /var/run/cp_prunedb_lan.pid /etc/rc.prunecaptiveportal lan
root    50763  0.0  0.0  12268   1884  -  I     3:28PM  0:00.00 minicron: helper /etc/rc.prunecaptiveportal lan (minicron)
squid   52008  0.0  0.3  33564  11212  -  I     5:29PM  0:00.12 (squidGuard) -c /usr/local/etc/squidGuard/squidGuard.conf (squidGuard)
squid   52078  0.0  0.3  33564  11132  -  I     5:29PM  0:00.12 (squidGuard) -c /usr/local/etc/squidGuard/squidGuard.conf (squidGuard)
squid   52268  0.0  0.3  33564  11048  -  I     5:29PM  0:00.09 (squidGuard) -c /usr/local/etc/squidGuard/squidGuard.conf (squidGuard)
squid   52656  0.0  0.3  33564  10932  -  I     5:29PM  0:00.12 (squidGuard) -c /usr/local/etc/squidGuard/squidGuard.conf (squidGuard)
root    60835  0.0  0.3  55860  13028  -  Is    3:29PM  0:00.00 /usr/local/sbin/squid -f /usr/local/etc/squid/squid.conf
mysql   62083  0.0  0.1  17000   2484  -  Is    3:29PM  0:00.03 /bin/sh /usr/local/bin/mysqld_safe --defaults-extra-file=/var/db/mysql/my.cnf --user=mysql --datadir=/var/db/mysql --pid-file
squid   65001  0.0  0.1  37752   4084  -  S     4:57PM  0:00.96 (pinger) (pinger)
squid   66407  0.0  0.1  37752   4080  -  S     3:29PM  0:01.63 (pinger) (pinger)
root    67386  0.0  0.0   8168   1824  -  I     6:16PM  0:00.00 sleep 55
squid   68772  0.0  0.3  33564  11596  -  S     5:19PM  0:00.09 (squidGuard) -c /usr/local/etc/squidGuard/squidGuard.conf (squidGuard)
squid   68796  0.0  0.3  33564  11476  -  I     5:19PM  0:00.12 (squidGuard) -c /usr/local/etc/squidGuard/squidGuard.conf (squidGuard)
squid   68989  0.0  0.3  33564  11252  -  I     5:19PM  0:00.08 (squidGuard) -c /usr/local/etc/squidGuard/squidGuard.conf (squidGuard)
squid   69238  0.0  0.3  33564  11252  -  I     5:19PM  0:00.08 (squidGuard) -c /usr/local/etc/squidGuard/squidGuard.conf (squidGuard)
root    76722  0.0  0.9 274340  38936  -  S     6:09PM  0:00.81 /usr/local/bin/php-cgi -f /etc/rc.prunecaptiveportal lan
squid   78216  0.0  0.1  37752   4084  -  S     3:29PM  0:01.64 (pinger) (pinger)
mysql   84761  0.0 11.3 683080 469892  -  I     3:29PM  0:03.21 /usr/local/libexec/mysqld --defaults-extra-file=/var/db/mysql/my.cnf --basedir=/usr/local --datadir=/var/db/mysql --plugin-di
root    87431  0.0  0.0  14408   1952  -  Ss    3:29PM  0:01.31 /usr/sbin/powerd -b hadp -a hadp -n hadp
squid   87539  0.0  0.3  33564  12032  -  S     4:57PM  0:01.82 (squidGuard) -c /usr/local/etc/squidGuard/squidGuard.conf (squidGuard)
squid   87589  0.0  0.3  33564  12032  -  S     4:57PM  0:00.56 (squidGuard) -c /usr/local/etc/squidGuard/squidGuard.conf (squidGuard)
squid   87655  0.0  0.3  33564  12032  -  S     4:57PM  0:00.31 (squidGuard) -c /usr/local/etc/squidGuard/squidGuard.conf (squidGuard)
squid   87901  0.0  0.3  33564  12016  -  S     4:57PM  0:00.17 (squidGuard) -c /usr/local/etc/squidGuard/squidGuard.conf (squidGuard)
squid   87914  0.0  0.3  33564  12016  -  S     4:57PM  0:00.16 (squidGuard) -c /usr/local/etc/squidGuard/squidGuard.conf (squidGuard)
squid   88067  0.0  0.3  33564  12016  -  S     4:57PM  0:00.10 (squidGuard) -c /usr/local/etc/squidGuard/squidGuard.conf (squidGuard)
squid   88124  0.0  0.3  33564  11796  -  S     4:57PM  0:00.12 (squidGuard) -c /usr/local/etc/squidGuard/squidGuard.conf (squidGuard)
squid   88237  0.0  0.1  37752   4084  -  S     4:57PM  0:01.34 (pinger) (pinger)
root    91247  0.0  0.1  17000   2516  -  I     4:57PM  0:00.09 /bin/sh /usr/local/pkg/sqpmon.sh
root    23284  0.0  0.1  43440   2668 u1  Is    3:29PM  0:00.01 login [pam] (login)
root    23486  0.0  0.1  17000   2636 u1  I     3:29PM  0:00.01 -sh (sh)
root    23617  0.0  0.1  17000   2524 u1  I+    3:29PM  0:00.00 /bin/sh /etc/rc.initial
root    17974  0.0  0.1  18676   2308  0  R+    6:17PM  0:00.00 ps aux
root    32712  0.0  0.1  17000   2620  0  Is    6:14PM  0:00.01 -sh (sh)
root    32955  0.0  0.1  17000   2528  0  I     6:14PM  0:00.00 /bin/sh /etc/rc.initial
root    39900  0.0  0.1  17340   3676  0  S     6:14PM  0:00.02 /bin/tcsh

It seems that php-fpm now eats up my CPU.
Where should I start my investigation ?

Regards,
Ozy.

kejianshi

Do you have your web access open to the world?

deajan

I do have web access via WAN on a random port.
A quick netstat showed me that I'm the only one connected to it.

kejianshi

Close that and use a vpn or SSH. Opening your web access to the WAN is stupid.

Don't feel bad - Everyone is stupid before they get smart.

deajan

Remote web access is already limited to my IP so I guess this isn't the problem here.
Is there any php profiling available in pfSense ?

kejianshi

Not sure. I'm often wrong about these things, but me personally.

I'd limit it to VPN access only.

cmb

Wouldn't end up with that even if your GUI was open to the world (though that would be a bad idea regardless).

Anything in your system log?

A truss of what the process is doing might be telling. For instance on the top PID you're showing there:

truss -p 83276 -o /root/83276-truss.txt

Let that run for 30 seconds or so and hit ctrl-c, then download that file and attach it. Can repeat same for the other php-fpm PIDs if it's like what you're showing there, with 3 separate ones chewing up a good deal of CPU.

deajan

System logs show the following messages like every hour or so:


JJun 23 09:31:35	kernel		sonewconn: pcb 0xfffff80061c4d188: Listen queue overflow: 193 already in queue awaiting acceptance (1288 occurrences)
Jun 23 09:30:34	kernel		sonewconn: pcb 0xfffff80061c4d188: Listen queue overflow: 193 already in queue awaiting acceptance (523 occurrences)

The number 193 and pcb address always stays the same accross all messages.
I've tried to isolate which process has the pcb address with lsof | grep 0xfff…. but I didn't get any result.
I've tried to increase kern.ipc.somaxconn from 128 to 512 but this doen't change cpu usage.

The php-fpm processes spawn and exit after like 30 seconds, so truss output might be shorter than 30 seconds.
Attached are two truss outputs.

PS: as for the open web configurator access, I worked on a tap VPN bridge past days, but in the meantime I had to be able to configure it :)

truss.73254.log.tar.gz
truss.13089.log.tar.gz

cmb

It's captive portal that's getting beat up there. Guessing you don't have a "Maximum concurrent connections" limit set in your CP, and some device is on that network, not logged into the portal, with an update checker or something running in the background that's causing numerous non-stop requests to hit CP.

deajan

Thank you for the insight.
I renamed the thread to something more explicit. Can a forum admin move it to captiveportal perhaps ?

After knowing what I was searching for, I isolated the problem to one device that tries to access 50-70 times a second an URL.

tail /var/log/nginx.log


Jun 24 10:53:09 pfsensebox pfsensebox.company.local nginx: 172.16.2.238 - - [24/Jun/2016:10:53:09 +0200] "GET /index.php?zone=lan&redirurl=http%3A%2F%2Fweatherblink.wdgserv.com%2Fweatherblink%2Flookup%2FLyon%2C+France HTTP/1.1" 499 0 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"
Jun 24 10:53:09 pfsensebox pfsensebox.company.local nginx: 172.16.2.238 - - [24/Jun/2016:10:53:09 +0200] "GET /index.php?zone=lan&redirurl=http%3A%2F%2Fweatherblink.wdgserv.com%2Fweatherblink%2Flookup%2FLyon%2C+France HTTP/1.1" 499 0 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"
Jun 24 10:53:09 pfsensebox pfsensebox.company.local nginx: 172.16.2.238 - - [24/Jun/2016:10:53:09 +0200] "GET /weatherblink/lookup/Lyon,%20France HTTP/1.1" 302 5 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"
Jun 24 10:53:09 pfsensebox pfsensebox.company.local nginx: 172.16.2.238 - - [24/Jun/2016:10:53:09 +0200] "GET /weatherblink/lookup/Lyon,%20France HTTP/1.1" 302 5 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"
Jun 24 10:53:09 pfsensebox pfsensebox.company.local nginx: 172.16.2.238 - - [24/Jun/2016:10:53:09 +0200] "GET /weatherblink/lookup/Lyon,%20France HTTP/1.1" 302 5 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"
Jun 24 10:53:09 pfsensebox pfsensebox.company.local nginx: 172.16.2.238 - - [24/Jun/2016:10:53:09 +0200] "GET /weatherblink/lookup/Lyon,%20France HTTP/1.1" 302 5 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"
Jun 24 10:53:09 pfsensebox pfsensebox.company.local nginx: 172.16.2.238 - - [24/Jun/2016:10:53:09 +0200] "GET /weatherblink/lookup/Lyon,%20France HTTP/1.1" 302 5 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"
Jun 24 10:53:09 pfsensebox pfsensebox.company.local nginx: 172.16.2.238 - - [24/Jun/2016:10:53:09 +0200] "GET /weatherblink/lookup/Lyon,%20France HTTP/1.1" 302 5 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"
Jun 24 10:53:09 pfsensebox pfsensebox.company.local nginx: 172.16.2.238 - - [24/Jun/2016:10:53:09 +0200] "GET /weatherblink/lookup/Lyon,%20France HTTP/1.1" 302 5 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"
Jun 24 10:53:09 pfsensebox pfsensebox.company.local nginx: 172.16.2.238 - - [24/Jun/2016:10:53:09 +0200] "GET /index.php?zone=lan&redirurl=http%3A%2F%2Fweatherblink.wdgserv.com%2Fweatherblink%2Flookup%2FLyon%2C+France HTTP/1.1" 499 0 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"
Jun 24 10:53:09 pfsensebox pfsensebox.company.local nginx: 172.16.2.238 - - [24/Jun/2016:10:53:09 +0200] "GET /weatherblink/lookup/Lyon,%20France HTTP/1.1" 302 5 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"
.......

The setup I have is an open internet access for 200 student flats, where the user account is created on the fly by the CP page.
Everyone can bring as many devices as they want, that's the demand of the customer.

Is there any way I can limit CP demands per IP or MAC base in order to mitigate some random pri*** using the worst software ever written to flood my pfsense CP ?

I was thinking that between 2.3.1-1 and 2.3.1-5, the keepalive_timeout for the CP nginx instance was disabled (#6421). Maybe this triggered the new behavior ?
Also, maybe lowering the value of limit_conn addr could help.

Any thoughts ?

Regards,
Ozy.

deajan

OMG ! Reading through the code in system.inc, I realized that limit_conn addr = "Maximum concurrent connections" in the CP web interface.
I lowered the default value (10) to 3. Will see what happens.

Any idea what's a "good" value for the max concurrent connections value ?

cmb

Good point, didn't think about it but I bet the keepalive timeout being disabled now probably makes the simultaneous connections significantly less effective than it used to be. Those stupid apps that are doing repeated calls like that are likely using some kind of library to do it that will utilize the keepalive. When the keepalive is disabled, they'll issue a new request every time and close out the former immediately, so if requests are issued sequentially you can rack up a huge load with something really persistent.

How many is appropriate depends on what's in your portal page. If it's just the HTML page with no images, no CSS, js, etc. - nothing other than one page to fetch, then 2-3 is probably a fine limit.

Derelict

Maybe a keepalive of like 2-5 seconds would be better? Enough to reap the benefits in these rapid-fire cases but should work for all but the fastest CP logins?

If it is establishing a new session every time it won't matter anyway. Have to see a tcpdump.

cmb

Guessing since it started being noticeable post-upgrade it was probably obeying the keepalive. Disabling it is definitely nicer for purposes of redirecting users instantaneously in all cases. Maybe a smaller keepalive would be more appropriate, something like 2 seconds possibly. There are also rate limiting options in nginx which might be more appropriate instead, or maybe in addition to. Something like 1 request/sec, though for users who have CSS, images, etc. on the portal page that's too low. Maybe make that a configurable option as well.

deajan

My CP page is kinda complex with mysql, css, bootstrap and a lot of php code.

Setting maximum concurrent connections from 10 to 3 decreased cpu usage from 100 to 30%.
Setting keepalive 2 gave an additionnal cpu drop, usage is now 7%-15%.

Maybe setting keepalive to 2 is indeed a good solution for next release ?

Regards,
Ozy.