Captive portal random deaths
-
Ok. So we had another event yesterday. The captive portal was throwing Internal Server Error. This was the top 50 from the server log. What do I need to look at next?
Last 50 system log entries
Oct 26 15:21:21 lighttpd[32515]: (mod_fastcgi.c.1754) connect failed: Connection refused on unix:/tmp/php-fastcgi-cpzone.socket-2
Oct 26 15:21:21 lighttpd[32515]: (mod_fastcgi.c.3021) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 1 load: 1085
Oct 26 15:21:21 lighttpd[32515]: (mod_fastcgi.c.1754) connect failed: Connection refused on unix:/tmp/php-fastcgi-cpzone.socket-0
Oct 26 15:21:21 lighttpd[32515]: (mod_fastcgi.c.3021) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 2 load: 1085
Oct 26 15:21:21 lighttpd[32515]: (mod_fastcgi.c.3587) all handlers for /index.php?zone=cpzone&redirurl=/sj/data.gif?intype=32&andver=5.0&rom=0&actionname=kbd_main&kong=0&imei=351776064868573&mcc=310&serial=b888b894&root=0&prodid=2&channel=10000014&kvercode=4171000&androidid=221da9d43d0fbaaa&pid=3517760648685737445ee16a99140d388c9ae9ca3046d34&did=ims8xwf5fexpfhgwtmlsw54jqwhl&mac=48:5A:3F:03:6A:3F&busi_type=2&intime=20140502&newer=0&osver=21&cl=en&click=charging_dialog_show&display=10801920&brand=samsung&mode=SM-N9005&kbdver=4.17.1&gaid=7e7ba8fd-f29b-4656-a0a1-9e864c89df3c on .php are down.
Oct 26 15:21:23 php-fpm[93200]: /index.php: Successful login for user 'admin' from: REMOVED
Oct 26 15:21:23 php-fpm[93200]: /index.php: Successful login for user 'admin' from: REMOVED
Oct 26 15:21:24 lighttpd[32515]: (mod_fastcgi.c.2779) fcgi-server re-enabled: 0 /tmp/php-fastcgi-cpzone.socket
Oct 26 15:21:24 lighttpd[32515]: (mod_fastcgi.c.2779) fcgi-server re-enabled: 0 /tmp/php-fastcgi-cpzone.socket
Oct 26 15:21:24 lighttpd[32515]: (mod_fastcgi.c.2779) fcgi-server re-enabled: 0 /tmp/php-fastcgi-cpzone.socket
Oct 26 15:21:24 lighttpd[32515]: (mod_fastcgi.c.1754) connect failed: Connection refused on unix:/tmp/php-fastcgi-cpzone.socket-4
Oct 26 15:21:24 lighttpd[32515]: (mod_fastcgi.c.3021) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 0 load: 1085
Oct 26 15:21:24 lighttpd[32515]: (mod_fastcgi.c.1754) connect failed: Connection refused on unix:/tmp/php-fastcgi-cpzone.socket-2
Oct 26 15:21:24 lighttpd[32515]: (mod_fastcgi.c.3021) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 1 load: 1085
Oct 26 15:21:24 lighttpd[32515]: (mod_fastcgi.c.1754) connect failed: Connection refused on unix:/tmp/php-fastcgi-cpzone.socket-0
Oct 26 15:21:24 lighttpd[32515]: (mod_fastcgi.c.3021) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 2 load: 1085
Oct 26 15:21:24 lighttpd[32515]: (mod_fastcgi.c.3587) all handlers for /index.php?zone=cpzone&redirurl=/sj/data.gif?intype=32&andver=5.0&rom=0&actionname=kbd_main&kong=0&imei=351776064868573&mcc=310&serial=b888b894&root=0&prodid=2&channel=10000014&kvercode=4171000&androidid=221da9d43d0fbaaa&pid=3517760648685737445ee16a99140d388c9ae9ca3046d34&did=ims8xwf5fexpfhgwtmlsw54jqwhl&mac=48:5A:3F:03:6A:3F&busi_type=2&intime=20140502&newer=0&osver=21&cl=en&click=REPORT_ACTIVE_UM_V5&display=10801920&brand=samsung&mode=SM-N9005&kbdver=4.17.1&gaid=7e7ba8fd-f29b-4656-a0a1-9e864c89df3c on .php are down.
Oct 26 15:21:24 lighttpd[32515]: (request.c.1125) POST-request, but content-length missing -> 411
Oct 26 15:21:27 lighttpd[32515]: (mod_fastcgi.c.2779) fcgi-server re-enabled: 0 /tmp/php-fastcgi-cpzone.socket
Oct 26 15:21:27 lighttpd[32515]: (mod_fastcgi.c.2779) fcgi-server re-enabled: 0 /tmp/php-fastcgi-cpzone.socket
Oct 26 15:21:27 lighttpd[32515]: (mod_fastcgi.c.2779) fcgi-server re-enabled: 0 /tmp/php-fastcgi-cpzone.socket
Oct 26 15:21:27 lighttpd[32515]: (mod_fastcgi.c.1754) connect failed: Connection refused on unix:/tmp/php-fastcgi-cpzone.socket-4
Oct 26 15:21:27 lighttpd[32515]: (mod_fastcgi.c.3021) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 0 load: 1085
Oct 26 15:21:27 lighttpd[32515]: (mod_fastcgi.c.1754) connect failed: Connection refused on unix:/tmp/php-fastcgi-cpzone.socket-2
Oct 26 15:21:27 lighttpd[32515]: (mod_fastcgi.c.3021) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 1 load: 1085
Oct 26 15:21:27 lighttpd[32515]: (mod_fastcgi.c.1754) connect failed: Connection refused on unix:/tmp/php-fastcgi-cpzone.socket-0
Oct 26 15:21:27 lighttpd[32515]: (mod_fastcgi.c.3021) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 2 load: 1085
Oct 26 15:21:27 lighttpd[32515]: (mod_fastcgi.c.3587) all handlers for /index.php?zone=cpzone&redirurl=/sj/data.gif?intype=32&andver=5.0&rom=0&actionname=kbd_main&kong=0&imei=351776064868573&mcc=310&serial=b888b894&root=0&prodid=2&channel=10000014&kvercode=4171000&androidid=221da9d43d0fbaaa&pid=3517760648685737445ee16a99140d388c9ae9ca3046d34&did=ims8xwf5fexpfhgwtmlsw54jqwhl&mac=48:5A:3F:03:6A:3F&busi_type=2&intime=20140502&newer=0&osver=21&cl=en&display=10801920&brand=samsung&mode=SM-N9005&kbdver=4.17.1&gaid=7e7ba8fd-f29b-4656-a0a1-9e864c89df3c&REPORT_ACTIVE=SelfAlarm_1445872006093_1445872005790_rescd_500 on .php are down.
Oct 26 15:21:30 lighttpd[32515]: (mod_fastcgi.c.2779) fcgi-server re-enabled: 0 /tmp/php-fastcgi-cpzone.socket
Oct 26 15:21:30 lighttpd[32515]: (mod_fastcgi.c.2779) fcgi-server re-enabled: 0 /tmp/php-fastcgi-cpzone.socket
Oct 26 15:21:30 lighttpd[32515]: (mod_fastcgi.c.2779) fcgi-server re-enabled: 0 /tmp/php-fastcgi-cpzone.socket
Oct 26 15:21:30 lighttpd[32515]: (mod_fastcgi.c.1754) connect failed: Connection refused on unix:/tmp/php-fastcgi-cpzone.socket-4
Oct 26 15:21:30 lighttpd[32515]: (mod_fastcgi.c.3021) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 0 load: 1085
Oct 26 15:21:30 lighttpd[32515]: (mod_fastcgi.c.1754) connect failed: Connection refused on unix:/tmp/php-fastcgi-cpzone.socket-2
Oct 26 15:21:30 lighttpd[32515]: (mod_fastcgi.c.3021) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 1 load: 1085
Oct 26 15:21:30 lighttpd[32515]: (mod_fastcgi.c.1754) connect failed: Connection refused on unix:/tmp/php-fastcgi-cpzone.socket-0
Oct 26 15:21:30 lighttpd[32515]: (mod_fastcgi.c.3021) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 2 load: 1085
Oct 26 15:21:30 lighttpd[32515]: (mod_fastcgi.c.3587) all handlers for /index.php?zone=cpzone&redirurl=/sj/data.gif?intype=32&andver=5.0&rom=0&actionname=kbd_main&kong=0&imei=351776064868573&mcc=310&serial=b888b894&root=0&prodid=2&channel=10000014&kvercode=4171000&androidid=221da9d43d0fbaaa&pid=3517760648685737445ee16a99140d388c9ae9ca3046d34&did=ims8xwf5fexpfhgwtmlsw54jqwhl&mac=48:5A:3F:03:6A:3F&busi_type=2&intime=20140502&newer=0&osver=21&cl=en&display=10801920&brand=samsung&mode=SM-N9005&kbdver=4.17.1&gaid=7e7ba8fd-f29b-4656-a0a1-9e864c89df3c&REPORT_ACTIVE=SelfAlarm_1445872006093_1445872005790_rescd_500 on .php are down.
Oct 26 15:21:30 lighttpd[32515]: (mod_evasive.c.183) 172.18.8.102 turned away. Too many connections.
Oct 26 15:21:33 lighttpd[32515]: (mod_fastcgi.c.2779) fcgi-server re-enabled: 0 /tmp/php-fastcgi-cpzone.socket
Oct 26 15:21:33 lighttpd[32515]: (mod_fastcgi.c.2779) fcgi-server re-enabled: 0 /tmp/php-fastcgi-cpzone.socket
Oct 26 15:21:33 lighttpd[32515]: (mod_fastcgi.c.2779) fcgi-server re-enabled: 0 /tmp/php-fastcgi-cpzone.socket
Oct 26 15:21:33 lighttpd[32515]: (mod_fastcgi.c.1754) connect failed: Connection refused on unix:/tmp/php-fastcgi-cpzone.socket-4
Oct 26 15:21:33 kernel: sonewconn: pcb 0xfffff8002c506e10: Listen queue overflow: 193 already in queue awaiting acceptance (63 occurrences)
Oct 26 15:21:33 lighttpd[32515]: (mod_fastcgi.c.3021) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 0 load: 1085
Oct 26 15:21:33 lighttpd[32515]: (mod_fastcgi.c.1754) connect failed: Connection refused on unix:/tmp/php-fastcgi-cpzone.socket-2
Oct 26 15:21:33 lighttpd[32515]: (mod_fastcgi.c.3021) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 1 load: 1085
Oct 26 15:21:33 lighttpd[32515]: (mod_fastcgi.c.1754) connect failed: Connection refused on unix:/tmp/php-fastcgi-cpzone.socket-0
Oct 26 15:21:33 lighttpd[32515]: (mod_fastcgi.c.3021) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 2 load: 1085
Oct 26 15:21:33 lighttpd[32515]: (mod_fastcgi.c.3587) all handlers for /index.php?zone=cpzone&redirurl=/sj/data.gif?intype=32&andver=5.0&rom=0&actionname=kbd_main&kong=0&imei=351776064868573&mcc=310&serial=b888b894&root=0&prodid=2&channel=10000014&kvercode=4171000&androidid=221da9d43d0fbaaa&pid=3517760648685737445ee16a99140d388c9ae9ca3046d34&did=ims8xwf5fexpfhgwtmlsw54jqwhl&mac=48:5A:3F:03:6A:3F&busi_type=2&intime=20140502&newer=0&osver=21&cl=en&click=charging_dialog_show&display=1080*1920&brand=samsung&mode=SM-N9005&kbdver=4.17.1&gaid=7e7ba8fd-f29b-4656-a0a1-9e864c89df3c on .php are down. -
Root cause there is PHP's dying. With fastcgi, I guess that's 2.1.x or older version on there. Upgrade to 2.2.4 first, php-fpm is better in that regard if it's some scalability issue, and you could be triggering some problem in the old PHP version.
-
This guy:
@carzin:Oct 26 15:21:30 lighttpd[32515]: (mod_evasive.c.183) 172.18.8.102 turned away. Too many connections.
is it a client on the captive portal ?
If so, its probably a case of a lousy written 'app' that doesn't understand what a 'portal' is and hammering your your portal. The portal send over a 'login page', the client (172.18.8.102) doesn't want that page, and keeps asking again and again …. up until 'no more resources' and PHP breaks.
But, hey, that's just a thought. Can't remember well these issues with ancient versions ;)
-
This guy:
@carzin:Oct 26 15:21:30 lighttpd[32515]: (mod_evasive.c.183) 172.18.8.102 turned away. Too many connections.
is it a client on the captive portal ?
If so, its probably a case of a lousy written 'app' that doesn't understand what a 'portal' is and hammering your your portal. The portal send over a 'login page', the client (172.18.8.102) doesn't want that page, and keeps asking again and again …. up until 'no more resources' and PHP breaks.
Yes, that would be a client. The fact the client connections limit is being met should prevent it from exhausting the PHP resources. But, that is along the lines of what I was thinking, except that something it was doing repeatedly caused PHP to crash rather than just run out of resources.
-
All: this box was running 2.2.4. So I'm on the latest and greatest. I've had this problem since we started using pfsense years ago, across multiple builds.
-
It's probably not PHP. On a lower level you have this:
Oct 26 15:21:33 kernel: sonewconn: pcb 0xfffff8002c506e10: Listen queue overflow: 193 already in queue awaiting acceptance (63 occurrences)
Google FreeBSD + sonewconn (so you know that you are not the only one), try what the first link proposes.
Other links will help you nailing down the process - port - etc.
-
I need some spoon feeding. I am not a Linux guru. From the searches, I ran the following command (netstat -Lan) and saw a bunch of:
tcpX 0/0/128 which should tell me the queue size is 128.
The instructions tell you to issue the command:
sysctl kern.ipc.somaxconn=2048 and I get a readout of:kern.ipc.somaxconn:128 -> 2048
However, when I run the netstat -Lan command again, it still shows a queue value of 128. What else do I need to do?
-
I need some spoon feeding. I am not a Linux guru. …..
It even worse, Linux is not FreeBSD (at all).
Anyway, without putting my hands on your system, I can not explain why your identical pfSense is behaving differently as mine.
Adapting the queues is just a counter measure because
-> Your system can't handle the load (the queues are filing up without pfSense being able to handle it)
or
-> (so) analyze this 'load' … whats coming into your pfSense ? Is it the WAN , LAN ? other interface that is flooding ?Can you limit the number of user ?
Can tcpdump tell you something ?
What did you change from the default setup ?
Note that I'm not a network expert neither, but these are the steps that I would take to dig up the problem.
-
Well, there isn't much I can do to limit the users. The pfSense virtual machines (4 of them) are what I use to authenticate users when they connect to a setup SSID and funnel them to the appropriate configuration website. I use the DNS forwarding functionality to limit what they have access to after they connect. So, I have no control over how the users connect, or really, how many connect.
I suspect I see a lot more load on my boxes than most of you. At peak, I can have 100s of users connecting through at a single instance. And the box works just fine with that load. The pfSense death happens for apparently no reason, and is not generally associated with load. Which is why I liked the idea of a 'bad client' basically beating the hell outta the server until it dies.
-
Just a thought.
You said:
Well, there isn't much I can do to limit the users
but you really 'nag' them with this:
I use the DNS forwarding functionality to limit what they have access to after they connect.
What I make of it:
The users device knows it is connected (there is a DNS server, a gateway) : the link seems up.
But may DNS requests will not receive a reply - or a wrong reply.
What does the 'app' doing with this situation ?? A request to resolve i.e. facebook.com will yield many retries because it 'won't work'.So: use tcpdump incoming port 53 - protocol UDP and TCP to see if your DNS resolver get swamped …
=> This is just an idea ....
-
This is fun. Another zone, different from the last time, died. And this is in the syslog:
Nov 1 10:58:17 lighttpd[34493]: (connections.c.305) SSL: 1 error:1408A0C1:SSL routines:SSL3_GET_CLIENT_HELLO:no shared cipher
Nov 1 10:58:17 lighttpd[34493]: (connections.c.305) SSL: 1 error:1408A10B:SSL routines:SSL3_GET_CLIENT_HELLO:wrong version number
Nov 1 11:08:19 lighttpd[34493]: (connections.c.305) SSL: 1 error:1408A0C1:SSL routines:SSL3_GET_CLIENT_HELLO:no shared cipher
Nov 1 11:08:19 lighttpd[34493]: (connections.c.305) SSL: 1 error:1408A10B:SSL routines:SSL3_GET_CLIENT_HELLO:wrong version number
Nov 1 11:18:21 lighttpd[34493]: (connections.c.305) SSL: 1 error:1408A0C1:SSL routines:SSL3_GET_CLIENT_HELLO:no shared cipher
Nov 1 11:18:21 lighttpd[34493]: (connections.c.305) SSL: 1 error:1408A10B:SSL routines:SSL3_GET_CLIENT_HELLO:wrong version number
Nov 1 11:28:23 lighttpd[34493]: (connections.c.305) SSL: 1 error:1408A0C1:SSL routines:SSL3_GET_CLIENT_HELLO:no shared cipher
Nov 1 11:28:23 lighttpd[34493]: (connections.c.305) SSL: 1 error:1408A10B:SSL routines:SSL3_GET_CLIENT_HELLO:wrong version number
Nov 1 11:38:25 lighttpd[34493]: (connections.c.305) SSL: 1 error:1408A0C1:SSL routines:SSL3_GET_CLIENT_HELLO:no shared cipher
Nov 1 11:38:25 lighttpd[34493]: (connections.c.305) SSL: 1 error:1408A10B:SSL routines:SSL3_GET_CLIENT_HELLO:wrong version number
Nov 1 11:48:27 lighttpd[34493]: (connections.c.305) SSL: 1 error:1408A0C1:SSL routines:SSL3_GET_CLIENT_HELLO:no shared cipher
Nov 1 11:48:27 lighttpd[34493]: (connections.c.305) SSL: 1 error:1408A10B:SSL routines:SSL3_GET_CLIENT_HELLO:wrong version number
Nov 1 11:51:04 lighttpd[34493]: (connections.c.305) SSL: 1 error:1407609C:SSL routines:SSL23_GET_CLIENT_HELLO:http request
Nov 1 11:54:23 lighttpd[34493]: (connections.c.305) SSL: 1 error:1407609C:SSL routines:SSL23_GET_CLIENT_HELLO:http request
Nov 1 11:58:29 lighttpd[34493]: (connections.c.305) SSL: 1 error:1408A0C1:SSL routines:SSL3_GET_CLIENT_HELLO:no shared cipher
Nov 1 11:58:29 lighttpd[34493]: (connections.c.305) SSL: 1 error:1408A10B:SSL routines:SSL3_GET_CLIENT_HELLO:wrong version number
Nov 1 12:02:27 lighttpd[34493]: (connections.c.305) SSL: 1 error:1407609C:SSL routines:SSL23_GET_CLIENT_HELLO:http request
Nov 1 12:05:33 lighttpd[34493]: (connections.c.305) SSL: 1 error:1407609C:SSL routines:SSL23_GET_CLIENT_HELLO:http request
Nov 1 12:08:31 lighttpd[34493]: (connections.c.305) SSL: 1 error:1408A0C1:SSL routines:SSL3_GET_CLIENT_HELLO:no shared cipher
Nov 1 12:08:31 lighttpd[34493]: (connections.c.305) SSL: 1 error:1408A10B:SSL routines:SSL3_GET_CLIENT_HELLO:wrong version number -
Probably a client connection to a '443' (https) not using a https 'talk'.