Captive Portal lighttpd crashing - sort of an emergency

Derelict

Out of the blue I started getting this:

Nov 26 17:59:24 pfsense lighttpd[98908]: (request.c.1113) GET/HEAD with content-length -> 400
Nov 26 17:59:25 pfsense lighttpd[98908]: (request.c.1113) GET/HEAD with content-length -> 400
Nov 26 17:59:25 pfsense lighttpd[98908]: (request.c.1113) GET/HEAD with content-length -> 400
Nov 26 18:00:45 pfsense lighttpd[98908]: (mod_fastcgi.c.3387) got a FDEVENT_OUT and didn't know why: 5
Nov 26 18:00:45 pfsense lighttpd[98908]: (mod_fastcgi.c.3387) got a FDEVENT_OUT and didn't know why: 5
Nov 26 18:00:45 pfsense kernel: pid 98908 (lighttpd), uid 0: exited on signal 6 (core dumped)

Does anyone know 1) Why this might be happening and 2) if I can just restart lighttpd without rebooting and what command I would run to do so?

The first time it dumped, I tried just saving the CP in question in the webgui and it rebooted the router.

Derelict

Is this what I should try if it crashes again?

/usr/local/sbin/lighttpd -f /var/etc/lighty-cpzone-CaptivePortal.conf

Derelict

I had to turn off the captive portal at this site because lighttpd kept crashing.

Before I turned it off I ran a packet capture on the portal IP address. I received 3520 TCP SYN requests to port 8000 in 67 seconds (52/sec).

There is one thing I am thinking of trying to do to increase the capability of this hardware before turning the portal back on.

My portal page uses bootstrap CSS for formatting. I am loading individual .js files instead of concatenating only what I need into one file. Seems I ought to be able to reduce the number of requests to load the portal page from about 12 to 3.

I might also serve the bootstrap js files, images, etc from another, more capable web server, which should allow me to get the lighttpd requests down to 1.

Note that this was an uncommon makeup of people and devices using the portal network (which has never before given me any problem peaking at about 3500 concurrent sessions and seemingly ready for much more than that). A great many of them were from China. Would a bunch of oversized header requests (unicode??) be an issue for lighttpd?

Am I on the right track to fixing this? It also crashed a few times. I have a crash dump. I am seriously considering opening an incident with ESF for this one. I'm kind of in a bind.

cmb

That kind of scale is no problem for lighttpd, that's actually a low load. Whether you make 12 requests or 1 to the portal page, it shouldn't matter. Make sure your other resources in the page aren't generating redirects themselves. Also enable "Maximum concurrent connections" in CP so if there is some system out there making tons of requests in the background (usually malware) it's not trying to deal with potentially thousands of requests a second to a bot. The only thing I see on that crash is a 4 year old lighttpd bug that was fixed 4 years ago and long since fixed in our releases.

Support probably your best bet. One of us can check out the crash dump and track it down from there. If you go that route, include a link to this thread in that ticket for history.