Seemingly random Captive Portal issues

jeffpfse

I just found out that one of our one of our branches has been having issues with the wifi (which is supplied by a pfSense box and Captive portal) for quite some time now. Apparently when they cannot connect to the wifi, they reboot the pfSense box and after it restarts they are able to connect to the wifi again.

Since this problem has been going on for awhile now, and hasn't been previosuly reported, I am having a difficult time in figuring out the cause.

I do see a lot of apinger alerts in the system log for delay and loss. If they have a spotty internet connection over ther could this be a contributing factor as to why they are having issues? I'm kind of at a loss for a direction to proceed in troubleshooting this issue right now.

jeffpfse

Oh, and I am running 2.0.1 Release, but the problems were occuring even before I upgraded from 2.0 RC

stompro

I have seen the same intermittent captive portal issues. Running 2.0.1 on an Alix (256MB).

I'll get reports that the public wireless (served by the pfSense Captive port) has stopped working, users are no longer receiving the portal page. Rebooting the firewall restores service.

I'm not quite sure how to track down the problem either, I would be happy to try suggestions though.

Josh

mhab12

Me too - Our CP quit working several months back. Users were not being redirected to the CP login page and as such could not authenticate unless they manually browsed to the auth page. As a result we simply disabled the CP for the time being. Would love to see this resolved.

stompro

This redmine ticket may be related to our Captive Portal issues.

http://redmine.pfsense.org/issues/2475

According to the submitter, the captive portal page limits don't actually work because a lighttpd module isn't included with pfsense.

I'll try and test this out also to see if that is what is happening.
Josh

stompro

Oh, and should this thread be moved to the captive portal section?
Josh

jeffpfse

Users were not able to access the internet at all when this has happened. This recently happened again and instead of rebooting pfsense, I just power cycled the wireless access point and that seemed to fix the issue as well. Would really love to know what is causing the problem, though.

stompro

I would guess the fact that the per IP limiting of captive portal connections doesn't work at all, means that the firewall runs out of memory when a user connects to the wireless, opens up firefox which opens up 30 saved tabs, which starts 60 http and https connections, which all get redirected to the captive portal, which all spawn php processes, which sucks down all the memory. Then the IPFW and dummynet rules sometimes fail to get created since there isn't enough memory, or because the php process that triggers those rules gets killed because it is out of memory, etc.

Anyone else willing to go in on a bounty to make sure that mod_evasive gets added and actually tested to make sure it works for 2.0.2 and 2.1?

I'm a little baffled that there are gui elements and config options for the mod_evasive use, but that it was never tested to see if it worked when that feature was developed. Or did it get dropped from the package at some point by accident?

Josh

cwilkinson

Also experiencing this problem at a medium sized motel, went from 1.2.3 to 2.0.1 and the issue still remains.
This is an Atom Box, with 1GB Ram.
Any temporary solutions for this?

Slam

@jeffpfse:

Users were not able to access the internet at all when this has happened. This recently happened again and instead of rebooting pfsense, I just power cycled the wireless access point and that seemed to fix the issue as well. Would really love to know what is causing the problem, though.

I intermittently face the same problems as reported here, I found too that power cycling the AP solves the issue without having to restart the cp and thus maintaining the authenticated user sessions, I would love to get this annoyance figured out.

cmb

The only firewall-related "seemingly random captive portal issues" we see are when the CP hard timeout and DHCP lease length are misconfigured. The DHCP lease length must be at least as long as the CP hard timeout. If the IP is reassigned to a different device before the CP session expires, it won't work by design (IP-MAC association is enforced for the duration of the session).

That was a problem on Abdsalem's system.

stompro

CMB, Thanks for posting this. I just checked one of my firewalls and I have the Default lease time set to 1800(30 minutes), and the max set to 5400 (90 minutes), with no hard timeout set, and a 120 minute idle timeout. I'll reset the settings according to your guidelines and see how that works out.
Josh

jeffpfse

@cmb:

The only firewall-related "seemingly random captive portal issues" we see are when the CP hard timeout and DHCP lease length are misconfigured. The DHCP lease length must be at least as long as the CP hard timeout. If the IP is reassigned to a different device before the CP session expires, it won't work by design (IP-MAC association is enforced for the duration of the session).

That was a problem on Abdsalem's system.

I have the captive portal idle timeout set to 30 minutes and the DHCP expiration set to 24 hours.

jeffpfse

I should also mention that we about 20 machines running pfsense providing captive portals to our different offices, and this seems to be the only one that is giving us any trouble. Normally they just reboot the machine when they are unable to connect, but this is happening several times a day.

If you power cycle the wireless AP that is connected to the LAN interface, it also seems to fix the problem and then customers are redirected to the captive portal page. I tried installing a new AP, hoping that would fix the problem, but it has not.

Any ideas would be very much appreciated.

jeffpfse

Looking through the system logs, it looks like clients are being assigned IP addresses/DHCP leases, but are not being redirected to the captive portal to login.

stompro

The Mod_evasive issue is fixed with 2.0.3, that seems to have solved many of the CP problems we were having, in combination with the DHCP timeout changes suggested. I haven't had a CP issue for several weeks now.
Thanks
Josh