Slow reboots due to captive portal rule regeneration ?
-
We are having an issue with pfsense on reboot which believe is relating to the captive portal and some kind of rule sync routine (occurs on both clean and un-clean restarts and appears to take longer the more uptime the box has had since last reboot).
Console shows the following messages for about 15-20minutes:
Ipfw: rule 10023 does not exist
Ipfw: rule 10009 does not exist
Ipfw: rule 10109 does not exist
….Note the rule numbers all appear above 10,000. After it finishes it runs the final part of the boot sequence (bring up interfaces etc). Note SSH is already loaded and can SSH to the box and can ping outside (wan) from SSH session. It seems users behind the firewall (on LAN) are unable to access the WAN (internet in this case) until the rule messages and everything finished. This causes us headache. (again assume its captive portal or something causing this?)
Running PFsense 1.2.3-Release (build on Sun Dec 6 23:21:36 EST 2009). Using traffic shaper and Captive Portal in this install.
Any advice on how to disable this / look at optimizing / where it occurs / why its needed…..or just help all round would be genuinely appreciated.
-
ipfw would be either captive portal or schedule rules. It doesn't need to delete anything at boot though, I've never seen anything like that. 2.0 does have a number of enhancements in how things are handled and works well for CP, might be something to try. Alternatively you can find the cause by tracking down how ipfw is handled in captiveportal.inc
-
Thanks for that. Can see where its occurring now:
When Captive Portal starts up it calls routine
captiveportal_radius_stop_all()That then attempts send a radius stop request for each username in the captive portal db file.
Problem is it also calls getVolume(rulenumber) to get the number of input/output bytes for each username/rule from the db file. That rule doesnt exist so sits there and errors out. Has to go through each and every username/rule in the db file and this is a slow process when you have a large number of users in the file still (guessing its slow due to error handling or something)
Hmmm….whats the best way to clear this up I wonder ?
- Cant let it run through the routine closing all accounting records for users - so could just delete the captiveportal db file on service startup (that would sort out my un-clean reboot issues), but causes a mess with concurrent logon checks
- Could just send a dummy username through to radius to truncate the radius accounting table on startup - seems a crazy way to handle it
- Could run a query against the MySQL radacct table manually on service startup (bypassing radius all together). No such a detached model and also dont have extensions in place to hit MySQL directly.
Who's got a good idea ?