Captive Portal Stopped working… Lock file.
-
Captive Portal stopped working today. I looked under Status->System Logs->Portal Auth
Mar 2 23:44:37 logportalauth[97784]: LOCKWARNING: waiting for lock for 8 minute/s!
Mar 2 23:45:04 logportalauth[749]: LOCKWARNING: waiting for lock for 5 minute/s!
Mar 2 23:45:04 logportalauth[99681]: LOCKWARNING: waiting for lock for 3 minute/s!
Mar 2 23:45:26 logportalauth[927]: LOCKWARNING: waiting for lock for 4 minute/s!
Mar 2 23:45:37 logportalauth[97784]: LOCKWARNING: waiting for lock for 9 minute/s!
Mar 2 23:46:04 logportalauth[749]: LOCKWARNING: waiting for lock for 6 minute/s!
Mar 2 23:46:04 logportalauth[99681]: LOCKWARNING: waiting for lock for 4 minute/s!
Mar 2 23:46:26 logportalauth[927]: LOCKWARNING: waiting for lock for 5 minute/s!
Mar 2 23:46:37 logportalauth[97784]: LOCKWARNING: waiting for lock for 10 minute/s!
Mar 2 23:46:37 logportalauth[97784]: LOCKERROR: waiting for lock for 10 minute/s - EXITING PROCESS!To fix the problem I read the PHP code and found the lock mechanism and how to clear the lock.
Go to Diagnostics -> Command in the web gui and run the following command.
rm /var/run/captiveportal.lockThe last change I did in the firewall was changing the webGUI protocol from http to https and I did this while people were using Captive Portal. After discovering the problem and the answer I went back to http and then changed the webGUI protocol back to https and the problem occurred again. So I believe what is happening is during the change it interrupts Captive Portal and the lock file does not get deleted.
I'm posting this information as documentation and in hopes that it will help someone else.
Best Regards!
-
Thanks, just commited a fix.
-
Thank-you! An amazingly fast response. I told my boss that you already committed a fix we are both impressed.
-
@mcrane:
Captive Portal stopped working today. I looked under Status->System Logs->Portal Auth
Mar 2 23:44:37 logportalauth[97784]: LOCKWARNING: waiting for lock for 8 minute/s!
Mar 2 23:45:04 logportalauth[749]: LOCKWARNING: waiting for lock for 5 minute/s!
Mar 2 23:45:04 logportalauth[99681]: LOCKWARNING: waiting for lock for 3 minute/s!
Mar 2 23:45:26 logportalauth[927]: LOCKWARNING: waiting for lock for 4 minute/s!
Mar 2 23:45:37 logportalauth[97784]: LOCKWARNING: waiting for lock for 9 minute/s!
Mar 2 23:46:04 logportalauth[749]: LOCKWARNING: waiting for lock for 6 minute/s!
Mar 2 23:46:04 logportalauth[99681]: LOCKWARNING: waiting for lock for 4 minute/s!
Mar 2 23:46:26 logportalauth[927]: LOCKWARNING: waiting for lock for 5 minute/s!
Mar 2 23:46:37 logportalauth[97784]: LOCKWARNING: waiting for lock for 10 minute/s!
Mar 2 23:46:37 logportalauth[97784]: LOCKERROR: waiting for lock for 10 minute/s - EXITING PROCESS!To fix the problem I read the PHP code and found the lock mechanism and how to clear the lock.
Go to Diagnostics -> Command in the web gui and run the following command.
rm /var/run/captiveportal.lockWhen this happened did the web interface become extremely slow?
-
Yes it can make it slow and completely unresponsive if the page you are viewing has anything to do with captive portal. At the point if people are trying to access captive portal it seems to completely bog down the web server on PFSense. It happened again today when I was enabling the optional interface. I had to delete the lock file again. The instructions to do that are:
Go to Diagnostics -> Command in the web gui and run the following command.
rm /var/run/captiveportal.lockThen used the console through ssh option 11 to 'Restart the webConfigurator' and then captive portal worked again.
It may be that the fix that was committed may cover this in the future when the fix becomes available in release.
I wonder if the lock file should eventually age and automatically expire kind of like a TTL. At this time if the lock file ages over 10 minutes I doubt it will ever finish and unlock by itself without manual intervention. The lock file so far only seems to fail to get removed if I'm making changes to the firewall while the captive portal is being used.
-
I have practically the same issue, but due to RADIUS server go down. Also this issue was connected with DHCP Server (logs show duplicate uids).
-
I've the same pb :
Jul 8 11:20:55 192.168.1.250 logportalauth[87619]: LOCKWARNING: waiting for lock for 1 minute/s! Jul 8 11:21:53 192.168.1.250 logportalauth[87814]: LOCKWARNING: waiting for lock for 1 minute/s! Jul 8 11:21:55 192.168.1.250 logportalauth[87619]: LOCKWARNING: waiting for lock for 2 minute/s! Jul 8 11:22:53 192.168.1.250 logportalauth[87814]: LOCKWARNING: waiting for lock for 2 minute/s! Jul 8 11:22:55 192.168.1.250 logportalauth[87619]: LOCKWARNING: waiting for lock for 3 minute/s! Jul 8 11:23:54 192.168.1.250 logportalauth[87814]: LOCKWARNING: waiting for lock for 3 minute/s! Jul 8 11:23:56 192.168.1.250 logportalauth[87619]: LOCKWARNING: waiting for lock for 4 minute/s! Jul 8 11:24:54 192.168.1.250 logportalauth[87814]: LOCKWARNING: waiting for lock for 4 minute/s! Jul 8 11:24:56 192.168.1.250 logportalauth[87619]: LOCKWARNING: waiting for lock for 5 minute/s! Jul 8 11:25:54 192.168.1.250 logportalauth[87814]: LOCKWARNING: waiting for lock for 5 minute/s! Jul 8 11:25:56 192.168.1.250 logportalauth[87619]: LOCKWARNING: waiting for lock for 6 minute/s! Jul 8 11:26:54 192.168.1.250 logportalauth[87814]: LOCKWARNING: waiting for lock for 6 minute/s! Jul 8 11:26:56 192.168.1.250 logportalauth[87619]: LOCKWARNING: waiting for lock for 7 minute/s! Jul 8 11:27:54 192.168.1.250 logportalauth[87814]: LOCKWARNING: waiting for lock for 7 minute/s! Jul 8 11:27:56 192.168.1.250 logportalauth[87619]: LOCKWARNING: waiting for lock for 8 minute/s! Jul 8 11:28:54 192.168.1.250 logportalauth[87814]: LOCKWARNING: waiting for lock for 8 minute/s! Jul 8 11:28:56 192.168.1.250 logportalauth[87619]: LOCKWARNING: waiting for lock for 9 minute/s! Jul 8 11:29:54 192.168.1.250 logportalauth[87814]: LOCKWARNING: waiting for lock for 9 minute/s! Jul 8 11:29:56 192.168.1.250 logportalauth[87619]: LOCKWARNING: waiting for lock for 10 minute/s! Jul 8 11:29:56 192.168.1.250 logportalauth[87619]: LOCKERROR: waiting for lock for 10 minute/s - EXITING PR$ Jul 8 11:30:54 192.168.1.250 logportalauth[87814]: LOCKWARNING: waiting for lock for 10 minute/s! Jul 8 11:30:54 192.168.1.250 logportalauth[87814]: LOCKERROR: waiting for lock for 10 minute/s - EXITING PR$ Jul 8 11:31:08 192.168.1.250 logportalauth[87619]: LOCKWARNING: waiting for lock for 1 minute/s! Jul 8 11:32:08 192.168.1.250 logportalauth[87619]: LOCKWARNING: waiting for lock for 2 minute/s! Jul 8 11:32:55 192.168.1.250 logportalauth[89108]: LOCKWARNING: waiting for lock for 1 minute/s!
you have an idea to solve this and especially why it happens?
thank you
-
My issue was usually caused after a reboot many users would hit the radius at the same time. This overwhelmed the web server that runs the GUI including the captive portal login. The problem you described with the Radius server error could easily cause a similar issue. The delay can overwhelm the web server. The web server used in pfSense is powerful but the configuration for it with pfSense 1.2 it is set with only enough resources to handle 1-2 connections at a time. While this saves ram and cpu resources it wreaks havoc on captive portal in an environment that is larger than one would expect for home use.
Some changes have been made that will hopefully be in the final version of 1.3. But right now to have the changes for 1.2 they have to be put in place manually.
Now addressing the lock warning. The way captive portal was developed when someone begins authentication the code drops a file on the file system to represent that no other user can logon while the current one finishes processing. If the login fails to complete then this lock file does not get deleted. And needs to be manually deleted in order for things to get moving again. I work as a contractor and for one system I maintain that lockwarning would occur occasionally. When it did I would manually remove the lock file. Eventually that got old so I finally automated it by deleting the lock file automatically when it was older than 3-4 minutes. I figure if it doesn't clear up automatically within 3-4 minutes then it is not likely to clear up at any time. After making these changes captive portal has worked pretty well.
-
Have you done a cron to delete the .lock file ?
I think it's interessant to give the solution for other people :)
thanks a lot !
-
The cool thing is that it doesn't require a cron job. There is code that gives a warning that the lock file is 3 or more minutes old. Look for that and use an php file delete command 'unlink' then change the log message to something like 'LOCKWARNING: lock file automatically removed'
-
can you give your code ? wich file must be modified ?
-
Attached is the file you requested. It has been tested and used on pfSense 1.2.
/etc/inc/captiveportal.incI marked the change with //custom remove lock file if older than 3 minutes
To change the contents of the file use the pfSense web gui and go to 'Diagnostics: Edit File'. Put '/etc/inc/captiveportal.inc' in the input box then press 'Load'. After adding the next code then press 'Save'. Other option is you can use Firefox with Fire FTP extension with 'sftp' selected.
The change was designed to remove the lock file if it has gotten older than 3 minutes. My thoughts have been if it is longer than that it is likely broken. This will remove the lock file and allow the system to process users without manual intervention.
This a band-aid so the system can heal itself. I think it would be best to find why the captive portal lock file gets stuck. The info mentioned by soisses at http://forum.pfsense.org/index.php/topic,5547.0.html may be helpful. Given enough time I believe I could resolve this issue the trick is finding the time.
-
I just sync'd the changes from 1.3 -> 1.2 for the records.
-
Thanks! I look forward to testing it.