Error during table cpzone1 creation. Error message: database is locked



  • Hi,

    I am using Pfsense (2.2.5) captive portal authentication with uses are on freeradius 2 + daloradius. Freeradius2 + daloradius is on another Ubuntu Server.

    I am using squid2 transparent + squidguard and CP is on LAN interface.

    There are 13500 registered users in freeradius. There are 1500-3000 concurrent users 24/7.

    Internet Bandwidth usage varies from 250-450Mbps and every captive portal user is given 2048 kbps download/upload speed.

    Problem is when a pfsense captive portal page pops up for a user for authentication and after user enters login-id and password (both are correct), logout page does not comes rather then again portal page pops up but on top of the page a message is written :-

    " logportalauth[7685]: Zone: cpzone1 - Error during table cpzone1 creation. Error message: database is locked:"

    Again user enters login details and again portal page with error message pops up, but if user refreshes the page then logout page comes only which is a normal behavior.

    This happens with every 3rd or 4th user not with all users.

    Logout CP page is available using https://forum.pfsense.org/index.php?topic=77143.0

    Regards



  • Hi

    Gertjan, thanks for your reply.

    I am using Server, HP-DL360-G6, Dual Intel Xeon Processor, X5550 @2.67Ghz, 4 cores per CPU, 12Mb cache per CPU with 16Gb of RAM.

    Processor usage on pfsense most of the time is 15-20% and rarely reaches 50-70% for a very little time when pfsense is restarted as so many concurrent users want to login simultaneously.

    Gertjan, I do not  understand how to do it.
    Pls provide more details as I am not able to implement and test your suggestions



  • Hi,

    Gertjan, thanks for your explanation

    File size you asked is 550912, it not even in Mbs

    I noted file size at 7am and there were only 512 user online at that time, might be file size grows when there are more users.

    Server CPU type copied from pfsense

    Intel(R) Xeon(R) CPU X5550 @ 2.67GHz
    Current: 1665 MHz, Max: 2665 MHz
    16 CPUs: 2 package(s) x 4 core(s) x 2 SMT threads

    I am using a 7200 RPM SATA 1Tb HDD

    Regards



  • Hi

    Gertjan,

    I have implemented the changes suggested by you, to see results I need to check it for atleast 1 day. I ll update about the result.

    Hope it ll work

    Regards



  • Good.

    Do you have a tool like this : https://www.test-domaine.fr/munin/brit-hotel-fumel.net/pfsense.brit-hotel-fumel.net/portalusers.html ? (my real time captive portal users stats).

    Keep my fingers crossed.



  • Hi,

    Nope, I don't have any tool like this, I check users and bandwidth graphs on RRd. What more functionalities does your graph tool give than RRD?

    Usage increased up to 14% and users currently logged in CP are 1520.

    copied from pfsense
    /var/run (ufs in RAM): 14% of 3.4M

    Max users expected to log in CP are up to 3000, so if it increases to 40% then I think there won't be any problem.

    Good thing is I myself have faced this error of database is locked till yesterday, but after changes, it doesn't occur before me, but updates from other users are still pending…

    Regards



  • @amitaussie:

    Nope, I don't have any tool like this, I check users and bandwidth graphs on RRd. What more functionalities does your graph tool give than RRD?

    I wanted to have a tool that monitors my servers (on the Internet). I added also some graphs coming from pfSense, which isn't "on the net" of course.
    It helps me to see from everywhere 'on the planet' how things are going without logging in remotely.
    If I was responsible for the access of several thousand of users, I would really consider adding such tools. It could run on a 'couple of $' VPS.

    My pfsense setup https://www.test-domaine.fr/munin/brit-hotel-fumel.net/pfsense.brit-hotel-fumel.net/index.html (and look one level up, you could see all my servers).

    @amitaussie:

    Good thing is I myself have faced this error of database is locked till yesterday, but after changes, it doesn't occur before me, but updates from other users are still pending…

    So far, so good  :)
    If users have logging problems related to the errors you showed above, then log-lines present in pfsense WILL tell you even before they come to you.

    Keep checking the size of the RAM disk - when this one goes full, pfsense WILL panic. (but I guess it won't happen).



  • Check out this https://forum.pfsense.org/index.php?topic=103732.0 (it concerns a pfsense Beta 2.3 but the part where the errors come from looks the same / didn't change … )
    Time for a  :( I guess ....
    Probably we are on the wrong track.

    => Still no errors anymore ?

    Btw : your are using a VM or is your pfsense running on a native device ?



  • Hi,

    Gertjan,

    There are no database errors reported from anywhere in the network till

    I think that changes have worked, I ll keep checking it for few more days.

    Thank you,

    I have read that thread you mentioned, but those errors might be due to vmware.

    I am not using vmware. Pfsense is running on native hardware.

    I am facing another CP error though, this error has not started after the changes but it was already happening.

    Error is CP service stops and then CP  page does not pop up.
    CP service does stop in 48 hours. I have also seen sometimes squidguard service does stop but it starts automatically but CP service does not start automatically, I manually have to restart.



  • @amitaussie:

    ….
    Error is CP service stops and then CP  page does not pop up.
    CP service does stop in 48 hours. I have also seen sometimes squidguard service does stop but it starts automatically but CP service does not start automatically, I manually have to restart.

    Before (2.2.5 ?) squid and Captive portal was impossible.
    Now : I don't know (I don't know why I should use Squid …)
    My Captive Portail never dies.

    All I know is : Squid + Captive portal …. unstable, or worse.



  • Hi,

    I am using Pfsense at 2 more places and at every place no pf users are 1200-1400. I have never faced any database locked error or CP service stopped error and there are always 200-400 concurrent users 24/7.

    And there also I am using squid2.7 + squid-guard and CP authentication and using 2.2.5 version, but started from version 2.0.1 and never faced any such difficulties .

    Regards



  • This is the root issue, we're working on it.
    https://redmine.pfsense.org/issues/5622



  • Hi,

    Hope to see a solution soon.

    Thanks and regards





  • ok, I got it.

    So should I undo the earlier changes made in /var/run ?

    Regards



  • @amitaussie:

    So should I undo the earlier changes made in /var/run ?

    Absolutely, yes. That was never a good idea. Though you can just wait until a fixed snapshot's available and upgrade to it to make sure you're back to a sane state.

    I just committed what should be a good fix for the issue here. 2.2.6 snapshot builders just restarted, we'll have a new snapshot in a couple hours that'll hopefully fix this.



  • Those who are seeing this issue, please try the newest available 2.2.6 snapshot and report back. It looks to be fixed, but I would like additional confirmation.
    64 bit
    https://snapshots.pfsense.org/FreeBSD_releng/10.1/amd64/pfSense_RELENG_2_2/updates/?C=M;O=D

    32 bit
    https://snapshots.pfsense.org/FreeBSD_releng/10.1/i386/pfSense_RELENG_2_2/updates/?C=M;O=D



  • Ok,

    I,ll apply the 2.2.6-64bit snapshot and report back.

    Thanks and Regards,



  • Sorry for late reply. As I cant afford to down server even for 15 mins for last 4 days for some urgent time bound jobs.

    I just upgraded to 2.2.6 release.

    AND again I just saw the  following error at CP login page, as this is the best time to check this error as lots of hit for CP authentication are coming, what I do, I keep on refreshing the CP authentication page, I do not enter my login id and password. So after 2-3 refreshes I am able to see this error on my system. So anywhere else in the network, other people would also be seeing this error.

    Warning: SQLite3::exec(): database is locked in /etc/inc/captiveportal.inc on line 2045 Warning: SQLite3::query(): Unable to prepare statement: 5, database is locked in /etc/inc/captiveportal.inc on line 2057

    I haven't seen this error in last 4-5 days when I implemented changes suggest by Gertjan.

    I am reverting back to Gertjan suggested changes for the time being…

    Regards

    Regards & Thanks



  • @amitaussie:

    Warning: SQLite3::exec(): database is locked in /etc/inc/captiveportal.inc on line 2045

    Line 2045 : https://github.com/pfsense/pfsense/blob/RELENG_2_2/etc/inc/captiveportal.inc#L2045 - a simple log line
    @amitaussie:

    Warning: SQLite3::query(): Unable to prepare statement: 5, database is locked in /etc/inc/captiveportal.inc on line 2057

    Line 2057 : https://github.com/pfsense/pfsense/blob/RELENG_2_2/etc/inc/captiveportal.inc#L2057 == empty line ??,

    Your file /etc/inc/captiveportal.inc contains the edits that puts the database file on RAM (what I suggested in the beginning), but this doesn't explain why your version of /etc/inc/captiveportal.inc is so much different as the original : https://github.com/pfsense/pfsense/blob/RELENG_2_2/etc/inc/captiveportal.inc
    The database error should happen around lines 1300 - 1400 ….
    You're NOT using an original pfsense setup, it has been patched with something else ??



  • Hi,

    Gertjan!

    My captiveportal.inc is different as I am using a file which is patched for manual logout as mentioned in this post of yours
    https://forum.pfsense.org/index.php?topic=77143.15

    But after making changes in /var/run database error is certainly not appearing and everything is working ok.

    Regards



  • Hi

    Gertjan!

    I am not able to find changes you suggested for /var/run.

    Can you please re-send the changes as these were helping.
    Regards



  • @amitaussie:

    I am not able to find changes you suggested for /var/run.

    Can you please re-send the changes as these were helping.

    They should be on the forum - I didn't remove anything.
    It was a question of changing the the 2 places where the data base was created.

    But, be careful : read this https://forum.pfsense.org/index.php?topic=103707.msg580857#msg580857 (post above).
    I don't know why it is a bad idea, but if cmb says so, I will not discus that (although I do want to the the 'why' part  ;)).



  • @amitaussie:

    Hi

    Gertjan!

    I am not able to find changes you suggested for /var/run.

    I removed them because they weren't related to the root issue and weren't a good idea. Just upgrade to 2.2.6, that will properly fix the root issue.



  • @cmb:

    I removed them because ….....

    That's what I figured out.

    @amitaussie : I advise you to use a clean "2.2.6" version (undo ALL changes in /etc/inc/captiveportal.inc).
    If you still see the same issues ("database locked") then I really start to think that your hardware isn't simply fast enough ….

    edit : @cmb : I wasn't looking for solving a programming error, but using a RAM disk instead of ordinary disk so treatment would speed up.
    From what I made of it, database locking and waiting for it wasn't optimal, so their was a chance an execution path (user logging in) could fall into the attempt of another user logging in at the same time. The database would be locked at the same, the last attempts bails out.
    Now, if locking, updating and unlocking takes time, and enough attempt are made, the locking issue start to pop up (race condition).
    That's why I was thing of borrowing temporarily some RAM space.

    Of course, when pfSense is rebooted, this RAM based database file will be gone which produces new unpredictable results.
    When the (very small) RAM disk fills up, new, totally unpredictable results will pop up.
    Etc ....

    The test existed in the fist place to see if "time" could be an issue. It clearly was.



  • Indeed because the RAM disk would be much faster than the hard drive, the locked operations would complete faster, which would lessen the chance of hitting the lock contention. It was a fine suggestion for something to try at the time, when we hadn't identified the root cause or fixed it yet. But once the lock issue was fixed, making those kinds of changes aren't desirable, which is why I removed it after the root problem was fixed. Otherwise what we've seen in similar situations in the past is people making changes that were intended to address an issue that no longer exists and just causing problems in the process (then reporting problems, not mentioning the changes they made, and we chase our tails wasting time trying to figure out what the issue is, when it was actually user modification induced).

    If anyone's still seeing any locking issues, please make sure you're on a stock 2.2.6 release (no modifications), and start a new thread describing what's happening. The issue in this thread was resolved, so something different is happening if you're still seeing that.



  • Hi

    Gertjan & CMB

    Thanks for your valuable inputs.

    As suggested I have made a clean install of 2.2.6 and no more database locked errors are found.

    Then I changed 2.2.6 captiveportal.inc & index.php for manual logout page as mentioned in https://forum.pfsense.org/index.php?topic=77143.0

    Everything is working fine now with manual logout page.

    Earlier when i upgraded to 2.2.6, I used to copy modified captiveportal.inc & index.php from version 2.2.4 to get manual logout page. I didnt modify 2.2.6 version captiveportal.inc & index.php for manual logout page. So that's why I was getting database error after upgrading to 2.2.6.

    Sorry, it was my fault, I should have made changes to 2.2.6 version captiveportal.inc & index.php files to get logout page.

    Thanks & regards


Log in to reply