[SOLVED] Crash reporter - Captive Portal
-
Hi to all,
Using 2.3.a.20151208.1928 with a load of 10-20 users, the errors are shown in the crash report:
[09-Dec-2015 12:33:24 America/Sao_Paulo] PHP Stack trace:
[09-Dec-2015 12:33:24 America/Sao_Paulo] PHP 1. {main}() /usr/local/captiveportal/index.php:0
[09-Dec-2015 12:33:24 America/Sao_Paulo] PHP 2. captiveportal_isip_logged() /usr/local/captiveportal/index.php:73
[09-Dec-2015 12:33:24 America/Sao_Paulo] PHP 3. captiveportal_read_db() /etc/inc/captiveportal.inc:2411
[09-Dec-2015 12:33:24 America/Sao_Paulo] PHP 4. captiveportal_opendb() /etc/inc/captiveportal.inc:1456
[09-Dec-2015 12:33:24 America/Sao_Paulo] PHP 5. SQLite3->exec() /etc/inc/captiveportal.inc:1434
[09-Dec-2015 12:33:24 America/Sao_Paulo] PHP Stack trace:
[09-Dec-2015 12:33:24 America/Sao_Paulo] PHP 1. {main}() /usr/local/captiveportal/index.php:0
[09-Dec-2015 12:33:24 America/Sao_Paulo] PHP 2. captiveportal_isip_logged() /usr/local/captiveportal/index.php:73
[09-Dec-2015 12:33:24 America/Sao_Paulo] PHP 3. captiveportal_read_db() /etc/inc/captiveportal.inc:2411
[09-Dec-2015 12:33:24 America/Sao_Paulo] PHP 4. captiveportal_opendb() /etc/inc/captiveportal.inc:1456
[09-Dec-2015 12:33:24 America/Sao_Paulo] PHP 5. SQLite3->exec() /etc/inc/captiveportal.inc:1442
[09-Dec-2015 12:33:24 America/Sao_Paulo] PHP Stack trace:
[09-Dec-2015 12:33:24 America/Sao_Paulo] PHP 1. {main}() /usr/local/captiveportal/index.php:0
[09-Dec-2015 12:33:24 America/Sao_Paulo] PHP 2. header() /usr/local/captiveportal/index.php:95
[09-Dec-2015 12:50:09 America/Sao_Paulo] PHP Stack trace:
[09-Dec-2015 12:50:09 America/Sao_Paulo] PHP 1. {main}() /usr/local/captiveportal/index.php:0
[09-Dec-2015 12:50:09 America/Sao_Paulo] PHP 2. captiveportal_isip_logged() /usr/local/captiveportal/index.php:73
[09-Dec-2015 12:50:09 America/Sao_Paulo] PHP 3. captiveportal_read_db() /etc/inc/captiveportal.inc:2411
[09-Dec-2015 12:50:09 America/Sao_Paulo] PHP 4. captiveportal_opendb() /etc/inc/captiveportal.inc:1456
[09-Dec-2015 12:50:09 America/Sao_Paulo] PHP 5. SQLite3->exec() /etc/inc/captiveportal.inc:1434
[09-Dec-2015 12:50:09 America/Sao_Paulo] PHP Stack trace:
[09-Dec-2015 12:50:09 America/Sao_Paulo] PHP 1. {main}() /usr/local/captiveportal/index.php:0
[09-Dec-2015 12:50:09 America/Sao_Paulo] PHP 2. captiveportal_isip_logged() /usr/local/captiveportal/index.php:73
[09-Dec-2015 12:50:09 America/Sao_Paulo] PHP 3. captiveportal_read_db() /etc/inc/captiveportal.inc:2411
[09-Dec-2015 12:50:09 America/Sao_Paulo] PHP 4. captiveportal_opendb() /etc/inc/captiveportal.inc:1456
[09-Dec-2015 12:50:09 America/Sao_Paulo] PHP 5. SQLite3->exec() /etc/inc/captiveportal.inc:1442
[09-Dec-2015 12:57:39 America/Sao_Paulo] PHP Stack trace:
[09-Dec-2015 12:57:39 America/Sao_Paulo] PHP 1. {main}() /usr/local/captiveportal/index.php:0
[09-Dec-2015 12:57:39 America/Sao_Paulo] PHP 2. portal_allow() /usr/local/captiveportal/index.php:261
[09-Dec-2015 12:57:39 America/Sao_Paulo] PHP 3. portal_reply_page() /etc/inc/captiveportal.inc:2233
[09-Dec-2015 12:57:39 America/Sao_Paulo] PHP 4. header() /etc/inc/captiveportal.inc:1878
[09-Dec-2015 13:03:05 America/Sao_Paulo] PHP Stack trace:
[09-Dec-2015 13:03:05 America/Sao_Paulo] PHP 1. {main}() /usr/local/captiveportal/index.php:0
[09-Dec-2015 13:03:05 America/Sao_Paulo] PHP 2. portal_allow() /usr/local/captiveportal/index.php:261
[09-Dec-2015 13:03:05 America/Sao_Paulo] PHP 3. portal_reply_page() /etc/inc/captiveportal.inc:2233
[09-Dec-2015 13:03:05 America/Sao_Paulo] PHP 4. header() /etc/inc/captiveportal.inc:1878 -
Hi,
You see this
SQLite3->exec() /etc/inc/captiveportal.inc
a lot.
Knowing that /var/db/captiveportal{$cpzone}.db
({$cpzone} is replaced by your zone ID)
is one of the most read/written file on the system, and accessing it fails, I'm pretty sure you have "disk" problems.Time to run some
[MENU] Diagnostics: S.M.A.R.T. Monitor Tools
tests. -
Hi, Gertjan.
Thank you for the reply, but I´m running that on my VMware cluster, so I think "disk" is not my problem. May be a VMware related issue or database concurrency?
I started a fresh new install ( new VM on another VMware host ) and the errors still showing up.Thanks
Bilhega -
"Database concurrency", maybe, but check this out : https://forum.pfsense.org/index.php?topic=103707.0
If you have several thousands of users, ok ….. not "20".Have you attributed enough 'horsepower' to your pfsense 'vm' ?
-
I think so, as follow (from pfSense):
Intel(R) Xeon(R) CPU E5520 @ 2.27GHz
4 CPUs: 2 package(s) x 2 core(s)
with 2GB memory.Looking into the logs, I found these messages:
Dec 10 12:22:51 logportalauth 37687 Zone: portalwifigratuito - Error during table portalwifigratuito creation. Error message: database is locked. Resetting and trying again.
Dec 10 12:22:51 logportalauth 37687 Zone: portalwifigratuito - Still unable to create tables for portalwifigratuito. Error message: database is locked. Remove the database file manually and try again.
Dec 10 12:22:51 logportalauth 36740 Zone: portalwifigratuito - Error during table portalwifigratuito creation. Error message: disk I/O error. Resetting and trying again.
Dec 10 12:22:52 logportalauth 36740 Zone: portalwifigratuito - Successfully reinitialized tables for portalwifigratuito – database has been reset.
Dec 10 12:23:09 logportalauth 92880 Zone: portalwifigratuito - ACCEPT: unauthenticated, xx:xx:xx:xx:xx:xx, 192.168.xx.xxand the question is ... "who is locking the database on that moment?" ???
Bilhega
-
…..
and the question is ... "who is locking the database on that moment?" ???As far as I known, a database (a file) exists for every unique captive portal.
The engine is "sqllite", build into PHP.
The errors come from here https://github.com/pfsense/pfsense/blob/master/src/etc/inc/captiveportal.inc#L1399A captive portal login session will execute captiveportal_read_db() and captiveportal_write_db() and ones of these, … well, just can't.
To make things even less difficult to understand : login attempts from several users can happen simultaneously. This means the file can be written out, read, and, why not, written out by another instance at about nearly the same time.
When a file is created for (over) writing, the access to that unique structure is locked, the OS is flushing out the bytes, and the file is unlocked.
All this to guarantee that no two (or more) 'programs' write to the same file at the same moment, which will (this is the easy part) create a mess ....What you see is : the file is opened (for read or for write), but 'someone else' is currently still 'writing it'.
That triggers an error.I never saw this on my pfsense setup, but your are already the third who mentions the issue.
I showed you the thread, I thought it was the load (number of users login in) on the system, but now, given your number of users, I have a doubt that that's the reason.
It's something else ..... -
Maybe the DDL command to check if the table exists is not a good ideia at this point.
On my "box", I added the code below in captiveportal.inc to check table_exists with a SELECT statement, so no lock is applied… and until now I got 40 connections without any errors.
$tableCheck = $DB->query("SELECT name FROM sqlite_master WHERE name='captiveportal'");
if ($tableCheck->fetchArray() === false)
{
if (! $DB->exec($createquery)) {...
-
Tables, database …. whatever.
If the file (which contains a 'sqllite3' database) is written out by another 'thread', it will be locked ....As an alternative, what out a simple wait for 1 millisecond 'usleep(1000)' ?
-
This is the root issue for the "Error message: database is locked. Resetting and trying again" and similar logs, we're looking into it.
https://redmine.pfsense.org/issues/5622 -
I running about 2 days without any erros ( crashes or locking logs ) with the use of busytimeout :
$DB->busyTimeout(5000);
if (! $DB->exec($createquery)) {
…"Sets a busy handler that will sleep until the database is not locked or the timeout is reached."
public bool SQLite3::busyTimeout ( int $msecs )reference: http://php.net/manual/en/sqlite3.busytimeout.php
I hope this is a good solution.
-
hi.
can you show exactly where you enter these parameters in captiveportal.inc
thanks
-
can you show exactly where you enter these parameters in captiveportal.inc
Look (Ctrl-F) for :
if (! $DB->exec($createquery)) {
;)
-
;D ;D
-
That one spot's probably adequate for the issue most are hitting, but it needs to be done in more than that spot. This should be the proper fix.
https://git.pfmechanics.com/pfsense/pfsense/commit/7c4b3b2060c291a9c166d8196d053afac97861afNext snapshot after the time of this post will have that.
-
New snapshots including that fix are out, please try and report back.
-
@cmb:
New snapshots including that fix are out, please try and report back.
On my "spot", i got about 120 simultaneous connections and it's working fine. I think it's solved, but could someone with big numbers give us more reports ?
Thanks
JC Bilhega