System reached maximum login capacity

magura

Beside deleting it, there is anyway to solute this bug?
Pfense will fix this problem ?

Gertjan

hello.

What you should check is : are users getting disconnected by the portal interface.
If not, then things will go wrong.

Some ideas about how to test if the disconnecting is working, see here https://forum.pfsense.org/index.php?topic=67739.0

magura

At that time,The disconnected Does not seem normal!But the future users may 2000+,Has the maximum number of bug, will go wrong!

About disconnected，There are command can be executed to restart the disconnect procedure？in CLI or UI?

eri--

Can you upgrade to 2.1.2 and verify that this is not happening anymore?

Also some system specification is welcome here.

tpramos

I Have one 2.1.2 with the same problem.

codywood

I also just hit this issue on 2.1.2.

deltix

Is this valid fix for this issue or this fixes something else?

https://github.com/pfsense/pfsense/pull/1070

Gertjan

This patch looks simple ;)

What about checking if the patch is been used?

Instead of applying the patch:

			/* Release unused pipe number */
			captiveportal_free_dn_ruleno($pipeno);

use this:

			if ($pipeno) {
				captiveportal_logportalauth($cpentry[4],$cpentry[3],$cpentry[2],"CONCURRENT LOGIN - REUSING IP {$cpentry[2]} - removing pipe number {$pipeno}");
				captiveportal_free_dn_ruleno($pipeno);
			}

It seems (to me) more logic to test the value of $pipeno (should be non-zero) before using it **
And, of course, make a log so you can see that the patch was executed (and actually freeing up a pipe rule).
Remember: no more free pipes rules provokes the message "System reached maximum login capacity".

deltix

This patch is not in 2.1.3 and it should be.

Can somebody reopen this bug?

doktornotor

https://redmine.pfsense.org/projects/pfsense/repository/revisions/4ec6b54d189dbd84265cf1f7dc18050ae941df7c

uaxero

hi my last update 2.1.3

the problem didn´t return to happen

thanks

paoloc

Hi I have the 2.1.3 but I have this problem

Paolo

jimp

Apply the patch above or upgrade to 2.1.4 when it comes out.

magura

24 days after the boot,Problem has occurred
version:2.1.3

Waiting for the upgrade 2.1.4 :-[

Gertjan

@magura:

24 days after the boot,Problem has occurred
version:2.1.3

Waiting for the upgrade 2.1.4 :-[
[/quote]
I saw your other posts.
Also this one: https://forum.pfsense.org/index.php?topic=75854.msg413813#msg413813

I saw in the log that people are logging in (sometimes) every <10 seconds.

An issue might be:
The file /var/db/captiveportaldn.rules is locked, read, unserialized, updated, serialized, written, an unlocked for every login.
The fact is: this file isn't a "small file":
-rw-r--r--   1 root      wheel     791904 Jun 16 22:44 captiveportaldn.rules
About 750 Kbytes.

Not only the users that login are competing here, also de cron task that executes minute "function captiveportal_prune_old() in /etc/inc/captiveportal.inc" and walks over all connected user to so if a time out has arrived (hard time, idle time out, etc) will "locked, read, unserialized, updated, serialized, written, an unlock" the file /var/db/captiveportaldn.rules for every user that is about to be kicked of the portal network.

What I think what happens in your case (remember:2000 clients connected) : you're hitting against "can't handle it fast enough" sealing.

Better yet: the cron task that calls "function captiveportal_prune_old() in /etc/inc/captiveportal.inc":
This file: /etc/rc.prunecaptiveportal

// Ususal blabla …..

require_once("captiveportal.inc");

global $g;

$cpzone = str_replace("\n", "", $argv[1]);

if (file_exists("{$g['tmp_path']}/.rc.prunecaptiveportal.{$cpzone}.running")) {
$stat = stat("{$g['tmp_path']}/.rc.prunecaptiveportal.{$cpzone}.running");
if (time() - $stat['mtime'] >= 120)
@unlink("{$g['tmp_path']}/.rc.prunecaptiveportal.{$cpzone}.running");
else {
log_error("Skipping CP prunning process because previous/another instance is already running");
return;
}
}

@file_put_contents("{$g['tmp_path']}/.rc.prunecaptiveportal.{$cpzone}.running", "");
captiveportal_prune_old();
@unlink("{$g['tmp_path']}/.rc.prunecaptiveportal.{$cpzone}.running");

?>

makes me think like this:
The mincron a first time executes.
It starts doing its job, but gets "locked" (read: it has to wait !(edit: better: compete)) often because many user are logging in, it can't really finish it's job.
A minute later: a new (second) mincron starts !
This one will stop right away with this message in the main pfSense log:

log_error("Skipping CP prunning process because previous/another instance is already running");

Again, a minute later, another (third) micron starts to prune the list.
The running state of the first thread (whether the first one is running or not !)is cleared, and this one will start (!) The function captiveportal_prune_old() is called again.
Now, two "function captiveportal_prune_old()" are competing …..
Both want to "lock" the big "captiveportaldn.rules" do do its work (when a client that timed out was found).
As I see it, things will go bad and worse.

You could check:
See you this in your main log file:

log_error("Skipping CP prunning process because previous/another instance is already running");

??

Run this command every second:

ps ax | grep '/etc/rc.prunecaptiveportal'

Ones in a minute, you will see an extra line:

21124 ?? RL 0:00.23 /usr/local/bin/php -f /etc/rc.prunecaptiveportal cpzone

This is our "pruning in progress".
Is it gone in the next second ? (continue running the "ps ax | grep '/etc/rc.prunecaptiveportal'"
If not, how long does it (the /usr/local/bin/php -f /etc/rc.prunecaptiveportal cpzone process) stays activated up ?

Btw, when I run

ps ax | grep '/etc/rc.prunecaptiveportal'

I see this:
19489 ?? Is 0:00.00 /usr/local/bin/minicron 60 /var/run/cp_prunedb_cpzone.pid /etc/rc.prunecaptiveportal cpzone
19498 ?? I 0:01.02 minicron: helper /etc/rc.prunecaptiveportal cpzone (minicron)
93137 0 S+ 0:00.00 grep /etc/rc.prunecaptiveportal

The last line is just our command that find itself as a task.
Line 1 and 2 is our minicron that sleep, and wakes up every minute, as I showed above.

Please understand that you have to debug a little bit yourself.
I don't have a pfSense portal with many user. My personal record is 24.
I'm running pfSense on a dual core, an I5-intel like desktop PC with 4 Gbytes of (fast) memory and a fast hard disk.

@Jimp: It would be a great thing if you could say right away: "No way, you missed a thing - it isn't working like that (at all)".

Idea: instead of running this minicron every 60 secondes, would it help if they start every 300 seconds ?
People will still be disconnected, and some will have a (300-60) bonus time.
But the system will be 'working' less.

Btw: stopping the captive portal interface will flush the "captiveportaldn.rules" file (for that 'zone'). Start it right after and everything will be 'clean'.

@magura:

Waiting for the upgrade 2.1.4 :-[
[/quote]
You patched with proposed patch, right ?
If not, do so first - its about editing a php file. Easy, you will see.

magura

-rw-r–r-- 1 root wheel 805060 Jun 17 11:29 captiveportaldn.rules
-rw-r--r-- 1 root wheel 1262900 May 21 17:29 captiveportaldn.rules.1030521
-rw-r--r-- 1 root wheel 1262900 Jun 16 09:26 captiveportaldn.rules.1030616

my solution:When the file size grows to 1262900 ,Users can not log in CP.So I will move the file.

Current users no more than 2000+

about ideal: instead of running this minicron every 60 secondes, would it help if they start every 300 seconds ?

how to modify 60 > 300?

in CLI enter command:/usr/local/bin/minicron 300 /var/run/cp_prunedb_ZZZ.pid /etc/rc.prunecaptiveportal ZZZ

or

edit :vi captiveportal.inc

 $croninterval = $cpcfg['croninterval'] ? $cpcfg['croninterval'] : 60;

to

 $croninterval = $cpcfg['croninterval'] ? $cpcfg['croninterval'] : 300;

================================================
what kind of approach is right?

login.jpg_thumb

Gertjan

@magura:

-rw-r–r-- 1 root wheel 805060 Jun 17 11:29 captiveportaldn.rules
-rw-r--r-- 1 root wheel 1262900 May 21 17:29 captiveportaldn.rules.1030521
-rw-r--r-- 1 root wheel 1262900 Jun 16 09:26 captiveportaldn.rules.1030616

my solution:When the file size grows to 1262900 ,Users can not log in CP.So I will move the file.
Current users no more than 2000+

1.3 Mega … :o
Btw: it means the file "captiveportaldn.rules" is also growing.

@magura:

how to modify 60 > 300?

edit :vi captiveportal.inc
 $croninterval = $cpcfg['croninterval'] ? $cpcfg['croninterval'] : 60;
to
 $croninterval = $cpcfg['croninterval'] ? $cpcfg['croninterval'] : 300;
================================================
what kind of approach is right?

That the one to go !
Normally, <croninterval>isn't defind in the config.xml, so, yes, just change 60 to 300 on that spot should do the job.</croninterval>

magura

If i use the crontab timed rm captiveportaldn.rules
Whether it will cause other problems?

because restart captive portal, all online client must re-authentication,users complain :'(

Gertjan

@magura:

If i use the crontab timed rm captiveportaldn.rules
Whether it will cause other problems?

because restart captive portal, all online client must re-authentication,users complain :'(

Don't do that !

If this was file was notusefull at that point, why pfSense generates it in the first place ?
It contains the relationship between all logged in users and the their related pipes.
Removing it and the pipes will not be removed anymore when a user logs out.

The number of pipes in the system will continue to grow ….. and pfSense with it.

The file captiveportaldn.rules can be "cleaned", its done when you stop (a zone in) the portal interface.

xzmz

Guys, I have a hard version

2.1.4-RELEASE (i386)
built on Fri Jun 20 12:59:29 EDT 2014
FreeBSD 8.3-RELEASE-p16

And I have a similar problem. Users becomes 1500, and Capt portal authentication can not do.

Has anyone found a normal solution?