RADIUS Accounting Server not Multiselect
-
We work with 3 RADIUS Servers (synchronized DB, 3 because of quorum) with intention to have redundancy in case of failure.
For authentication we can select (Pfsense 2.4.4-p2) multiple servers. For accounting and interims update we can select only one server. If this server failed we wouldn't get any response and CP is failing. We had this situation.
Could you change CP either to use same servers as in section authentication or allow to select multiple servers.
Thanks
-
@erik_ch said in RADIUS Accounting Server not Multiselect:
For authentication we can select (Pfsense 2.4.4-p2) multiple servers. For accounting and interims update we can select only one server. If this server failed we wouldn't get any response and CP is failing. We had this situation.
could you explain what does mean "cp is failing"?
as far as I can tell, a failure from the accounting server would not have any impact on end users. users would still be logged on the captive portal, even if radius accounting is failing ...am I wrong?
-
@free4
Yes, the interim update didn't send back an "access-acept" instead CP has recognized that RADIUS server isn't available. Therefore users logged out. When users afterwards wanted to relogin they got the message you are logged in already. This because the authentication process has found the two remaining RADIUS servers.I can't understand why the accounting section don't use same server list as defined in authentication section.
-
@erik_ch said in RADIUS Accounting Server not Multiselect:
@free4
Yes, the interim update didn't send back an "access-acept" instead CP has recognized that RADIUS server isn't available. Therefore users logged out.whoops. this is the real issue according to me. getting no reply from an accounting/authentication server during accounting-updates should NOT disconnect users...
I will try to reproduce this bug within next weeksI can't understand why the accounting section don't use same server list as defined in authentication section.
the short answer is because it would at a lot of complexity to the current code, therefore increasing a lot the technical debt...for a feature that would be too specific, not widely used
from a purely technical/coding point of view, performing accounting updates on multiple servers while keeping the technical debt low is a little bit less easy that it sounds.
from a less technical view, accounting report (using RADIUS or something else, I am not talking about a specific protocol here) is only a small use case for captive portal utilization, which doesn't fit very well with the new path that has been drawn for captive portal. You cannot report accounting to a remote server when you are performing an LDAP/Local/Oauth2 authentication.... that's why it has been decided to not implement RADIUS accounting on multiple servers, It was not worth it to spend so much efforts for a feature that woudn't be widely used.
just to be clear : I am not saying that what you are asking should not be done, or is a bad idea. I am just explaining why this was not implemented in the 2.4 captive portal update
-
@free4
Something isn't working with CP and RADIUS really since realease 2.4.4-p2.As I wrote, we have 3 three synchronized RADIUS servers. Since 2.4.4-p2 users can login correctly. After about 24 hours (potentially idle logout) users are stopped on RADIUS, accounting isn't set correctly and they aren't logged out on captive portal. Please check the RADIUS accounting log attached.
If users open their browser the directly get the "redirect to..." page in a loop without redirecting really. If I check CP status I see that "last activity" is equal to "session start".
If I logout the user in pfsense manually. The user can login as usual until the same behaviour rebegins.
On an other pfsense on another hotspot (other building) with release 2.4.4 (not p1/p2) using same RADIUS servers are working as usual.
For us is CP on 2.4.4-p2 not usable. I had to disable CP.
-
@erik_ch that's strange...there was no change in radius accounting between 2.4.4 and 2.4.4-p2...
also, I tried to replicate the previous bug that you reported, but I didn't succeeded yet. Failure in accounting updates are not disconnecting an user on my side....I am probably missing something
-
@free4 Unfortunately I don't now your name. Are you responsible for the Captive Portal code? In some code parts (eg. index.php and captiveportal.inc) there were some changes between 2.4.4 an 2.4.4-p2
I tried to find out more about my issue. I have some findings but finally I have to do more logs. My current findings are:
- I found some entries in the log like
/index.php: Submission to captiveportal with unknown parameter zone:
(zone is empty), Why this happens isn't clear yet (perhaps antivir scanner blocks parameters in hidden field). - In line 41 in
index.php
functionportal_reply_page
is called because$cpcfg
isn't defined caused by missing "zone" parameter in request. - Now it gets worse. This call of type "error" tries to get the error page content with code
$htmltext = get_include_contents("{$g['varetc_path']}/captiveportal-{$cpzone}-error.html");
. This function call expects a correct value for$cpzone
which isn't defined. No error page is provided to users browser. - Instead my user gets the logout window (his blocker is disabled) which has a JS code line
document.location.href="<?=\$my_redirurl;?>";
which "redirects" to nothing. The loop begins never ending. At the moment I don't know why logout window pops up if line 41 in index.php is called. I assume because user is still logged-in.
I don't know if only one user has missing parameters. In log entry I can't see users IP what could help to find out if serveral users have the parameter issue. They probably don't see the issue because they have blocked the logout window. I will add the users' IP address to the log line 40 in index.php.
Anyway independant of the missing parameter issue. You shouldn't call
portal_reply_page
without zone information. Either index.php should find the correct zone by user's ip or partal_reply_page should have a standard error page as fallback.I will send you more details after further logs and more detailed investigations.
- I found some entries in the log like
-
@erik_ch said in RADIUS Accounting Server not Multiselect:
@free4 Unfortunately I don't now your name. Are you responsible for the Captive Portal code? In some code parts (eg. index.php and captiveportal.inc) there were some changes between 2.4.4 an 2.4.4-p2
I tried to find out more about my issue. I have some findings but finally I have to do more logs. My current findings are:
- I found some entries in the log like
/index.php: Submission to captiveportal with unknown parameter zone:
(zone is empty), Why this happens isn't clear yet (perhaps antivir scanner blocks parameters in hidden field).
[...]
Anyway independant of the missing parameter issue. You shouldn't callportal_reply_page
without zone information. Either index.php should find the correct zone by user's ip or partal_reply_page should have a standard error page as fallback.
I am an active contributor from the open source community. I am not responsible for the captive portal code nor part of Netgate, but i am carefully watching the captive portal forums as I submitted multiple pull requests and I could have accidentally created other issues ("if you break it, you fix it !"). My name on GitHub is Augustin-FL
there was some changes in the captive portal between 2.4.4 and 2.4.4_p2, but no real change on accounting has been performed during this time (Nas-identifier changes aside).
the error you are describing now has no relationship with RADIUS Accounting, it's another subject here....
what happens is that when users are hitting the captive portal, they are redirected to http(s)://captiveportalip:800
{zoneid or zoneid+1 in case of https}
/index.php?cpzone={zone}
.Users could manually remove the GET parameters if they play with the URL, resulting in errors in your logs. this is not harmful for your pfSense. your visitors would also only see a blank page.
I really think this these logs are generated from a manual action (someone removed the GET parameters from the URL) and not an automatic one (AV or sth else...)...but any proof that I'm wrong is welcome !
I agree with you that the
cpzone
should be always defined based on visitor IP or (maybe better? not sure...) based on the port used when connecting (or at leastcaptiveportal_reply_page
should be aware that cpzone might not be defined). you could maybe create a redmine feature request ? - I found some entries in the log like
-
@erik_ch said in RADIUS Accounting Server not Multiselect:
In line 41 in index.php function portal_reply_page is called because $cpcfg isn't defined caused by missing "zone" parameter in request.
Nice catch.
portal_reply_page() uses $cpzone, dirived from $_REQUEST['zone'] without testing. No good.@free4 :
cpzone should be always defined based on visitor IP or (maybe better? not sure...) based on the port used ....
The visitor has an IP.
An IP belongs to an interface.
Portal zone are interface based.
So there is (must be ?!) a strict relation between IP and captive portal instance. If this is always true, then the support of the $_REQUEST['zone'] could be dropped al together. -
@Gertjan I fully agree with you. The zone is with users ip net defined. Cpzone is serveral times necessary, so there should be no risk.
@free4 I did some more inquiries, test and code review (without debugger). I think I found the reason for my issue. Plus I found two other bugs.
The handling with logout form has changed from older versions to 2.4.4. In the past the logout form was a pop-up form always with Javascript for redirect. So in the affected hotspot I uploaded the same logout form (with other design) like the standard form.
So in my user's case he has deblocked the pop-up blocker. Then he called our host with with port 8002. Instead of getting the page "Your are connected" he got the logout page with "redirect to" and the logout pop-up. This redirected to our host again and so on. I suppose that my user has our host as standard page in his browser or he calls it to get the logout page.
My proposal: Please document this changed behaviour saying that user specific logout page mustn't call the logout pop-up with redirect. Or better remove the pop-up version because all browsers block it. Instead there should be a designed standard logout page.
While testing i found three other issues:
-
This is a critical issue: If users are logged-in and I change a paramter on CP service page and save then CP isn't initiated newly. Users keep logged-in BUT ipfw data are lost. Last activity is reset to session start time, used data is 0 and user can't pass data anymore. If user calls ourhost:8002 he gets the answer "You are connected". I can reproduce this bug.
It could be that my user from case 1 had this issue. In combination with issue one the loop is perfect. I have seen on production hotspot that all users had same "last activity time" as "session time". But I haven't touched the CP service page. Could it be that a cron job or another event does reset ipfw tables? -
While reviewing your code I have seen that the login form uses action method "post" but in function "portal_reply_page" in line 2163 the action is our host plus GET params like "zone=xxx&redirurl=yyy". Mix of post action and using GET params at the same time works but isn't designed. You never know if some browsers could cut the url params. Either use post or get.
-
I tested what happens with field "pre-authentication-redirect-url" and "after-authentication-redir-url". None! With both I get the login page with get params redirurl what I requested before. After login both redirect to the content url. Pre-auth needs a little bit longer. It's only another way in the code to get the same result. As I understand pre-auth should be an alternative url if there is no url provided other than our host in the request.
I don't want to bother you. Thank you for your good work.
-
-
@erik_ch said in RADIUS Accounting Server not Multiselect:
I suppose that my user has our host as standard page in his browser or he calls it to get the logout page.
Such a situation exists when time == money, like paid portal access. The user would be motivated to disconnect, and she/he is ofcourse blocking popups.
Semi-solution :- Put a rather low idle-time out value like 5 minutes.
- Indicate users that they should disconnect their Wifi (raio) access (the idle time out will clean up automatically).
- If users start to locate the logout page, like https://portal.pfsense.tld the will not know that they have add the ?cpzone=abcfeg part (if not, they trigger what you saw earlier). The complete logout link can be shown on the login page, of course. But, well, users will need it after they left the login page.
@erik_ch said in RADIUS Accounting Server not Multiselect:
This is a critical issue: If users are logged-in and I change a paramter on CP service page and save then CP isn't initiated newly. Users keep logged-in BUT ipfw data are lost. Last activity is reset to session start time, used data is 0 and user can't pass data anymore. If user calls ourhost:8002 he gets the answer "You are connected". I can reproduce this bug.
It could be that my user from case 1 had this issue. In combination with issue one the loop is perfect. I have seen on production hotspot that all users had same "last activity time" as "session time". But I haven't touched the CP service page. Could it be that a cron job or another event does reset ipfw tables?This has been solved : https://forum.netgate.com/topic/130420/any-change-and-save-update-captive-portal-bug
You will find a patch, patch can be installed using the patch package.
See https://github.com/pfsense/pfsense/pull/4042
See@erik_ch said in RADIUS Accounting Server not Multiselect:
While reviewing your code I have seen that the login form uses action method "post" but in function "portal_reply_page" in line 2163 the action is our host plus GET params like "zone=xxx&redirurl=yyy". Mix of post action and using GET params at the same time works but isn't designed. You never know if some browsers could cut the url params. Either use post or get.
True.
@erik_ch said in RADIUS Accounting Server not Multiselect:
I tested what happens with field "pre-authentication-redirect-url" and "after-authentication-redir-url". None
True : pre-auth is broken.
This https://docs.netgate.com/pfsense/en/latest/captiveportal/configuring-a-pre-authentication-redirect-for-captive-portal-users.html is a no go these day -
@erik_ch said in RADIUS Accounting Server not Multiselect:
My proposal: [...] remove the pop-up version because all browsers block it. Instead there should be a designed standard logout page.
agreed. there was a plan to update the login page, but I guess it will come in a while since that's not a very critical problem
- I tested what happens with field "pre-authentication-redirect-url" and "after-authentication-redir-url". None! With both I get the login page with get params redirurl what I requested before. After login both redirect to the content url. Pre-auth needs a little bit longer. It's only another way in the code to get the same result. As I understand pre-auth should be an alternative url if there is no url provided other than our host in the request.
pre auth only redirects you after a successful authentication only if the captive portal don't know the URL an user tried to access originally (if no $redirurl is present in the URL). it is highly recommended to set this field.
if this field is empty and no $redirurl is given by the user, then the captive portal will redirect the user to an empty page saying "you are connected".post auth redirects you all users to this address after successful authentication, regardless of the original URL the user tried to access.
that's the difference between both fields -
I understand what you mean and the risks you under-count. But I guess it was better to ask about this from the very beginning) Anyway, I think in this case, you will not suffer from a server failure. I work in an accounting company that provides outsource services. Imagine the amount of data we work with and the level of responsibility. So, when they were selecting the working platform, the consulting agency showed us this one and said that the failure has absolutely to no effect on the end consumer) I guess they made already some improvements relating real-time data backups during a server failure, at least we never experienced problems.
-
@free4 said in RADIUS Accounting Server not Multiselect:
post auth redirects you all users to this address after successful authentication, regardless of the original URL the user tried to access.
See also https://redmine.pfsense.org/issues/11842
Just a couple of days ago, jimp edited this part just a week ago :
https://github.com/pfsense/pfsense/commit/697a99c1e176b2f546460fc124a6e15075e5ef49#diff-8720aa105443be11a290dd059b115d08def735b84c7889190e8aba2b93dfcde8I'll update my 'index.php' from here :
https://github.com/pfsense/pfsense/blob/RELENG_2_5_1/src/usr/local/captiveportal/index.phpUpdate also the other file, /etc/inc/captiveportal.inc
and see if it's 'better'.
If you decide to use the ore recent files from source, and you use https login, do my proposed line 2275->2261 as stated on redmine.
Or be ready to face the NGINX gods. -
@veralder said in RADIUS Accounting Server not Multiselect:
I understand what you mean and the risks you under-count. But I guess it was better to ask about this from the very beginning) Anyway, I think in this case, you will not suffer from a server failure. I work in an accounting company that provides outsource services. Imagine the amount of data we work with and the level of responsibility. So, when they were selecting the working platform, the https://szwedaconsulting.com/tax-preparation/ showed us this one and said that the failure has absolutely to no effect on the end consumer) I guess they made already some improvements relating real-time data backups during a server failure, at least we never experienced problems.
I forgot to mention that we worked with Redmine, but I find it a bit outdated -
Nice !!
I replied yesterday -see above - to a post that dates from ..... not the spammer just above but to Erik_CH and @free4 - messages from 2019......
Woken up by some BS of Veralder who want me to look at some Swedish consultancy site - to find issue https://redmine.pfsense.org/issues/11842 - testing that solution - posting a solution for the solution ....
To discover just now that it was actually a spammer ( ? ) that made me contribute to pfSense.Great.
I need a drink.