RADIUS Accounting Server not Multiselect



  • We work with 3 RADIUS Servers (synchronized DB, 3 because of quorum) with intention to have redundancy in case of failure.

    For authentication we can select (Pfsense 2.4.4-p2) multiple servers. For accounting and interims update we can select only one server. If this server failed we wouldn't get any response and CP is failing. We had this situation.

    Could you change CP either to use same servers as in section authentication or allow to select multiple servers.

    Thanks


  • Rebel Alliance

    @erik_ch said in RADIUS Accounting Server not Multiselect:

    For authentication we can select (Pfsense 2.4.4-p2) multiple servers. For accounting and interims update we can select only one server. If this server failed we wouldn't get any response and CP is failing. We had this situation.

    could you explain what does mean "cp is failing"?

    as far as I can tell, a failure from the accounting server would not have any impact on end users. users would still be logged on the captive portal, even if radius accounting is failing ...am I wrong?



  • @free4
    Yes, the interim update didn't send back an "access-acept" instead CP has recognized that RADIUS server isn't available. Therefore users logged out. When users afterwards wanted to relogin they got the message you are logged in already. This because the authentication process has found the two remaining RADIUS servers.

    I can't understand why the accounting section don't use same server list as defined in authentication section.


  • Rebel Alliance

    @erik_ch said in RADIUS Accounting Server not Multiselect:

    @free4
    Yes, the interim update didn't send back an "access-acept" instead CP has recognized that RADIUS server isn't available. Therefore users logged out.

    whoops. this is the real issue according to me. getting no reply from an accounting/authentication server during accounting-updates should NOT disconnect users...
    I will try to reproduce this bug within next weeks

    I can't understand why the accounting section don't use same server list as defined in authentication section.

    the short answer is because it would at a lot of complexity to the current code, therefore increasing a lot the technical debt...for a feature that would be too specific, not widely used

    from a purely technical/coding point of view, performing accounting updates on multiple servers while keeping the technical debt low is a little bit less easy that it sounds.

    from a less technical view, accounting report (using RADIUS or something else, I am not talking about a specific protocol here) is only a small use case for captive portal utilization, which doesn't fit very well with the new path that has been drawn for captive portal. You cannot report accounting to a remote server when you are performing an LDAP/Local/Oauth2 authentication.... that's why it has been decided to not implement RADIUS accounting on multiple servers, It was not worth it to spend so much efforts for a feature that woudn't be widely used.

    just to be clear : I am not saying that what you are asking should not be done, or is a bad idea. I am just explaining why this was not implemented in the 2.4 captive portal update



  • @free4
    Something isn't working with CP and RADIUS really since realease 2.4.4-p2.

    As I wrote, we have 3 three synchronized RADIUS servers. Since 2.4.4-p2 users can login correctly. After about 24 hours (potentially idle logout) users are stopped on RADIUS, accounting isn't set correctly and they aren't logged out on captive portal. Please check the RADIUS accounting log attached.

    If users open their browser the directly get the "redirect to..." page in a loop without redirecting really. If I check CP status I see that "last activity" is equal to "session start".

    If I logout the user in pfsense manually. The user can login as usual until the same behaviour rebegins.

    On an other pfsense on another hotspot (other building) with release 2.4.4 (not p1/p2) using same RADIUS servers are working as usual.

    For us is CP on 2.4.4-p2 not usable. I had to disable CP.

    0_1551088435566_radius accounting statement.png


  • Rebel Alliance

    @erik_ch that's strange...there was no change in radius accounting between 2.4.4 and 2.4.4-p2...

    also, I tried to replicate the previous bug that you reported, but I didn't succeeded yet. Failure in accounting updates are not disconnecting an user on my side....I am probably missing something



  • @free4 Unfortunately I don't now your name. Are you responsible for the Captive Portal code? In some code parts (eg. index.php and captiveportal.inc) there were some changes between 2.4.4 an 2.4.4-p2

    I tried to find out more about my issue. I have some findings but finally I have to do more logs. My current findings are:

    • I found some entries in the log like /index.php: Submission to captiveportal with unknown parameter zone: (zone is empty), Why this happens isn't clear yet (perhaps antivir scanner blocks parameters in hidden field).
    • In line 41 in index.php function portal_reply_page is called because $cpcfg isn't defined caused by missing "zone" parameter in request.
    • Now it gets worse. This call of type "error" tries to get the error page content with code $htmltext = get_include_contents("{$g['varetc_path']}/captiveportal-{$cpzone}-error.html");. This function call expects a correct value for $cpzone which isn't defined. No error page is provided to users browser.
    • Instead my user gets the logout window (his blocker is disabled) which has a JS code line document.location.href="<?=\$my_redirurl;?>"; which "redirects" to nothing. The loop begins never ending. At the moment I don't know why logout window pops up if line 41 in index.php is called. I assume because user is still logged-in.

    I don't know if only one user has missing parameters. In log entry I can't see users IP what could help to find out if serveral users have the parameter issue. They probably don't see the issue because they have blocked the logout window. I will add the users' IP address to the log line 40 in index.php.

    Anyway independant of the missing parameter issue. You shouldn't call portal_reply_page without zone information. Either index.php should find the correct zone by user's ip or partal_reply_page should have a standard error page as fallback.

    I will send you more details after further logs and more detailed investigations.


  • Rebel Alliance

    @erik_ch said in RADIUS Accounting Server not Multiselect:

    @free4 Unfortunately I don't now your name. Are you responsible for the Captive Portal code? In some code parts (eg. index.php and captiveportal.inc) there were some changes between 2.4.4 an 2.4.4-p2

    I tried to find out more about my issue. I have some findings but finally I have to do more logs. My current findings are:

    • I found some entries in the log like /index.php: Submission to captiveportal with unknown parameter zone: (zone is empty), Why this happens isn't clear yet (perhaps antivir scanner blocks parameters in hidden field).
      [...]
      Anyway independant of the missing parameter issue. You shouldn't call portal_reply_page without zone information. Either index.php should find the correct zone by user's ip or partal_reply_page should have a standard error page as fallback.

    I am an active contributor from the open source community. I am not responsible for the captive portal code nor part of Netgate, but i am carefully watching the captive portal forums as I submitted multiple pull requests and I could have accidentally created other issues ("if you break it, you fix it !"). My name on GitHub is Augustin-FL

    there was some changes in the captive portal between 2.4.4 and 2.4.4_p2, but no real change on accounting has been performed during this time (Nas-identifier changes aside).

    the error you are describing now has no relationship with RADIUS Accounting, it's another subject here....

    what happens is that when users are hitting the captive portal, they are redirected to http(s)://captiveportalip:800{zoneid or zoneid+1 in case of https}/index.php?cpzone={zone}.

    Users could manually remove the GET parameters if they play with the URL, resulting in errors in your logs. this is not harmful for your pfSense. your visitors would also only see a blank page.

    I really think this these logs are generated from a manual action (someone removed the GET parameters from the URL) and not an automatic one (AV or sth else...)...but any proof that I'm wrong is welcome !

    I agree with you that the cpzone should be always defined based on visitor IP or (maybe better? not sure...) based on the port used when connecting (or at least captiveportal_reply_page should be aware that cpzone might not be defined). you could maybe create a redmine feature request ?



  • @erik_ch said in RADIUS Accounting Server not Multiselect:

    In line 41 in index.php function portal_reply_page is called because $cpcfg isn't defined caused by missing "zone" parameter in request.

    Nice catch.
    portal_reply_page() uses $cpzone, dirived from $_REQUEST['zone'] without testing. No good.

    @free4 :

    cpzone should be always defined based on visitor IP or (maybe better? not sure...) based on the port used ....
    The visitor has an IP.
    An IP belongs to an interface.
    Portal zone are interface based.
    So there is (must be ?!) a strict relation between IP and captive portal instance. If this is always true, then the support of the $_REQUEST['zone'] could be dropped al together.



  • @Gertjan I fully agree with you. The zone is with users ip net defined. Cpzone is serveral times necessary, so there should be no risk.

    @free4 I did some more inquiries, test and code review (without debugger). I think I found the reason for my issue. Plus I found two other bugs.

    The handling with logout form has changed from older versions to 2.4.4. In the past the logout form was a pop-up form always with Javascript for redirect. So in the affected hotspot I uploaded the same logout form (with other design) like the standard form.

    So in my user's case he has deblocked the pop-up blocker. Then he called our host with with port 8002. Instead of getting the page "Your are connected" he got the logout page with "redirect to" and the logout pop-up. This redirected to our host again and so on. I suppose that my user has our host as standard page in his browser or he calls it to get the logout page.

    My proposal: Please document this changed behaviour saying that user specific logout page mustn't call the logout pop-up with redirect. Or better remove the pop-up version because all browsers block it. Instead there should be a designed standard logout page.

    While testing i found three other issues:

    • This is a critical issue: If users are logged-in and I change a paramter on CP service page and save then CP isn't initiated newly. Users keep logged-in BUT ipfw data are lost. Last activity is reset to session start time, used data is 0 and user can't pass data anymore. If user calls ourhost:8002 he gets the answer "You are connected". I can reproduce this bug.
      It could be that my user from case 1 had this issue. In combination with issue one the loop is perfect. I have seen on production hotspot that all users had same "last activity time" as "session time". But I haven't touched the CP service page. Could it be that a cron job or another event does reset ipfw tables?

    • While reviewing your code I have seen that the login form uses action method "post" but in function "portal_reply_page" in line 2163 the action is our host plus GET params like "zone=xxx&redirurl=yyy". Mix of post action and using GET params at the same time works but isn't designed. You never know if some browsers could cut the url params. Either use post or get.

    • I tested what happens with field "pre-authentication-redirect-url" and "after-authentication-redir-url". None! With both I get the login page with get params redirurl what I requested before. After login both redirect to the content url. Pre-auth needs a little bit longer. It's only another way in the code to get the same result. As I understand pre-auth should be an alternative url if there is no url provided other than our host in the request.

    I don't want to bother you. Thank you for your good work.



  • @erik_ch said in RADIUS Accounting Server not Multiselect:

    I suppose that my user has our host as standard page in his browser or he calls it to get the logout page.

    Such a situation exists when time == money, like paid portal access. The user would be motivated to disconnect, and she/he is ofcourse blocking popups.
    Semi-solution :

    1. Put a rather low idle-time out value like 5 minutes.
    2. Indicate users that they should disconnect their Wifi (raio) access (the idle time out will clean up automatically).
    3. If users start to locate the logout page, like https://portal.pfsense.tld the will not know that they have add the ?cpzone=abcfeg part (if not, they trigger what you saw earlier). The complete logout link can be shown on the login page, of course. But, well, users will need it after they left the login page.

    @erik_ch said in RADIUS Accounting Server not Multiselect:

    This is a critical issue: If users are logged-in and I change a paramter on CP service page and save then CP isn't initiated newly. Users keep logged-in BUT ipfw data are lost. Last activity is reset to session start time, used data is 0 and user can't pass data anymore. If user calls ourhost:8002 he gets the answer "You are connected". I can reproduce this bug.
    It could be that my user from case 1 had this issue. In combination with issue one the loop is perfect. I have seen on production hotspot that all users had same "last activity time" as "session time". But I haven't touched the CP service page. Could it be that a cron job or another event does reset ipfw tables?

    This has been solved : https://forum.netgate.com/topic/130420/any-change-and-save-update-captive-portal-bug
    You will find a patch, patch can be installed using the patch package.
    See https://github.com/pfsense/pfsense/pull/4042
    See

    @erik_ch said in RADIUS Accounting Server not Multiselect:

    While reviewing your code I have seen that the login form uses action method "post" but in function "portal_reply_page" in line 2163 the action is our host plus GET params like "zone=xxx&redirurl=yyy". Mix of post action and using GET params at the same time works but isn't designed. You never know if some browsers could cut the url params. Either use post or get.

    True.

    @erik_ch said in RADIUS Accounting Server not Multiselect:

    I tested what happens with field "pre-authentication-redirect-url" and "after-authentication-redir-url". None

    True : pre-auth is broken.
    This https://docs.netgate.com/pfsense/en/latest/captiveportal/configuring-a-pre-authentication-redirect-for-captive-portal-users.html is a no go these day


  • Rebel Alliance

    @erik_ch said in RADIUS Accounting Server not Multiselect:

    My proposal: [...] remove the pop-up version because all browsers block it. Instead there should be a designed standard logout page.

    agreed. there was a plan to update the login page, but I guess it will come in a while since that's not a very critical problem

    • I tested what happens with field "pre-authentication-redirect-url" and "after-authentication-redir-url". None! With both I get the login page with get params redirurl what I requested before. After login both redirect to the content url. Pre-auth needs a little bit longer. It's only another way in the code to get the same result. As I understand pre-auth should be an alternative url if there is no url provided other than our host in the request.

    pre auth only redirects you after a successful authentication only if the captive portal don't know the URL an user tried to access originally (if no $redirurl is present in the URL). it is highly recommended to set this field.
    if this field is empty and no $redirurl is given by the user, then the captive portal will redirect the user to an empty page saying "you are connected".

    post auth redirects you all users to this address after successful authentication, regardless of the original URL the user tried to access.
    that's the difference between both fields


Log in to reply