Navigation

    Netgate Discussion Forum
    • Register
    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search

    Is there a way to backrev to 2.2.5? 2.2.6 is no good for me… cp dies often

    Captive Portal
    14
    28
    4394
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • C
      carzin last edited by

      We've been on 2.2.6 for a few weeks now, and now that students are back, the captive portal has been dying a good amount (several times a day).  I haven't been able to find a problem in the logs.  I am looking more now.  Being that I though it couldn't be any worse, I snapshotted then upgraded to 2.3, but it looks as through the captive portal code has changed enough where it generated a php error submitting a username/password for radius.  That was grasping at straws.  But at this point, I want to roll back.

      Can someone point me to the 2.2.5 images?  I may just do a total backup, then build a 2.2.5 image and do a restore (hoping it works)

      1 Reply Last reply Reply Quote 0
      • dotdash
        dotdash last edited by

        2.2.5 is still on the mirrors. Just go to the download page, select the 2.2.6 image, and click the link 'just show me the mirrors'

        1 Reply Last reply Reply Quote 0
        • Derelict
          Derelict LAYER 8 Netgate last edited by

          Then reinstall the 2.2.5 backup config you took before you upgraded.

          1 Reply Last reply Reply Quote 0
          • Y
            yanqian last edited by

            Yes, I have the same issue, CP service dies often in my 2.2.6 box too, but I have only two boxes, not sure if this is a common issue.

            I will downgrade to 2.2.4, and check if the issue still exists.

            1 Reply Last reply Reply Quote 0
            • D
              demco last edited by

              @carzin:

              We've been on 2.2.6 for a few weeks now, and now that students are back, the captive portal has been dying a good amount (several times a day).  I haven't been able to find a problem in the logs.

              What are the symptoms when this happen? New user cannot login? Are existing sessions affected?

              1 Reply Last reply Reply Quote 0
              • Gertjan
                Gertjan last edited by

                @demco:

                @carzin:

                We've been on 2.2.6 for a few weeks now, and now that students are back, the captive portal has been dying a good amount (several times a day).  I haven't been able to find a problem in the logs.

                What are the symptoms when this happen? New user cannot login? Are existing sessions affected?

                added to these questions :
                The captive portal web server is still up ?
                (run
                ps ax | grep 'light.*Captive'
                to check)
                What do you see in the logs - "System" en "Portal Auth" ? (activate "Log errors from the web server process." on the Status: System logs: Settings page)
                How do your user authenticate ?
                Describe your hardware.

                1 Reply Last reply Reply Quote 0
                • N
                  nappy_d last edited by

                  Out of curiosity, I was having issues with my captive portal.  I had to kill the proces and restarted and ever since I've had no issues.

                  https://forum.pfsense.org/index.php?topic=105635.msg589548#msg589548

                  1 Reply Last reply Reply Quote 0
                  • E
                    Easter last edited by

                    @Gertjan:

                    @demco:

                    @carzin:

                    We've been on 2.2.6 for a few weeks now, and now that students are back, the captive portal has been dying a good amount (several times a day).  I haven't been able to find a problem in the logs.

                    What are the symptoms when this happen? New user cannot login? Are existing sessions affected?

                    added to these questions :
                    The captive portal web server is still up ?
                    (run
                    ps ax | grep 'light.*Captive'
                    to check)
                    What do you see in the logs - "System" en "Portal Auth" ? (activate "Log errors from the web server process." on the Status: System logs: Settings page)
                    How do your user authenticate ?
                    Describe your hardware.

                    Hi,
                    the Captive Portal dies at my installation too.  It never happened with the 2.2.5 version.

                    I have 3 pfSense 2.2.6 VM (ESXi) each one with a CP and every day almost one lighttpd process dies without apparent reasons.

                    Last kernel log entries:
                    CP1:
                    Feb 5 17:15:01 kernel: pid 31555 (lighttpd), uid 0: exited on signal 10 (core dumped)

                    CP3:
                    Feb 6 22:04:52 kernel: pid 91600 (lighttpd), uid 0: exited on signal 6 (core dumped)

                    CP5:
                    Feb 6 17:25:05 kernel: pid 41922 (lighttpd), uid 0: exited on signal 11 (core dumped)
                    Feb 9 08:18:21    kernel: pid 57014 (lighttpd), uid 0: exited on signal 11 (core dumped)

                    Only the process related to the SSL CP dies (port 8003) and I'm non using the plain HTTP (no connections to port 8002);  I have to restart the CP service from the GUI.
                    The portalauth log lists correctly "USER LOGIN" until the process death, after that I can only see "TIMEOUT" for the users already logged-in and no new logins through my RADIUS server because users are unable to display the login page and POST credentials.

                    Hope my informations will be useful; now I have to monitor the lighttpd process to manually restart the CP as soon as possible (that's really a big problem last time it was down for 2 days week-end). If I will not found soon a resolution, I have to downgrade to 2.2.5

                    1 Reply Last reply Reply Quote 0
                    • Gertjan
                      Gertjan last edited by

                      From what I make of it :
                      Freebsd lightpd signal 10 => https://www.freebsd.org/cgi/man.cgi?sektion=3&query=signal => bus error => hardware error.
                      Freebsd lightpd signal 6 => http://lists.freebsd.org/pipermail/freebsd-questions/2004-April/042665.html => lighty is killing itself because "he can't stand the load anymore …".

                      Not enough resources ?
                      Idea : Get pfsense out of this "ESXi" thing, and "do what we all do" and check again ...
                      Another idea : 'tcpdump' the NIC of your portal .... someone is over loading it ?

                      1 Reply Last reply Reply Quote 0
                      • C
                        carzin last edited by

                        Ok, so I am glad other people are experiencing this.  I forgot about my post.  What is interesting is that when the portal dies, there is nothing in the log.  It just stops logging.  But everything else is functioning.  I actually decided to go entirely crazy, and in a production environment, I upgraded to the 2.3 experimental track when the 2.2.6 was having to be rebooted MULTIPLE times in a day.  it was easier for me to snapshot (on VM), the server, then upgrade and see.  I had to add a 'zone' value in the login page, however.

                        I upgraded to whatever the daily build was last Thursday, and the server has been running without issue since then.  I am going to declare victory if I make it through tomorrow.

                        I can only hope whatever bug is in 2.2.6 does not make it to the 2.3 track.

                        To answer some of the questions…

                        I run 4 pfsense boxes on VMware.  I have given them 4 cores and 8 gigs of memory (plenty).  When the service dies, if I person connects to our onboarding SSID, the DNS responder correctly gives the person the pfsense IP address for whatever DNS entry they are looking up, then the connection just spins.  No page is launched.  I can't tell you for sure what happens to pre-existing connections.  We have 10 minute time outs, and by the time I discover the problem, the 10 minutes is up and captive portal status reports 0 users.  I am not going to run this on physical hardware, because in our environment, it can be just too problematic.  Since I run about 200 VMs just for my group, and nothing else has issues, I am going to say this is a pfSense issue, not a platform issue (I have about 60 servers on FreeBSD).  This is not a rinky dink environment, and it is on a platform that cost millions.  In the past versions, lightd would die periodically (maybe once a month), and it would be logged to splunk and I would get an alert and reboot the box.  It didn't happen enough to really bother me (this is free, so what can you do?).  But with 2.2.6, it was dying in a way that I got no messages in my log, and I would eventually get calls from the helpdesk.  It was so bad, I was considering pulling out the entire service and doing something else.

                        users authenticate via an https portal which connects to a radius backend (off server) to authenticate.

                        This was the old trusty message I would look for to see if the captive portal died.  The new 'dying' method doesn't do this:

                        network    lighttpd                1                        syslog        IP_REMOVED                    Sun Jan 24 10:17:57 2016    splunkindex0p    /var/log/SPOOL/networking-current/daemon-networking.log    Jan 24 10:17:57 152.2.78.166 lighttpd[31698]: (mod_fastcgi.c.2912) connection was dropped after accept() (perhaps the fastcgi process died), write-offset: 8192 socket: unix:/tmp/php-fastcgi-cpzone.socket-5

                        1 Reply Last reply Reply Quote 0
                        • E
                          Easter last edited by

                          Hi,
                          I was going to install the 2.2.5 version when I found this issue:
                          https://redmine.lighttpd.net/issues/2700
                          resolved with verion 1.4.39  https://www.lighttpd.net/2016/1/2/1.4.39/

                          The lighthttpd verion on pfSense 2.2.5 is 1.4.37 but on 2.2.6 I have lighttpd/1.4.38

                          Now I'm going to
                          pkg install lighttpd
                          trying with lighttpd/1.4.39

                          @carzin what is the lighttpd on your pfSense 2.3 ?

                          I Hope this will help.

                          @carzin:

                          Ok, so I am glad other people are experiencing this.  I forgot about my post.  What is interesting is that when the portal dies, there is nothing in the log.  It just stops logging.  But everything else is functioning.  I actually decided to go entirely crazy, and in a production environment, I upgraded to the 2.3 experimental track when the 2.2.6 was having to be rebooted MULTIPLE times in a day.  it was easier for me to snapshot (on VM), the server, then upgrade and see.  I had to add a 'zone' value in the login page, however.

                          I upgraded to whatever the daily build was last Thursday, and the server has been running without issue since then.  I am going to declare victory if I make it through tomorrow.

                          I can only hope whatever bug is in 2.2.6 does not make it to the 2.3 track.

                          To answer some of the questions…

                          I run 4 pfsense boxes on VMware.  I have given them 4 cores and 8 gigs of memory (plenty).  When the service dies, if I person connects to our onboarding SSID, the DNS responder correctly gives the person the pfsense IP address for whatever DNS entry they are looking up, then the connection just spins.  No page is launched.  I can't tell you for sure what happens to pre-existing connections.  We have 10 minute time outs, and by the time I discover the problem, the 10 minutes is up and captive portal status reports 0 users.  I am not going to run this on physical hardware, because in our environment, it can be just too problematic.  Since I run about 200 VMs just for my group, and nothing else has issues, I am going to say this is a pfSense issue, not a platform issue (I have about 60 servers on FreeBSD).  This is not a rinky dink environment, and it is on a platform that cost millions.  In the past versions, lightd would die periodically (maybe once a month), and it would be logged to splunk and I would get an alert and reboot the box.  It didn't happen enough to really bother me (this is free, so what can you do?).  But with 2.2.6, it was dying in a way that I got no messages in my log, and I would eventually get calls from the helpdesk.  It was so bad, I was considering pulling out the entire service and doing something else.

                          users authenticate via an https portal which connects to a radius backend (off server) to authenticate.

                          This was the old trusty message I would look for to see if the captive portal died.  The new 'dying' method doesn't do this:

                          network    lighttpd                1                        syslog        IP_REMOVED                    Sun Jan 24 10:17:57 2016    splunkindex0p    /var/log/SPOOL/networking-current/daemon-networking.log    Jan 24 10:17:57 152.2.78.166 lighttpd[31698]: (mod_fastcgi.c.2912) connection was dropped after accept() (perhaps the fastcgi process died), write-offset: 8192 socket: unix:/tmp/php-fastcgi-cpzone.socket-5

                          1 Reply Last reply Reply Quote 0
                          • H
                            heper last edited by

                            lighttpd has been removed from 2.3

                            1 Reply Last reply Reply Quote 0
                            • C
                              carzin last edited by

                              Easter:  Thank you very much for figuring that out.    On the 3 remaining the version 1.4.38 is installed.

                              I am glad this package is gone with 2.3.  It has been the source of much grief for me.

                              1 Reply Last reply Reply Quote 0
                              • E
                                Easter last edited by

                                Hi carzin,
                                after more than 20 hours since  lighttpd-1.4.39_1 installed, there are no crashes and now the CP is handling more than 1500 users logged-in concurrently.
                                On Monday I can provide more data, anyway now I think the issue could be resolved this way.

                                The sequence I used are:

                                pkg upgrade
                                

                                (just to install/upgrade pkg binaries)

                                pkg install lighttpd
                                

                                on pfsense console menu choose: 11) Restart webConfigurator 
                                (to restart lighttpd processes without to force log-out users)

                                @carzin:

                                Easter:  Thank you very much for figuring that out.    On the 3 remaining the version 1.4.38 is installed.

                                I am glad this package is gone with 2.3.  It has been the source of much grief for me.

                                1 Reply Last reply Reply Quote 0
                                • C
                                  carzin last edited by

                                  Thank you very much!  I've done this on all my boxes.  Time will tell.  I'll just keep hoping that my box running 2.3 continues to do so without issue :)

                                  1 Reply Last reply Reply Quote 0
                                  • E
                                    Easter last edited by

                                    Hi carzin,
                                    my setup is working well uninterrupted since the lighttpd "upgrade"

                                    I hope will be released an official update for pfSense CaptivePortal.

                                    How to report this bug to packager?

                                    @carzin:

                                    Thank you very much!  I've done this on all my boxes.  Time will tell.  I'll just keep hoping that my box running 2.3 continues to do so without issue :)

                                    1 Reply Last reply Reply Quote 0
                                    • C
                                      cmb last edited by

                                      It's an issue in lighttpd itself, which we won't do anything about since lighttpd's removed from the next release entirely (because they kept putting out buggy releases, like this as one example of multiple ones in the past several months). So no need to report it, it's already fixed going forward.

                                      1 Reply Last reply Reply Quote 0
                                      • C
                                        carzin last edited by

                                        Thanks, cmb.  We have always had some issues with captive portal stability, seemingly linked to the lighttpd process.  I usually get a call from the helpdesk once every month or two when the portal stops working and I restart the server.  It went crazy with the latest release, to the point that I was thinking of abandoning pfsense.  The beta 2.3 build has been super smooth in the nearly 2 weeks it has been up.  When do you anticipate a proper release?

                                        1 Reply Last reply Reply Quote 0
                                        • H
                                          heper last edited by

                                          @carzin:

                                          When do you anticipate a proper release?

                                          Its beta_x now. no clue if there is going to be a beta_y or z. then there will be atleast one or more release candidates.
                                          i doubt they have set a fixed release date at this point. i'm hoping/guessing it'll be between april & june somewhere.

                                          1 Reply Last reply Reply Quote 0
                                          • D
                                            denis31 last edited by

                                            Hi,

                                            Same issues for me (lighttpd core dump).
                                            Sometimes several times per day or a few times per week.
                                            Already authenticated users are OK but no captive portal page for newcomers.
                                            I am also trying the lighttpd upgrade to 1.4.39_1… Time will says if it does the job.

                                            What will be lighttpd replacement in PfSense 2.3 ?

                                            DM

                                            1 Reply Last reply Reply Quote 0
                                            • 2
                                              21hertz last edited by

                                              Same problem here.

                                              Happens a few times a week, seems random.
                                              Services status still reports captiveportal as Up so the only chance to know if its down is if a client user is reporting or if you use external monitoring on the CP's http/https.

                                              1 Reply Last reply Reply Quote 0
                                              • Gertjan
                                                Gertjan last edited by

                                                @denis31:

                                                What will be lighttpd replacement in PfSense 2.3 ?

                                                Beta section of the forum https://forum.pfsense.org/index.php?board=65.0 : third line ….

                                                1 Reply Last reply Reply Quote 0
                                                • C
                                                  cmb last edited by

                                                  @21hertz:

                                                  Same problem here.

                                                  Happens a few times a week, seems random.
                                                  Services status still reports captiveportal as Up so the only chance to know if its down is if a client user is reporting or if you use external monitoring on the CP's http/https.

                                                  Try upgrading lighttpd as others have done successfully, 'pkg install lighttpd', then option 16 followed by option 11 at the console.

                                                  1 Reply Last reply Reply Quote 0
                                                  • D
                                                    denis31 last edited by

                                                    @21hertz:

                                                    Same problem here.

                                                    Happens a few times a week, seems random.
                                                    Services status still reports captiveportal as Up so the only chance to know if its down is if a client user is reporting or if you use external monitoring on the CP's http/https.

                                                    You can check this using CLI (ssh public key auth for automation):
                                                    ps -aux | grep http
                                                    root  13324  7.0  1.4 169520 115368  -  S    2:52PM    57:12.67 /usr/local/sbin/lighttpd -f /var/etc/lighty-ssid_ups-CaptivePortal-SSL.co
                                                    root  11329  0.0  0.1  56884  7736  -  S    2:52PM    0:40.39 /usr/local/sbin/lighttpd -f /var/etc/lighty-webConfigurator.conf
                                                    root  13040  0.0  0.2  67120  17404  -  S    2:52PM    3:42.73 /usr/local/sbin/lighttpd -f /var/etc/lighty-ssid_ups-CaptivePortal.conf

                                                    Or checking log file for Lighttpd 'Core dump' error.

                                                    Denis.

                                                    1 Reply Last reply Reply Quote 0
                                                    • P
                                                      perjoh last edited by

                                                      Does  anyone know if this problem i solved in 2.3 ?

                                                      1 Reply Last reply Reply Quote 0
                                                      • Gertjan
                                                        Gertjan last edited by

                                                        Well ….. knowing that "lighttpd" isn't part of pfSEnse 2.2.3 anymore .......  ;)

                                                        2.2.3 works fine for me ....

                                                        Oops : I meant 2.3.

                                                        1 Reply Last reply Reply Quote 0
                                                        • S
                                                          sebastiannielsen last edited by

                                                          Gertjan: The new version is named 2.3
                                                          The version you are referring to, is prior to 2.2.5 and 2.2.6 and still contains lightppd.

                                                          1 Reply Last reply Reply Quote 0
                                                          • C
                                                            cmb last edited by

                                                            Yes it is solved in 2.3. lighttpd no longer exists, that's been switched to nginx which has no such problems.

                                                            1 Reply Last reply Reply Quote 0
                                                            • First post
                                                              Last post

                                                            Products

                                                            • Platform Overview
                                                            • TNSR
                                                            • pfSense Plus
                                                            • Appliances

                                                            Services

                                                            • Training
                                                            • Professional Services

                                                            Support

                                                            • Subscription Plans
                                                            • Contact Support
                                                            • Product Lifecycle
                                                            • Documentation

                                                            News

                                                            • Media Coverage
                                                            • Press
                                                            • Events

                                                            Resources

                                                            • Blog
                                                            • FAQ
                                                            • Find a Partner
                                                            • Resource Library
                                                            • Security Information

                                                            Company

                                                            • About Us
                                                            • Careers
                                                            • Partners
                                                            • Contact Us
                                                            • Legal
                                                            Our Mission

                                                            We provide leading-edge network security at a fair price - regardless of organizational size or network sophistication. We believe that an open-source security model offers disruptive pricing along with the agility required to quickly address emerging threats.

                                                            Subscribe to our Newsletter

                                                            Product information, software announcements, and special offers. See our newsletter archive to sign up for future newsletters and to read past announcements.

                                                            © 2021 Rubicon Communications, LLC | Privacy Policy