Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Hanging webGUI fix

    Scheduled Pinned Locked Moved webGUI
    22 Posts 5 Posters 6.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • D
      daq
      last edited by

      I searched the forum, but couldn't find any working suggestions for restarting webGUI after it crashes so I thought this could help someone. It's not enough to just kill lighttpd because it'll just hang again as soon as you start it. You also need to remove php sockets from /tmp

      
      srwxr-xr-x  1 root  wheel       0 Apr 23 18:59 php-fastcgi.socket-0
      srwxr-xr-x  1 root  wheel       0 Apr 23 18:59 php-fastcgi.socket-1
      
      

      So the complete solution is

      Optionally confirm that you've got a few zombies:

      ps -ajx |grep Z
      

      You can then kill by PPID (3rd colum) or just:

      killall -9 php; killall -9 lighttpd
      

      Delete the sockets:

      rm /tmp/php-fastcgi.socket*
      

      Restart lighttpd:

      /etc/rc.restart_webgui
      

      Restarting the entire box will work too, but sometimes its not an option.

      1 Reply Last reply Reply Quote 0
      • C
        cmb
        last edited by

        You must be on 2.0.2, better fix is to upgrade to 2.0.3 where fastcgi doesn't crash anymore.

        1 Reply Last reply Reply Quote 0
        • D
          daq
          last edited by

          I am on 2.0.3. Still crashes regularly.

          1 Reply Last reply Reply Quote 0
          • M
            marama
            last edited by

            @daq:

            I am on 2.0.3. Still crashes regularly.

            I am also running 2.0.3 , nano on Alix boards (have tried 2 x D13 boards).
            Crashes every few minutes. I found no entry in the logs.
            Am very sad.

            1 Reply Last reply Reply Quote 0
            • K
              kejianshi
              last edited by

              I'd love to know if there is some common factor with the few people who's web gui are crashing?

              Memory maxed out?  Certain combination of packages?  Web Gui mods?  Hardware?  Install type?  Anything?

              Because 2.03 for me is rock solid.

              1 Reply Last reply Reply Quote 0
              • D
                doktornotor Banned
                last edited by

                On Alix? Yeah, out of RAM plus insane packages installed (such as squid, snort etc.)

                1 Reply Last reply Reply Quote 0
                • K
                  kejianshi
                  last edited by

                  Hmmmm.  Well I suppose treating an Alix build like it had dual xeon processors and 64GB of RAM would do that…

                  1 Reply Last reply Reply Quote 0
                  • M
                    marama
                    last edited by

                    @doktornotor:

                    On Alix? Yeah, out of RAM plus insane packages installed (such as squid, snort etc.)

                    No, no packages installed. I've disabled local logging, am using remote syslog.

                    Aug 26 13:27:27 xxx.xxx.xxx.xxx check_reload_status: Syncing firewall
                    Aug 26 13:27:28 xxx.xxx.xxx.xxx lighttpd[56597]: (server.c.1546) server stopped by UID = 0 PID = 61135
                    Aug 26 13:27:28 xxx.xxx.xxx.xxx lighttpd[56597]: (server.c.1546) server stopped by UID = 0 PID = 61135
                    Aug 26 13:27:28 xxx.xxx.xxx.xxx minicron: (/etc/rc.prunecaptiveportal) terminated by signal 15 (Terminated: 15)
                    Aug 26 13:27:28 xxx.xxx.xxx.xxx kernel: IP firewall unloaded
                    Aug 26 13:27:28 xxx.xxx.xxx.xxx check_reload_status: Reloading filter
                    Aug 26 13:27:33 xxx.xxx.xxx.xxx php: : MONITOR: GW3G is down, removing from routing group
                    Aug 26 13:27:53 xxx.xxx.xxx.xxx sshd[17069]: fatal: Write failed: Operation not permitted
                    Aug 26 13:27:53 xxx.xxx.xxx.xxx sshd[17069]: fatal: Write failed: Operation not permitted

                    also SSH seems to "crash" (my ssh connection is terminated)

                    I either wait or do this in order to bring the webGUI back up:

                    killall -9 php; killall -9 lighttpd; /etc/rc.restart_webgui

                    The only package I have installed is darkstat, I will uninstall it and watch the behavior. Actually, darkstat "crashes" also when I choose more than 1 interface to monitor so I am having hopes it might be the cause of the webGUI crashing.

                    Name XXX
                    Version 2.0.3-RELEASE (i386)
                    built on Fri Apr 12 10:22:18 EDT 2013
                    FreeBSD 8.1-RELEASE-p13

                    You are on the latest version.
                    Platform nanobsd (4g)
                    NanoBSD Boot Slice pfsense0 / ad0s1
                    CPU Type Geode(TM) Integrated Processor by AMD PCS
                    Uptime
                    Current date/time
                    Mon Aug 26 13:36:31 CEST 2013
                    DNS server(s) 127.0.0.1
                    xxx.xxx.xxx.xxx
                    xxx.xxx.xxx.xxx
                    Last config change Mon Aug 26 13:34:00 CEST 2013
                    State table size
                    Show states
                    MBUF Usage 646/8640
                    CPU usage 2%
                    Memory usage 32%
                    Disk usage 9%

                    top:
                    CPU:  9.6% user,  0.0% nice,  5.1% system,  0.0% interrupt, 85.4% idle
                    Mem: 32M Active, 43M Inact, 44M Wired, 34M Buf, 115M Free

                    I don't think RAM or CPU is the problem, but this info is after I've uninstalled DarkStat. I'll watch and let you know. The board is "PC Engines ALIX.2D13" (500 MHz AMD Geode LX800 CPU, 256 MB SDRAM). BTW, I was using DarkStat to trace the heavy IP usage (LAN and WAN), but was not happy without being able to use more then one interface at the time. Are there any good alternatives? I just need to see the top X heavy users by source and dest IP. pfTop was to much info (didn't get arround well), have tried bandwidthd, also not so happy. What do you guys use?

                    In case Alix is too weak, also Soekris would be affordable. I am not happy with Alix throughoutput anyway (50-80 Mb/s), need to be able to reach LAN <=> DMZ with much more speed. Would I be better off with Soekris net6501-30 (600 MHz) or maybe net6501-50 (1GHz Atom). I wouldn't like to oversize it. We are using ASA 5510 for VPN, so I just need the routing and some basic monitoring.

                    1 Reply Last reply Reply Quote 0
                    • D
                      doktornotor Banned
                      last edited by

                      1/ Considering the lighttpd gets stopped right after this:

                      
                      Aug 26 13:27:28 xxx.xxx.xxx.xxx minicron: (/etc/rc.prunecaptiveportal) terminated by signal 15 (Terminated: 15)
                      
                      

                      you might turn off the captive portal as well. Does not even look like it's killed, just regular stop.

                      2/ Darkstat can ONLY listen on one interface; at least until it gets updated.

                      P.S. Using exact same board on multiple places with 2.1RC with no GUI crashes at all.

                      1 Reply Last reply Reply Quote 0
                      • M
                        marama
                        last edited by

                        @doktornotor:

                        1/ Considering the lighttpd gets stopped right after this:

                        
                        Aug 26 13:27:28 xxx.xxx.xxx.xxx minicron: (/etc/rc.prunecaptiveportal) terminated by signal 15 (Terminated: 15)
                        
                        

                        you might turn off the captive portal as well. Does not even look like it's killed, just regular stop.

                        2/ Darkstat can ONLY listen on one interface; at least until it gets updated.

                        P.S. Using exact same board on multiple places with 2.1RC with no GUI crashes at all.

                        OK, will give it a try. I've just had another crash, so Darkstat is not causing the problems.

                        Aug 26 13:48:48 xxx.xxx.xxx.xxx check_reload_status: Syncing firewall
                        Aug 26 13:48:49 xxx.xxx.xxx.xxx logportalauth[27062]: Restarting captive portal.
                        Aug 26 13:48:49 xxx.xxx.xxx.xxx kernel: ipfw2 (+ipv6) initialized, divert loadable, nat loadable, rule-based forwarding enabled, default to accept, logging disabled
                        Aug 26 13:48:51 xxx.xxx.xxx.xxx check_reload_status: Reloading filter
                        Aug 26 13:48:56 xxx.xxx.xxx.xxx php: : MONITOR: GW3G is down, removing from routing group
                        Aug 26 13:49:16 xxx.xxx.xxx.xxx logportalauth[1184]: LOGIN: orbit, 00:0c:29:ca:be:91, 172.16.0.100
                        Aug 26 13:50:44 xxx.xxx.xxx.xxx logportalauth[1184]: FAILURE: orbit, 00:0c:29:ca:be:91, 172.16.0.100
                        Aug 26 13:51:06 xxx.xxx.xxx.xxx lighttpd[63368]: (server.c.1546) server stopped by UID = 0 PID = 54459
                        Aug 26 13:51:06 xxx.xxx.xxx.xxx lighttpd[63368]: (server.c.1546) server stopped by UID = 0 PID = 54459
                        Aug 26 13:51:06 xxx.xxx.xxx.xxx check_reload_status: Syncing firewall
                        Aug 26 13:51:07 xxx.xxx.xxx.xxx minicron: (/etc/rc.prunecaptiveportal) terminated by signal 15 (Terminated: 15)
                        Aug 26 13:51:07 xxx.xxx.xxx.xxx kernel: IP firewall unloaded
                        Aug 26 13:51:08 xxx.xxx.xxx.xxx check_reload_status: Reloading filter
                        Aug 26 13:51:13 xxx.xxx.xxx.xxx logportalauth[27062]: Restarting captive portal.
                        Aug 26 13:51:13 xxx.xxx.xxx.xxx kernel: ipfw2 (+ipv6) initialized, divert loadable, nat loadable, rule-based forwarding enabled, default to accept, logging disabled
                        Aug 26 13:51:15 xxx.xxx.xxx.xxx php: : MONITOR: GW3G is down, removing from routing group
                        Aug 26 13:51:18 xxx.xxx.xxx.xxx lighttpd[25214]: (network_writev.c.112) writev failed: Operation not permitted 14
                        Aug 26 13:51:18 xxx.xxx.xxx.xxx lighttpd[25214]: (network_writev.c.112) writev failed: Operation not permitted 14
                        Aug 26 13:51:18 xxx.xxx.xxx.xxx lighttpd[25214]: (connections.c.637) connection closed: write failed on fd 14
                        Aug 26 13:51:18 xxx.xxx.xxx.xxx lighttpd[25214]: (connections.c.637) connection closed: write failed on fd 14

                        am turning the captive portal off (just set it up 10 minutes ago, webGUI crashes I've had for days, so CP alone is also not the problem).

                        1 Reply Last reply Reply Quote 0
                        • D
                          doktornotor Banned
                          last edited by

                          Something is regularly STOPPING your webserver.

                          Aug 26 13:51:06 xxx.xxx.xxx.xxx lighttpd[63368]: (server.c.1546) server stopped by UID = 0  PID = 54459
                          Aug 26 13:51:06 xxx.xxx.xxx.xxx lighttpd[63368]: (server.c.1546) server stopped by UID = 0 PID = 54459

                          So you need find out what process has that PID that appears on those log lines.

                          Also, from that log, I cannot see how you disabled the captive portal, the log suggests pretty clear is it NOT disabled at all.

                          
                          Aug 26 13:48:49 xxx.xxx.xxx.xxx logportalauth[27062]: Restarting captive portal.
                          Aug 26 13:49:16 xxx.xxx.xxx.xxx logportalauth[1184]: LOGIN: orbit, 00:0c:29:ca:be:91, 172.16.0.100
                          Aug 26 13:50:44 xxx.xxx.xxx.xxx logportalauth[1184]: FAILURE: orbit, 00:0c:29:ca:be:91, 172.16.0.100
                          Aug 26 13:51:07 xxx.xxx.xxx.xxx minicron: (/etc/rc.prunecaptiveportal) terminated by signal 15 (Terminated: 15)
                          Aug 26 13:51:13 xxx.xxx.xxx.xxx logportalauth[27062]: Restarting captive portal.
                          
                          

                          You also apparently have some networking issues:

                          
                          Aug 26 13:51:15 xxx.xxx.xxx.xxx php: : MONITOR: GW3G is down, removing from routing group
                          
                          

                          before the lighttpd closes the connection:

                          
                          Aug 26 13:51:18 xxx.xxx.xxx.xxx lighttpd[25214]: (network_writev.c.112) writev failed: Operation not permitted 14
                          Aug 26 13:51:18 xxx.xxx.xxx.xxx lighttpd[25214]: (network_writev.c.112) writev failed: Operation not permitted 14
                          Aug 26 13:51:18 xxx.xxx.xxx.xxx lighttpd[25214]: (connections.c.637) connection closed: write failed on fd 14
                          Aug 26 13:51:18 xxx.xxx.xxx.xxx lighttpd[25214]: (connections.c.637) connection closed: write failed on fd 14
                          
                          

                          (Well, no wonder it seems that it "crashes" when your network is down.)

                          1 Reply Last reply Reply Quote 0
                          • M
                            marama
                            last edited by

                            @doktornotor:

                            Something is regularly STOPPING your webserver.

                            Aug 26 13:51:06 xxx.xxx.xxx.xxx lighttpd[63368]: (server.c.1546) server stopped by UID = 0  PID = 54459
                            Aug 26 13:51:06 xxx.xxx.xxx.xxx lighttpd[63368]: (server.c.1546) server stopped by UID = 0 PID = 54459

                            So you need find out what process has that PID that appears on those log lines.

                            Hm, how could I do that? I tried by running top and simply finding the PID on the screen when crash occures (just did), but the PID was not on the screen. Either the killer PID was low in usage so he didn't show in top, or it was an ad-hoc process that didn't exist before. Can I somehow send PID info upon process generation to the syslog server? Simply cronjobing "ps -aux" will probably not be effective.

                            BTW, since there is no point in using Alix board if I cannot use Darkstat or Captive Portal, I turned both of them on and am looking for the killer-PID.

                            Also, from that log, I cannot see how you disabled the captive portal, the log suggests pretty clear is it NOT disabled at all.

                            Sorry for not being clear, the log was made BEFORE I've turned the captive portal off, that explains the CP entries in the log. Just for clarification, Alix Board should be able to cope with captive portal, right?

                            You also apparently have some networking issues:

                            
                            Aug 26 13:51:15 xxx.xxx.xxx.xxx php: : MONITOR: GW3G is down, removing from routing group
                            
                            

                            I could of course remove the Gateway Group entry, but the line is there everytime I have a crash. I will remove it and watch.

                            before the lighttpd closes the connection:

                            
                            Aug 26 13:51:18 xxx.xxx.xxx.xxx lighttpd[25214]: (network_writev.c.112) writev failed: Operation not permitted 14
                            Aug 26 13:51:18 xxx.xxx.xxx.xxx lighttpd[25214]: (network_writev.c.112) writev failed: Operation not permitted 14
                            Aug 26 13:51:18 xxx.xxx.xxx.xxx lighttpd[25214]: (connections.c.637) connection closed: write failed on fd 14
                            Aug 26 13:51:18 xxx.xxx.xxx.xxx lighttpd[25214]: (connections.c.637) connection closed: write failed on fd 14
                            
                            

                            (Well, no wonder it seems that it "crashes" when your network is down.)

                            It was a dirty log, don't remember what I've done at that exact moment. But thanks for the hint, I will try to exclude the networking issue by bringing the pfSense box and attaching it to my workstation directly. Though, we have some 60 Workstations on the switch, no problems. No firewall rules or any limiter entries that could cause problems from pfSense side.

                            Thanx for helping me out ;)

                            1 Reply Last reply Reply Quote 0
                            • D
                              doktornotor Banned
                              last edited by

                              Please, post something useful, not "dirty" logs. The log shows that your network crashes and that's pretty much it. I'd suggest wiping the config and restarting form scratch.

                              1 Reply Last reply Reply Quote 0
                              • M
                                marama
                                last edited by

                                @doktornotor:

                                Please, post something useful, not "dirty" logs. The log shows that your network crashes and that's pretty much it. I'd suggest wiping the config and restarting form scratch.

                                With "dirty" I ment I was constantly restarting services (darkstat, captive portal…) so I was not clear if I made the logs dirty by my own actions.
                                The installation is from scratch, I only had Darkstat running for few days and tried to get captive portal running today. I also have some basic port forwards. No additional services or some weird settings. I've tried the fail over route configuration, it was running on WAN and it was no math science so I excluded that as cause for my WebGUI problems, but in order to eliminate the causes, I removed it too (as you indirectly suggested). Am out of office now, but I have another CF I will copy pfSense to and will give it a go tomorrow from scratch, will let you know how it goes.

                                For me it would be important to have an idea if I am overstretching the hardware, should Alix board be able to handle 12 Mbit internet connection, some firewall rules, RRD graphs (default ones), captive portal, DHCP Server, DNS forwarder, Darkstat monitoring and Gateway Failover, maybe 5 VLANs... Alix board should be able to handle that easily, right?

                                Anyway, thnx a lot for helping. My main suspect is the gateway I removed, will test tomorrow and do a clean install if necessary. Will let you know how it goes.

                                1 Reply Last reply Reply Quote 0
                                • K
                                  kejianshi
                                  last edited by

                                  Perhaps you have a hardware problem?  Something about to fail?

                                  1 Reply Last reply Reply Quote 0
                                  • D
                                    doktornotor Banned
                                    last edited by

                                    @kejianshi:

                                    Perhaps you have a hardware problem?  Something about to fail?

                                    Most likely. Though, with statements like "The installation is from scratch" and "My main suspect is the gateway I removed"…  ::)

                                    1 Reply Last reply Reply Quote 0
                                    • K
                                      kejianshi
                                      last edited by

                                      doktornotor - You missed your calling as depression counselor…  :D

                                      1 Reply Last reply Reply Quote 0
                                      • D
                                        doktornotor Banned
                                        last edited by

                                        @kejianshi:

                                        doktornotor - You missed your calling as depression counselor…  :D

                                        1 Reply Last reply Reply Quote 0
                                        • M
                                          marama
                                          last edited by

                                          @doktornotor:

                                          @kejianshi:

                                          Perhaps you have a hardware problem?  Something about to fail?

                                          Most likely. Though, with statements like "The installation is from scratch" and "My main suspect is the gateway I removed"…  ::)

                                          Hardware… hope not because it would be difficult to diagnose, hope it's a topology configuration problem (outside pfSense), I still don't know. We have a 3 line DSL that has one line down, waiting for ISP to fix that. But it's on WAN side so I don't think it could crash the WebGUI just like that. The thing with gateway is that I had a gateway on LAN, so main Gateway on WAN side, and failover on LAN side. I've tested the failover and it worked, but I might have concluded to easily that it's the right way to do. Anyway, I bought another managed switch so I will be able to get 2 WAN ports and have a clean installation on that side. I think there is not much point in pursuing the WebGUI issues before making sure the environment is the right one. But I'll repeat once more, I've really done nothing "unusual" to pfSense, I wouldn't expect much from install from scratch, the few settings I've made shouldn't be crashing the WebGUI.
                                          I hope it was the failover gateway configuration causing problems.

                                          1 Reply Last reply Reply Quote 0
                                          • K
                                            kejianshi
                                            last edited by

                                            I think having all your gateways on WAN is a good idea…

                                            Hope it goes well when you get your new equipment.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.