Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Hanging webGUI fix

    webGUI
    5
    22
    6.0k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • K
      kejianshi
      last edited by

      Hmmmm.  Well I suppose treating an Alix build like it had dual xeon processors and 64GB of RAM would do that…

      1 Reply Last reply Reply Quote 0
      • M
        marama
        last edited by

        @doktornotor:

        On Alix? Yeah, out of RAM plus insane packages installed (such as squid, snort etc.)

        No, no packages installed. I've disabled local logging, am using remote syslog.

        Aug 26 13:27:27 xxx.xxx.xxx.xxx check_reload_status: Syncing firewall
        Aug 26 13:27:28 xxx.xxx.xxx.xxx lighttpd[56597]: (server.c.1546) server stopped by UID = 0 PID = 61135
        Aug 26 13:27:28 xxx.xxx.xxx.xxx lighttpd[56597]: (server.c.1546) server stopped by UID = 0 PID = 61135
        Aug 26 13:27:28 xxx.xxx.xxx.xxx minicron: (/etc/rc.prunecaptiveportal) terminated by signal 15 (Terminated: 15)
        Aug 26 13:27:28 xxx.xxx.xxx.xxx kernel: IP firewall unloaded
        Aug 26 13:27:28 xxx.xxx.xxx.xxx check_reload_status: Reloading filter
        Aug 26 13:27:33 xxx.xxx.xxx.xxx php: : MONITOR: GW3G is down, removing from routing group
        Aug 26 13:27:53 xxx.xxx.xxx.xxx sshd[17069]: fatal: Write failed: Operation not permitted
        Aug 26 13:27:53 xxx.xxx.xxx.xxx sshd[17069]: fatal: Write failed: Operation not permitted

        also SSH seems to "crash" (my ssh connection is terminated)

        I either wait or do this in order to bring the webGUI back up:

        killall -9 php; killall -9 lighttpd; /etc/rc.restart_webgui

        The only package I have installed is darkstat, I will uninstall it and watch the behavior. Actually, darkstat "crashes" also when I choose more than 1 interface to monitor so I am having hopes it might be the cause of the webGUI crashing.

        Name XXX
        Version 2.0.3-RELEASE (i386)
        built on Fri Apr 12 10:22:18 EDT 2013
        FreeBSD 8.1-RELEASE-p13

        You are on the latest version.
        Platform nanobsd (4g)
        NanoBSD Boot Slice pfsense0 / ad0s1
        CPU Type Geode(TM) Integrated Processor by AMD PCS
        Uptime
        Current date/time
        Mon Aug 26 13:36:31 CEST 2013
        DNS server(s) 127.0.0.1
        xxx.xxx.xxx.xxx
        xxx.xxx.xxx.xxx
        Last config change Mon Aug 26 13:34:00 CEST 2013
        State table size
        Show states
        MBUF Usage 646/8640
        CPU usage 2%
        Memory usage 32%
        Disk usage 9%

        top:
        CPU:  9.6% user,  0.0% nice,  5.1% system,  0.0% interrupt, 85.4% idle
        Mem: 32M Active, 43M Inact, 44M Wired, 34M Buf, 115M Free

        I don't think RAM or CPU is the problem, but this info is after I've uninstalled DarkStat. I'll watch and let you know. The board is "PC Engines ALIX.2D13" (500 MHz AMD Geode LX800 CPU, 256 MB SDRAM). BTW, I was using DarkStat to trace the heavy IP usage (LAN and WAN), but was not happy without being able to use more then one interface at the time. Are there any good alternatives? I just need to see the top X heavy users by source and dest IP. pfTop was to much info (didn't get arround well), have tried bandwidthd, also not so happy. What do you guys use?

        In case Alix is too weak, also Soekris would be affordable. I am not happy with Alix throughoutput anyway (50-80 Mb/s), need to be able to reach LAN <=> DMZ with much more speed. Would I be better off with Soekris net6501-30 (600 MHz) or maybe net6501-50 (1GHz Atom). I wouldn't like to oversize it. We are using ASA 5510 for VPN, so I just need the routing and some basic monitoring.

        1 Reply Last reply Reply Quote 0
        • D
          doktornotor Banned
          last edited by

          1/ Considering the lighttpd gets stopped right after this:

          
          Aug 26 13:27:28 xxx.xxx.xxx.xxx minicron: (/etc/rc.prunecaptiveportal) terminated by signal 15 (Terminated: 15)
          
          

          you might turn off the captive portal as well. Does not even look like it's killed, just regular stop.

          2/ Darkstat can ONLY listen on one interface; at least until it gets updated.

          P.S. Using exact same board on multiple places with 2.1RC with no GUI crashes at all.

          1 Reply Last reply Reply Quote 0
          • M
            marama
            last edited by

            @doktornotor:

            1/ Considering the lighttpd gets stopped right after this:

            
            Aug 26 13:27:28 xxx.xxx.xxx.xxx minicron: (/etc/rc.prunecaptiveportal) terminated by signal 15 (Terminated: 15)
            
            

            you might turn off the captive portal as well. Does not even look like it's killed, just regular stop.

            2/ Darkstat can ONLY listen on one interface; at least until it gets updated.

            P.S. Using exact same board on multiple places with 2.1RC with no GUI crashes at all.

            OK, will give it a try. I've just had another crash, so Darkstat is not causing the problems.

            Aug 26 13:48:48 xxx.xxx.xxx.xxx check_reload_status: Syncing firewall
            Aug 26 13:48:49 xxx.xxx.xxx.xxx logportalauth[27062]: Restarting captive portal.
            Aug 26 13:48:49 xxx.xxx.xxx.xxx kernel: ipfw2 (+ipv6) initialized, divert loadable, nat loadable, rule-based forwarding enabled, default to accept, logging disabled
            Aug 26 13:48:51 xxx.xxx.xxx.xxx check_reload_status: Reloading filter
            Aug 26 13:48:56 xxx.xxx.xxx.xxx php: : MONITOR: GW3G is down, removing from routing group
            Aug 26 13:49:16 xxx.xxx.xxx.xxx logportalauth[1184]: LOGIN: orbit, 00:0c:29:ca:be:91, 172.16.0.100
            Aug 26 13:50:44 xxx.xxx.xxx.xxx logportalauth[1184]: FAILURE: orbit, 00:0c:29:ca:be:91, 172.16.0.100
            Aug 26 13:51:06 xxx.xxx.xxx.xxx lighttpd[63368]: (server.c.1546) server stopped by UID = 0 PID = 54459
            Aug 26 13:51:06 xxx.xxx.xxx.xxx lighttpd[63368]: (server.c.1546) server stopped by UID = 0 PID = 54459
            Aug 26 13:51:06 xxx.xxx.xxx.xxx check_reload_status: Syncing firewall
            Aug 26 13:51:07 xxx.xxx.xxx.xxx minicron: (/etc/rc.prunecaptiveportal) terminated by signal 15 (Terminated: 15)
            Aug 26 13:51:07 xxx.xxx.xxx.xxx kernel: IP firewall unloaded
            Aug 26 13:51:08 xxx.xxx.xxx.xxx check_reload_status: Reloading filter
            Aug 26 13:51:13 xxx.xxx.xxx.xxx logportalauth[27062]: Restarting captive portal.
            Aug 26 13:51:13 xxx.xxx.xxx.xxx kernel: ipfw2 (+ipv6) initialized, divert loadable, nat loadable, rule-based forwarding enabled, default to accept, logging disabled
            Aug 26 13:51:15 xxx.xxx.xxx.xxx php: : MONITOR: GW3G is down, removing from routing group
            Aug 26 13:51:18 xxx.xxx.xxx.xxx lighttpd[25214]: (network_writev.c.112) writev failed: Operation not permitted 14
            Aug 26 13:51:18 xxx.xxx.xxx.xxx lighttpd[25214]: (network_writev.c.112) writev failed: Operation not permitted 14
            Aug 26 13:51:18 xxx.xxx.xxx.xxx lighttpd[25214]: (connections.c.637) connection closed: write failed on fd 14
            Aug 26 13:51:18 xxx.xxx.xxx.xxx lighttpd[25214]: (connections.c.637) connection closed: write failed on fd 14

            am turning the captive portal off (just set it up 10 minutes ago, webGUI crashes I've had for days, so CP alone is also not the problem).

            1 Reply Last reply Reply Quote 0
            • D
              doktornotor Banned
              last edited by

              Something is regularly STOPPING your webserver.

              Aug 26 13:51:06 xxx.xxx.xxx.xxx lighttpd[63368]: (server.c.1546) server stopped by UID = 0  PID = 54459
              Aug 26 13:51:06 xxx.xxx.xxx.xxx lighttpd[63368]: (server.c.1546) server stopped by UID = 0 PID = 54459

              So you need find out what process has that PID that appears on those log lines.

              Also, from that log, I cannot see how you disabled the captive portal, the log suggests pretty clear is it NOT disabled at all.

              
              Aug 26 13:48:49 xxx.xxx.xxx.xxx logportalauth[27062]: Restarting captive portal.
              Aug 26 13:49:16 xxx.xxx.xxx.xxx logportalauth[1184]: LOGIN: orbit, 00:0c:29:ca:be:91, 172.16.0.100
              Aug 26 13:50:44 xxx.xxx.xxx.xxx logportalauth[1184]: FAILURE: orbit, 00:0c:29:ca:be:91, 172.16.0.100
              Aug 26 13:51:07 xxx.xxx.xxx.xxx minicron: (/etc/rc.prunecaptiveportal) terminated by signal 15 (Terminated: 15)
              Aug 26 13:51:13 xxx.xxx.xxx.xxx logportalauth[27062]: Restarting captive portal.
              
              

              You also apparently have some networking issues:

              
              Aug 26 13:51:15 xxx.xxx.xxx.xxx php: : MONITOR: GW3G is down, removing from routing group
              
              

              before the lighttpd closes the connection:

              
              Aug 26 13:51:18 xxx.xxx.xxx.xxx lighttpd[25214]: (network_writev.c.112) writev failed: Operation not permitted 14
              Aug 26 13:51:18 xxx.xxx.xxx.xxx lighttpd[25214]: (network_writev.c.112) writev failed: Operation not permitted 14
              Aug 26 13:51:18 xxx.xxx.xxx.xxx lighttpd[25214]: (connections.c.637) connection closed: write failed on fd 14
              Aug 26 13:51:18 xxx.xxx.xxx.xxx lighttpd[25214]: (connections.c.637) connection closed: write failed on fd 14
              
              

              (Well, no wonder it seems that it "crashes" when your network is down.)

              1 Reply Last reply Reply Quote 0
              • M
                marama
                last edited by

                @doktornotor:

                Something is regularly STOPPING your webserver.

                Aug 26 13:51:06 xxx.xxx.xxx.xxx lighttpd[63368]: (server.c.1546) server stopped by UID = 0  PID = 54459
                Aug 26 13:51:06 xxx.xxx.xxx.xxx lighttpd[63368]: (server.c.1546) server stopped by UID = 0 PID = 54459

                So you need find out what process has that PID that appears on those log lines.

                Hm, how could I do that? I tried by running top and simply finding the PID on the screen when crash occures (just did), but the PID was not on the screen. Either the killer PID was low in usage so he didn't show in top, or it was an ad-hoc process that didn't exist before. Can I somehow send PID info upon process generation to the syslog server? Simply cronjobing "ps -aux" will probably not be effective.

                BTW, since there is no point in using Alix board if I cannot use Darkstat or Captive Portal, I turned both of them on and am looking for the killer-PID.

                Also, from that log, I cannot see how you disabled the captive portal, the log suggests pretty clear is it NOT disabled at all.

                Sorry for not being clear, the log was made BEFORE I've turned the captive portal off, that explains the CP entries in the log. Just for clarification, Alix Board should be able to cope with captive portal, right?

                You also apparently have some networking issues:

                
                Aug 26 13:51:15 xxx.xxx.xxx.xxx php: : MONITOR: GW3G is down, removing from routing group
                
                

                I could of course remove the Gateway Group entry, but the line is there everytime I have a crash. I will remove it and watch.

                before the lighttpd closes the connection:

                
                Aug 26 13:51:18 xxx.xxx.xxx.xxx lighttpd[25214]: (network_writev.c.112) writev failed: Operation not permitted 14
                Aug 26 13:51:18 xxx.xxx.xxx.xxx lighttpd[25214]: (network_writev.c.112) writev failed: Operation not permitted 14
                Aug 26 13:51:18 xxx.xxx.xxx.xxx lighttpd[25214]: (connections.c.637) connection closed: write failed on fd 14
                Aug 26 13:51:18 xxx.xxx.xxx.xxx lighttpd[25214]: (connections.c.637) connection closed: write failed on fd 14
                
                

                (Well, no wonder it seems that it "crashes" when your network is down.)

                It was a dirty log, don't remember what I've done at that exact moment. But thanks for the hint, I will try to exclude the networking issue by bringing the pfSense box and attaching it to my workstation directly. Though, we have some 60 Workstations on the switch, no problems. No firewall rules or any limiter entries that could cause problems from pfSense side.

                Thanx for helping me out ;)

                1 Reply Last reply Reply Quote 0
                • D
                  doktornotor Banned
                  last edited by

                  Please, post something useful, not "dirty" logs. The log shows that your network crashes and that's pretty much it. I'd suggest wiping the config and restarting form scratch.

                  1 Reply Last reply Reply Quote 0
                  • M
                    marama
                    last edited by

                    @doktornotor:

                    Please, post something useful, not "dirty" logs. The log shows that your network crashes and that's pretty much it. I'd suggest wiping the config and restarting form scratch.

                    With "dirty" I ment I was constantly restarting services (darkstat, captive portal…) so I was not clear if I made the logs dirty by my own actions.
                    The installation is from scratch, I only had Darkstat running for few days and tried to get captive portal running today. I also have some basic port forwards. No additional services or some weird settings. I've tried the fail over route configuration, it was running on WAN and it was no math science so I excluded that as cause for my WebGUI problems, but in order to eliminate the causes, I removed it too (as you indirectly suggested). Am out of office now, but I have another CF I will copy pfSense to and will give it a go tomorrow from scratch, will let you know how it goes.

                    For me it would be important to have an idea if I am overstretching the hardware, should Alix board be able to handle 12 Mbit internet connection, some firewall rules, RRD graphs (default ones), captive portal, DHCP Server, DNS forwarder, Darkstat monitoring and Gateway Failover, maybe 5 VLANs... Alix board should be able to handle that easily, right?

                    Anyway, thnx a lot for helping. My main suspect is the gateway I removed, will test tomorrow and do a clean install if necessary. Will let you know how it goes.

                    1 Reply Last reply Reply Quote 0
                    • K
                      kejianshi
                      last edited by

                      Perhaps you have a hardware problem?  Something about to fail?

                      1 Reply Last reply Reply Quote 0
                      • D
                        doktornotor Banned
                        last edited by

                        @kejianshi:

                        Perhaps you have a hardware problem?  Something about to fail?

                        Most likely. Though, with statements like "The installation is from scratch" and "My main suspect is the gateway I removed"…  ::)

                        1 Reply Last reply Reply Quote 0
                        • K
                          kejianshi
                          last edited by

                          doktornotor - You missed your calling as depression counselor…  :D

                          1 Reply Last reply Reply Quote 0
                          • D
                            doktornotor Banned
                            last edited by

                            @kejianshi:

                            doktornotor - You missed your calling as depression counselor…  :D

                            1 Reply Last reply Reply Quote 0
                            • M
                              marama
                              last edited by

                              @doktornotor:

                              @kejianshi:

                              Perhaps you have a hardware problem?  Something about to fail?

                              Most likely. Though, with statements like "The installation is from scratch" and "My main suspect is the gateway I removed"…  ::)

                              Hardware… hope not because it would be difficult to diagnose, hope it's a topology configuration problem (outside pfSense), I still don't know. We have a 3 line DSL that has one line down, waiting for ISP to fix that. But it's on WAN side so I don't think it could crash the WebGUI just like that. The thing with gateway is that I had a gateway on LAN, so main Gateway on WAN side, and failover on LAN side. I've tested the failover and it worked, but I might have concluded to easily that it's the right way to do. Anyway, I bought another managed switch so I will be able to get 2 WAN ports and have a clean installation on that side. I think there is not much point in pursuing the WebGUI issues before making sure the environment is the right one. But I'll repeat once more, I've really done nothing "unusual" to pfSense, I wouldn't expect much from install from scratch, the few settings I've made shouldn't be crashing the WebGUI.
                              I hope it was the failover gateway configuration causing problems.

                              1 Reply Last reply Reply Quote 0
                              • K
                                kejianshi
                                last edited by

                                I think having all your gateways on WAN is a good idea…

                                Hope it goes well when you get your new equipment.

                                1 Reply Last reply Reply Quote 0
                                • M
                                  marama
                                  last edited by

                                  @kejianshi:

                                  I think having all your gateways on WAN is a good idea…

                                  Hope it goes well when you get your new equipment.

                                  ISP fixed the DSL line and I've managed to put the Gateway on the WAN side of the pfSense box. 3-4 days running… no problems, so I am optimistic. Am not sure if it was the fault DSL connection or having the WAN Failover Gateway on the LAN side, but WebGUI doesen't seem to crash any more. Thanx for your help everybody!

                                  1 Reply Last reply Reply Quote 0
                                  • K
                                    kejianshi
                                    last edited by

                                    Good stuff - I'm glad its working.

                                    1 Reply Last reply Reply Quote 0
                                    • First post
                                      Last post
                                    Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.