Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Partial lock-up

    2.0-RC Snapshot Feedback and Problems - RETIRED
    4
    7
    2.7k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • C
      clarknova
      last edited by

      2.0-BETA5 (amd64)
      built on Sat Jan 29 18:46:16 EST 2011

      Working remotely over https, I had just created a port forward rule with associated firewall rule (on WAN) and applied changes. I then went to the WAN firewall rules page and checked the new rule, then hit the move arrow to move it up. At this point the web UI timed out. I tried reloading the page but it timed out again.

      I had an ssh session open to a host on pfsense's LAN side. This session continued responsive, but DNS resolution from the LAN host was failing, and I could not get a ping response from pfsense from the LAN host nor from the remote host (WAN), while normally I can. I tried pinging pfsense's LAN interface by IP address from the LAN host and this too failed. So pfsense continued routing even though it was not otherwise responding on the WAN or LAN interface.

      I called my wife at home and had her check the vga console, which appeared normal. She hit Enter and the console reloaded. At this point I had her reboot using console option 5, which worked.

      Unbound is installed, but it was more than DNS that failed, so I don't know if it's to blame.

      I have a cron job recording the output of netstat -m to /var/log/netstat-m.log every hour, but the file is missing after the reboot. I'm not sure if that's related but I find it odd.

      I looked in /var/crash but the only thing there is a 5B file 'minfree' that contains only '2048'.

      Is there something else I can look at for clues to the cause of this? Anything else I can check on the console before rebooting if it happens again?

      db

      1 Reply Last reply Reply Quote 0
      • jimpJ
        jimp Rebel Alliance Developer Netgate
        last edited by

        Could have been mbuf exhaustion or some other similar issue… things to look for might be:

        netstat -m
        netstat -ni
        top -SH

        See if anything unusual is taking up the CPU time.

        Upgrade ASAP to a snap from 2/2 or later though so you can rule out the FTP proxy as a possible cause as well.

        /var/crash only gets data in the case of a kernel panic - not a slowdown or hang.

        Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

        Need help fast? Netgate Global Support!

        Do not Chat/PM for help!

        1 Reply Last reply Reply Quote 0
        • C
          clarknova
          last edited by

          I was recording netstat -m to a log file hourly. Not sure why the file disappeared on reboot, but mbufs were x/7500 minutes before the wheels came off. The "max mbuf clusters" value reported by netstat -m is invariably 32768, and my total was nowhere near that number shortly before the problem occurred.

          I will update to the latest snap tonight and try again. Do you know why my log file (/var/log/netstat-m.log) would have disappeared during the lock-reboot process? This is a full install.

          db

          1 Reply Last reply Reply Quote 0
          • jimpJ
            jimp Rebel Alliance Developer Netgate
            last edited by

            the log folder is usually kept, I thought… though the actual clog files are reset.

            You might try writing it in /root/ just to see if it makes a difference.

            Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

            Need help fast? Netgate Global Support!

            Do not Chat/PM for help!

            1 Reply Last reply Reply Quote 0
            • _
              _igor_
              last edited by

              Have the same lock-ups since Feb/02 update. Webinterface completely stops responding. Normally after restarting the webconfigurater twice! it works again - after waiting about 30sec to 1 minute. No log-entries, nor any log-entry which tells the restart of lighty. Strange.

              It happens really often, sometimes calling log-files, sometimes when i try to call a service-page, so at every possible page i try to call. top shows nothing: 100% idle. Rest of the function seems to be "normal": I can surf internet, all services seem to work as normal.

              Today when it happened, unbound died just at the same moment. Never happened before, so it was maybe a pure coincidence.

              1 Reply Last reply Reply Quote 0
              • jimpJ
                jimp Rebel Alliance Developer Netgate
                last edited by

                Update to the Feb 3 snapshot and see if it can still be reproduced. I'm not sure if it'll really impact this particular issue, but several other patches were added yesterday evening to fix other issues.

                Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                Need help fast? Netgate Global Support!

                Do not Chat/PM for help!

                1 Reply Last reply Reply Quote 0
                • M
                  mromero
                  last edited by

                  On i386 of 3 Feb just had a hard lockup.

                  On reboot I noticed "No Core Dump" message as messages were flying past,

                  Borat still smiling.

                  Upgrading to today's snapshot. Wish me luck.

                  1 Reply Last reply Reply Quote 0
                  • First post
                    Last post
                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.