Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    PfSense becomes unresponsive occasionally (Alix 2d13, pfSense 2.2.2)

    General pfSense Questions
    4
    7
    1.4k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • -flo- 0-
      -flo- 0
      last edited by

      Hi,

      I’m having problems with occasional crashes of my pfSense box (nano install on an Alix 2d13, pfSense 2.2.2). This has happened with previous versions of pfSense although not as frequently as in the last months. Currently I have such events once or twice a week.

      If this happens the pfSense box gets completely unresponsive: I cannot access the web GUI, via ssh or even ping the box. It does not restart on its own and I have to power cycle to force a restart. I do not find any hint in the system log after the restart as the log starts with the boot process.

      I have never been able to trace these events to user behavior (specific web pages being browsed to, usage of specific other internet services, times of day, etc.).

      I have only on package installed („FTP Client Proxy“). I use several VLANs on LAN and WAN side, the IGMP-Proxy, no VPN, a simple traffic shaper setup, some NAT rules, captive portal, some firewall rules of course. I have not been able to trace the events down to specific configuration elements or configuration changes.

      I cannot trace the problem down to a pfSense version. I did a clean install when switching to Version 2.2.2 and I had the problem before, so this should not be caused by some leftover configuration junk.

      The system shows a memory usage of typically around or below 50%. I cannot find any suspicious system log entries.

      I don’t even know where I should begin to search for the cause of the problem.

      Does anyone have a hint on how to tackle this?

      • Did anybody out there have similar problems?

      • Can this be due to hardware problems and how can I possibly check this?

      • How can I get diagnostic data for such events?

      • Are any configuration options known to have caused such behavior?

      -flo-

      1 Reply Last reply Reply Quote 0
      • H
        hda
        last edited by

        For hardware, first prepare a new CF-card and fresh basic 2.2.2 install.
        For present running box, take the problem solving exclusion route, start with excl. Traffic Shaping.

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          Are you able to connect on the serial console?

          What sort of traffic shaping are you using?

          Steve

          1 Reply Last reply Reply Quote 0
          • -flo- 0-
            -flo- 0
            last edited by

            It happened again. This time I have some more information. First I extracted some log messages from a syslog server:

            29.06.15 16:00:47 172.27.2.1 Unknown Critical [zone: mbuf] kern.ipc.nmbufs limit reached
            29.06.15 16:01:58 172.27.2.1 Unknown Critical vr1_vlan8: unable to prepend VLAN header

            This is probably incomplete (my syslog server is somehow broken …)

            I checked the RRD graphs and found that the mbufs are probably not the problem. I guess something else consumed the memory. I have some graphs attached. I increased the maximum value of mbufs, this is visible in the graph. The mbuf usage until midnight before the crash looks normal.

            Apparently my system breaks every 8 or 9 days. I wasn't aware how regular these breakdown actually occured. The most iteresting graph seemed the memory graph, see below. I had the RRD backup set to daily. There is a gap in the graph because of that periodically.  The "wired" memory has been increasing continuously before the crash.

            Has this been observed by anyone else before? What can be the cause for this / how can I analyze this?

            As the the question regarding the kind of traffic shaping: This is a simple setup based on PRIQ for prioritization of VoIP traffic (3 queues only). I have NOT YET started the "problem solving exclusion route" actually due to lack of time. Also I experienced this problem before I added traffic shaping. So I don't expect this to help much.

            -flo-

            RRD_memory.png
            RRD_memory.png_thumb
            RRD_mbuf.png
            RRD_mbuf.png_thumb

            1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              Running out of mbufs will definitely cause you to not be able to access the box. Increase those before you try anything else:
              https://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards#mbuf_.2F_nmbclusters

              Steve

              1 Reply Last reply Reply Quote 0
              • G
                georgeman
                last edited by

                Do you have a WLAN card by chance? ath? And the shaper active on it??

                I had the very same problem a couple of years ago, very hard to debug, but the problem was somehow related to the traffic shaper being active on the ath0 interface. You can check out that post here

                I never found a proper solution, but at least identified the cause

                If it ain't broke, you haven't tampered enough with it

                1 Reply Last reply Reply Quote 0
                • -flo- 0-
                  -flo- 0
                  last edited by

                  I have a traffic shaping in place but not on a WLAN. My Alix has only the built in LAN ports.

                  I increased the maximum limit of mbufs now. I have observed an absolute stable and very low amount of mbufs allocated at all times (in the RRD graphs). I'm not expert enough to understand which facts have an influence on used mbufs so it's difficult for me to trace an increase down to specific behavior of hosts in the network.

                  Because the RRD graphs stop to display data on midnight before a crash of my pfSense I did not observe the amount of used mbufs shortly before a crash yet. I changed the RRD backup cycle to one hour now. Maybe the next time I can actually see an increase of used mbufs in the RRD graphs (if this does not occur only within minutes before a crash).

                  Are there know typical scenarios which cause used mbufs to increase dramatically?

                  I rather suspect that something else is eating up memory which has been reserved for mbufs. In other words something else / another process has higher priority when requesting memory than the network processes / the mbufs reservation. Is this possible at all in FreeBSD?

                  -flo-

                  1 Reply Last reply Reply Quote 0
                  • First post
                    Last post
                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.