Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Increasing mbuf and state table size precede total lockup

    Scheduled Pinned Locked Moved 2.0-RC Snapshot Feedback and Problems - RETIRED
    7 Posts 2 Posters 7.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • C
      clarknova
      last edited by

      2.0-BETA4 (i386)
      built on Wed Sep 8 14:34:56 EDT 2010
      FreeBSD 8.1-RELEASE
      Platform nanobsd (2g)
      net5501 on CF (UDMA)

      I was using the Aug 30 snapshot. Up until 2 days ago my mbuf usage, as reported on the dashboard, was fairly steady, usually around x/3000 or so. State table normally ran around 3000/48000.

      Two days ago I noticed mbuf usage and state table size were much higher, and yesterday higher again, around x/12000 and 11000/48000 respectively. CPU, RAM, disk usage appeared normal (20/23/13).

      Late last night (thank goodness) pfsense became unresponsive: no DNS forwarder, no routing, no sshd, no serial console, no ping response, no web UI. Totally locked as far as I could tell (forgot to check arp response, but I doubt it). I rebooted it and immediately updated to the latest snap.

      I had remote syslogging enabled. These are the final entries immediately before lockup:

      $ grep pfsense /var/log/syslog.1 | less
      
      Sep  8 17:02:01 pfsense kernel: Bump sched buckets to 64 (was 0)
      Sep  8 23:09:51 pfsense ppp: [wan_link0] LCP: no reply to 1 echo request(s)
      Sep  8 23:09:51 pfsense ppp: [wan_link4] LCP: no reply to 1 echo request(s)
      Sep  8 23:09:51 pfsense ppp: [wan_link3] LCP: no reply to 1 echo request(s)
      Sep  8 23:09:51 pfsense ppp: [wan_link1] LCP: no reply to 1 echo request(s)
      Sep  8 23:09:51 pfsense ppp: [wan_link2] LCP: no reply to 1 echo request(s)
      Sep  8 23:09:51 pfsense ppp: [wan_link5] LCP: no reply to 1 echo request(s)
      Sep  8 23:10:01 pfsense ppp: [wan_link4] LCP: no reply to 2 echo request(s)
      Sep  8 23:10:01 pfsense ppp: [wan_link0] LCP: no reply to 2 echo request(s)
      Sep  8 23:10:01 pfsense ppp: [wan_link2] LCP: no reply to 2 echo request(s)
      Sep  8 23:10:01 pfsense ppp: [wan_link1] LCP: no reply to 2 echo request(s)
      Sep  8 23:10:01 pfsense ppp: [wan_link3] LCP: no reply to 2 echo request(s)
      Sep  8 23:10:01 pfsense ppp: [wan_link5] LCP: no reply to 2 echo request(s)
      Sep  8 23:10:11 pfsense ppp: [wan_link0] LCP: no reply to 3 echo request(s)
      Sep  8 23:10:11 pfsense ppp: [wan_link4] LCP: no reply to 3 echo request(s)
      Sep  8 23:10:11 pfsense ppp: [wan_link3] LCP: no reply to 3 echo request(s)
      Sep  8 23:10:11 pfsense ppp: [wan_link1] LCP: no reply to 3 echo request(s)
      Sep  8 23:10:11 pfsense ppp: [wan_link2] LCP: no reply to 3 echo request(s)
      Sep  8 23:10:11 pfsense ppp: [wan_link5] LCP: no reply to 3 echo request(s)
      Sep  8 23:10:21 pfsense ppp: [wan_link4] LCP: no reply to 4 echo request(s)
      Sep  8 23:10:21 pfsense ppp: [wan_link0] LCP: no reply to 4 echo request(s)
      Sep  8 23:10:21 pfsense ppp: [wan_link2] LCP: no reply to 4 echo request(s)
      Sep  8 23:10:21 pfsense ppp: [wan_link1] LCP: no reply to 4 echo request(s)
      Sep  8 23:10:21 pfsense ppp: [wan_link3] LCP: no reply to 4 echo request(s)
      Sep  8 23:10:21 pfsense ppp: [wan_link5] LCP: no reply to 4 echo request(s)
      Sep  8 23:10:31 pfsense ppp: [wan_link0] LCP: no reply to 5 echo request(s)
      Sep  8 23:10:31 pfsense ppp: [wan_link0] LCP: peer not responding to echo requests
      Sep  8 23:10:31 pfsense ppp: [wan_link0] LCP: state change Opened --> Stopping
      Sep  8 23:10:31 pfsense ppp: [wan_link0] Link: Leave bundle "wan"
      
      

      All it's telling me is that my mlppp links all went unresponsive around the same time. The switch and modems did not go offline, and everything was functioning immediately after power-cycling pfsense, so it's fair to say the problem was with pfsense, or at least it was apparently the only victim.

      Today again (on the new snapshot) it appears that mbuf and state table count are steadily increasing, presently:

      
      State table size	 4152/48000
      MBUF Usage	 551 /1155
      
      

      Which is already higher than I'd ever seen prior to 2 days ago.

      Any idea what's going on here, how to diagnose or correct it?

      Thanks.

      db

      1 Reply Last reply Reply Quote 0
      • jimpJ
        jimp Rebel Alliance Developer Netgate
        last edited by

        State table size cause could be found by just looking at the state table, and/or Diagnostics > States Summary

        If it is something causing an unusually high traffic load (e.g. virus) it should stand out in the state summary as one source with many destinations.

        If that is the case, it shouldn't happen quite so easy, but more detail is definitely required to know for sure.

        Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

        Need help fast? Netgate Global Support!

        Do not Chat/PM for help!

        1 Reply Last reply Reply Quote 0
        • C
          clarknova
          last edited by

          Thanks for your reply.

          What about mbuf usage. It is up steadily to 1252 /1920 at present. In a week or so I expect it to lock up again if I don't reboot it preemptively.

          db

          1 Reply Last reply Reply Quote 0
          • jimpJ
            jimp Rebel Alliance Developer Netgate
            last edited by

            mbuf usage is a little harder to judge. Some systems ride close to the max with almost no load (or seem to anyhow) but it doesn't really mean there is a problem.

            Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

            Need help fast? Netgate Global Support!

            Do not Chat/PM for help!

            1 Reply Last reply Reply Quote 0
            • C
              clarknova
              last edited by

              The problem in this case is that pfsense locked up, and the only thing different that I can see is steadily climbing mbuf numbers.

              db

              1 Reply Last reply Reply Quote 0
              • C
                clarknova
                last edited by

                This does not appear to be an issue on the September 13 snapshot.

                db

                1 Reply Last reply Reply Quote 0
                • C
                  clarknova
                  last edited by

                  I spoke too soon. Not only did this continue to be an issue on nanobsd/net5501, but I changed hardware and software version and it continues to be a problem.

                  I'm now running 2.0-BETA4  (i386)
                  built on Thu Oct 14 01:16:12 EDT 2010
                  FreeBSD 8.1-RELEASE-p1

                  on a SM X7SPA-H (Atom D510, 4GB) and seeing the exact same symptoms: mbuf usage increases steadily until uptime reaches approx 7 days, then total hard lockup.

                  db

                  1 Reply Last reply Reply Quote 0
                  • First post
                    Last post
                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.