• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

kernel panic since 2.5.2.r.20210615.1851

2.5.2 Release Candidate Snapshots (Retired)
3
18
2.1k
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J
    jimp Rebel Alliance Developer Netgate
    last edited by Jun 17, 2021, 3:32 PM

    The textdump.tar.<n> file has all of the necessary data inside.

    Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

    Need help fast? Netgate Global Support!

    Do not Chat/PM for help!

    1 Reply Last reply Reply Quote 1
    • J
      jimp Rebel Alliance Developer Netgate
      last edited by Jun 18, 2021, 2:09 PM

      Following up on this, we're still not able to reproduce a panic here but if you can get us the backtrace we can try to locate the cause and fix it.

      Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

      Need help fast? Netgate Global Support!

      Do not Chat/PM for help!

      M 1 Reply Last reply Jun 20, 2021, 6:57 AM Reply Quote 0
      • M
        mfld LAYER 8 @jimp
        last edited by mfld Jun 20, 2021, 7:00 AM Jun 20, 2021, 6:57 AM

        @jimp Ok I built a new box and made sure it gets load, issue can be replicated there, too.

        alt text

        Have collected data but I cannot DM you. Where shall I send it.

        1 Reply Last reply Reply Quote 0
        • J
          jimp Rebel Alliance Developer Netgate
          last edited by jimp Jun 20, 2021, 3:20 PM Jun 20, 2021, 3:20 PM

          The textdump.tar.X file wouldn't contain anything sensitive, you can post it here. Or open it in something like 7-zip and post the ddb.txt and the panic info from the end of msgbuf.txt at least.

          Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

          Need help fast? Netgate Global Support!

          Do not Chat/PM for help!

          M 1 Reply Last reply Jun 22, 2021, 8:54 AM Reply Quote 0
          • M
            mfld LAYER 8 @jimp
            last edited by Jun 22, 2021, 8:54 AM

            @jimp Sorry about the delay.

            So to recap, the pfctl -ss CPU issue was resolved with this upate. At least I thought it was because it stopped falling over early on. But I now see it is still falling over where it handled the workload fine in 2.4.5-p1. Just since the fix it can handle a bit more before it falls over. It also seems to OOM itself now and then crash.

            I can make it fall over by having a 1GB RAM KVM instance set for 100k states receive UDP traffic like NTP or DNS. Once it reaches 40-50k states the pfctl -ss thing makes a comeback and RAM utilization increases exponentially, no longer in proportion with the number of states and it eventually falls over. This oom situation seems new.

            Here are the files you asked for.

            msgbuf.txt ddb.txt
            version.txt panic.txt

            1 Reply Last reply Reply Quote 0
            • J
              jimp Rebel Alliance Developer Netgate
              last edited by Jun 22, 2021, 1:02 PM

              I opened https://redmine.pfsense.org/issues/12069 for this and we'll look into it shortly.

              I can't reproduce a problem here, though my ability to generate large volumes of states is limited. I tried hitting a VM with 512MB RAM and 200k state table and though I could sort of see an inconsistent slowdown with high numbers of states I haven't been able to make it panic.

              It is possible that it legitimately did run out of RAM to contain the state table in your scenario. 100k states would consume about 100MB or so of kernel memory, which is still limited on a VM with 1GB RAM. Check the values in sysctl vm | grep kmem. Kernel memory is not the same as RAM, only a portion of available RAM can be used directly by the kernel.

              Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

              Need help fast? Netgate Global Support!

              Do not Chat/PM for help!

              M 1 Reply Last reply Jun 22, 2021, 1:58 PM Reply Quote 1
              • M
                mfld LAYER 8 @jimp
                last edited by Jun 22, 2021, 1:58 PM

                @jimp Thank you so much.

                For mild stresstesting of pfSense I usually use free tier loader.io against haproxy-devel or simply use dnsperf against unbound. If I run it repeateadly I can conjure up enough states to make an impact :)

                Bump up ulimit of client machine, then

                resperf-report -d queryfile-example-10million-201202 -C 20000 -s <ip of pfsense>

                Doing this repeatedly in short succession I can get to:

                State table size
                50% (50013/100000)

                At this point system load and memory utilization in top are uncharacteristically high vs. 2.4.5-p1 and things start falling to pieces.

                Running

                sysctl vm | grep kmem

                at that time:

                vm.uma_kmem_total: 528101376
                vm.uma_kmem_limit: 977170432
                vm.kmem_map_free: 449069056
                vm.kmem_map_size: 528101376
                vm.kmem_size_scale: 1
                vm.kmem_size_max: 1319413950874
                vm.kmem_size_min: 0
                vm.kmem_zmax: 65536
                vm.kmem_size: 977170432
                

                Query file for dnsperf for anyone interested is over here

                1 Reply Last reply Reply Quote 1
                • J
                  jimp Rebel Alliance Developer Netgate
                  last edited by Jun 23, 2021, 5:17 PM

                  Using a combination of that dnsperf test, some nmap scans, and setting the firewall to conservative states I was able to get a test system over 100k states, but I haven't been able to make it panic in pfctl yet. I did run it out of RAM and pfctl was killed, and it did lock up after I stopped testing (might be due to heat), so it was definitely getting hit hard.

                  I was able to panic a VM but not in pfctl, it was in ZFS and likely due to the low RAM and all the disk writes from unbound logging all the failed DNS queries.

                  When I had it up around 50k states, pfctl -ss took about 4.5 minutes to finish. After that I never did see it finish until the RAM ran out and the process was killed. So there is definitely a problem it's just not as easy to trigger as it seems.

                  I'm still making adjustments and running some tests here yet. I'll update the Redmine issue with my results soon.

                  Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                  Need help fast? Netgate Global Support!

                  Do not Chat/PM for help!

                  1 Reply Last reply Reply Quote 1
                  • J
                    jimp Rebel Alliance Developer Netgate
                    last edited by Jun 23, 2021, 6:13 PM

                    I was finally able to make it panic on a VM with less RAM. Though it got up to 200k states and stayed there for a while. pfctl -ss was inconsistently slow. Sometimes returns results in a few seconds, other times it takes 30s-5m. Very odd. The system as a whole would become unresponsive near the end, both over the network and on the console.

                    I added notes to https://redmine.pfsense.org/issues/12069 with my findings

                    Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                    Need help fast? Netgate Global Support!

                    Do not Chat/PM for help!

                    M 1 Reply Last reply Jun 24, 2021, 7:32 AM Reply Quote 2
                    • M
                      mfld LAYER 8 @jimp
                      last edited by mfld Jun 24, 2021, 7:33 AM Jun 24, 2021, 7:32 AM

                      @jimp 🤜 🤛

                      Thank you very much!

                      Next up, Regression #11545 🙏 🙏 🙏 🙏 🙏 and life will be worth living again 😁

                      1 Reply Last reply Reply Quote 0
                      • V
                        vjizzle
                        last edited by Jun 24, 2021, 9:35 AM

                        Hi. I upgraded my test vm today to the latest version:

                        2.5.2-RC (amd64)
                        built on Thu Jun 17 17:10:26 EDT 2021
                        FreeBSD 12.2-STABLE

                        All seems well and I am glad that Netgate is actively squashing bugs :). Kernel panic is pretty serious so please let me know if there is anything I can do to test or help. I am running this in a virtual machine so let's squash those bugs! :)

                        1 Reply Last reply Reply Quote 0
                        • J
                          jimp Rebel Alliance Developer Netgate
                          last edited by Jul 1, 2021, 8:52 PM

                          There is a new snapshot up now (2.5.2.r.20210629.1350) which should be much better here. Update and give it a try.

                          We ended up rolling back all those pf changes since they need some work yet, so it's closer to what was in 21.05 and 2.5.1 (but with multi-wan fixed, of course).

                          Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                          Need help fast? Netgate Global Support!

                          Do not Chat/PM for help!

                          M 1 Reply Last reply Jul 2, 2021, 2:23 AM Reply Quote 3
                          • M
                            mfld LAYER 8 @jimp
                            last edited by Jul 2, 2021, 2:23 AM

                            @jimp <3 Love you all. In the beer 🍻 sense, not the romantic 💔 .

                            Turned on my magic traffic faucet and hit a tiny VPS with it:

                            alt text

                            At first glance it seems we are back to previous levels of performance. Strange memory leak is gone (no SWAP is being used)

                            I will go overboard and send ridiculous traffic over the weekend to make it scale states and put it through hell. Will report back.

                            M 1 Reply Last reply Jul 2, 2021, 7:58 AM Reply Quote 1
                            • M
                              mfld LAYER 8 @mfld
                              last edited by Jul 2, 2021, 7:58 AM

                              🔒 Log in to view

                              Zero packets dropped, no panic yet even in the face of wanton abuse :)

                              🔒 Log in to view

                              1 Reply Last reply Reply Quote 3
                              • J
                                jimp Rebel Alliance Developer Netgate
                                last edited by Jul 2, 2021, 12:10 PM

                                You call that wanton abuse? :-)

                                🔒 Log in to view

                                (Not on my equipment, but from our test lab)

                                Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                Need help fast? Netgate Global Support!

                                Do not Chat/PM for help!

                                1 Reply Last reply Reply Quote 8
                                • First post
                                  Last post
                                Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.