• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

kernel panic since 2.5.2.r.20210615.1851

Scheduled Pinned Locked Moved 2.5.2 Release Candidate Snapshots (Retired)
18 Posts 3 Posters 3.3k Views 3 Watching
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M Offline
    mfld LAYER 8 @jimp
    last edited by mfld Jun 17, 2021, 2:23 PM Jun 17, 2021, 2:21 PM

    @jimp It offered me two files to download. Have retained them. Which do you need ? Is it the ddb.txt from textdump.tar.0 ?

    1 Reply Last reply Reply Quote 0
    • J Offline
      jimp Rebel Alliance Developer Netgate
      last edited by Jun 17, 2021, 3:32 PM

      The textdump.tar.<n> file has all of the necessary data inside.

      Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

      Need help fast? Netgate Global Support!

      Do not Chat/PM for help!

      1 Reply Last reply Reply Quote 1
      • J Offline
        jimp Rebel Alliance Developer Netgate
        last edited by Jun 18, 2021, 2:09 PM

        Following up on this, we're still not able to reproduce a panic here but if you can get us the backtrace we can try to locate the cause and fix it.

        Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

        Need help fast? Netgate Global Support!

        Do not Chat/PM for help!

        M 1 Reply Last reply Jun 20, 2021, 6:57 AM Reply Quote 0
        • M Offline
          mfld LAYER 8 @jimp
          last edited by mfld Jun 20, 2021, 7:00 AM Jun 20, 2021, 6:57 AM

          @jimp Ok I built a new box and made sure it gets load, issue can be replicated there, too.

          alt text

          Have collected data but I cannot DM you. Where shall I send it.

          1 Reply Last reply Reply Quote 0
          • J Offline
            jimp Rebel Alliance Developer Netgate
            last edited by jimp Jun 20, 2021, 3:20 PM Jun 20, 2021, 3:20 PM

            The textdump.tar.X file wouldn't contain anything sensitive, you can post it here. Or open it in something like 7-zip and post the ddb.txt and the panic info from the end of msgbuf.txt at least.

            Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

            Need help fast? Netgate Global Support!

            Do not Chat/PM for help!

            M 1 Reply Last reply Jun 22, 2021, 8:54 AM Reply Quote 0
            • M Offline
              mfld LAYER 8 @jimp
              last edited by Jun 22, 2021, 8:54 AM

              @jimp Sorry about the delay.

              So to recap, the pfctl -ss CPU issue was resolved with this upate. At least I thought it was because it stopped falling over early on. But I now see it is still falling over where it handled the workload fine in 2.4.5-p1. Just since the fix it can handle a bit more before it falls over. It also seems to OOM itself now and then crash.

              I can make it fall over by having a 1GB RAM KVM instance set for 100k states receive UDP traffic like NTP or DNS. Once it reaches 40-50k states the pfctl -ss thing makes a comeback and RAM utilization increases exponentially, no longer in proportion with the number of states and it eventually falls over. This oom situation seems new.

              Here are the files you asked for.

              msgbuf.txt ddb.txt
              version.txt panic.txt

              1 Reply Last reply Reply Quote 0
              • J Offline
                jimp Rebel Alliance Developer Netgate
                last edited by Jun 22, 2021, 1:02 PM

                I opened https://redmine.pfsense.org/issues/12069 for this and we'll look into it shortly.

                I can't reproduce a problem here, though my ability to generate large volumes of states is limited. I tried hitting a VM with 512MB RAM and 200k state table and though I could sort of see an inconsistent slowdown with high numbers of states I haven't been able to make it panic.

                It is possible that it legitimately did run out of RAM to contain the state table in your scenario. 100k states would consume about 100MB or so of kernel memory, which is still limited on a VM with 1GB RAM. Check the values in sysctl vm | grep kmem. Kernel memory is not the same as RAM, only a portion of available RAM can be used directly by the kernel.

                Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                Need help fast? Netgate Global Support!

                Do not Chat/PM for help!

                M 1 Reply Last reply Jun 22, 2021, 1:58 PM Reply Quote 1
                • M Offline
                  mfld LAYER 8 @jimp
                  last edited by Jun 22, 2021, 1:58 PM

                  @jimp Thank you so much.

                  For mild stresstesting of pfSense I usually use free tier loader.io against haproxy-devel or simply use dnsperf against unbound. If I run it repeateadly I can conjure up enough states to make an impact :)

                  Bump up ulimit of client machine, then

                  resperf-report -d queryfile-example-10million-201202 -C 20000 -s <ip of pfsense>

                  Doing this repeatedly in short succession I can get to:

                  State table size
                  50% (50013/100000)

                  At this point system load and memory utilization in top are uncharacteristically high vs. 2.4.5-p1 and things start falling to pieces.

                  Running

                  sysctl vm | grep kmem

                  at that time:

                  vm.uma_kmem_total: 528101376
                  vm.uma_kmem_limit: 977170432
                  vm.kmem_map_free: 449069056
                  vm.kmem_map_size: 528101376
                  vm.kmem_size_scale: 1
                  vm.kmem_size_max: 1319413950874
                  vm.kmem_size_min: 0
                  vm.kmem_zmax: 65536
                  vm.kmem_size: 977170432
                  

                  Query file for dnsperf for anyone interested is over here

                  1 Reply Last reply Reply Quote 1
                  • J Offline
                    jimp Rebel Alliance Developer Netgate
                    last edited by Jun 23, 2021, 5:17 PM

                    Using a combination of that dnsperf test, some nmap scans, and setting the firewall to conservative states I was able to get a test system over 100k states, but I haven't been able to make it panic in pfctl yet. I did run it out of RAM and pfctl was killed, and it did lock up after I stopped testing (might be due to heat), so it was definitely getting hit hard.

                    I was able to panic a VM but not in pfctl, it was in ZFS and likely due to the low RAM and all the disk writes from unbound logging all the failed DNS queries.

                    When I had it up around 50k states, pfctl -ss took about 4.5 minutes to finish. After that I never did see it finish until the RAM ran out and the process was killed. So there is definitely a problem it's just not as easy to trigger as it seems.

                    I'm still making adjustments and running some tests here yet. I'll update the Redmine issue with my results soon.

                    Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                    Need help fast? Netgate Global Support!

                    Do not Chat/PM for help!

                    1 Reply Last reply Reply Quote 1
                    • J Offline
                      jimp Rebel Alliance Developer Netgate
                      last edited by Jun 23, 2021, 6:13 PM

                      I was finally able to make it panic on a VM with less RAM. Though it got up to 200k states and stayed there for a while. pfctl -ss was inconsistently slow. Sometimes returns results in a few seconds, other times it takes 30s-5m. Very odd. The system as a whole would become unresponsive near the end, both over the network and on the console.

                      I added notes to https://redmine.pfsense.org/issues/12069 with my findings

                      Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                      Need help fast? Netgate Global Support!

                      Do not Chat/PM for help!

                      M 1 Reply Last reply Jun 24, 2021, 7:32 AM Reply Quote 2
                      • M Offline
                        mfld LAYER 8 @jimp
                        last edited by mfld Jun 24, 2021, 7:33 AM Jun 24, 2021, 7:32 AM

                        @jimp 🤜 🤛

                        Thank you very much!

                        Next up, Regression #11545 🙏 🙏 🙏 🙏 🙏 and life will be worth living again 😁

                        1 Reply Last reply Reply Quote 0
                        • V Offline
                          vjizzle
                          last edited by Jun 24, 2021, 9:35 AM

                          Hi. I upgraded my test vm today to the latest version:

                          2.5.2-RC (amd64)
                          built on Thu Jun 17 17:10:26 EDT 2021
                          FreeBSD 12.2-STABLE

                          All seems well and I am glad that Netgate is actively squashing bugs :). Kernel panic is pretty serious so please let me know if there is anything I can do to test or help. I am running this in a virtual machine so let's squash those bugs! :)

                          1 Reply Last reply Reply Quote 0
                          • J Offline
                            jimp Rebel Alliance Developer Netgate
                            last edited by Jul 1, 2021, 8:52 PM

                            There is a new snapshot up now (2.5.2.r.20210629.1350) which should be much better here. Update and give it a try.

                            We ended up rolling back all those pf changes since they need some work yet, so it's closer to what was in 21.05 and 2.5.1 (but with multi-wan fixed, of course).

                            Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                            Need help fast? Netgate Global Support!

                            Do not Chat/PM for help!

                            M 1 Reply Last reply Jul 2, 2021, 2:23 AM Reply Quote 3
                            • M Offline
                              mfld LAYER 8 @jimp
                              last edited by Jul 2, 2021, 2:23 AM

                              @jimp <3 Love you all. In the beer 🍻 sense, not the romantic 💔 .

                              Turned on my magic traffic faucet and hit a tiny VPS with it:

                              alt text

                              At first glance it seems we are back to previous levels of performance. Strange memory leak is gone (no SWAP is being used)

                              I will go overboard and send ridiculous traffic over the weekend to make it scale states and put it through hell. Will report back.

                              M 1 Reply Last reply Jul 2, 2021, 7:58 AM Reply Quote 1
                              • M Offline
                                mfld LAYER 8 @mfld
                                last edited by Jul 2, 2021, 7:58 AM

                                send-a-medical-team-to-engineering.jpg

                                Zero packets dropped, no panic yet even in the face of wanton abuse :)

                                success.PNG

                                1 Reply Last reply Reply Quote 3
                                • J Offline
                                  jimp Rebel Alliance Developer Netgate
                                  last edited by Jul 2, 2021, 12:10 PM

                                  You call that wanton abuse? :-)

                                  lotsostates.png

                                  (Not on my equipment, but from our test lab)

                                  Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                  Need help fast? Netgate Global Support!

                                  Do not Chat/PM for help!

                                  1 Reply Last reply Reply Quote 8
                                  • First post
                                    Last post
                                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.
                                    [[user:consent.lead]]
                                    [[user:consent.not_received]]