Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Kernel Panic on pfSense+ 24.03-RELEASE

    General pfSense Questions
    4
    19
    751
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • C
      cboenning
      last edited by cboenning

      Hello,

      we recently started to see Kernel Panics (Fatal trap 12: page fault while in kernel mode) on our Netgate 1537 Instances. We're running a HA Pair of them and they both show this behaviour. Currently the "usual primary" is in Persistent CARP Maintenance and the second one took over CARP IPs and is handling traffic as we suspected a bad memory module on the primary instance. This however seems not to be the case as the "usual secondary" is showing the same behaviour.

      Both instances have recently been updated from 23.05.1. The Upgrade on one of the instances failed which was the reason it was re-installed from scratch and upgraded afterwards. This instance does show the same behaviour as the one which was upgraded only.

      Both instances show the following on the "textdump.tar.N" as the last bit of information:

      Fatal trap 12: page fault while in kernel mode
      cpuid = 3; apic id = 03
      fault virtual address	= 0x1c
      fault code		= supervisor read data, page not present
      instruction pointer	= 0x20:0xffffffff80f246e2
      stack pointer	        = 0x0:0xfffffe0084fa7ae0
      frame pointer	        = 0x0:0xfffffe0084fa7b70
      code segment		= base 0x0, limit 0xfffff, type 0x1b
      			= DPL 0, pres 1, long 1, def32 0, gran 1
      processor eflags	= interrupt enabled, resume, IOPL = 0
      current process		= 2 (clock (3))
      rdi: 0000000000000000 rsi: 0000000000000000 rdx: fffffe0084fa7cf8
      rcx: 0000000000000000  r8: 0000000000000578  r9: 0000000000000000
      rax: 0000000000000000 rbx: 0000000000000000 rbp: fffffe0084fa7b70
      r10: 0000000000001388 r11: 00000000a36f15dd r12: 0000000000000000
      r13: 0000000000000578 r14: fffff8003e4a4000 r15: 0000000000000034
      trap number		= 12
      panic: page fault
      cpuid = 3
      time = 1723109708
      KDB: enter: panic
      

      We don't run a bunch of packages on them. I think from custom packages it's just the following: acme (0.8_1), frr (2.0.2_3), lldmd (0.9.11_2), node_exporter (0.18.1_3) and zabbix-agent64 (1.0.6).

      As both instances are showing this behaviour I'd "rule out" hardware issues. Even though the instances have been purchased at the same time so they're equally old but from my past experience it feels to be unlikely it might be a defective part like memory or storage. If considering Hardware failure I'd expect the instance which was the primary instance for an extended period of time to fail first; Not both ones at the same time.

      I'd appreciate if someone could give me a hint what to look out for or how to further diagnose the issue.

      Thank you very much in advance.

      Cheers,
      Christian

      GertjanG keyserK 2 Replies Last reply Reply Quote 0
      • GertjanG
        Gertjan @cboenning
        last edited by

        @cboenning

        HA setup and both start to 'crash' showing identical crash dumps ?
        I agree with you, and I put my bets on a 'software' issue.

        acme : good news ; that one is just a rather innocent PHP scrip and one or two small shell scripts. Runs only ones a day, check your cron tasks when that is.

        lldmd : dono what that is. Ditch it ?!
        node_exporter ? a pfSense package ? Does it contain binaries , If so => remove it for a while.
        zabbix-agent64 : can you live with it for some time ?

        You get my point by now : go bare bone mode for a while.
        If it's the FreeBSD kernel by itself that is doing this ... well .....

        @cboenning said in Kernel Panic on pfSense+ 24.03-RELEASE:

        Both instances have recently been updated from 23.05.1

        But why using an old kernel ? You don't want the more recent one ? ( hint : 24.03 )

        No "help me" PM's please. Use the forum, the community will thank you.
        Edit : and where are the logs ??

        C 1 Reply Last reply Reply Quote 0
        • C
          cboenning @Gertjan
          last edited by

          Hi @Gertjan ,

          I removed the lldpd package which is the one I can live without.
          Others however (zabbix-agent and node_exporter in particular) are integral part of our monitoring infrastructure which I cannot remove for business reasons.

          We are on 24.03-RELEASE (which is when the instances started to misbehave), the mention that we came from 23.05.1 was just a bit of history and the instances worked flawlessly serving ~250 OpenVPN users and terminating a good amount of IPSec Site-to-Site Tunnels (which we use frr for).

          GertjanG 1 Reply Last reply Reply Quote 0
          • GertjanG
            Gertjan @cboenning
            last edited by

            @cboenning said in Kernel Panic on pfSense+ 24.03-RELEASE:

            the mention that we came from 23.05.1

            I was somewhat reading the other way around .. sorry for that.

            Can you post more details about the crash ? The place where it was crashing ?

            No "help me" PM's please. Use the forum, the community will thank you.
            Edit : and where are the logs ??

            C 1 Reply Last reply Reply Quote 0
            • C
              cboenning @Gertjan
              last edited by

              @Gertjan Sure, I'll attach the "ddb.txt" from the textdump.tar

              (I've redacted it for non-kernel processes in the ps output as it contained hostnames but other than that it's complete).

              ddb.txt

              1 Reply Last reply Reply Quote 0
              • keyserK
                keyser Rebel Alliance @cboenning
                last edited by

                @cboenning Just an observation: You mentioned you used FRR for IPsec site-2-site tunnels. FYI there is some major kernel route issues with the FRR package that comes with 24.03:

                https://forum.netgate.com/topic/188603/updating-to-pfsense-24-3-breaks-routing-kernel-routes-now-gone/25

                Could it be the FRR problem that causes Kernel problems in your setup?

                Love the no fuss of using the official appliances :-)

                C 1 Reply Last reply Reply Quote 1
                • C
                  cboenning @keyser
                  last edited by cboenning

                  @keyser I would not want to rule this out. It's the package "I was most afraid of" to upgrade given it bumped from 7.x to 9.x.

                  We don't do "anything funky" through. It's just a bunch of BGP Sessions we're running with Google Cloud VPN. No OSPF/OSPF6, no RIP; In fact we don't redistribute any routes other than "connected" (e.g. static, kernel, ospf/ospfv3). I'll go through the post you mentioned to see if there might be any similarities here.

                  1 Reply Last reply Reply Quote 1
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    Do you have the backtrace? Can you upload the full crash report(s)?

                    https://nc.netgate.com/nextcloud/s/qcT5RXWeyj2rJX3

                    C 1 Reply Last reply Reply Quote 0
                    • C
                      cboenning @stephenw10
                      last edited by

                      @stephenw10 I have uploaded the files as a tar Archive.

                      As a reference:

                      • pfSense-1 is the "usual primary" (currently in persistent CARP maintenance, thus backup), it produced a bunch of crashes out of which one dump was still available.
                      • pfSense-2 is the "current primary" (usually backup), I added 4 dumps to the upload.
                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by stephenw10

                        Ok these are all identical crashes, on both nodes. So that's definitely a software issue.

                        Backtrace:

                        db:1:pfs> bt
                        Tracing pid 2 tid 100097 td 0xfffff80001831740
                        kdb_enter() at kdb_enter+0x33/frame 0xfffffe0084fa28f0
                        panic() at panic+0x43/frame 0xfffffe0084fa2950
                        trap_fatal() at trap_fatal+0x40f/frame 0xfffffe0084fa29b0
                        trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0084fa2a10
                        calltrap() at calltrap+0x8/frame 0xfffffe0084fa2a10
                        --- trap 0xc, rip = 0xffffffff80f246e2, rsp = 0xfffffe0084fa2ae0, rbp = 0xfffffe0084fa2b70 ---
                        tcp_m_copym() at tcp_m_copym+0x62/frame 0xfffffe0084fa2b70
                        tcp_default_output() at tcp_default_output+0x1294/frame 0xfffffe0084fa2d60
                        tcp_timer_rexmt() at tcp_timer_rexmt+0x53c/frame 0xfffffe0084fa2dc0
                        tcp_timer_enter() at tcp_timer_enter+0x101/frame 0xfffffe0084fa2e00
                        softclock_call_cc() at softclock_call_cc+0x12e/frame 0xfffffe0084fa2ec0
                        softclock_thread() at softclock_thread+0xe9/frame 0xfffffe0084fa2ef0
                        fork_exit() at fork_exit+0x7f/frame 0xfffffe0084fa2f30
                        fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0084fa2f30
                        --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
                        

                        Previously that was seen with HAProxy: https://redmine.pfsense.org/issues/15457

                        But you're not running HAProxy.

                        I do note that in each case it appears an OpenVPN instance is unable to service incoming requests:

                        <7>sonewconn: pcb 0xfffff8002a4cd400 (local:/var/etc/openvpn/server2/sock): Listen queue overflow: 2 already in queue awaiting acceptance (1 occurrences), euid 0, rgid 0, jail 0
                        

                        Do you have OpenVPN servers running TCP?

                        C 1 Reply Last reply Reply Quote 0
                        • C
                          cboenning @stephenw10
                          last edited by

                          @stephenw10 Yes, we're running two OpenVPN Servers on TCP. One is a pretty boring 12 Clients instance while the other one is one of our 2 primary VPN Services. Both Servers (one on UDP the other one - this one "server2" - on TCP).

                          Both usually serve around 120-150 users throughout the day while I cannot really tell how many users are connected at the point in time where the unit panics.

                          1 Reply Last reply Reply Quote 0
                          • stephenw10S
                            stephenw10 Netgate Administrator
                            last edited by

                            Hmm, but at least one of those servers is UDP only?

                            C 1 Reply Last reply Reply Quote 0
                            • C
                              cboenning @stephenw10
                              last edited by

                              @stephenw10 yes, we run udp/1194 (server1, 120-150 users), tcp/1194 (server2, 120-150 users) and tcp/1195 (server3, 10-12 users).

                              1 Reply Last reply Reply Quote 0
                              • stephenw10S
                                stephenw10 Netgate Administrator
                                last edited by

                                Hmm, the OpenVPN message is probably just a symptom then. What do you listening on TCP from?: sockstat -P tcp

                                C 1 Reply Last reply Reply Quote 0
                                • C
                                  cboenning @stephenw10
                                  last edited by

                                  @stephenw10 Full output (with redacted IPs) below, but major listening daemons are:

                                  • nginx (80/18288, from pfSense)
                                  • ssh (12689, from pfSense)
                                  • openvpn (1194)
                                  • isc-dhcpd (520, I switched to kea but it showed same behaviour so I reverted)
                                  • node_exporter (9100)
                                  • frr (179 /26xx)
                                  • zabbix-agent (10050)
                                  [24.03-RELEASE][admin@pfsense-2.domain]/root: sockstat -P tcp
                                  USER     COMMAND    PID   FD  PROTO  LOCAL ADDRESS         FOREIGN ADDRESS
                                  root     sshd       12954 4   tcp4   lan:12689       remote:63592
                                  frr      bgpd       93160 15  tcp6   *:2605                *:*
                                  frr      bgpd       93160 16  tcp4   *:2605                *:*
                                  frr      bgpd       93160 20  tcp6   *:179                 *:*
                                  frr      bgpd       93160 21  tcp4   *:179                 *:*
                                  frr      bgpd       93160 23  tcp4   169.254.254.41:48552  169.254.254.42:179
                                  frr      bgpd       93160 24  tcp4   169.254.254.45:61975  169.254.254.46:179
                                  frr      bgpd       93160 25  tcp4   169.254.254.9:45278   169.254.254.10:179
                                  frr      bgpd       93160 26  tcp4   169.254.254.13:9166   169.254.254.14:179
                                  frr      bgpd       93160 27  tcp4   169.254.254.17:8175   169.254.254.18:179
                                  frr      bgpd       93160 28  tcp4   169.254.254.21:5727   169.254.254.22:179
                                  frr      bgpd       93160 29  tcp4   169.254.254.25:40994  169.254.254.26:179
                                  frr      bgpd       93160 30  tcp4   169.254.254.29:6862   169.254.254.30:179
                                  frr      bgpd       93160 31  tcp4   169.254.254.33:50604  169.254.254.34:179
                                  frr      bgpd       93160 32  tcp4   169.254.254.37:1108   169.254.254.38:179
                                  frr      bgpd       93160 33  tcp4   169.254.254.57:51757  169.254.254.58:179
                                  frr      bgpd       93160 34  tcp4   169.254.254.61:20765  169.254.254.62:179
                                  frr      bgpd       93160 35  tcp4   169.254.254.65:6066   169.254.254.66:179
                                  frr      bgpd       93160 36  tcp4   169.254.254.69:52714  169.254.254.70:179
                                  frr      bgpd       93160 37  tcp4   169.254.254.73:39873  169.254.254.74:179
                                  frr      bgpd       93160 38  tcp4   169.254.254.77:55399  169.254.254.78:179
                                  frr      bgpd       93160 39  tcp4   169.254.254.81:48328  169.254.254.82:179
                                  frr      bgpd       93160 40  tcp4   169.254.254.85:45645  169.254.254.86:179
                                  frr      bgpd       93160 41  tcp4   169.254.254.89:8402   169.254.254.90:179
                                  frr      bgpd       93160 42  tcp4   169.254.254.93:11107  169.254.254.94:179
                                  frr      bgpd       93160 43  tcp4   169.254.254.97:27421  169.254.254.98:179
                                  frr      bgpd       93160 44  tcp4   169.254.254.105:28537 169.254.254.106:179
                                  frr      bgpd       93160 45  tcp4   169.254.254.109:29597 169.254.254.110:179
                                  frr      bgpd       93160 46  tcp4   169.254.255.37:48190  169.254.255.38:179
                                  frr      bgpd       93160 47  tcp4   169.254.255.9:54573   169.254.255.10:179
                                  frr      bgpd       93160 48  tcp4   169.254.255.17:4014   169.254.255.18:179
                                  frr      bgpd       93160 49  tcp4   169.254.255.21:45371  169.254.255.22:179
                                  frr      bgpd       93160 50  tcp4   169.254.255.25:15854  169.254.255.26:179
                                  frr      bgpd       93160 51  tcp4   169.254.255.29:43328  169.254.255.30:179
                                  frr      bgpd       93160 52  tcp4   169.254.255.33:8344   169.254.255.34:179
                                  frr      bgpd       93160 53  tcp4   169.254.255.13:57269  169.254.255.14:179
                                  frr      bgpd       93160 54  tcp4   169.254.255.41:32399  169.254.255.42:179
                                  frr      bgpd       93160 55  tcp4   169.254.255.45:62751  169.254.255.46:179
                                  frr      bgpd       93160 56  tcp4   169.254.255.61:49589  169.254.255.62:179
                                  frr      bgpd       93160 57  tcp4   169.254.255.57:30575  169.254.255.58:179
                                  frr      bgpd       93160 58  tcp4   169.254.255.65:17927  169.254.255.66:179
                                  frr      bgpd       93160 59  tcp4   169.254.255.69:52593  169.254.255.70:179
                                  frr      bgpd       93160 60  tcp4   169.254.255.73:26342  169.254.255.74:179
                                  frr      bgpd       93160 61  tcp4   169.254.255.77:23533  169.254.255.78:179
                                  frr      bgpd       93160 62  tcp4   169.254.255.81:6200   169.254.255.82:179
                                  frr      bgpd       93160 63  tcp4   169.254.255.85:1817   169.254.255.86:179
                                  frr      bgpd       93160 64  tcp4   169.254.255.89:55972  169.254.255.90:179
                                  frr      bgpd       93160 65  tcp4   169.254.255.93:40012  169.254.255.94:179
                                  frr      bgpd       93160 66  tcp4   169.254.255.97:20828  169.254.255.98:179
                                  frr      bgpd       93160 67  tcp4   169.254.255.105:5854  169.254.255.106:179
                                  frr      bgpd       93160 68  tcp4   169.254.255.109:37727 169.254.255.110:179
                                  frr      bgpd       93160 69  tcp4   lan:179         lan-bgp-peer:46520
                                  frr      bgpd       93160 70  tcp4   lan:179         lan-bgp-peer:55180
                                  frr      staticd    92209 9   tcp6   *:2616                *:*
                                  frr      staticd    92209 10  tcp4   *:2616                *:*
                                  frr      mgmtd      91678 12  tcp6   *:2623                *:*
                                  frr      mgmtd      91678 13  tcp4   *:2623                *:*
                                  frr      zebra      90512 20  tcp6   *:2601                *:*
                                  frr      zebra      90512 21  tcp4   *:2601                *:*
                                  zabbix   zabbix_age 59718 4   tcp4   *:10050               *:*
                                  zabbix   zabbix_age 59449 4   tcp4   *:10050               *:*
                                  zabbix   zabbix_age 59331 4   tcp4   *:10050               *:*
                                  zabbix   zabbix_age 59154 4   tcp4   *:10050               *:*
                                  zabbix   zabbix_age 58742 4   tcp4   *:10050               *:*
                                  nobody   node_expor 45522 3   tcp4   lan:9100        *:*
                                  nobody   node_expor 45522 7   tcp4   lan:9100        remote:1285
                                  frr      bfdd       42282 17  tcp6   *:2617                *:*
                                  frr      bfdd       42282 18  tcp4   *:2617                *:*
                                  root     openvpn    84817 6   tcp4   carp-wan:1195    *:*
                                  root     openvpn    84817 12  tcp4   carp-wan:1195    remote:54208
                                  root     openvpn    84817 13  tcp4   carp-wan:1195    remote:64293
                                  root     openvpn    84817 14  tcp4   carp-wan:1195    remote:55553
                                  root     openvpn    84817 15  tcp4   carp-wan:1195    remote:49879
                                  root     openvpn    84817 16  tcp4   carp-wan:1195    remote:49727
                                  root     openvpn    84817 17  tcp4   carp-wan:1195    remote:59337
                                  root     openvpn    84817 18  tcp4   carp-wan:1195    remote:52367
                                  root     openvpn    84817 19  tcp4   carp-wan:1195    remote:61691
                                  root     openvpn    84817 20  tcp4   carp-wan:1195    remote:49299
                                  root     openvpn    84817 21  tcp4   carp-wan:1195    remote:55497
                                  root     openvpn    84817 22  tcp4   carp-wan:1195    remote:56152
                                  root     openvpn    84817 23  tcp4   carp-wan:1195    remote:60493
                                  root     openvpn    63853 6   tcp4   carp-wan:1194    *:*
                                  root     openvpn    63853 12  tcp4   carp-wan:1194    remote:50937
                                  root     openvpn    63853 13  tcp4   carp-wan:1194    remote:25223
                                  root     openvpn    63853 14  tcp4   carp-wan:1194    remote:34077
                                  root     openvpn    63853 15  tcp4   carp-wan:1194    remote:8229
                                  root     openvpn    63853 16  tcp4   carp-wan:1194    remote:49925
                                  root     openvpn    63853 17  tcp4   carp-wan:1194    remote:59427
                                  root     openvpn    63853 18  tcp4   carp-wan:1194    remote:19497
                                  root     openvpn    63853 19  tcp4   carp-wan:1194    remote:53176
                                  root     openvpn    63853 20  tcp4   carp-wan:1194    remote:53941
                                  root     openvpn    63853 21  tcp4   carp-wan:1194    remote:30092
                                  root     openvpn    63853 22  tcp4   carp-wan:1194    remote:61351
                                  root     openvpn    63853 23  tcp4   carp-wan:1194    remote:59472
                                  root     openvpn    63853 24  tcp4   carp-wan:1194    remote:17457
                                  root     openvpn    63853 25  tcp4   carp-wan:1194    remote:10610
                                  root     openvpn    63853 26  tcp4   carp-wan:1194    remote:55283
                                  root     openvpn    63853 27  tcp4   carp-wan:1194    remote:43863
                                  root     openvpn    63853 28  tcp4   carp-wan:1194    remote:51742
                                  root     openvpn    63853 29  tcp4   carp-wan:1194    remote:50180
                                  root     openvpn    63853 30  tcp4   carp-wan:1194    remote:26228
                                  root     openvpn    63853 31  tcp4   carp-wan:1194    remote:55189
                                  root     openvpn    63853 32  tcp4   carp-wan:1194    remote:51027
                                  root     openvpn    63853 33  tcp4   carp-wan:1194    remote:20917
                                  root     openvpn    63853 34  tcp4   carp-wan:1194    remote:49952
                                  root     openvpn    63853 35  tcp4   carp-wan:1194    remote:63507
                                  root     openvpn    63853 36  tcp4   carp-wan:1194    remote:50676
                                  root     openvpn    63853 37  tcp4   carp-wan:1194    remote:63995
                                  root     openvpn    63853 55  tcp4   carp-wan:1194    remote:50970
                                  root     openvpn    63853 88  tcp4   carp-wan:1194    remote:54025
                                  root     openvpn    63853 129 tcp4   carp-wan:1194    remote:49206
                                  dhcpd    dhcpd      75672 11  tcp4   lan:36325       pfsense-1:519
                                  dhcpd    dhcpd      75672 12  tcp4   lan:520         *:*
                                  root     nginx      67598 5   tcp4   *:18288               *:*
                                  root     nginx      67598 6   tcp6   *:18288               *:*
                                  root     nginx      67598 7   tcp4   *:80                  *:*
                                  root     nginx      67598 9   tcp6   *:80                  *:*
                                  root     nginx      67341 5   tcp4   *:18288               *:*
                                  root     nginx      67341 6   tcp6   *:18288               *:*
                                  root     nginx      67341 7   tcp4   *:80                  *:*
                                  root     nginx      67341 9   tcp6   *:80                  *:*
                                  root     nginx      67000 5   tcp4   *:18288               *:*
                                  root     nginx      67000 6   tcp6   *:18288               *:*
                                  root     nginx      67000 7   tcp4   *:80                  *:*
                                  root     nginx      67000 9   tcp6   *:80                  *:*
                                  root     sshd       98253 3   tcp6   *:12689               *:*
                                  root     sshd       98253 4   tcp4   *:12689               *:*
                                  ?        ?          ?     ?   tcp4   carp-wan:1194    remote:57205
                                  [24.03-RELEASE][admin@pfsense-2.domain]/root:
                                  
                                  C 1 Reply Last reply Reply Quote 0
                                  • C
                                    cboenning @cboenning
                                    last edited by

                                    Earlier today I disabled the DHCP Service in pfSense as I can currently live without it.

                                    1 Reply Last reply Reply Quote 0
                                    • stephenw10S
                                      stephenw10 Netgate Administrator
                                      last edited by

                                      Great thanks. We have some devs engaged on this now, there are a few users hitting it.

                                      C 1 Reply Last reply Reply Quote 1
                                      • C
                                        cboenning @stephenw10
                                        last edited by

                                        @stephenw10 Thank you.

                                        Feel free to contact me privately in case you need additional details or I can provide anything.

                                        1 Reply Last reply Reply Quote 1
                                        • stephenw10S
                                          stephenw10 Netgate Administrator
                                          last edited by

                                          Bug to track it: https://redmine.pfsense.org/issues/15684

                                          1 Reply Last reply Reply Quote 1
                                          • C cboenning referenced this topic on
                                          • First post
                                            Last post
                                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.