Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Recurring crash 2.4.5-RELEASE-p1

    Scheduled Pinned Locked Moved General pfSense Questions
    10 Posts 3 Posters 1.0k Views 3 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • hp_inkjetH Offline
      hp_inkjet
      last edited by

      Hi,
      I'm having some troubles on one of my pfSense installations, crashing every ~20h

      The instance giving me problems is the primary of my homelab HA cluster (same problems on the secondary if I poweroff the primary), the hypervisor of choice is Proxmox and I'm using virtio as paravirtualized nic (with checksum & offload disabled as best practice describes)

      Looking at the crash report I can see that the current thread at the moment of crash is the virtio irq thread of one of the nics, but my knowledge in reading the crash log unfortunately ends here

      I've deployed other (4) HA cluster on proxmox in the past e no one showed me this kind of behavior

      Can someone more skilled than me suggest which should be the next step to troubleshoot the issue? I've attached the crash report crash-report.zip

      GertjanG 1 Reply Last reply Reply Quote 0
      • stephenw10S Offline
        stephenw10 Netgate Administrator
        last edited by

        Ok you have numerous identical crashes there that all look like this:

        db:0:kdb.enter.default>  show pcpu
        cpuid        = 1
        dynamic pcpu = 0xfffffe01967ae580
        curthread    = 0xfffff80004df9620: pid 12 "irq264: virtio_pci2"
        curpcb       = 0xfffffe00f48efcc0
        fpcurthread  = none
        idlethread   = 0xfffff80004975620: tid 100004 "idle: cpu1"
        curpmap      = 0xffffffff834f1c40
        tssp         = 0xffffffff835a3338
        commontssp   = 0xffffffff835a3338
        rsp0         = 0xfffffe00f48efcc0
        gs32p        = 0xffffffff835a9f90
        ldt          = 0xffffffff835a9fd0
        tss          = 0xffffffff835a9fc0
        tlb gen      = 15068852
        db:0:kdb.enter.default>  bt
        Tracing pid 12 tid 100078 td 0xfffff80004df9620
        kdb_enter() at kdb_enter+0x3b/frame 0xfffffe00f48eef30
        vpanic() at vpanic+0x19b/frame 0xfffffe00f48eef90
        panic() at panic+0x43/frame 0xfffffe00f48eeff0
        trap_pfault() at trap_pfault/frame 0xfffffe00f48ef040
        trap_pfault() at trap_pfault+0x49/frame 0xfffffe00f48ef0a0
        trap() at trap+0x29d/frame 0xfffffe00f48ef1b0
        calltrap() at calltrap+0x8/frame 0xfffffe00f48ef1b0
        --- trap 0xc, rip = 0xffffffff80f9214e, rsp = 0xfffffe00f48ef280, rbp = 0xfffffe00f48ef3c0 ---
        pf_test_state_tcp() at pf_test_state_tcp+0x19ae/frame 0xfffffe00f48ef3c0
        pf_test() at pf_test+0x2112/frame 0xfffffe00f48ef5e0
        pf_check_in() at pf_check_in+0x1d/frame 0xfffffe00f48ef600
        pfil_run_hooks() at pfil_run_hooks+0x90/frame 0xfffffe00f48ef690
        ip_input() at ip_input+0x412/frame 0xfffffe00f48ef720
        netisr_dispatch_src() at netisr_dispatch_src+0xa2/frame 0xfffffe00f48ef770
        ether_demux() at ether_demux+0x15b/frame 0xfffffe00f48ef7a0
        ether_nh_input() at ether_nh_input+0x32c/frame 0xfffffe00f48ef800
        netisr_dispatch_src() at netisr_dispatch_src+0xa2/frame 0xfffffe00f48ef850
        ether_input() at ether_input+0x26/frame 0xfffffe00f48ef870
        vlan_input() at vlan_input+0x215/frame 0xfffffe00f48ef920
        ether_demux() at ether_demux+0x144/frame 0xfffffe00f48ef950
        ether_nh_input() at ether_nh_input+0x32c/frame 0xfffffe00f48ef9b0
        netisr_dispatch_src() at netisr_dispatch_src+0xa2/frame 0xfffffe00f48efa00
        ether_input() at ether_input+0x26/frame 0xfffffe00f48efa20
        vtnet_rxq_eof() at vtnet_rxq_eof+0x7ae/frame 0xfffffe00f48efaf0
        vtnet_rx_vq_intr() at vtnet_rx_vq_intr+0x71/frame 0xfffffe00f48efb20
        intr_event_execute_handlers() at intr_event_execute_handlers+0xe9/frame 0xfffffe00f48efb60
        ithread_loop() at ithread_loop+0xe7/frame 0xfffffe00f48efbb0
        fork_exit() at fork_exit+0x83/frame 0xfffffe00f48efbf0
        fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00f48efbf0
        --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
        db:0:kdb.enter.default>  ps
        

        This is not running a 2.5 development snap so I'm moving it to General for more exposure.

        Steve

        1 Reply Last reply Reply Quote 0
        • GertjanG Offline
          Gertjan @hp_inkjet
          last edited by

          @hp_inkjet said in Recurring crash 2.4.5-RELEASE-p1:

          virtio irq

          Hi,

          First things first : I'm not an expert.
          Still, I guess the advise will be very useful : exclude what isn't really needed.
          You have two choices : go bare metal or change the hyper visor.
          You'll will know if it's the VM environment - or not, and you'll know where to focus on.

          No "help me" PM's please. Use the forum, the community will thank you.
          Edit : and where are the logs ??

          1 Reply Last reply Reply Quote 0
          • stephenw10S Offline
            stephenw10 Netgate Administrator
            last edited by

            What's different about this config than the other installs?

            You looks to have bridges and TAP interfaces here. Are they common to all sites?
            I assume the bridges are connecting the TAP interfaces to local subnets? Other bridge setups in HA can easily go horribly wrong!

            Do you have any traffic shaping enable here? It looks very similar to a previous bug that was AltQ related.

            Steve

            hp_inkjetH 1 Reply Last reply Reply Quote 1
            • hp_inkjetH Offline
              hp_inkjet @stephenw10
              last edited by

              @stephenw10 First of all thank you for your feedback, nothing is really special about this install except for the presence of limiters (up/down on 2 guest networks).
              The bridges connect 2 OVPN S2S to 2 local networks.

              Could I be affected by the AltQ bug?

              Matteo

              1 Reply Last reply Reply Quote 0
              • stephenw10S Offline
                stephenw10 Netgate Administrator
                last edited by

                It would be a new bug if so because this other one was fixed a long time ago: https://redmine.pfsense.org/issues/5473

                Limiters are not AltQ either but the similarity of the back trace makes me thing something must be the same there.

                Can you remove/disable the Limiters long enough to test?

                Steve

                1 Reply Last reply Reply Quote 0
                • hp_inkjetH Offline
                  hp_inkjet
                  last edited by

                  Sure, thank you.

                  I'll report back in a couple of days with the result,
                  Matteo

                  1 Reply Last reply Reply Quote 1
                  • hp_inkjetH Offline
                    hp_inkjet
                    last edited by

                    After one day the problem represented itselfpfcrash.zip

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S Offline
                      stephenw10 Netgate Administrator
                      last edited by stephenw10

                      So still the same identical crash.

                      And that was with limiters disabled? And no AtlQ shaping?

                      1 Reply Last reply Reply Quote 0
                      • hp_inkjetH Offline
                        hp_inkjet
                        last edited by

                        Yes, no limiters or AltQ

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.