Recurring crash 2.4.5-RELEASE-p1
-
Hi,
I'm having some troubles on one of my pfSense installations, crashing every ~20hThe instance giving me problems is the primary of my homelab HA cluster (same problems on the secondary if I poweroff the primary), the hypervisor of choice is Proxmox and I'm using virtio as paravirtualized nic (with checksum & offload disabled as best practice describes)
Looking at the crash report I can see that the current thread at the moment of crash is the virtio irq thread of one of the nics, but my knowledge in reading the crash log unfortunately ends here
I've deployed other (4) HA cluster on proxmox in the past e no one showed me this kind of behavior
Can someone more skilled than me suggest which should be the next step to troubleshoot the issue? I've attached the crash report crash-report.zip
-
Ok you have numerous identical crashes there that all look like this:
db:0:kdb.enter.default> show pcpu cpuid = 1 dynamic pcpu = 0xfffffe01967ae580 curthread = 0xfffff80004df9620: pid 12 "irq264: virtio_pci2" curpcb = 0xfffffe00f48efcc0 fpcurthread = none idlethread = 0xfffff80004975620: tid 100004 "idle: cpu1" curpmap = 0xffffffff834f1c40 tssp = 0xffffffff835a3338 commontssp = 0xffffffff835a3338 rsp0 = 0xfffffe00f48efcc0 gs32p = 0xffffffff835a9f90 ldt = 0xffffffff835a9fd0 tss = 0xffffffff835a9fc0 tlb gen = 15068852 db:0:kdb.enter.default> bt Tracing pid 12 tid 100078 td 0xfffff80004df9620 kdb_enter() at kdb_enter+0x3b/frame 0xfffffe00f48eef30 vpanic() at vpanic+0x19b/frame 0xfffffe00f48eef90 panic() at panic+0x43/frame 0xfffffe00f48eeff0 trap_pfault() at trap_pfault/frame 0xfffffe00f48ef040 trap_pfault() at trap_pfault+0x49/frame 0xfffffe00f48ef0a0 trap() at trap+0x29d/frame 0xfffffe00f48ef1b0 calltrap() at calltrap+0x8/frame 0xfffffe00f48ef1b0 --- trap 0xc, rip = 0xffffffff80f9214e, rsp = 0xfffffe00f48ef280, rbp = 0xfffffe00f48ef3c0 --- pf_test_state_tcp() at pf_test_state_tcp+0x19ae/frame 0xfffffe00f48ef3c0 pf_test() at pf_test+0x2112/frame 0xfffffe00f48ef5e0 pf_check_in() at pf_check_in+0x1d/frame 0xfffffe00f48ef600 pfil_run_hooks() at pfil_run_hooks+0x90/frame 0xfffffe00f48ef690 ip_input() at ip_input+0x412/frame 0xfffffe00f48ef720 netisr_dispatch_src() at netisr_dispatch_src+0xa2/frame 0xfffffe00f48ef770 ether_demux() at ether_demux+0x15b/frame 0xfffffe00f48ef7a0 ether_nh_input() at ether_nh_input+0x32c/frame 0xfffffe00f48ef800 netisr_dispatch_src() at netisr_dispatch_src+0xa2/frame 0xfffffe00f48ef850 ether_input() at ether_input+0x26/frame 0xfffffe00f48ef870 vlan_input() at vlan_input+0x215/frame 0xfffffe00f48ef920 ether_demux() at ether_demux+0x144/frame 0xfffffe00f48ef950 ether_nh_input() at ether_nh_input+0x32c/frame 0xfffffe00f48ef9b0 netisr_dispatch_src() at netisr_dispatch_src+0xa2/frame 0xfffffe00f48efa00 ether_input() at ether_input+0x26/frame 0xfffffe00f48efa20 vtnet_rxq_eof() at vtnet_rxq_eof+0x7ae/frame 0xfffffe00f48efaf0 vtnet_rx_vq_intr() at vtnet_rx_vq_intr+0x71/frame 0xfffffe00f48efb20 intr_event_execute_handlers() at intr_event_execute_handlers+0xe9/frame 0xfffffe00f48efb60 ithread_loop() at ithread_loop+0xe7/frame 0xfffffe00f48efbb0 fork_exit() at fork_exit+0x83/frame 0xfffffe00f48efbf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00f48efbf0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- db:0:kdb.enter.default> ps
This is not running a 2.5 development snap so I'm moving it to General for more exposure.
Steve
-
@hp_inkjet said in Recurring crash 2.4.5-RELEASE-p1:
virtio irq
Hi,
First things first : I'm not an expert.
Still, I guess the advise will be very useful : exclude what isn't really needed.
You have two choices : go bare metal or change the hyper visor.
You'll will know if it's the VM environment - or not, and you'll know where to focus on. -
What's different about this config than the other installs?
You looks to have bridges and TAP interfaces here. Are they common to all sites?
I assume the bridges are connecting the TAP interfaces to local subnets? Other bridge setups in HA can easily go horribly wrong!Do you have any traffic shaping enable here? It looks very similar to a previous bug that was AltQ related.
Steve
-
@stephenw10 First of all thank you for your feedback, nothing is really special about this install except for the presence of limiters (up/down on 2 guest networks).
The bridges connect 2 OVPN S2S to 2 local networks.Could I be affected by the AltQ bug?
Matteo
-
It would be a new bug if so because this other one was fixed a long time ago: https://redmine.pfsense.org/issues/5473
Limiters are not AltQ either but the similarity of the back trace makes me thing something must be the same there.
Can you remove/disable the Limiters long enough to test?
Steve
-
Sure, thank you.
I'll report back in a couple of days with the result,
Matteo -
After one day the problem represented itselfpfcrash.zip
-
So still the same identical crash.
And that was with limiters disabled? And no AtlQ shaping?
-
Yes, no limiters or AltQ