pfSense 2.5.0 boxes with WireGuard keep crashing (both!)

z3us

Both of my Pfsense boxes (2.5.0) have the same problem.
The System - General logs are filling up with " kernel matchaddr failed" errors.
Every second it gets another entry.
This happened after setting up a WireGuard tunnel from box 1 to box 2.
That connection is working perfectly fine, but I hate the fact that this message is filling up my logs.

Any ideas how to troubleshoot this? There isn't a seperate WireGuard logfile as far as I can see.
All the settings in the tunnel setup look fine to me and the connection is working great.

Both systems crashed today, here's one of the crashdumps:

Crash report begins. Anonymous machine information:

amd64
12.2-STABLE
FreeBSD 12.2-STABLE d48fb226319(devel-12) pfSense

Crash report details:

No PHP errors found.

Filename: /var/crash/info.0
Dump header from device: /dev/label/swap0
Architecture: amd64
Architecture Version: 4
Dump Length: 157696
Blocksize: 512
Compression: none
Dumptime: Mon Feb 22 14:54:55 2021
Hostname: fire02.crito.info
Magic: FreeBSD Text Dump
Version String: FreeBSD 12.2-STABLE d48fb226319(devel-12) pfSense
Panic String: page fault
Dump Parity: 4140386836
Bounds: 0
Dump Status: good

Filename: /var/crash/textdump.tar.0
ddb.txt06000014000014014733657 7104 ustarrootwheeldb:0:kdb.enter.default> run lockinfo
db:1:lockinfo> show locks
No such command; use "help" to list available commands
db:1:lockinfo> show alllocks
No such command; use "help" to list available commands
db:1:lockinfo> show lockedvnods
Locked vnodes
db:0:kdb.enter.default> show pcpu
cpuid = 3
dynamic pcpu = 0xfffffe007e328380
curthread = 0xfffff8000428c000: pid 0 tid 100033 "if_io_tqg_3"
curpcb = 0xfffff8000428c5a0
fpcurthread = none
idlethread = 0xfffff80004245740: tid 100006 "idle: cpu3"
curpmap = 0xffffffff8368d5a8
tssp = 0xffffffff83717758
commontssp = 0xffffffff83717758
rsp0 = 0xfffffe0025908cc0
kcr3 = 0xffffffffffffffff
ucr3 = 0xffffffffffffffff
scr3 = 0x0
gs32p = 0xffffffff8371df70
ldt = 0xffffffff8371dfb0
tss = 0xffffffff8371dfa0
tlb gen = 72000
curvnet = 0
db:0:kdb.enter.default> bt
Tracing pid 0 tid 100033 td 0xfffff8000428c000
kdb_enter() at kdb_enter+0x37/frame 0xfffffe0025908530
vpanic() at vpanic+0x197/frame 0xfffffe0025908580
panic() at panic+0x43/frame 0xfffffe00259085e0
trap_fatal() at trap_fatal+0x391/frame 0xfffffe0025908640
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0025908690
trap() at trap+0x286/frame 0xfffffe00259087a0
calltrap() at calltrap+0x8/frame 0xfffffe00259087a0
--- trap 0xc, rip = 0xffffffff80e024b5, rsp = 0xfffffe0025908870, rbp = 0xfffffe0025908880 ---
turnstile_broadcast() at turnstile_broadcast+0x45/frame 0xfffffe0025908880
rw_wunlock_hard() at rw_wunlock_hard+0xab/frame 0xfffffe00259088b0
noise_remote_begin_session() at noise_remote_begin_session+0x4aa/frame 0xfffffe0025908a20
wg_softc_handshake_receive() at wg_softc_handshake_receive+0x60b/frame 0xfffffe0025908b20
gtaskqueue_run_locked() at gtaskqueue_run_locked+0x121/frame 0xfffffe0025908b80
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xb6/frame 0xfffffe0025908bb0
fork_exit() at fork_exit+0x7e/frame 0xfffffe0025908bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0025908bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
db:0:kdb.enter.default> ps
pid ppid pgrp uid state wmesg wchan cmd
2778 50539 325 0 S nanslp 0xffffffff836a1041 sleep
97981 1 97981 0 Ss (threaded) dpinger
100165 S uwait 0xfffff80006483900 dpinger
100495 S sbwait 0xfffff8010f720c0c dpinger
100496 S nanslp 0xffffffff836a1040 dpinger
100497 S nanslp 0xffffffff836a1040 dpinger
100498 S accept 0xfffff800c02f0ba0 dpinger
97301 1 97301 0 Ss (threaded) dpinger
100192 S uwait 0xfffff8010f709180 dpinger
100491 S sbwait 0xfffff8010f9a38a4 dpinger
100492 S nanslp 0xffffffff836a1043 dpinger
100493 S nanslp 0xffffffff836a1042 dpinger
100494 S accept 0xfffff800c0320168 dpinger
9240 1 9240 136 Ss select 0xfffff800c0b18ac0 dhcpd
80890 1 80890 0 Ss select 0xfffff8010f08e3c0 bsnmpd
50539 1 325 0 S wait 0xfffff8010f861530 sh
85516 1 85516 0 Ss kqread 0xfffff80006474800 dhcpleases
21287 325 325 0 S accept 0xfffff80006b66168 php-fpm
31159 325 325 0 S accept 0xfffff80006b66168 php-fpm
27761 1 27654 65534 S select 0xfffff8010f768e40 dnsmasq
17894 325 325 0 S accept 0xfffff80006b66168 php-fpm
42894 325 325 0 S accept 0xfffff80006b66168 php-fpm
51244 1 51244 0 Ss+ ttyin 0xfffff8000475b4b0 getty
51054 1 51054 0 Ss+ ttyin 0xfffff800063df0b0 getty
50958 1 50958 0 Ss+ ttyin 0xfffff800063df4b0 getty
50824 1 50824 0 Ss+ ttyin 0xfffff800063df8b0 getty
50813 1 50813 0 Ss+ ttyin 0xfffff800063dfcb0 getty
50730 1 50730 0 Ss+ ttyin 0xfffff800063e00b0 getty
50406 1 50406 0 Ss+ ttyin 0xfffff800063e04b0 getty
50401 1 50401 0 Ss+ ttyin 0xfffff800063e08b0 getty
50131 1 50131 0 Ss+ ttyin 0xfffff800063e0cb0 getty
24247 325 325 0 S accept 0xfffff80006b66168 php-fpm
17810 1 17810 65 Ss select 0xfffff8010f1d5440 dhclient
13796 1 13796 0 Ss select 0xfffff8010f1d5740 dhclient
97169 96763 96763 0 S nanslp 0xffffffff836a1043 minicron
96763 1 96763 0 Ss wait 0xfffff8010f0aba60 minicron
96656 95987 95987 0 S nanslp 0xffffffff836a1041 minicron
95987 1 95987 0 Ss wait 0xfffff8000637e530 minicron
95820 95423 95423 0 S nanslp 0xffffffff836a1042 minicron
95423 1 95423 0 Ss wait 0xfffff80006a39530 minicron
94545 1 94545 0 Ss select 0xfffff80006edf940 syslogd
94338 1 28 0 S+ nanslp 0xffffffff836a1043 php-cgi
46904 1 46904 0 Ss (threaded) ntpd
100117 S select 0xfffff800065769c0 ntpd
45869 45721 45721 0 S kqread 0xfffff800c0443e00 nginx
45769 45721 45721 0 S kqread 0xfffff800c0443400 nginx
45721 1 45721 0 Ss pause 0xfffff80006a355d8 nginx
18911 1 18911 0 Ss bpf 0xfffff80006dbb000 filterlog
6998 1 28 0 S+ (threaded) pcscd
100149 S select 0xfffff8000646f5c0 pcscd
100150 S piperd 0xfffff80006e02000 pcscd
100151 S nanslp 0xffffffff836a1042 pcscd
565 1 565 0 Ss nanslp 0xffffffff836a1043 cron
543 1 543 0 Ss select 0xfffff80006cf1740 devd
366 364 364 0 S kqread 0xfffff8000645d700 check_reload_status
364 1 364 0 Ss kqread 0xfffff800069f4d00 check_reload_status
327 325 325 0 S accept 0xfffff80006b66168 php-fpm
326 325 325 0 S accept 0xfffff80006b66168 php-fpm
325 1 325 0 Ss kqread 0xfffff8000645dd00 php-fpm
67 0 0 0 DL mdwait 0xfffff8000656e000 [md0]
27 0 0 0 DL aldslp 0xffffffff8368dfc0 [ALQ Daemon]
26 0 0 0 DL syncer 0xffffffff836ed508 [syncer]
25 0 0 0 DL vlruwt 0xfffff800045fb530 [vnlru]
24 0 0 0 DL (threaded) [bufdaemon]
100100 D qsleep 0xffffffff836ec810 [bufdaemon]
100106 D - 0xffffffff81e0a900 [bufspacedaemon-0]
100107 D - 0xffffffff81e12bc0 [bufspacedaemon-1]
100112 D sdflush 0xfffff80006571ee8 [/ worker]
23 0 0 0 DL psleep 0xffffffff83715008 [vmdaemon]
22 0 0 0 DL (threaded) [pagedaemon]
100098 D psleep 0xffffffff83709dd8 [dom0]
100104 D launds 0xffffffff83709de4 [laundry: dom0]
100105 D umarcl 0xffffffff811debd0 [uma]
21 0 0 0 DL - 0xffffffff834e5c58 [rand_harvestq]
20 0 0 0 DL pftm 0xffffffff81091e00 [pf purge]
19 0 0 0 DL waiting 0xffffffff836f2878 [sctp_iterator]
18 0 0 0 DL - 0xffffffff836ec21c [soaiod4]
17 0 0 0 DL - 0xffffffff836ec21c [soaiod3]
9 0 0 0 DL - 0xffffffff836ec21c [soaiod2]
8 0 0 0 DL - 0xffffffff836ec21c [soaiod1]
16 0 0 0 DL (threaded) [usb]
100063 D - 0xfffffe0001984440 [usbus0]
100064 D - 0xfffffe0001984498 [usbus0]
100065 D - 0xfffffe00019844f0 [usbus0]
100066 D - 0xfffffe0001984548 [usbus0]
100067 D - 0xfffffe00019845a0 [usbus0]
100070 D - 0xfffffe00003ddcf0 [usbus1]
100071 D - 0xfffffe00003ddd48 [usbus1]
100072 D - 0xfffffe00003ddda0 [usbus1]
100073 D - 0xfffffe00003dddf8 [usbus1]
100074 D - 0xfffffe00003dde50 [usbus1]
100075 D - 0xfffffe002c889cf0 [usbus2]
100076 D - 0xfffffe002c889d48 [usbus2]
100077 D - 0xfffffe002c889da0 [usbus2]
100078 D - 0xfffffe002c889df8 [usbus2]
100079 D - 0xfffffe002c889e50 [usbus2]
7 0 0 0 DL (threaded) [cam]
100061 D - 0xffffffff8339c5c0 [doneq0]
100096 D - 0xffffffff8339c490 [scanner]
6 0 0 0 DL crypto 0xfffff80004381350 [crypto returns 3]
5 0 0 0 DL crypto 0xfffff800043812f0 [crypto returns 2]
4 0 0 0 DL crypto 0xfffff80004381290 [crypto returns 1]
3 0 0 0 DL crypto 0xfffff80004381230 [crypto returns 0]
2 0 0 0 DL crypto_ 0xffffffff83707538 [crypto]
15 0 0 0 DL seqstat 0xfffff80004300088 [sequencer 00]
14 0 0 0 DL (threaded) [geom]
100048 D - 0xffffffff8368c638 [g_event]
100049 D - 0xffffffff8368c640 [g_up]
100050 D - 0xffffffff8368c648 [g_down]
13 0 0 0 DL (threaded) [ng_queue]
100043 D sleep 0xffffffff8330f050 [ng_queue0]
100044 D sleep 0xffffffff8330f050 [ng_queue1]
100045 D sleep 0xffffffff8330f050 [ng_queue2]
100046 D sleep 0xffffffff8330f050 [ng_queue3]
12 0 0 0 WL (threaded) [intr]
100007 I [swi6: Giant taskq]
100010 I [swi5: fast taskq]
100013 I [swi6: task queue]
100034 I [swi3: vm]
100035 I [swi4: clock (0)]
100036 I [swi4: clock (1)]
100037 I [swi4: clock (2)]
100038 I [swi4: clock (3)]
100039 I [swi1: netisr 0]
100040 I [swi1: netisr 1]
100041 I [swi1: netisr 2]
100042 I [swi1: netisr 3]
100062 I [irq268: xhci0]
100068 I [irq19: ahci0]
100069 I [irq18: ehci0 ehci1]
100080 I [irq16: sdhci_pci0]
100081 I [swi0: uart uart]
100089 I [swi1: pf send]
100090 I [swi1: pfsync]
11 0 0 0 RL (threaded) [idle]
100003 Run CPU 0 [idle: cpu0]
100004 Run CPU 1 [idle: cpu1]
100005 Run CPU 2 [idle: cpu2]
100006 CanRun [idle: cpu3]
1 0 1 0 SLs wait 0xfffff8000422c530 [init]
10 0 0 0 DL audit_w 0xffffffff83707b18 [audit]
0 0 0 0 RLs (threaded) [kernel]
100000 D swapin 0xffffffff8368c7e8 [swapper]
100008 D - 0xfffff8000424f900 [thread taskq]
100009 D - 0xfffff8000424f700 [aiod_kick taskq]
100011 D - 0xfffff8000424f200 [kqueue_ctx taskq]
100012 D - 0xfffff8000424f000 [inm_free taskq]
100014 D - 0xfffff8000424ea00 [in6m_free taskq]
100015 D - 0xfffff8000424e800 [linuxkpi_short_wq_0]
100016 D - 0xfffff8000424e800 [linuxkpi_short_wq_1]
100017 D - 0xfffff8000424e800 [linuxkpi_short_wq_2]
100018 D - 0xfffff8000424e800 [linuxkpi_short_wq_3]
100019 D - 0xfffff8000424e800 [linuxkpi_short_wq_4]
100020 D - 0xfffff8000424e200 [linuxkpi_long_wq_0]
100021 D - 0xfffff8000424e200 [linuxkpi_long_wq_1]
100022 D - 0xfffff8000424e200 [linuxkpi_long_wq_2]
100023 D - 0xfffff8000424e200 [linuxkpi_long_wq_3]
100024 D - 0xfffff8000424e200 [linuxkpi_long_wq_4]
100025 D - 0xfffff8000424db00 [softirq_0]
100026 D - 0xfffff8000424d900 [softirq_1]
100027 D - 0xfffff8000424d700 [softirq_2]
100028 D - 0xfffff8000424d500 [softirq_3]
100029 D - 0xfffff8000424d300 [if_config_tqg_0]
100030 D - 0xfffff8000424d100 [if_io_tqg_0]
100031 D - 0xfffff8000421be00 [if_io_tqg_1]
100032 D - 0xfffff8000421bc00 [if_io_tqg_2]
100033 Run CPU 3 [if_io_tqg_3]
100047 D - 0xfffff80004293900 [firmware taskq]
100052 D - 0xfffff80004292200 [crypto_0]
100053 D - 0xfffff80004292200 [crypto_1]
100054 D - 0xfffff80004292200 [crypto_2]
100055 D - 0xfffff80004292200 [crypto_3]
100082 D - 0xfffff80004756200 [mca taskq]
100092 D - 0xfffff800063fb700 [acpi_task_0]
100093 D - 0xfffff800063fb700 [acpi_task_1]
100094 D - 0xfffff800063fb700 [acpi_task_2]
100095 D - 0xfffff800044cd100 [CAM taskq]
100097 D - 0xfffff8000642a700 [mlx4]
db:0:kdb.enter.default> alltrace
...deleted soms parts....
ident pfSense
machine amd64
cpu HAMMER
makeoptions WITH_CTF=1
makeoptions DEBUG=-g
...deleted parts....
device virtio_console
msgbuf.txt06000027767014014733657 7666 ustarrootwheelfailed
<4>matchaddr failed
<4>matchaddr failed
... deleted some of these entries........
<4>matchaddr failed
<4>matchaddr failed
kernel trap 12 with interrupts disabled

Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 03
fault virtual address = 0x30
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80e024b5
stack pointer = 0x28:0xfffffe0025908870
frame pointer = 0x28:0xfffffe0025908880
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = resume, IOPL = 0
current process = 0 (if_io_tqg_3)
trap number = 12
panic: page fault
cpuid = 3
time = 1614002095
KDB: enter: panic
panic.txt0600001214014733657 7143 ustarrootwheelpage faultversion.txt0600006214014733657 7543 ustarrootwheelFreeBSD 12.2-STABLE d48fb226319(devel-12) pfSense

jimp

Is it the same crash backtrace on each system every time?

Do you have any traffic shaping active?

There is only one outstanding (rare) panic I'm aware of with WireGuard but it doesn't match that backtrace.

If you can, please attach the full textdump.tar file(s) from a few crashes here so we can look them over.

z3us

@jimp Thanks so far! No trafficshaping active, already checked that.
I've attached two files from box NMG and four files from box UDN.
UDN crashed 4 times, the other box 2 times.
NMG box is also not receiving a DHCP address on the WAN interface after the reboot right after the crash. I have to disable and enable the DHCP setting on the interface to make it work again.

Thanks a lot in advance, hope you can find the issue![6_1614242663203_UDN_textdump.tar.4](Uploading 100%) [5_1614242663203_UDN_textdump.tar.3](Uploading 100%) [4_1614242663203_UDN_textdump.tar.2](Uploading 100%) [3_1614242663202_UDN_textdump.tar.1](Uploading 100%) [2_1614242663202_UDN_textdump.tar.0](Uploading 100%) [1_1614242663202_NMG_textdump.tar.1](Uploading 100%) [0_1614242663200_NMG_textdump.tar.0](Uploading 100%)

z3us

Both boxes also have these weird log messages every second. This happened right after enabling WireGuard....

Feb 25 09:46:37 kernel matchaddr failed
Feb 25 09:46:36 kernel matchaddr failed
Feb 25 09:46:36 kernel matchaddr failed
Feb 25 09:46:35 kernel matchaddr failed
Feb 25 09:46:35 kernel matchaddr failed
Feb 25 09:46:34 kernel matchaddr failed
Feb 25 09:46:34 kernel matchaddr failed

jimp

Looks like those uploads didn't complete properly, can you try again? Maybe one at a time to be certain they each work?

You might have to rename them so they are named textdump.1.tar instead of textdump.tar.1

z3us

NMG_textdump0.tar

z3us

NMG_textdump1.tar

z3us

UDN_textdump0.tar

z3us

UDN_textdump1.tar

z3us

UDN_textdump2.tar

z3us

UDN_textdump3.tar

z3us

UDN_textdump4.tar

z3us

UDN.zip

z3us

NMG.zip

jimp

OK, all of those did have identical backtraces.

Is there anything special you're doing with these tunnels that might be triggering it? Any services using WireGuard specifically? Or just lots of continued traffic?

I ask because unless we can find a way to reproduce it we can't necessarily be sure we've found and fixed the problem. Thus far I don't think anyone internally has his the particular panic.

jimp

I started an issue on Redmine for this crash:

https://redmine.pfsense.org/issues/11538

z3us

@jimp said in pfSense 2.5.0 boxes with WireGuard keep crashing (both!):

OK, all of those did have identical backtraces.

Is there anything special you're doing with these tunnels that might be triggering it? Any services using WireGuard specifically? Or just lots of continued traffic?

I ask because unless we can find a way to reproduce it we can't necessarily be sure we've found and fixed the problem. Thus far I don't think anyone internally has his the particular panic.

I understand your question, but there is very few traffic on this tunnel. Just some smb (filesharing) en domain controller logins. UDN has a domain controller and NMG has 3 desktops (domain members). I think there's not more than 1GB traffic in a day.

jimp

Is it all regular L3 traffic from one subnet to another? Or could some of it be trying to send broadcast or multicast on the WireGuard interface?

I wouldn't think so, since it can't be bridged and that would typically involve something like Samba running on the firewall (which it can't) but it makes me curious.

Also what entries do you have in "Allowed IPs" on both sides? Is it empty? Or do you have the remote subnets listed?

If you have the Allowed IPs list filled in, could something be trying to route across WireGuard that isn't listed in Allowed IPs?

z3us

@jimp Yes we go from one subnet to another and there's nothing being send to the WireGuard interfaces. On both sides I've entered the remote subnet as the allowed IP. We don't use other subnets than those who are entered on the allowed IP lists.

stafast

I'm actually having this exact same issue. I setup Wireguard to experiment from work to my house to see difference between IPSEC. My work router has been crashing much more than my home router, but they are both crashing with similar crash dump to this. I disabled the interface and WireGuard on both sides, will see if the crashes stop.