24.03 Crashing...?
-
@stephenw10 - Ok, I realized that I had the 192.168.100.0/24 subnet set on a site-to-site OpenVPN connection. Never occurred to met that this would cause an issue (and I can't recall having any. I don't even remember seeing the messages logged but...)
. I removed the route on the connection and have not see the error logged since (been a hour or so).
Wonder if there's a way to resolve that, the Starlink modem is at 192.168.100.1 and that's not configurable. Not having the subnet on the VPN connection is not the end of the world, but is inconvenient.
Guess I'll have to wait to see if the crash happens again.
-
Mmm, it shouldn't cause a kernel panic. But it might be part of a combination of things.
-
Welp, even with the subnet off the VPN I am still seeing the two messages:
arpresolve: can't allocate llinfo for 100.64.0.1 on em1
arp: 26:12:ac:1a:80:01 is using my IP address 192.168.100.2 on em1!
Interestingly (and I should have checked this before) the 26:12:ac:1a:80:01 mac is Starlink router/modem (which is in bypass mode).
-
I'd guess it uses that IP when it has no uplink so you can configure it.
Seems like you still have that IP address on the firewall though. Try running:
ifconfig
and if it's still present. -
Yeah, it's on the Starlink interface (even after a reboot). Which is odd because I have the option set to ignore DHCP leases from 192.168.100.1... Did something change in 24.03 that's causing it to ignore that option? Would explain why I've not seen that IP on the interface before (as far as I recall) and not had any issues with that subnet on the VPN. Not sure it explains the crash...
em1: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500 description: WAN_Starlink options=4e100bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,RXCSUM_IPV6,TXCSUM_IPV6,HWSTATS,MEXTPG> ether 00:26:55:ec:d9:34 inet 100.122.205.144 netmask 0xffc00000 broadcast 100.127.255.255 inet 192.168.100.2 netmask 0xffffff00 broadcast 192.168.100.255 inet6 fe80::226:55ff:feec:d934%em1 prefixlen 64 scopeid 0x4 media: Ethernet autoselect (1000baseT <full-duplex>) status: active nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
-
Nothing changed as far as I know. Is that actually the DHCP server though?
Check /var/db/dhclient.leases.em1Is that other IP address a VIP on WAN_Starlink?
-
@stephenw10 said in 24.03 Crashing...?:
The 192.168.100.1 address is what all the documentation for pfSense with Starlink tells you to reject leases from because even in bypass mode the modem/router can hand out an IP. /var/db/dhclient.leases.em1 does not seem to show the lease. Although it does have a route to 192.168.100.32.
lease { interface "em1"; fixed-address 100.122.205.144; next-server 10.10.10.10; option subnet-mask 255.192.0.0; option routers 100.64.0.1; option domain-name-servers 1.1.1.1,8.8.8.8; option interface-mtu 1500; option dhcp-lease-time 300; option dhcp-message-type 5; option dhcp-server-identifier 100.64.0.1; option classless-routes 32,192,168,100,1,0,0,0,0,32,34,120,255,244,0,0,0,0,0,100,64,0,1; renew 2 2024/5/21 01:44:47; rebind 2 2024/5/21 01:46:36; expire 2 2024/5/21 01:47:17; } lease { interface "em1"; fixed-address 100.122.205.144; next-server 10.10.10.10; option subnet-mask 255.192.0.0; option routers 100.64.0.1; option domain-name-servers 1.1.1.1,8.8.8.8; option interface-mtu 1500; option dhcp-lease-time 300; option dhcp-message-type 5; option dhcp-server-identifier 100.64.0.1; option classless-routes 32,192,168,100,1,0,0,0,0,32,34,120,255,244,0,0,0,0,0,100,64,0,1; renew 2 2024/5/21 01:44:53; rebind 2 2024/5/21 01:46:42; expire 2 2024/5/21 01:47:23; } lease { interface "em1"; fixed-address 100.122.205.144; next-server 10.10.10.10; option subnet-mask 255.192.0.0; option routers 100.64.0.1; option domain-name-servers 1.1.1.1,8.8.8.8; option interface-mtu 1500; option dhcp-lease-time 300; option dhcp-message-type 5; option dhcp-server-identifier 100.64.0.1; option classless-routes 32,192,168,100,1,0,0,0,0,32,34,120,255,244,0,0,0,0,0,100,64,0,1; renew 2 2024/5/21 01:49:15; rebind 2 2024/5/21 01:51:04; expire 2 2024/5/21 01:51:45; } lease { interface "em1"; fixed-address 100.122.205.144; next-server 10.10.10.10; option subnet-mask 255.192.0.0; option routers 100.64.0.1; option domain-name-servers 1.1.1.1,8.8.8.8; option interface-mtu 1500; option dhcp-lease-time 300; option dhcp-message-type 5; option dhcp-server-identifier 100.64.0.1; option classless-routes 32,192,168,100,1,0,0,0,0,32,34,120,255,244,0,0,0,0,0,100,64,0,1; renew 2 2024/5/21 01:53:48; rebind 2 2024/5/21 01:55:37; expire 2 2024/5/21 01:56:18; } lease { interface "em1"; fixed-address 100.122.205.144; next-server 10.10.10.10; option subnet-mask 255.192.0.0; option routers 100.64.0.1; option domain-name-servers 1.1.1.1,8.8.8.8; option interface-mtu 1500; option dhcp-lease-time 300; option dhcp-message-type 5; option dhcp-server-identifier 100.64.0.1; option classless-routes 32,192,168,100,1,0,0,0,0,32,34,120,255,244,0,0,0,0,0,100,64,0,1; renew 2 2024/5/21 01:58:30; rebind 2 2024/5/21 02:00:19; expire 2 2024/5/21 02:01:00; } lease { interface "em1"; fixed-address 100.122.205.144; next-server 10.10.10.10; option subnet-mask 255.192.0.0; option routers 100.64.0.1; option domain-name-servers 1.1.1.1,8.8.8.8; option interface-mtu 1500; option dhcp-lease-time 300; option dhcp-message-type 5; option dhcp-server-identifier 100.64.0.1; option classless-routes 32,192,168,100,1,0,0,0,0,32,34,120,255,244,0,0,0,0,0,100,64,0,1; renew 2 2024/5/21 02:03:33; rebind 2 2024/5/21 02:05:22; expire 2 2024/5/21 02:06:03; } lease { interface "em1"; fixed-address 100.122.205.144; next-server 10.10.10.10; option subnet-mask 255.192.0.0; option routers 100.64.0.1; option domain-name-servers 1.1.1.1,8.8.8.8; option interface-mtu 1500; option dhcp-lease-time 300; option dhcp-message-type 5; option dhcp-server-identifier 100.64.0.1; option classless-routes 32,192,168,100,1,0,0,0,0,32,34,120,255,244,0,0,0,0,0,100,64,0,1; renew 2 2024/5/21 02:07:54; rebind 2 2024/5/21 02:09:43; expire 2 2024/5/21 02:10:24; } lease { interface "em1"; fixed-address 100.122.205.144; next-server 10.10.10.10; option subnet-mask 255.192.0.0; option routers 100.64.0.1; option domain-name-servers 1.1.1.1,8.8.8.8; option interface-mtu 1500; option dhcp-lease-time 300; option dhcp-message-type 5; option dhcp-server-identifier 100.64.0.1; option classless-routes 32,192,168,100,1,0,0,0,0,32,34,120,255,244,0,0,0,0,0,100,64,0,1; renew 2 2024/5/21 02:12:21; rebind 2 2024/5/21 02:14:10; expire 2 2024/5/21 02:14:51; } lease { interface "em1"; fixed-address 100.122.205.144; next-server 10.10.10.10; option subnet-mask 255.192.0.0; option routers 100.64.0.1; option domain-name-servers 1.1.1.1,8.8.8.8; option interface-mtu 1500; option dhcp-lease-time 300; option dhcp-message-type 5; option dhcp-server-identifier 100.64.0.1; option classless-routes 32,192,168,100,1,0,0,0,0,32,34,120,255,244,0,0,0,0,0,100,64,0,1; renew 2 2024/5/21 02:16:50; rebind 2 2024/5/21 02:18:39; expire 2 2024/5/21 02:19:20; }
Clarification on my part, the 100.64.0.1 address is the Starlink gateway address. The public IP is 100.122.205.144. Starlink uses CGNAT.
-
I'm an idiot. I was looking back though some Starlink pfSense setup info and was reminded that in order to have access to the modem/router in bypass mode you have to setup a VIP on the WAN interface. And you even asked about that... dumb.
Anyway, I removed the VIP so we'll see what happens now. Still, does not explain the crash prior to 24.3 it had been working fine, even with the 192.168.100.0 subnet in the OpenVPN config
-
Ah so 192.168.100.2 was the VIP? That seems reasonable if the modem is usually at 192.168.100.1. And odd that the modem would start using .2 itself.
You could probably use any IP for the VIP in the subnet there though. So maybe 192.168.100.10. If the modem suddenly starts using that too it would confirm some unexpected behaviour.
-
Yep, 192.168.100.2 was the VIP.
The only reason to set it up that way is to have LAN access to the modem via the app. Once it's setup and connected you can access it without the config on pfSense as long as the dish has connectivity to the Starlink network. I've never actually needed the LAN access so I'll just leave it be for now, easy enough to add back in if needed.
No crashes since the last one, I am still seeing
arpresolve: can't allocate llinfo for 100.64.0.1 on em1
in the logs... there are a few posts on here about it and elsewhere. Some people see that logged and it corresponds to an outage on that connection, I don't think I'm seeing that (or at least noticing it). Some suggestions were to change the monitor IP for the connection from the gateway address (100.64.0.1) to something else. Maybe I'll try that.
-
You must be able to ARP for the gateway address in order to send packets to it though. If you're seeing that log entry I would expect to find the entry missing from the ARP table and no traffic possible.
It could be gateway monitoring pings and tripping something on the gateway causing it to stop responding though. In which case monitoring some other external IP would prevent that.
-
I'm seeing a crash every couple weeks ("Fatal trap 12: page fault while in kernel mode") at exactly the same instruction pointer (0xffffffff80f246e2). I'm posting here under the expectation that same rip likely means similar underlying cause.
My setup is somewhat different however. I'm running in a proxmox VM and no CARP on WAN interfaces (but I am using CARP on the internal networks).
I'm also running 24.03-RELEASE (amd64). Other than the occasional crash, the system is normal.
Jun 23 19:25:16 Bouncer kernel: Fatal trap 12: page fault while in kernel mode Jun 23 19:25:16 Bouncer kernel: cpuid = 1; apic id = 01 Jun 23 19:25:16 Bouncer kernel: fault virtual address = 0x1c Jun 23 19:25:16 Bouncer kernel: fault code = supervisor read data, page not present Jun 23 19:25:16 Bouncer kernel: instruction pointer = 0x20:0xffffffff80f246e2 Jun 23 19:25:16 Bouncer kernel: stack pointer = 0x28:0xfffffe005d698ae0 Jun 23 19:25:16 Bouncer kernel: frame pointer = 0x28:0xfffffe005d698b70 Jun 23 19:25:16 Bouncer kernel: code segment = base 0x0, limit 0xfffff, type 0x1b Jun 23 19:25:16 Bouncer kernel: = DPL 0, pres 1, long 1, def32 0, gran 1 Jun 23 19:25:16 Bouncer kernel: processor eflags = interrupt enabled, resume, IOPL = 0 Jun 23 19:25:16 Bouncer kernel: current process = 2 (clock (1)) Jun 23 19:25:16 Bouncer kernel: rdi: 0000000000000000 rsi: 0000000000000000 rdx: fffffe005d698cf8 Jun 23 19:25:16 Bouncer kernel: rcx: 0000000000000000 r8: 000000000000041c r9: 0000000000000000 Jun 23 19:25:16 Bouncer kernel: rax: 0000000000000000 rbx: 0000000000000000 rbp: fffffe005d698b70 Jun 23 19:25:16 Bouncer kernel: r10: 0000000000002014 r11: 000000000000e08c r12: 0000000000000000 Jun 23 19:25:16 Bouncer kernel: r13: 000000000000041c r14: fffff80132b30000 r15: 0000000000000034 Jun 23 19:25:16 Bouncer kernel: trap number = 12 Jun 23 19:25:16 Bouncer kernel: panic: page fault Jun 23 19:25:16 Bouncer kernel: cpuid = 1 Jun 23 19:25:16 Bouncer kernel: time = 1719195652 Jun 23 19:25:16 Bouncer kernel: KDB: enter: panic Jun 23 19:25:16 Bouncer kernel: ---<<BOOT>>--- Jun 22 10:57:51 Bouncer kernel: Fatal trap 12: page fault while in kernel mode Jun 22 10:57:51 Bouncer kernel: cpuid = 2; apic id = 02 Jun 22 10:57:51 Bouncer kernel: fault virtual address = 0x1c Jun 22 10:57:51 Bouncer kernel: fault code = supervisor read data, page not present Jun 22 10:57:51 Bouncer kernel: instruction pointer = 0x20:0xffffffff80f246e2 Jun 22 10:57:51 Bouncer kernel: stack pointer = 0x0:0xfffffe005d693ae0 Jun 22 10:57:51 Bouncer kernel: frame pointer = 0x0:0xfffffe005d693b70 Jun 22 10:57:51 Bouncer kernel: code segment = base 0x0, limit 0xfffff, type 0x1b Jun 22 10:57:51 Bouncer kernel: = DPL 0, pres 1, long 1, def32 0, gran 1 Jun 22 10:57:51 Bouncer kernel: processor eflags = interrupt enabled, resume, IOPL = 0 Jun 22 10:57:51 Bouncer kernel: current process = 2 (clock (2)) Jun 22 10:57:51 Bouncer kernel: rdi: 0000000000000000 rsi: 0000000000000000 rdx: fffffe005d693cf8 Jun 22 10:57:51 Bouncer kernel: rcx: 0000000000000000 r8: 0000000000000157 r9: 0000000000000000 Jun 22 10:57:51 Bouncer kernel: rax: 0000000000000000 rbx: 0000000000000000 rbp: fffffe005d693b70 Jun 22 10:57:51 Bouncer kernel: r10: 000000000000201c r11: 000000000000e0c4 r12: 0000000000000000 Jun 22 10:57:51 Bouncer kernel: r13: 0000000000000157 r14: fffff8006047c000 r15: 0000000000000034 Jun 22 10:57:51 Bouncer kernel: trap number = 12 Jun 22 10:57:51 Bouncer kernel: panic: page fault Jun 22 10:57:51 Bouncer kernel: cpuid = 2 Jun 22 10:57:51 Bouncer kernel: time = 1719078955 Jun 22 10:57:51 Bouncer kernel: KDB: enter: panic Jun 22 10:57:51 Bouncer kernel: ---<<BOOT>>--- May 16 12:05:02 Bouncer kernel: Fatal trap 12: page fault while in kernel mode May 16 12:05:02 Bouncer kernel: cpuid = 1; apic id = 01 May 16 12:05:02 Bouncer kernel: fault virtual address = 0x1c May 16 12:05:02 Bouncer kernel: fault code = supervisor read data, page not present May 16 12:05:02 Bouncer kernel: instruction pointer = 0x20:0xffffffff80f246e2 May 16 12:05:02 Bouncer kernel: stack pointer = 0x28:0xfffffe005d698ae0 May 16 12:05:02 Bouncer kernel: frame pointer = 0x28:0xfffffe005d698b70 May 16 12:05:02 Bouncer kernel: code segment = base 0x0, limit 0xfffff, type 0x1b May 16 12:05:02 Bouncer kernel: = DPL 0, pres 1, long 1, def32 0, gran 1 May 16 12:05:02 Bouncer kernel: processor eflags = interrupt enabled, resume, IOPL = 0 May 16 12:05:02 Bouncer kernel: current process = 2 (clock (1)) May 16 12:05:02 Bouncer kernel: rdi: 0000000000000000 rsi: 0000000000000000 rdx: fffffe005d698cf8 May 16 12:05:02 Bouncer kernel: rcx: 0000000000000000 r8: 0000000000000564 r9: 0000000000000000 May 16 12:05:02 Bouncer kernel: rax: 0000000000000000 rbx: 0000000000000000 rbp: fffffe005d698b70 May 16 12:05:02 Bouncer kernel: r10: 0000000000002021 r11: 000000000000e0e7 r12: 0000000000000000 May 16 12:05:02 Bouncer kernel: r13: 0000000000000564 r14: fffff8010ff6ba80 r15: 0000000000000034 May 16 12:05:02 Bouncer kernel: trap number = 12 May 16 12:05:02 Bouncer kernel: panic: page fault May 16 12:05:02 Bouncer kernel: cpuid = 1 May 16 12:05:02 Bouncer kernel: time = 1715886225 May 16 12:05:02 Bouncer kernel: KDB: enter: panic May 16 12:05:02 Bouncer kernel: ---<<BOOT>>--- Apr 29 11:50:16 Bouncer kernel: Fatal trap 12: page fault while in kernel mode Apr 29 11:50:16 Bouncer kernel: cpuid = 2; apic id = 02 Apr 29 11:50:16 Bouncer kernel: fault virtual address = 0x1c Apr 29 11:50:16 Bouncer kernel: fault code = supervisor read data, page not present Apr 29 11:50:16 Bouncer kernel: instruction pointer = 0x20:0xffffffff80f246e2 Apr 29 11:50:16 Bouncer kernel: stack pointer = 0x28:0xfffffe005d693ae0 Apr 29 11:50:16 Bouncer kernel: frame pointer = 0x28:0xfffffe005d693b70 Apr 29 11:50:16 Bouncer kernel: code segment = base 0x0, limit 0xfffff, type 0x1b Apr 29 11:50:16 Bouncer kernel: = DPL 0, pres 1, long 1, def32 0, gran 1 Apr 29 11:50:16 Bouncer kernel: processor eflags = interrupt enabled, resume, IOPL = 0 Apr 29 11:50:16 Bouncer kernel: current process = 2 (clock (2)) Apr 29 11:50:16 Bouncer kernel: rdi: 0000000000000000 rsi: 0000000000000000 rdx: fffffe005d693cf8 Apr 29 11:50:16 Bouncer kernel: rcx: 0000000000000000 r8: 0000000000000564 r9: 0000000000000000 Apr 29 11:50:16 Bouncer kernel: rax: 0000000000000000 rbx: 0000000000000000 rbp: fffffe005d693b70 Apr 29 11:50:16 Bouncer kernel: r10: 0000000000002021 r11: 000000000000e0e7 r12: 0000000000000000 Apr 29 11:50:16 Bouncer kernel: r13: 0000000000000564 r14: fffff800510b0000 r15: 0000000000000034 Apr 29 11:50:16 Bouncer kernel: trap number = 12 Apr 29 11:50:16 Bouncer kernel: panic: page fault Apr 29 11:50:16 Bouncer kernel: cpuid = 2 Apr 29 11:50:16 Bouncer kernel: time = 1714416542 Apr 29 11:50:16 Bouncer kernel: KDB: enter: panic Apr 29 11:50:16 Bouncer kernel: ---<<BOOT>>--- Apr 26 14:18:52 Bouncer kernel: Fatal trap 12: page fault while in kernel mode Apr 26 14:18:52 Bouncer kernel: cpuid = 0; apic id = 00 Apr 26 14:18:52 Bouncer kernel: fault virtual address = 0x1c Apr 26 14:18:52 Bouncer kernel: fault code = supervisor read data, page not present Apr 26 14:18:52 Bouncer kernel: instruction pointer = 0x20:0xffffffff80f246e2 Apr 26 14:18:52 Bouncer kernel: stack pointer = 0x0:0xfffffe005d69dae0 Apr 26 14:18:52 Bouncer kernel: frame pointer = 0x0:0xfffffe005d69db70 Apr 26 14:18:52 Bouncer kernel: code segment = base 0x0, limit 0xfffff, type 0x1b Apr 26 14:18:52 Bouncer kernel: = DPL 0, pres 1, long 1, def32 0, gran 1 Apr 26 14:18:52 Bouncer kernel: processor eflags = interrupt enabled, resume, IOPL = 0 Apr 26 14:18:52 Bouncer kernel: current process = 2 (clock (0)) Apr 26 14:18:52 Bouncer kernel: rdi: 0000000000000000 rsi: 0000000000000000 rdx: fffffe005d69dcf8 Apr 26 14:18:52 Bouncer kernel: rcx: 0000000000000000 r8: 0000000000000546 r9: 0000000000000000 Apr 26 14:18:52 Bouncer kernel: rax: 0000000000000000 rbx: 0000000000000000 rbp: fffffe005d69db70 Apr 26 14:18:52 Bouncer kernel: r10: 000000000000201c r11: 000000000000e0c4 r12: 0000000000000000 Apr 26 14:18:52 Bouncer kernel: r13: 0000000000000546 r14: fffff800059ce540 r15: 0000000000000034 Apr 26 14:18:52 Bouncer kernel: trap number = 12 Apr 26 14:18:52 Bouncer kernel: panic: page fault Apr 26 14:18:52 Bouncer kernel: cpuid = 0 Apr 26 14:18:52 Bouncer kernel: time = 1714166019 Apr 26 14:18:52 Bouncer kernel: KDB: enter: panic Apr 26 14:18:52 Bouncer kernel: ---<<BOOT>>---
I've saved the last two crash reports if that helps.
VM config:
- 3GB RAM, balloon=0
- Processors: 1 socket, 4 cores, kvm64
- BIOS: UEFI
- Machine: q35
- 3 Network devices.
- Two are virtio
- One PCI passthrough of Intel(R) I219-V SPT-H(2).
This system is part of an HA pair so the reboot doesn't have a big impact on me, but page faults always feel like good things to chase down.
The other system in the HA pair is also a VM under proxmox and has not crashed since the last release was installed.
Let me know if I can help further.
-
Same backtrace?
Are you also running HAProxy? If so update the HAProxy package. There is an update to address a known bug causing that panic.
-
@stephenw10, damn you're fast. Yes I'm also running HAproxy and yes there was a pending update I hadn't noticed. I have now applied it. I guess the only thing to do now is wait to see if the problem is resolved. Thanks so much for the response!
-
No worries let us know if that fixes it.