Kernel Panic on nanobsd - Fatal Double Fault

gerdesj

(I started a prior thread thinking that IPSEC was the cause but I really have no idea, so starting a new thread)

My recently updated box is panicing after between 10 mins and several hours. It's running the current snap, i386. I've managed to pass the serial console through to a Linux VM and then captured the screen output via GNU screen.

I've attached the output from the console - I've not a clue what it means :-[

Cheers
Jon
pfSense-2.2-Pre-Crash-14-Jan-2200.txt

eri--

It seems like you are doing ipsec on WAN and the same traffic sending on ipsec on LAN.
Is that the case?

Anyway seems like a driver or hw issue to me.

gerdesj

@ermal:

It seems like you are doing ipsec on WAN and the same traffic sending on ipsec on LAN.
Is that the case?

Anyway seems like a driver or hw issue to me.

I am doing IPSEC on WAN which is PPPoE over a 802.1Q VLAN on re2. I have multiple LANs - one of which is really a DMZ with external addresses again using 802.1Q. I have multiple IPSEC P2s to enable me to get to that lot from the office.

My previous ISP gave me a single /32 for WAN and then a separate /29 for DMZ/LAN. My current one only give out a single /29 (or /28 if I want). I put one on WAN another on DMZ and the route the remainder for externally facing systems (SIP has never been easier!) This all worked OK up until 2.1.

I suppose that lot has exercised a code path in a driver somewhere. I'm probably not the only person in the world with an APU C1 based system so perhaps others might see this panic eventually. I think a hardware fault is a bit unlikely. Incidentally my switch hasn't seen any frame errors on that port.

I'll reconfigure my WAN to remove the dot1Q (because it's actually unnecessary now) to start with and see how it goes from there.

Cheers
Jon

gerdesj

I disabled IPSEC and got 24 hours uptime, so wherever the bug is, it's almost certainly caused by IPSEC.

I then removed the Phase 2 that connects my external allocation to the other end (this worked OK under 2.1). I don't actually need it nowadays anyway.

I then decided to do some stress testing:

iperf3 across the tunnel with 2 lots of client/server pairs one TCP the other UDP. I fired the contents of /dev/random from my laptop through dd over ssh to dd and a file at the other end. I set up several lots of flood pings at each end, some across the IPSEC tunnels (I have 2) and some intra VLANs. At my end (the 2.2-pre) I also did a download of a large ISO. I have 80/20Mb FTTC (UK) links at each end so I tried to thrash these as best I could bearing mind the relative bandwidths.

I left that lot running for over an hour and neither end skipped a beat. Bear in mind that the other end is a (pfSense 2.1) VM with over 50 IPSEC P2s and 20 odd OpenVPNs some site to site and around 15 VLANs, oh and four WANs. There is a monitoring box watching >500 systems across the country with >1500 services (Icinga) and it didn't see anything untoward. The pfSense there is also running Snort, sending Netflow data to a collector and a few other things and it's one of several VMs in a VMware cluster on not very modern gear (Dell PE2950).

If this APU 1C job is still up tomorrow, I'll consider the problem "works for me" but there is still a bug somewhere. You shouldn't get a panic though usage.

Cheers
Jon

PS I'm pretty impressed at how all of this lot behaved whilst testing. I saw rather a lot of graphs on many systems go up to near line speed for a fair amount of time. Some of them running Windows ::)

gerdesj

Bummer, another crash after all that thrashing with random streams, flood pings and whatever. When it goes quiet, it dies!

Attached is the latest console output on the off chance that someone sees something. I'll re-arrange the WAN interfaces tomorrow to be more as it should be - bin the defunct one.

pfSense-2.2-Pre-Crash-16-Jan-0015.txt

eri--

There is something in your configuration doing loops or similar from what i can tell.

gerdesj

@ermal:

There is something in your configuration doing loops or similar from what i can tell.

Thanks for having a look. I am dismantling leftover bits of config one by one. I recreated my WAN link from scratch last night but to no avail.

Ho hum, one change at a time - test - make another change, wash … rinse ... repeat. Perhaps I'll start from scratch on a new CF card and then I can diff the configs if that works.

gerdesj

I well, I ran out of things to try to get IPSEC working reliably.

Given up and used OpenVPN instead - same functionality but the system stays up. 5 1/2 hours uptime so far which is way more than I managed with IPSEC on 2.2 on this system. I can live with that workaround for now.

Cheers
Jon