Enable IPSEC - crash within 10-30 mins



  • I upgraded my long running home nanobsd i386 that was running 2.1.5 to 2.2 (it's currently on Sunday's snapshot).  I have it linked to my office via IPSEC to another 2.1.5 (VMware VM amd64).

    It has been crashing regularly today and I found that if I disabled the tunnel at the office end then it runs fine.  Enable the tunnel and it crashes.  So thus far I am 99% certain that IPSEC is the problem.

    I've enabled logging to syslog in case anything useful turns up but to get to the logs I need the tunnel up, or wait until I get home!  In the brief windows I had to read them, there has been nothing too obvious.

    I'm not sure how to proceed further here.  I've scanned through this forum but I can't see anything too similar.

    My first job is to really confirm that IPSEC is crashing the box but it doesn't have a screen or a hard disc.  I think I can get a serial console working but from memory I think I installed it without ever seeing the console.  I just wrote the card on my laptop, booted it and connected to the default IP.  Will I see a kernel panic or something useful if I can get at the console?

    Cheers
    Jon



  • Please try a snapshot that will come later on to have all final fixes.



  • After trying a few changes with no success I installed this: Mon Jan 12 03:27:47 CST 2015 which was produced mere hours after I installed the previous one as my initial foray into 2.2.

    2 hours uptime now with IPSEC enabled so it appears fixed.  I see another snap has recently appeared, think I'll pop that on later this evening once things have settled down.

    [Updated] - oh well it lasted about four hours and then crashed again.  I've updated to the latest today - built on Mon Jan 12 11:47:27 CST 2015.



  • Define "crashes", what happens exactly? Do you get a crash report?



  • @cmb:

    Define "crashes", what happens exactly? Do you get a crash report?

    No crash report and by crash it seems to simply reboot itself.  This box is in my attic and has no screen and I don't really want to sit watching it on the end of a serial cable on the off chance it goes when I'm up there.  I could wire the console up to something to record but it tries to boot off my USB to serial convertor if it is plugged in before the CF gets mounted.  Mind you I haven't tried this with the new BSD.

    I am at a bit of a loss as to how to diagnose this.  I'd like to see a real error of some sort that I could get down to trying to resolve but I'm flying blind.  I don't get anything useful in the remote syslogs to indicate to me a pending fault.

    I have various systems (Linux and Windows VMs) available near it with a fair amount of disc which could receive data if you can give me a hint as to how to capture it.

    I'm under the impression that nanobsd keeps the disc read only all the time apart from writing config changes so would not be be able to write a report/core in the event of a crash.  Is there a way around this?

    According to Icinga it's "vanished" twice today so the later snapshots seem to give me more uptime but not the legendary >1000 days that one of my pfSense installs managed before a power cut took them out 8)

    The only thing I see with kern.crit is this but I think its on boot up rather than just before crashing judging by when Icinga marked it as down:

    Jan 13 19:11:11 10.200.200.1 kernel: done.
    Jan 13 19:11:11 10.200.200.1 kernel: done.
    Jan 13 19:11:11 10.200.200.1 kernel:
    Jan 13 19:11:12 10.200.200.1 kernel:
    Jan 13 19:11:13 10.200.200.1 kernel: ..
    Jan 13 19:11:13 10.200.200.1 kernel: 0 addresses deleted.
    Jan 13 19:11:13 10.200.200.1 kernel: done.
    Jan 13 19:11:15 10.200.200.1 kernel: done.
    Jan 13 19:11:24 10.200.200.1 kernel: done.
    Jan 13 19:11:25 10.200.200.1 kernel: done.
    Jan 13 19:11:25 10.200.200.1 kernel: done.
    Jan 13 19:11:25 10.200.200.1 kernel: done.
    Jan 13 19:11:26 10.200.200.1 kernel: ..
    Jan 13 19:11:26 10.200.200.1 kernel: ..
    Jan 13 19:11:27 10.200.200.1 kernel: 0 addresses deleted.
    Jan 13 19:11:27 10.200.200.1 kernel: done
    Jan 13 19:11:46 10.200.200.1 kernel: done.

    Cheers
    Jon



  • Your 2.2 side, that's not a VM from the sounds of it? What's the hardware?



  • @cmb:

    Your 2.2 side, that's not a VM from the sounds of it? What's the hardware?

    It's one of these: http://linitx.com/product/linitx-apu-1c-3nicusbrtc-pfsense-embed-firewall-kit-red/14094 .  It has three Realtek 1GB NICs and the processor reports as AMD G-T40E Processor 2 CPUs: 1 package(s) x 2 core(s).  It's been running <2.2 for about 2 years now.



  • Pretty sure Ermal found and fixed the issue you're seeing there. The snapshot that's building right now will have that change, please try upgrading once it's available.



  • @cmb:

    Pretty sure Ermal found and fixed the issue you're seeing there. The snapshot that's building right now will have that change, please try upgrading once it's available.

    Good grief - you guys must be psychic to diagnose that with what I've provided so far!

    I have this one available at the moment: Tue Jan 13 09:03:18 CST 2015, I'm on GMT so that would imply 13:00 ish here and it's now 21:30 so I'll wait for the next one rather than shoeshine my CF card.

    Thanks
    Jon



  • @gerdesj:

    Good grief - you guys must be psychic to diagnose that with what I've provided so far!

    Just so happened someone else reported what sounds like the same thing, and got us a crash report. So no, not psychic. :)

    @gerdesj:

    I have this one available at the moment: Tue Jan 13 09:03:18 CST 2015, I'm on GMT so that would imply 13:00 ish here and it's now 21:30 so I'll wait for the next one rather than shoeshine my CF card.

    Yeah it's still building, should be done within a few minutes from the time of this post.



  • @cmb:

    Yeah it's still building, should be done within a few minutes from the time of this post.

    It's getting hard to tell the difference between the "crashing" I was experiencing and the newish problem with IPSEC tunnels failing to start at all although I think that should be fixed now I have resaved the config using the button at the top of the IPSEC page which I suspect puts in the results of some changes in the code.

    I see rather a lot of snaps have come out recently and I've been installing them so let's see how it goes. My CF card is getting a good writing to.  Let's call this solved for now and I'll start a new thread with a more useful set of diags if I still have issues.

    Thanks for your comments and hard work, cmb and ermal.
    Jon

    [EDIT] Another crash.  I've managed to boot this box with a serial to USB converter (Prolific) without it hanging so that's an improvement over older versions.  I've passed that through to a Linux VM in VMware are connected it to "screen" with logging switched on.  I can then watch that via ssh and screen on my laptop downstairs as well.  With luck I'll have something better to report back with now I can see the console at last without having to sit in the attic!


Log in to reply