Random reboots pfSense 2.1.5 VM [SOLVED]



  • I'm experiencing random reboots on my pfSense firewall running as a VM under vSphere 5.5
    Anyone else experienced this problems?

    Earlier this year I got random kernel panics, so there actually was a log (this happened avg 1 time/month)
    Now there is nothing, it just dies and starts itself up and everything is fine for everything between 8h and 1week.

    After the kernel panics I opened a support ticked. They couldn't see any obvius stuff then but I was recommended to upgrade the pfSense from x86 to x64 to match the real hardware better.
    I tried this but wasn't successful so I went back to x86.

    Some info:
    pfSense 2.1.0 x86 2.1.5 x86
    vSphere 5.5.0, 1623387
    HP ProLiant DL385 G7 (latest FW)
    Broadcom NC382i + Intel 82580 (total 8 NICs)
    VM Hardware version 8
    ^ everything on vSphere HCL

    Thank you!



  • My config is similar and I've never seen that.  I would recommend:

    1.  Make a backup of your config via Diagnostics - Backup/Restore
    2.  Install pfSense 2.1.5-i386
    3.  Restore your backup file

    Make sure you snap your VM before doing any of this.  Note that upgrading from i386 to x64 is not recommended, from what I recall.  You can't take an i386 backup file and use it to restore on an x64 config.


  • Rebel Alliance Global Moderator

    there is no reason to run x64 of pfsense unless you are giving it more than 4GB of ram, etc.

    You are old version of pfsense, 2.1.5 i386 is what I would be on.  Your esxi is OLD.. current build is  2143827 came out 10/15, your build is the original release of update1, there have been like 6 patches since then.

    Why do you run hardware version 8, and not 9 or 10?  I can understand not going to 10 if your using free version and the client to manage your esxi host.  But its easy enough to move to 9, just upgrade to 10 and then back it off to 9 in the vmx file.



  • Mysterious.
    I don't see any reason to go to x64 either if memory isn't an issue. And my box uses ~150-200MB.
    ESXi version is "a bit after". Rolling 3.5 is OLD :) I will upgrade but it take some planning to do so when 250 users needs connection.
    Same with pfSense itself of course. And I can't see any release notes in either of the newer ESXi-releases or pfSense-versions that touches random reboots.
    HW version 8 or 9 is king of the hill since it works with everything! But upgrading wouldn't hurt I guess.

    So you say that the following steps would probably solve the random reboots:
    Going to ESXi 5.5-2143827
    Upgrading to pfSense 2.1.5
    Updating vHW to 10



  • don't upgrade to HW10 if you run the FREE esxi hypervisor ( you won't be able to edit the VM after update to 10)

    do you have a virtual CD-driver on the pfsense-VM ? If yes ==> remove it and restart the VM
    see related post here: https://forum.pfsense.org/index.php?topic=82849.0



  • I know, but I'm not.
    Was able to catch som interessting stuff today when this happened:

    Nov 26 08:43:16 kernel: ZFS storage pool version 28
    Nov 26 08:43:16 kernel: ZFS filesystem version 5
    Nov 26 08:43:16 kernel: in /boot/loader.conf.
    Nov 26 08:43:16 kernel: Consider tuning vm.kmem_size and vm.kmem_size_max
    Nov 26 08:43:16 kernel: ZFS WARNING: Recommended minimum kmem_size is 512MB; expect unstable behavior.
    Nov 26 08:43:16 kernel: add "vfs.zfs.prefetch_disable=0" to /boot/loader.conf.
    Nov 26 08:43:16 kernel: ZFS NOTICE: Prefetch is disabled by default on i386 – to enable,
    Nov 26 08:43:16 kernel: WARNING: / was not properly dismounted
    Nov 26 08:43:16 kernel: Trying to mount root from ufs:/dev/da0s1a
    Nov 26 08:43:16 kernel: SMP: AP CPU #2 Launched!
    Nov 26 08:43:16 kernel: SMP: AP CPU #4 Launched!
    Nov 26 08:43:16 kernel: SMP: AP CPU #3 Launched!
    Nov 26 08:43:16 kernel: SMP: AP CPU #5 Launched!
    Nov 26 08:43:16 kernel: SMP: AP CPU #1 Launched!
    Nov 26 08:43:16 kernel: da0: 51200MB (104857600 512 byte sectors: 255H 63S/T 6527C)
    Nov 26 08:43:16 kernel: da0: Command Queueing enabled
    Nov 26 08:43:16 kernel: da0: 320.000MB/s transfers (160.000MHz DT, offset 127, 16bit)
    Nov 26 08:43:16 kernel: da0: <vmware virtual="" disk="" 1.0="">Fixed Direct Access SCSI-2 device
    Nov 26 08:43:16 kernel: da0 at mpt0 bus 0 scbus0 target 0 lun 0

    Could it be a lead? Afaik, ZFS isn't used on pfSense.</vmware>



  • that warning is "normal" … i see it on all my installs
    check the virtual cd-drive ;)



  • @heper:

    that warning is "normal" … i see it on all my installs
    check the virtual cd-drive ;)

    Will do that at lunch time :) thanks for the suggestion. Let's see what happens!



  • Upgraded pfSense to 2.1.5 and removed virtual CD.

    Just had another reboot, so that didn't help :( this is starting to be pretty critical.
    Wonder if x64 will help. But i've never seen that it's "needed" on other systems (like Linux and Windows).

    The ONLY time I've seen Firewalls restart like this was on a virtual Clavister. That was due to a bug in the AES hardware encryption/decryption when VPN connections was made.

    Can this be something related to that? Last times I've noticed that a OpenVPN-connection is done just before the restarts. Will keep an eye on the log.
    Reboots have never happen at night/production time.



  • Things to consider with HW.

    CPU's need micro code updates to fix bugs in the cpu, which are normally delivered via OS updates like windows update and some linux updates.

    If you dont have an OS vm running on your machine which can update the cpu, can you check to see if your cpu needs an update and if so has been patched? https://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=14303

    Likewise if you are running Intel chip's capable of supporting AMT, can you be sure no one is messing with your machine via AMT, the OOB makes for a great back door into people system irrespective of their firewall & other security constraints beit in a data centre or office block.

    Could be a simple ram chip failing if the system is old, or it could be sectors dropping out on the drive which might be causing problems. Thats the problem with random reboots, its not always obvious whats at fault.

    There are also "magic" packets which can be sent out which can also mess with some machines as well, if someone was "playing" with your system, but in all honesty its probably a simple bit of hw failure, maybe a ram chip not seated properly, maybe some dust build up is shorting something (these things are like vacuum cleaners in the wrong places) thats causing the random reboots.



  • I run the latest firmwares from HP. And AMD doesn't have AMT.
    The server is pretty well isolated, so I'm as sure as I can be that this isn't someone messing with it… it should produce som log indication of this as well.

    I've handeled loads of servers, and I've never seen one that has not been able to reported errors in REGECC MEM.
    The RAID0 sould be able to handle disk failures.

    I don't believe that this is something hardware related.


  • Rebel Alliance Global Moderator

    @heper:

    don't upgrade to HW10 if you run the FREE esxi hypervisor ( you won't be able to edit the VM after update to 10)

    Actually this is no longer true..

    Current build of client/esxi allows to edit vmx-10

    Not saying updating fixes the problem, but not working with current is support issue.  What is the first thing any support tells you when you call ;)  If your wanting to track down a bug - why would you track it down on old versions..




  • Current build of client/esxi allows to edit vmx-10

    Thanks for the tip.  I had no idea.  I was still on the original 5.5.0 release of vi-client.  Thank the stars that I don't have to use their annoying web client.  I know the web client is the future, but I find it a PITA to use.



  • @johnpoz:

    there is no reason to run x64 of pfsense unless you are giving it more than 4GB of ram, etc.

    Not true. There likely aren't any functional differences in that case, but 32 bit is a dying breed, every 64 bit capable system should be running 64 bit. FreeNAS and Dragonfly both just put out their last releases with 32 bit support. We'll stop putting out 32 bit releases before too long, maybe a year or two down the road. We do much more testing on 64 bit than 32, and 64 is more widely used, so less chance of issues there. I'm not aware of any architecture-specific issues in 2.2, but if there are any, they're likely 32 bit only.

    @KOM:

    2.  Install pfSense 2.1.5-i386

    No, don't do that, use 64 bit.

    @KOM:

    You can't take an i386 backup file and use it to restore on an x64 config.

    Yes you can, there is nothing architecture-specific in most all configs. The only thing that can be architecture-specific is if you manually set your auto-update URL. Just going to System>Firmware, Updater Settings tab, and verifying you don't have "Use an unofficial server for firmware upgrades" checked will ensure that's not an issue.



  • Yes you can, there is nothing architecture-specific in most all configs.

    Good news.  I was repeating something I had heard from someone else here many months ago.

    Yes you can, there is nothing architecture-specific in most all configs.

    Oh?  Then why is upgrading from 32 to 64 bit not supported?



  • Because that's a re-install anyway.



  • Yea, I'll give x64 another try. But I don't know when I can have that service window.

    Now i crashed again. And yet again this seems to be related to OpenVPN.

    I had a OpenVPN connection @ 09.45, just seconds after it crashed. So this must be OpenVPN related.
    Should I open a ticket?

    EDIT: And again - OpenVPN connection @ 10.12, crash just right after..



  • did you enable hardware crypto by any chance?

    i vaguely remember I once tried this setting on esxi and it resulted in "fatal trap xxx"



  • @heper:

    did you enable hardware crypto by any chance?

    i vaguely remember I once tried this setting on esxi and it resulted in "fatal trap xxx"

    Yep, HW Crypto enabled (BSD Cryptodev engine).
    Disable it now, hope it helps.

    But i love HW decryption :(

    Lets see the result.



  • I think we can mark this as solved for now  ;D
    Since HW-crypto for OpenVPN was turned off, I've not had a single reboot.

    I'd call this a bug.

    Thanks all!


  • Netgate Administrator

    What crypto hardware were you using (or trying to use)? Is ESXi presenting some virtual hardware to the OS the it thinks it can use  with the crypto framework?

    I don't think you've lost anything.  ;)

    Steve



  • This just seem to have started again.
    And I'm not ready to do a 2.2 upgrade just yet. Seems to be too many IPSec related issues, and IPSec is very important here.

    Can't find anything related to the reboots this time, see log 9.48 is the last entry before reboot:

    Jan 30 11:47:24 kernel: Features2=0x80802001 <sse3,cx16,popcnt,hv>Jan 30 11:47:24 kernel: Features=0x1783fbff <fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,mmx,fxsr,sse,sse2,htt>Jan 30 11:47:24 kernel: Origin = "AuthenticAMD" Id = 0x100f91 Family = 10 Model = 9 Stepping = 1
    Jan 30 11:47:24 kernel: CPU: AMD Opteron™ Processor 6128 (1999.86-MHz 686-class CPU)
    Jan 30 11:47:24 kernel: Timecounter "i8254" frequency 1193182 Hz quality 0
    Jan 30 11:47:24 kernel: root@pf2_1_1_i386.pfsense.org:/usr/obj.i386/usr/pfSensesrc/src/sys/pfSense_SMP.8 i386
    Jan 30 11:47:24 kernel: FreeBSD 8.3-RELEASE-p16 #0: Mon Aug 25 08:25:41 EDT 2014
    Jan 30 11:47:24 kernel: FreeBSD is a registered trademark of The FreeBSD Foundation.
    Jan 30 11:47:24 kernel: The Regents of the University of California. All rights reserved.
    Jan 30 11:47:24 kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
    Jan 30 11:47:24 kernel: Copyright (c) 1992-2012 The FreeBSD Project.
    Jan 30 11:47:24 syslogd: kernel boot file is /boot/kernel/kernel
    Jan 30 09:48:59 lighttpd[35746]: (connections.c.137) (warning) close: 13 Connection reset by peer</fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,mmx,fxsr,sse,sse2,htt></sse3,cx16,popcnt,hv>


  • Netgate Administrator

    Did you ever try a 64bit install?

    Steve



  • Try, yes. But didn't get WAN-access to work so reverted to x86 after 45mins.
    But I still havn't been able to find anything that should relate to random reboots on x86 FreeBSD on x86 compatible hardware and x86 compatible hypervisor.



  • @GroundX:

    Try, yes. But didn't get WAN-access to work so reverted to x86 after 45mins.

    In a different VM I'm guessing? Sounds like the usual circumstance when changing MAC on WAN NIC, upstream ARP cache needs flushed (or modem rebooted if cable or DSL service).

    @GroundX:

    But I still havn't been able to find anything that should relate to random reboots on x86 FreeBSD on x86 compatible hardware and x86 compatible hypervisor.

    The only circumstance where I'm aware of that happening is where the OS in the VM is 32 bit, but the VM is set to 64 bit at the hypervisor level.

    Regardless, you're best off with 64 bit. Guessing you just need to make sure to either keep the WAN MAC the same, or do whatever you need to do for your type of Internet service to switch the MAC.



  • Yes, new VM ofc. I talked to my ISP before i tried to change last time about the ARP Cache and they told me "You can just change firewall, nothing needs to be done from our side".

    This VM is ofc set to x86 on hypervisor, and i hate spoofing L2 adresses - wich also - regarding to ISP shouldn't matter.


  • Rebel Alliance Global Moderator

    who said anything about spoofing mac - if you change the mac of the device connected to the modem.  You quite often have to power cycle the modem to clear its old cache and work with the new mac.  Or you can do what I do and use the same mac on your different copies of your pfsense vm (never on at the same time) but this allows to keep the same public IP.  This allows me to test with different versions, etc.  Even if playing with different distro I set the mac the same on the wan interface so I keep the public IP.