ESXi 6.5.0 Guest OS errors…



  • I know this is a bit outside the realm of the PFSense project, but suddenly today after updating to the snapshot built Fri Oct 06 15:36:09 CDT 2017 (has been probably a week since the last time I updated) I have started noticing a message in ESXi 6.5 regarding the Guest OS not being correct.  ESXi doesn't have any options for "FreeBSD 11.1-RELEASE-p1"

    Is this something that can be resolved in PFSense, or is this something introduced with FreeBSD that is going to drive me up a wall?  Highly unlikely that ESXi is going to be updated to reflect every OS patch/version of FreeBSD..?

    Also, I thought 2.4.0 was going to be FreeBSD 11.0 and not 11.1, and that 2.4.1 would be 11.1?


  • Rebel Alliance Global Moderator

    what build version of esxi 6.5 are you running?

    I am running
    built on Tue Oct 03 13:49:31 CDT 2017
    FreeBSD 11.1-RELEASE-p1

    Which is 11.1 and not seeing any such issue. I will update to the latest snap.. But my guess is your running a OLD build of 6.5.. Current is build 6765664, came out oct 5th..  I am running build previous. 6.5.0 Update 1 (Build 5969303).

    What VM version your running 8, 11, 13?


  • Rebel Alliance Global Moderator

    Ok updated to current snap

    2.4.0-RC (amd64)
    built on Fri Oct 06 15:36:09 CDT 2017
    FreeBSD 11.1-RELEASE-p1

    The system is on the latest version.
    Version information updated at Sat Oct 7 5:01:59 CDT 2017

    Not seeing this is my esxi 6.5.. Do you have the open-vm-tools package installed?



  • ESXi Build 5310538, guess I will update that when I am able and see how it goes.

    Strange this has never happened before, even on severely out-dated builds till the more recent PFSense snapshots…  Yes, Open-VM-Tools is installed.

    Will post back soon, hopefully...  Rather dislike touching that ESXi install.

    UPDATE #1:
    OK, I am back...  No dice... Updated and made sure everything is on the latest code from what I can tell.

    ESXi 6.5.0 Update 1 (Build 6765664)
    All three PFSense virtual machines are on VM Version 13
    PFSense has been updated to 2.4.0.r.20171007.0850
    Open-VM-Tools is 10.1.0,1
    Guest OS is set to "Other > FreeBSD (64bit)"

    ESXi is seeing that the OS reports back as "FreeBSD 11.1-RELEASE-p1" instead of "FreeBSD(64bit)"....  This is a rather recent change (like in the last week, in PFSense wasn't happening on a snapshot from a week or so ago), and affects 3 different virtual machines, so why would this impact just me, and not others?

    Update #2:
    Have been noticing performance issues with all three of those VM's now, various glitches where it hangs purely at random before it has ever mounted the file system?  Set to Other 64bit and that seems to have gone away, however, the message is still there.







  • Rebel Alliance Developer Netgate

    I'm on 6.5.0 (Build 5310538) and though I do see that guest OS difference message, I don't have any problems with VMs underperforming or hanging.



  • I am on ESXi 6.5.0  (Build 6765664).  I have to use the old Kernel (Option 5 in GRUB at Boot).  I am currently on the newest Snapshot, built on Mon Oct 09 17:58:12 CDT 2017.  The following picture shows what happens if I try to boot normally.  The system hangs at "Stopped at via_bitblt_text…"




  • Same here, ESXi 6.5.0 latest patch level, and getting same message with 2.4.0 RC builds. Interestingly the VM sometimes doesn't boot (same error as mentioned above by jasonsansone), however a reboot usually helps…


  • Rebel Alliance Developer Netgate

    i just updated my lab box to 6.5.0 Update 1 (Build 6765664) and all of my 2.4/2.4.1 VMs powered up fine. No errors.



  • Any other log, pic, or info I can provide to help?


  • Rebel Alliance Developer Netgate

    Was this a new 2.4.0 install or an upgrade? If it was a new install, what filesystem and partition options were used?



  • I installed 2.4 RC new when it was first released and then restored a config in order to take advantage of ZFS. It’s ZFS on Bios.  I have updated each successive snapshot since the initial RC release.


  • Rebel Alliance Developer Netgate

    I have a mix here, some upgrades, some new installs, most on UFS, one or two on ZFS, I think they're all on GPT/BIOS though. None have issues.

    Any special hardware involved in the hypervisor? Passing anything through? Any special options enabled?



  • Nothing special. It’s a Lenovo Thinkstation C20. Dual Xeon Quad Core with 48GB ECC DDR3. No pci passthrough although VT-d is fully enabled for other guest VM’s. No tweaks or special options in the guest or host.


  • Rebel Alliance Developer Netgate

    Looks like it could be a FreeBSD issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220923 and/or https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=217282

    Seems intermittent and doesn't appear to happen to everyone, not many hits out there for it.

    Maybe something in the video/console settings of the virtual hardware itself.


  • Rebel Alliance Global Moderator

    @Jimp

    So you stated "I'm on 6.5.0 (Build 5310538) and though I do see that guest OS difference message"

    Then you updated to "6.5.0 Update 1 (Build 6765664)"  Did this make the OS message go away?


  • Rebel Alliance Developer Netgate

    @johnpoz:

    @Jimp

    So you stated "I'm on 6.5.0 (Build 5310538) and though I do see that guest OS difference message"

    Then you updated to "6.5.0 Update 1 (Build 6765664)"  Did this make the OS message go away?

    No, that message is still there, and there is no OS option in the ESX settings for the VM to make it match. Not sure if that's something that FreeBSD or VMware is going to have to fix.

    I should also add that the exact same message shows up on a FreeBSD 11.1 VM, so it isn't specific to pfSense. It does go away if you stop the tools guestd daemon, so maybe an update to the open-vm-tools package will eventually fix it.



  • I don't think that warning really matters.  VMWare is simply not properly identifying the Guest OS but you already manually selected FreeBSD, so the drivers and hardware emulation is accurate.  The same occurs when I update to brand new releases of macOS or Ubuntu builds.


  • Rebel Alliance Global Moderator

    I have not seen this OS message.. Curious why some people are and I am not.. Guess I could try a clean install.. I am running current open-vm-tools..

    I am not using zfs that is for sure.. But I could try installing pfsense clean and freebsd clean and see if I can get it to come up..



  • Here is the warning



  • Rebel Alliance Global Moderator

    Yeah I do not get that.. Wow 8 cpus for your pfsense - bit of overkill ;) heheeh



  • @johnpoz:

    Yeah I do not get that.. Wow 8 cpus for your pfsense - bit of overkill ;) heheeh

    Smoke'm if you got'em?


  • Rebel Alliance Global Moderator

    I just bumped pfsense to

    2.4.0-RC (amd64)
    built on Mon Oct 09 17:58:12 CDT 2017
    FreeBSD 11.1-RELEASE-p1

    And now I am getting it.. Hmmmmmm?

    edit:
    I just closed that notice with its little x, and then logged out of esxi, then back in.. And doesn't seem like its coming back??  Huh… Wonder if it only comes up if your watching pfsense boot on esxi?



  • @jimp:

    Looks like it could be a FreeBSD issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220923 and/or https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=217282

    Seems intermittent and doesn't appear to happen to everyone, not many hits out there for it.

    Maybe something in the video/console settings of the virtual hardware itself.

    I'm seeing this vga_bitblt_text() bug as well in VMs.

    I had no success varying most of the guest VM vga configuration - ram from 4M to 256MB, with or without 3d support, etc.

    It occurs in VMs with both legacy bios and EFI bios.

    On the bright side, it is intermittent – about 1/4 of reboots.  It seems to occur early enough that file system damage doesn't seem to be an issue when invoking a reboot in the debugger.



  • Yeup the hangs I mentioned earlier are from that vga_bitblt_text() issue…  Tried all sorts of things...

    One VM is an upgrade from 2.3.x, another was installed using a snapshot image a while back, accepting the defaults all the way through.  The third one is a copy of the first VM.

    The hanging at boot is very random for me, but thankfully it happens before disk access occurs from the looks of things.

    For the most part, the message is an annoyance in my log files, a very big one, the real issue now seems to be more or less vga_bitblt_text() causing a hang which is outside of the hands of PFSense it seems... :(

    Going to have to find a snapshot and use that one till I know this is fixed as this (the hanging issue more specifically now) is (I know it shouldn't be...) in a production environment and we kill everything at night to save money on the light bill and everything auto-starts in the mornings.

    @johnpoz:

    I just closed that notice with its little x, and then logged out of esxi, then back in.. And doesn't seem like its coming back??  Huh… Wonder if it only comes up if your watching pfsense boot on esxi?

    If you switch views a few times, or log in a day later it usually pops right back up, and when viewing your VM's it will keep bouncing between Warning and Normal on the status for it.

    I wonder if whoever maintains the VM Tools Package for PFSense could make modifications to report the "proper" OS?


  • Rebel Alliance Global Moderator

    "we kill everything at night to save money on the light bill and everything auto-starts in the mornings."

    Wow… what a BAD idea that is!  So did you do the math on that?  What exactly is your host drawing at night when cpus are all idle anyway?

    So how many watts this host drawing?  Now how much you pay per kwh, do the math for it being off at night.. What you save a $1.20 a month? ;)  You eat up 10 years worth of saving with 1 oh shit this didn't boot..



  • The closet pulls ~650w minimum at all times, under load, the closet will reach about 1300w, it's not just the host that gets killed at night either.  Switches (that's about 140w alone), 4 satellite modems (between 20w and 45w each depending on weather conditions, signal quality and other factors), and other devices all get powered off shortly after the host shuts down and back on automatically when the host starts back up.

    It also keeps the AC from having to run as much at night even though it's easier to cool at night, it still saves on the bill.

    No need to do the math, real world testing I average between $20 to $45 lower on the bill on months that I kill everything.  While that doesn't seem like a lot per month, over the course of a year, that's a nice savings that can go to other things :)


  • Rebel Alliance Developer Netgate

    @C0RR0SIVE:

    I wonder if whoever maintains the VM Tools Package for PFSense could make modifications to report the "proper" OS?

    The pfSense package only uses the FreeBSD port. This issue also affects FreeBSD, so any fix needs to happen on FreeBSD and then it will make its way into our package after.

    Maybe it just hasn't shown up yet, but I used "Upgrade VM Compatibility" to upgrade two VMs to "VM version 13", or ESX 6.5 level, and they no longer report the version mismatch warning.


  • Rebel Alliance Developer Netgate

    @jimp:

    @C0RR0SIVE:

    I wonder if whoever maintains the VM Tools Package for PFSense could make modifications to report the "proper" OS?

    The pfSense package only uses the FreeBSD port. This issue also affects FreeBSD, so any fix needs to happen on FreeBSD and then it will make its way into our package after.

    Maybe it just hasn't shown up yet, but I used "Upgrade VM Compatibility" to upgrade two VMs to "VM version 13", or ESX 6.5 level, and they no longer report the version mismatch warning.

    Spoke too soon, it just hadn't shown back up yet. Disregard.


  • Rebel Alliance Developer Netgate

    Anyone that can reproduce that vga_bitblt_text() crash relaibly (I still can't), try adding this to your /boot/loader.conf.local:

    debug.debugger_on_panic=0
    

    It won't stop the crash but it should allow the VM to restart itself automatically if it happens, rather than sitting at a debug prompt. Though that would also stop it from gathering panic data if you have an actual crash later. Could add a tunable in the GUI to set it back to 1 at boot time which should be late enough to work around that.


  • Rebel Alliance Developer Netgate

    Another possible workaround is to set kern.vty=sc in /boot/loader.conf.local since this appears to be a race condition in the VT console, according to the FreeBSD bug report.

    I did manage to make one of my VMs crash once, finally.

    I opened a bug report for it here: https://redmine.pfsense.org/issues/7925


  • Rebel Alliance Global Moderator

    "While that doesn't seem like a lot per month, over the course of a year, that's a nice savings that can go to other things :)"

    Are you paying this bill?  Is this your house or a place of business?  Lets call it $50 a month savings.. Or $600 a year… How much does it cost the company when something doesn't come back up and you have people sitting at their desk not able to work?

    "Switches (that's about 140w alone),"

    What switches are you running that pull 140watts? You running POE?  A cisco 3850-48 doesn't even pull 140W under full load.. This doesn't sound like a home setup...


  • Rebel Alliance Developer Netgate

    FYI- There was a patch for the vga/vt race crash in FreeBSD-CURRENT so be brought it in, if we end up having to rebuild 2.4.0 it may end up in the release, otherwise it will be in 2.4.1 which will be very close behind (a week or two at most).



  • The kern.vty=sc workaround worked for me.

    I rebooted a dozen times and didn't see the panic, which is definitely an improvement.

    Thanks so much for the workaround and for incorporating the fix!



  • Smoke'm if you got'em?

    This is a tangent from the original post, but you can actually get worse performance with additional cores.  The way VMware does its scheduling, it will only give a VM a time slice if there are as many cores free as the max defined, regardless of what the VM is currently using.  If you have a 12-core box and have defined pfSense as using 8 of those cores, then it's only going to get time when 8 cores are free – even if pfSense is only using 2 for example.  Depending on load from other VMs, this can delay a VMs scheduling of CPU time in a detrimental way.


  • Rebel Alliance Developer Netgate

    @zxvv:

    The kern.vty=sc workaround worked for me.

    I rebooted a dozen times and didn't see the panic, which is definitely an improvement.

    Thanks so much for the workaround and for incorporating the fix!

    Great news! That's a much better workaround than letting it crash and reboot.



  • @KOM:

    Smoke'm if you got'em?

    This is a tangent from the original post, but you can actually get worse performance with additional cores.  The way VMware does its scheduling, it will only give a VM a time slice if there are as many cores free as the max defined, regardless of what the VM is currently using.  If you have a 12-core box and have defined pfSense as using 8 of those cores, then it's only going to get time when 8 cores are free – even if pfSense is only using 2 for example.  Depending on load from other VMs, this can delay a VMs scheduling of CPU time in a detrimental way.

    Understand, and thank you. I have 16 cores and 48GB RAM for three VM’s. FreeNAS, an Ubuntu web server, and pfSense. The web traffic is minimal and FreeNAS is more static storage. PfSense pulls the greatest load through VPN traffic.  Really, it’s massive overkill for all guests.  Our encryption settings for a site-to-site trunk are maxed out, so I provide the CPU for OpenVPN.



  • jimp, thanks for looking into the crash so much as well as reporting it :)

    So, back to the original posting/question…  I guess we will have to wait for FreeBSD to fix the issue then, thankfully the original issue isn't detrimental.

    Once again, thanks to all, it's much appreciated. :)



  • @jimp:

    Another possible workaround is to set kern.vty=sc in /boot/loader.conf.local since this appears to be a race condition in the VT console, according to the FreeBSD bug report.

    I did manage to make one of my VMs crash once, finally.

    I opened a bug report for it here: https://redmine.pfsense.org/issues/7925

    This bug can also seems to be avoided by using safe mode during install, for those that may be creating a new VM from the ISO.

    Interrupt the boot with <space>, then 6 4 1 1.

    As you said, to make the workaround persist after the install is complete, invoke a shell and run:

    echo set kern.vty=sc >> /boot/loader.conf.local</space>


  • Rebel Alliance Developer Netgate

    @zxvv:

    This bug can also seems to be avoided by using safe mode during install, for those that may be creating a new VM from the ISO.

    Interrupt the boot with <space>, then 6 4 1 1.

    As you said, to make the workaround persist after the install is complete, invoke a shell and run:

    echo set kern.vty=sc >> /boot/loader.conf.local</space>

    If you are going to that trouble, drop to a loader prompt and run:

    set kern.vty=sc
    boot
    

    Also you don't need "set" in /boot/loader.conf.local, just "kern.vty=sc"



  • @jasonsansone:

    I don't think that warning really matters.  VMWare is simply not properly identifying the Guest OS but you already manually selected FreeBSD, so the drivers and hardware emulation is accurate.  The same occurs when I update to brand new releases of macOS or Ubuntu builds.

    I arrived here via Google looking for the answer to this problem and wanted to comment on this because it does matter.

    In ESXi you can't do snapshots (or backups via tools like Veeam) unless ESXi thinks the host is in a consistent state. As long as this message appears, ESXi thinks the host is inconsistent, so no snapshots/backups.