ESXi 6.5.0 Guest OS errors…
-
I have not seen this OS message.. Curious why some people are and I am not.. Guess I could try a clean install.. I am running current open-vm-tools..
I am not using zfs that is for sure.. But I could try installing pfsense clean and freebsd clean and see if I can get it to come up..
-
Here is the warning
-
Yeah I do not get that.. Wow 8 cpus for your pfsense - bit of overkill ;) heheeh
-
Yeah I do not get that.. Wow 8 cpus for your pfsense - bit of overkill ;) heheeh
Smoke'm if you got'em?
-
I just bumped pfsense to
2.4.0-RC (amd64)
built on Mon Oct 09 17:58:12 CDT 2017
FreeBSD 11.1-RELEASE-p1And now I am getting it.. Hmmmmmm?
edit:
I just closed that notice with its little x, and then logged out of esxi, then back in.. And doesn't seem like its coming back?? Huh… Wonder if it only comes up if your watching pfsense boot on esxi? -
Looks like it could be a FreeBSD issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220923 and/or https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=217282
Seems intermittent and doesn't appear to happen to everyone, not many hits out there for it.
Maybe something in the video/console settings of the virtual hardware itself.
I'm seeing this vga_bitblt_text() bug as well in VMs.
I had no success varying most of the guest VM vga configuration - ram from 4M to 256MB, with or without 3d support, etc.
It occurs in VMs with both legacy bios and EFI bios.
On the bright side, it is intermittent – about 1/4 of reboots. It seems to occur early enough that file system damage doesn't seem to be an issue when invoking a reboot in the debugger.
-
Yeup the hangs I mentioned earlier are from that vga_bitblt_text() issue… Tried all sorts of things...
One VM is an upgrade from 2.3.x, another was installed using a snapshot image a while back, accepting the defaults all the way through. The third one is a copy of the first VM.
The hanging at boot is very random for me, but thankfully it happens before disk access occurs from the looks of things.
For the most part, the message is an annoyance in my log files, a very big one, the real issue now seems to be more or less vga_bitblt_text() causing a hang which is outside of the hands of PFSense it seems... :(
Going to have to find a snapshot and use that one till I know this is fixed as this (the hanging issue more specifically now) is (I know it shouldn't be...) in a production environment and we kill everything at night to save money on the light bill and everything auto-starts in the mornings.
I just closed that notice with its little x, and then logged out of esxi, then back in.. And doesn't seem like its coming back?? Huh… Wonder if it only comes up if your watching pfsense boot on esxi?
If you switch views a few times, or log in a day later it usually pops right back up, and when viewing your VM's it will keep bouncing between Warning and Normal on the status for it.
I wonder if whoever maintains the VM Tools Package for PFSense could make modifications to report the "proper" OS?
-
"we kill everything at night to save money on the light bill and everything auto-starts in the mornings."
Wow… what a BAD idea that is! So did you do the math on that? What exactly is your host drawing at night when cpus are all idle anyway?
So how many watts this host drawing? Now how much you pay per kwh, do the math for it being off at night.. What you save a $1.20 a month? ;) You eat up 10 years worth of saving with 1 oh shit this didn't boot..
-
The closet pulls ~650w minimum at all times, under load, the closet will reach about 1300w, it's not just the host that gets killed at night either. Switches (that's about 140w alone), 4 satellite modems (between 20w and 45w each depending on weather conditions, signal quality and other factors), and other devices all get powered off shortly after the host shuts down and back on automatically when the host starts back up.
It also keeps the AC from having to run as much at night even though it's easier to cool at night, it still saves on the bill.
No need to do the math, real world testing I average between $20 to $45 lower on the bill on months that I kill everything. While that doesn't seem like a lot per month, over the course of a year, that's a nice savings that can go to other things :)
-
I wonder if whoever maintains the VM Tools Package for PFSense could make modifications to report the "proper" OS?
The pfSense package only uses the FreeBSD port. This issue also affects FreeBSD, so any fix needs to happen on FreeBSD and then it will make its way into our package after.
Maybe it just hasn't shown up yet, but I used "Upgrade VM Compatibility" to upgrade two VMs to "VM version 13", or ESX 6.5 level, and they no longer report the version mismatch warning.
-
I wonder if whoever maintains the VM Tools Package for PFSense could make modifications to report the "proper" OS?
The pfSense package only uses the FreeBSD port. This issue also affects FreeBSD, so any fix needs to happen on FreeBSD and then it will make its way into our package after.
Maybe it just hasn't shown up yet, but I used "Upgrade VM Compatibility" to upgrade two VMs to "VM version 13", or ESX 6.5 level, and they no longer report the version mismatch warning.
Spoke too soon, it just hadn't shown back up yet. Disregard.
-
Anyone that can reproduce that vga_bitblt_text() crash relaibly (I still can't), try adding this to your /boot/loader.conf.local:
debug.debugger_on_panic=0
It won't stop the crash but it should allow the VM to restart itself automatically if it happens, rather than sitting at a debug prompt. Though that would also stop it from gathering panic data if you have an actual crash later. Could add a tunable in the GUI to set it back to 1 at boot time which should be late enough to work around that.
-
Another possible workaround is to set kern.vty=sc in /boot/loader.conf.local since this appears to be a race condition in the VT console, according to the FreeBSD bug report.
I did manage to make one of my VMs crash once, finally.
I opened a bug report for it here: https://redmine.pfsense.org/issues/7925
-
"While that doesn't seem like a lot per month, over the course of a year, that's a nice savings that can go to other things :)"
Are you paying this bill? Is this your house or a place of business? Lets call it $50 a month savings.. Or $600 a year… How much does it cost the company when something doesn't come back up and you have people sitting at their desk not able to work?
"Switches (that's about 140w alone),"
What switches are you running that pull 140watts? You running POE? A cisco 3850-48 doesn't even pull 140W under full load.. This doesn't sound like a home setup...
-
FYI- There was a patch for the vga/vt race crash in FreeBSD-CURRENT so be brought it in, if we end up having to rebuild 2.4.0 it may end up in the release, otherwise it will be in 2.4.1 which will be very close behind (a week or two at most).
-
The kern.vty=sc workaround worked for me.
I rebooted a dozen times and didn't see the panic, which is definitely an improvement.
Thanks so much for the workaround and for incorporating the fix!
-
Smoke'm if you got'em?
This is a tangent from the original post, but you can actually get worse performance with additional cores. The way VMware does its scheduling, it will only give a VM a time slice if there are as many cores free as the max defined, regardless of what the VM is currently using. If you have a 12-core box and have defined pfSense as using 8 of those cores, then it's only going to get time when 8 cores are free – even if pfSense is only using 2 for example. Depending on load from other VMs, this can delay a VMs scheduling of CPU time in a detrimental way.
-
The kern.vty=sc workaround worked for me.
I rebooted a dozen times and didn't see the panic, which is definitely an improvement.
Thanks so much for the workaround and for incorporating the fix!
Great news! That's a much better workaround than letting it crash and reboot.
-
@KOM:
Smoke'm if you got'em?
This is a tangent from the original post, but you can actually get worse performance with additional cores. The way VMware does its scheduling, it will only give a VM a time slice if there are as many cores free as the max defined, regardless of what the VM is currently using. If you have a 12-core box and have defined pfSense as using 8 of those cores, then it's only going to get time when 8 cores are free – even if pfSense is only using 2 for example. Depending on load from other VMs, this can delay a VMs scheduling of CPU time in a detrimental way.
Understand, and thank you. I have 16 cores and 48GB RAM for three VM’s. FreeNAS, an Ubuntu web server, and pfSense. The web traffic is minimal and FreeNAS is more static storage. PfSense pulls the greatest load through VPN traffic. Really, it’s massive overkill for all guests. Our encryption settings for a site-to-site trunk are maxed out, so I provide the CPU for OpenVPN.
-
jimp, thanks for looking into the crash so much as well as reporting it :)
So, back to the original posting/question… I guess we will have to wait for FreeBSD to fix the issue then, thankfully the original issue isn't detrimental.
Once again, thanks to all, it's much appreciated. :)