Pfsense on ESXi - SCSI Error?

PRNOHFT

Hi guys,

I'm running pfsense 2.2.6 on ESXi 5.5. I've allocated 4GB RAM & 4 CPU working on it and it is running in a school supporting a medium sized (say about 1000? but not all the time). Things are going fine and dandy but as of late, i've been getting intermittent reboots from pfsense. Checked the logs and I found something along the lines of SCSI Error. I've also had freeradius in it but i've decided to run freeradius on a seperate VM so as not to stress the guy out. This afternoon, it restarted again but this time it brought down the freeradius with it. I'm not in the midst of reinstalling the freeradius but I cannot gamble with this thing happening. I've did some search and found that its an Open/FreeBSD issue that has yet to be resolved?

Hope you guys can provide some info.

cmb

What specifically is the error?

A SCSI error inside a VM, if it's the one I'm thinking of, is most often what happens when the host loses connectivity to the shared storage where the VM is located. Akin to yanking the hard drive out of a physical machine.

PRNOHFT

@cmb

I can't replicate that error (nor do I want to!) but I seem to have lost the time frame to get the logs. For now, it looks something like this;

Sep 19 09:21:33 srv03 kernel: (da0:mpt0:0:0:0): WRITE(10). CDB: 2a 00 04 6e d3 e2 00 00 40 00
Sep 19 09:21:33 srv03 kernel: (da0:mpt0:0:0:0): CAM status: SCSI Status Error
Sep 19 09:21:33 srv03 kernel: (da0:mpt0:0:0:0): SCSI status: Busy
Sep 19 09:21:33 srv03 kernel: (da0:mpt0:0:0:0): Retrying command
Sep 19 09:21:33 srv03 kernel: (da0:mpt0:0:0:0): WRITE(6). CDB: 0a 00 a7 d1 01 00
Sep 19 09:21:33 srv03 kernel: (da0:mpt0:0:0:0): CAM status: SCSI Status Error
Sep 19 09:21:33 srv03 kernel: (da0:mpt0:0:0:0): SCSI status: Busy
Sep 19 09:21:33 srv03 kernel: (da0:mpt0:0:0:0): Retrying command

(I stole it off the web).

Normally after they retry like, 4 times? Pfsense will reboot by itself.

cmb

Yeah that's the error I was referring to. If that VM's on shared storage, it's likely some kind of problem with the ESX host and the shared storage. If you're not up to date on patch levels on ESX, it might be some issue in the hypervisor that's since-fixed. Could be a hardware issue with the server if it's local disk.

Every time I've seen that reported, it's been a host issue.