Pfsense 2.1.3 + ESXi 5.5 = reboot after every shutdown of pfsense needed



  • Hey.

    Just migrated pfSense to my ESXi with dual Intel NIC Passthrough'd to pfSense. For everything working flawlessly so far, I need to reboot pfsense every time HOST or PFSENSE/GUEST shuts down. I can reproduce it everytime by rebooting HOST or by halting PFSENSE/GUEST. If I don't do this, WAN doesnt get a lease from my ISP and LAN doesn't give IPs to any client on my LAN (I can't even ping it.).

    What should I do? Is this "bug" normal? Any fix or workaround?

    Thanks!



  • 32 views, 0 reply? seriously? :|



  • Some more information might help.

    Hardware details?  Any special reason for passing through the NICs?



  • I use 2.1.3 under ESXi 5.5U1.  I'm not sure if what you're seeing could be classified as a problem.  When it comes to running pfSense (or any router) in a virtual machine, you don't want to suspend and resume since a lot of the router metrics & tables are in RAM.  Do you have an ISP-supplied router/modem in series after the pfSense instance?  Does your ISP really assign you a new IP address every time your router boots up?



  • CPU Intel Xeon E3-1270 v3 -> 1 core, 2 threads exclusively for pfsense
    Motherboard Supermicro X10SL7-F -> dual gbe Intel® i210AT passthrough'd to pfsense
    Modem (ONLY modem!) gives pfsense always the same WAN IP, which is my IP on www.whatsmyip.org, so its not a router! With seperate pfsense on baremetal (Intel D2500CC) box with dual NICs it works flawlessly, but I'm going after virtualization+passthrough now.

    I choose passthrough for performance+latency and will NOT do it otherwise. (only if you got some good reasons I don't know yet, but stick with passthrough please.)

    Host is ESXi 5.5 update 1 with installed latest 1.6.2014 patch
    Guest is latest pfSense 2.1.3 64

    Anything else?

    EDIT: for management network and other VMs I use quad nic, so those onboard dual NICs are in exclusive use for pfsense
    EDIT2: while we're at hardware info… I don't think its hardware related but software related...?



  • If you're concerned about performance then I don't know why you're goofing around with hypervisors.  Just run it on the bare metal.

    I'm still not sure what your problem is.  You say that if you (I assume?) suspend your pfSense VM instance, then when you resume it, it no longer has the IP address it used to have?  It no longer has Internet access until you reboot?  You say your modem give you the same WAN IP then I'm not sure why you're asking about dynamic leases.  Set your pfSense WAN to the IP address that your modem would normally serve it.

    Am I missing something?



  • @KOM:

    If you're concerned about performance then I don't know why you're goofing around with hypervisors.  Just run it on the bare metal.

    I'm still not sure what your problem is.  You say that if you (I assume?) suspend your pfSense VM instance, then when you resume it, it no longer has the IP address it used to have?  It no longer has Internet access until you reboot?  You say your modem give you the same WAN IP then I'm not sure why you're asking about dynamic leases.  Set your pfSense WAN to the IP address that your modem would normally serve it.

    Am I missing something?

    There are 3 possibilities:

    1. Power on with ESXi (autostart)

    2. Shutdown in pfSense (halt - number 6) and then power on in ESXi

    3. Reboot in pfSense (reboot - number 5)

    4. and 2) are identical: does NOT get WAN IP and does NOT serve IPs on LAN

    5. when it boots everything works like it should!

    That means as far as I understand, when pfsense initially initializes the NICs, they somehow don't work properly (software related? maybe bad drivers?). But when it normally reboots (that means self-reboot - number 5!), NICs are already initialized, they work flawlessly WAN+LAN IPs! I'm no developer of ANY kind, but I think I'm kind of right? What I really do not understand that I'm the only one on the planet with this problem. Already googled and found nothing!

    I'm not goofing around with hypervisors, I'm doing passthrough which is the same as bare metal and I know what I'm doing.

    Thanks for far!
    Waiting for reply



  • Have you tried using a static IP address for your pfSense WAN, the same IP that your modem gives it dynamically?

    You said it works when bare metal and you're concerned about performance and latency.  Can I ask why you insist on virtualizing it when it seems there is no obvious advantage and a big disadvantage for you?



  • @KOM:

    Have you tried using a static IP address for your pfSense WAN, the same IP that your modem gives it dynamically?

    You said it works when bare metal and you're concerned about performance and latency.  Can I ask why you insist on virtualizing it when it seems there is no obvious advantage and a big disadvantage for you?

    Let's assume it would work with static IP on WAN, but what on the LAN side? LAN side doesnt serve IPs too! It needs a reboot to serve IPs!

    First advantage would be to sell that box and have some extra cash in my pocket and second advantage is to take care of only 1 box and not 2. Less hardware, less disks to fail for exact same purpose and actually better performance (Xeon vs Atom and Server vs Desktop NICs).

    Waiting for more advices



  • Perhaps DHCP server on LAN may not work if WAN is undefined?  I don't know, but it might be a safety mechanism so that DHCP doesn't give out IP addresses in what might end up being a LAN range?  No idea for sure and I'm just guessing.  Anything in your DHCP logs under Status - System Logs - DHCP tab?

    OK, so you're moving from pfSense on a desktop-class machine to an instance under ESXi?  I'm still trying to figure out the big picture what you're really trying to do



  • @KOM:

    Perhaps DHCP server on LAN may not work if WAN is undefined?  I don't know, but it might be a safety mechanism so that DHCP doesn't give out IP addresses in what might end up being a LAN range?  No idea for sure and I'm just guessing.  Anything in your DHCP logs under Status - System Logs - DHCP tab?

    OK, so you're moving from pfSense on a desktop-class machine to an instance under ESXi?  I'm still trying to figure out the big picture what you're really trying to do

    I don't think a mechanism would be in a pro-like software like pfsense… and honestly I don't want to make a static IP where it should not be.. I mean, would you do it?

    The big picture is to get everything under the same roof because of enough powerful NICs and a powerful CPU..

    But let's get back on the subject. Still CAN NOT believe I'm the only one with this problem and still noone knows how to fix it. Cmon, I'm doing passthrough which is kinda same as baremetal! I'm not doing rocket science!


  • Rebel Alliance Global Moderator

    I am running pfsense 2.1.3 on esxi 5.5u1 and not having any issues.  I reboot my host all the time, has multiple vms on it that autostart with the host.  I do a shutdown of guests when shutdown hosts, and then the vms restart after the host has been up for a couple of minutes.  Autostart setting in esxi.

    I am curious what performance your looking to get that you think you need to go passthru? How much to you need pfsense to handle as far as bandwidth and what is the connections?

    So your saying that even if you shutdown pfsense (guest) and then start the guest it doesn't get an IP on wan and lan is not even up?  Then you reboot the guest and it then works?  Or you have to reboot the host.

    So does pfsense think the nic is working - what does pfsense show for the nic?  Nothing in the log on pfsense?  It just doesn't pass any traffic?

    Are you running vmtools on pfsense?  Is pfsense 32 or 64bit version?



  • @johnpoz:

    I am running pfsense 2.1.3 on esxi 5.5u1 and not having any issues.  I reboot my host all the time, has multiple vms on it that autostart with the host.  I do a shutdown of guests when shutdown hosts, and then the vms restart after the host has been up for a couple of minutes.  Autostart setting in esxi.

    I am curious what performance your looking to get that you think you need to go passthru? How much to you need pfsense to handle as far as bandwidth and what is the connections?

    First its not about the bandwith. Its the pure thinking that I have not ruined anything in any way by going from bare-metal to fully virtualized.
    My WAN is 120down with somewhat heavy usage from many devices.

    So your saying that even if you shutdown pfsense (guest) and then start the guest it doesn't get an IP on wan and lan is not even up?

    When I autostart ESXi from power loss or just simply reboot it, it autostarts pfsense. It does not get/give any IP WAN/LAN-side even though in the menu it says LAN:192.168.1.1/24.


    The interfaces are up I think, because I can normally assign in the menu igb1/igb0.
    So yes.

    Then you reboot the guest and it then works?  Or you have to reboot the host.

    ****If I now reboot the guest it works as it should (WAN from ISP, LAN DHCP server).

    So does pfsense think the nic is working - what does pfsense show for the nic?  Nothing in the log on pfsense?  It just doesn't pass any traffic?

    Are you running vmtools on pfsense?  Is pfsense 32 or 64bit version?

    Where and how to check logs? Its pfSense 2.1.3 64 with openVM-tools installed under ESXi 5.5 u1 with 1.6.2014 latest patch.



  • You may not be the only one in the world.

    This thread was about what seems to be the same problem with similar hardware.

    Unfortunately, no resolution posted.



  • perhaps there is an issue with passthrough & freebsd8.3 & esxi.

    Since current development is focussed on freebsd 10, I wouldn't get my hopes up, that this gets fixed soon.
    Freebsd has para-virtualized driver support builtin, and thus would no longer require the use of legacy stuff. Performance should go up dramatically.

    on that hardware getting around 1gbit/s with the legacy virtual drivers should not be a problem. ( i run around 10 boxes on similar hardware)
    my advice: stop using the passthrough and go legacy-virtual ;)



  • While we're hopefully still searching for an answer… Out of the subject is pfsense2.2 in its current state considered stable and secure as 2.1.3?


  • Rebel Alliance Global Moderator

    Yeah you only have 120mbit connection - there is not going to be an issue using that with virtual nic vs passthru..  Your thinking is wrong on the performance..  If your worried about virtual performance - then you shouldn't be running virtual at all.  Since you can not get over the mindset of using it the way its designed.

    My file storage is box is VM, its nic (vmxnet3) is vm connected to the vswitch that connects to the physical world with cheap nics, on a N40L box..  And I get great performance to and from the real world network.

    here is my workstation to my VM storage box.

    –----------------------------------------------------------
    Client connecting to storage.local.lan, TCP port 5001
    TCP window size:  256 KByte

    [344] local 192.168.1.100 port 52507 connected with 192.168.1.8 port 5001
    [ ID] Interval      Transfer    Bandwidth
    [344]  0.0-10.0 sec  1.06 GBytes  912 Mbits/sec

    Why should I be worried about performance on that??

    phy box - switch – phy nic (N40L) -- vswitch - vm nic (storage vm)



  • @johnpoz:

    Yeah you only have 120mbit connection - there is not going to be an issue using that with virtual nic vs passthru..  Your thinking is wrong on the performance..  If your worried about virtual performance - then you shouldn't be running virtual at all.  Since you can not get over the mindset of using it the way its designed.

    My file storage is box is VM, its nic (vmxnet3) is vm connected to the vswitch that connects to the physical world with cheap nics, on a N40L box..  And I get great performance to and from the real world network.

    here is my workstation to my VM storage box.

    –----------------------------------------------------------
    Client connecting to storage.local.lan, TCP port 5001
    TCP window size:  256 KByte

    [344] local 192.168.1.100 port 52507 connected with 192.168.1.8 port 5001
    [ ID] Interval      Transfer    Bandwidth
    [344]  0.0-10.0 sec  1.06 GBytes  912 Mbits/sec

    Why should I be worried about performance on that??

    phy box - switch – phy nic (N40L) -- vswitch - vm nic (storage vm)

    Again its not about the throughput only, its about LATENCY and MANY connections at the same time from many devices where latencies play a role.

    Stick to the subject please. I want and I need passthrough.

    So now this is a "known" bug or something. Maybe driver fault? Can I update drivers somehow?

    Would something in System>Advanced>Networking solve it?

    Any other suggestion?


  • Rebel Alliance Global Moderator

    Latency really on a freaking LAN what could the pos be .001 seconds ? Your nuts if u think latency going to be an issue phy vs virt. Your causing yourself grief for no reason



  • Hopefully found the solution:

    System > Advanced > Networking

    Check Enable device polling
    Uncheck everything below
    Reboot

    Survived for 3 host reboots already
    2x Intel i210AT on Supermicro X10SL7-F



  • It isn't quite solved yet. Always have atleast 50-51% CPU inside pfsense. That means one of the cores is always maxed out. Anyone got any idea in this area what these switches do and why they kind of fixed this?

    EDIT: And that is on an idle connection and pfsense idling!



  • @devianceluka:

    It isn't quite solved yet. Always have atleast 50-51% CPU inside pfsense. That means one of the cores is always maxed out. Anyone got any idea in this area what these switches do and why they kind of fixed this?

    EDIT: And that is on an idle connection and pfsense idling!

    You are sure your reading the graph right? On the performance tab of vSphere? It can be confusing looking at the left side it says percent, on the right side it say MHz. The graph varies by the maximum usage, if your max was 200 MHz and you are using 100MHz right now it will say 50%? My current usage is 62 MHz with Max usage at 1291 MHz. The Max is so high on mine because ever hour a 2 minute script runs that uses a lot of processor.



  • I have the same problem with pfSense 2.1.5 and esxi 5.5 with Intel I350 Nics.
    Works when you reboot the VM -

    On latest pfsense 2.2 Snapshot it seems to be working correctly without the need to reboot.



  • http://christopher-technicalmusings.blogspot.com/2012/12/passthrough-pcie-devices-from-esxi-to.html

    Try that. it seems to be a BSD/ESXi issue. Found it while dealing w/ a similar problem



  • Did you ever get farther on this?

    I'm going to try device polling before next reboot and see if that helps me. The tickboxes below… I can't see how those would impact no connectivity post "first" reboot. It's 100% reliable- every SECOND reboot is fine. I am sure all vmware and passed through intel nics support polling fine.

    I agree 100% you need to passthrough, virt NICs are just not good enough for replacing baremetal intelligently. Even when I only had 200MB in, I could see a huge loss on the ESX nic....Even played with the 3 different driver options you can pass to hack toward pfsense, always lossy. Can't happen with voice and other stuff.

    The dual reboot hing makes me wonder if its a slice thing- I know the flash installs to two slices...And seem to remember reading they alternate at every reboot. Any comments on that?

    Next week I will have one wan on gigabit/300 and the other at 200/30. Of course you need a good intel card for those, and to be smart to even see the throughput behind pfsense.

    I think I ordered a quad ET 82576, my dual ET plus single 82574 CT pass through fine and dandy to the two wans which still are just 300/300 and 200/30.

    ESX 5.1 is on x9scm-f e3-1230 32GB running lots of PCI passthrough to other stuff too.

    pfsense 2.2's limiters are bustd. so 2.1.5 is best for me for now.

    I may get around to trying a 2.2 pfsense and see if reboot works. Last time I tried the upgrade it broke everything, which I later found out was just because 2.2 busted limiters.