pfSense VM as filtering bridge: issues moving to new ESXi platform
-
Long post here, apologies in advance!
I have a single subnet running on which I have used pfSense 2.5.1 as a filtering bridge to split the subnet into 3 "segments" of sorts: LAN, DMZ, WAN. Note that regardless of "segment" all systems are in the same subnet here; pfSense only filters traffic in the 3 directions (LAN <-> DMZ, LAN <-> WAN and DMZ <-> WAN).
Now my pfSense is running as a VM on IBM custom ESXi 5.5 (the free vSphere version, no vCenter). Three vNICs are used by pfSense: vmxnet0, 1 and 2 corresponding to LAN, DMZ, WAN.
Some details:
- There are two physical NICs, vmnic0 and vmnic1
- vmnic0 connects to an untagged port on my physical managed switch (LAN traffic is untagged)
- vmnic1 connects to a tagged port on my physical switch, carrying traffic for a DMZ vlan and WAN vlan
Connections are as follows:
vmxnet0 (LAN) <-> pfsense-lan portgroup (untagged)* <-> vSwitch0** <-> vmnic0
vmxnet1 (DMZ) <-> pfsense-dmz portgroup (DMZ vlan)* <-> vSwitch1** <->vmnic1
vmxnet2 (WAN) <-> pfsense-wan portgroup (WAN vlan)* <-> vSwitch1** <->vmnic1*: On each port group, promiscuous mode is overridden to be enabled, something needed I believe to run in bridging mode.
**: On each vSwitch, MAC Address Changes and Forged Transmits were both set to Accept for some reason. I'm afraid to disable them to be honest (will explain soon).
vSwitch1 also hosts two more port groups: a dmz-hosts (normal server VMs) port group having a DMZ vlan and and a wan-hosts (same) having a WAN vlan.
As is, the thing seems to work like a charm. All three interfaces of the pfSense box are bridged and management is performed on the combined OPT interface.
Truth is I've been reading stuff and expanding my knowledge about ESXi and VLANs only recently. If I knew more, I'd prolly have combined the the vmnic's to a failover setup to have some sort of redundancy.
Anyways, this IBM server entered its 14th year of operation. Got a Lenovo thinksystem, and installed Lenovo 7.0 U1 on it with the idea to move my VM infrastructure (incl pfSense) to the new powerhorse. This Lenovo had no less than 8 GbE ports each (wow) so with the little knowledge I had on ESXi and after some reading decided to do things a bit differently:
- I installed a couple of managed switches for the sake of having some sort of "high availability". Both switches were connected to the upstream switch.
- From the 8 physical NICs I connected half of them (vmnic0, 2, 4, 6) to the first switch and the other half (vmnic1, 3, 5, 7) to the second switch. On ESXi I allocated the first group for vSwitch0 as active and the other half as standby NICs.
- I configured all physical ports on both switches to carry both untagged traffic as well as tagged (DMZ and WAN) traffic. Upstream traffic to the main switch was trunked, carrying both untagged as well DMZ/WAN VLANs. All switches (upstream and the two switches on the new ESXi) have RSTP enabled.
- This was a much different setup than the one I had up and running succesfully, in that a single vSwitch was employed.
- I then created three port groups with promiscious mode enabled, reflecting the ones I had on my 5.5 setup: pfsense-lan, pfsense-wan, pfsense-dmz
I did not move any other VMs to the new server. I wanted this to be the first one, routing traffic.
Installed pfsense on the new Lenovo, gave it a LAN ip (connecting initally only vmxnet0/LAN) and loaded up a fresh configuration as saved by my working pfsense VM on 5.5. Shut down the old VM, connected all interfaces on the new VM and rebooted the last one. Nothing worked. From its ESXi console I could see it was up and running, the bridge had the correct ip, but pinging any system (LAN, DMZ, WAN) failed. From ESXi I disconnected its interfaces and fired up the old VM to handle traffic normally.
At this point I was not sure if this (having all port groups on the same vSwitch) was a design error. I did notice though that on the new ESXi implementation, created and existing vSwitches had MAC Address Changes and Forged Transmits set to reject. This was different from my 5.5 ESXi.
So I took the plunge and enabled both of these options on all 3, pfsense-related port groups. Again I shut down the old VM, connected interfaces and started up the new one and ... total blackout. I lost all connectivity everywhere! Had to physically go the esx console and reboot the esxi node (thankfully I did not have the new pfsense VM running at boot). LAN connectivity was restored. I then fired up the old VM and everything went back to normal.
I'm a bit clueless (and afraid) on how to proceed. I'm still unsure as to whether having all port groups on the same switch is a bad idea or not. I could definitely try to "mimic" the old setup, allocating physical NICs differently, ie setting different NICs to handle DMZ/WAN and another vSwitch for their traffic. Problem is that I intend to move to vmotion by buying another physical host (meaning I'd have to grab 2 of the NICs for it), include a separate iSCSI vlan (another 2 cards allocated for it) etc. Meaning that I'd end with only two NICs for handling LAN/DMZ/WAN traffic. And these two would most likely end up in an active/standby scenario. Meaning one single NIC to do all these.
Any ideas, any pointers of what I might be doing would be really appreciated. And again, thank you for taking the time to read this!