CARP + promiscuous mode and accurate bandwidth monitoring.

  • Here's the setup:

    2 ESXi boxes (5.0), each with a pfsense VM.   Physical networking: Each ESXi host 2 nics teamed to a switch port-goup, with the group in trunked mode.  Virtual networking:  Various untagged switch port groups, plus one tagged trunk group in promiscuous mode for the pfsense VMs.  The pfsense vms each have a single virtual NIC attached to the trunked port group, with tagged interfaces set up for all the subnets.

    All this make sense so far?  CARP seems to work great, except that it occasionally fails over seemingly without any good reason.  Otherwise, all is well on that front.  The issue we're having is that the pfsense interfaces on both VMs see ALL the traffic for whatever vlan they're on, and graph it as such.  This makes it pretty difficult to get a good fix on our actual bandwidth usage since, for example, 2 hosts on ,say, vlan 56 talking to each other will still show up as firewall traffic on the RRD graphs.

    The second issue:

    Our ISP is also using a pair of routers and some form of failover to provide redundancy.  This means we have two switch ports connected to them, one to each router.  The problem is that on the non-active router, we're still metering all of our traffic on the internet VLAN out on that port, and they're billing us accordingly.  Basically, all of our traffic is getting counted twice.  Not sure why that would be except that I guess their devices are in promiscuous mode as well?  This just started about a month ago, before that the graph for the failover switch port was flat.

    The internet vlan has: 2x pfsense VMs, 2x ISP routers (unknown what), 1 Cisco ASA.  Because the pfsense boxes can see (and are graphing) all of the ASA traffic, I'm assuming this is the case with the standby ISP router as well….

    Anyway.  Are my assumptions about why we're metering traffic without it actually passing through the firewall correct?  And can I do anything about that?


  • FYI, this problem was due to a firmware bug in our switch that caused it to stop learning MAC addresses after 49.5 days.