Large Deployment Hardware Config Advice

Slackmaster

Hello,

I am currently running 1.2.3 on a Dell 2850 and VMWare ESXi 3.5. My internet connection is 100mbps and I have about 3000 users behind it. It is working well, in terms of internet performance, however my CPU on my pfSense VM stays pegged out at around 100% during the day. This is a problem to me because I would like add ntop and also utilize the traffic shaper and captive portal. I have not been able to set NICs to e1000, as some have done for better performance.

I want to purchase a new Dell server and run pfSense bare metal on it, for max performance, however I'm not sure about hardware support. I have been looking at the HCL, and it looks like 1.2.3 does not support the new Dell PERC controllers, and 2.0 is still spotty as well. I'm hesitant to move to 2.0 at this point also in a heavy production environment just because it's new.

Has anyone successfully loaded pfSense bare metal on a new Dell Server with the lastest PERC controllers? If so, what version, and what hoops had to be jumped through?

I've run in this forum about others running on Citrix Xenserver and doing PCI passthrough for the NICs. I would be willing to consider Xenserver, but can PCI passthrough function as well with ESX/ESXi?

Thanks.

Guest

I would say save the money from buying a high end server, buy 2 dual/quad cores builds and configure CARP, skipping out on a raid controller.

If it is only acting as an internet gateway with services, you'd be surprised at how far a little will take you.

http://www.pfsense.org/index.php?option=com_content&task=view&id=52&Itemid=49

PS - I haven't found a post or user that's running ntop on pfsense 2.0, I would dedicate an old p3/p4 w/ linux if you have a managed switch. Though bandwidthD comes close.

Slackmaster

Thanks for the reply.

Actually I just found out that I've inherited a Dell PE R710 to do the job, so that is what I have to work with. I haven't booted it up yet to see what raid controller is in it, hopefully it is the one that is compatible with 1.2.3.

jasonlitka

@Slackmaster:

Thanks for the reply.

Actually I just found out that I've inherited a Dell PE R710 to do the job, so that is what I have to work with. I haven't booted it up yet to see what raid controller is in it, hopefully it is the one that is compatible with 1.2.3.

Doubt it. 2.0 should work with the PERC 6i but I don't believe any release of pfSense will cover the H-series adapters.

EDIT: Looked it up, the R710 also has the SAS 6/iR as an option. This is a rebadged LSI 1068 and should work under 1.2.3.

Slackmaster

I ended up loading ESX 4.1 on the R710 and configuring the VM with e1000 NICs, and it runs GREAT. Very happy with the performance so far.

the.it.dude

Slackmaster,

I have (or will have) similar number of clients at our school district. As per my other posts, I have had problems getting 2.0 loaded and/or running on both a R410 & R510. I had considered running in a VM, but I decided that it did not seem like a smart idea to run our Internet facing firewall within a VM (security wise and the added management complexity). It seams you have been running that way for some time. I'm interested in hearing your experiences? (or anyone else's)

Thanks,

Jeff

cmb

The firewalls in front of the servers hosting this and all our other sites are running in ESXi, primary on one server and secondary on another. The only real security risk I would be concerned about is misconfiguration, historically there have not been any security vulnerabilities in ESX that would impact such a configuration, though of course something new is always possible so keep up on the latest updates and info. I expect that after using ESX for about 7 years that if it hasn't been a problem already, it's not likely to be in the future. We push about 3-4 TB/month, typically around 1000-4000 pps and somewhere between 2-100 Mbps (we have a 100 Mb connection to the provider). That's generally comparable to the load 3000 clients are going to generate, though that can vary a lot from one environment to another (typical business with well controlled PCs will be less or comparable to that, something like a dorm of college kids well over that).

Our Netherlands colo runs the same way, though much smaller, everything there is contained in a single ESX server including a CARP pair of firewalls though that provides no hardware redundancy (so we can upgrade it without downtime, we upgrade our production 2.0 boxes quite a bit). It's not a lot more than a mirror, we can live without it if it dies.

I've done a lot of deployments entirely contained in ESXi, wouldn't hesitate to run that way.

the.it.dude

cmb,

Good Info!

I had tried loading up 2.0 in ESXi. I discovered that pfSense always showed the link status as UP when running in a VM. (Not sure I liked this) I assumed installing the Open VM Tools package would correct this. However, after installing that package and then rebooting, I lost all connectivity to pfSense. Do you use the VM Tools?

Thanks,

Jeff

cmb

We don't use open-vm-tools or any type of tools on any of our installs, development, testing and production. May have it on a couple test systems, almost never use any tools though.

the.it.dude

cmb,

One more question if you don't mind…

What's your reasoning for running within ESXi instead of on bare metal? Is it for better support on newer hardware? Or for easier/quicker upgrades/recovery? Has that extra layer of management caused you any grief?

OK, sorry. That was 4 questions :-)

Jeff

tacfit

I'd be interested in the answer to those questions as well. My assumption is that any layer of abstraction would slow things down to some degree.

jwelter99

We've tried ESXi 4.1 for production PFSense and often had unexplained latency of up to 200mS. Moving to physical hardware fixed this. I'd like to revisit ESXi deployment again if we could solve this issue.

CMB - are you using dedicated NIC's to a dedicated vswitch for public and carp? or are you using vlans off a nic group? We were running dual 10 gig-e per host and using vlans for the carp and public networks which I suspect might have been our issue.

@cmb:

The firewalls in front of the servers hosting this and all our other sites are running in ESXi, primary on one server and secondary on another. The only real security risk I would be concerned about is misconfiguration, historically there have not been any security vulnerabilities in ESX that would impact such a configuration, though of course something new is always possible so keep up on the latest updates and info. I expect that after using ESX for about 7 years that if it hasn't been a problem already, it's not likely to be in the future. We push about 3-4 TB/month, typically around 1000-4000 pps and somewhere between 2-100 Mbps (we have a 100 Mb connection to the provider). That's generally comparable to the load 3000 clients are going to generate, though that can vary a lot from one environment to another (typical business with well controlled PCs will be less or comparable to that, something like a dorm of college kids well over that).

Our Netherlands colo runs the same way, though much smaller, everything there is contained in a single ESX server including a CARP pair of firewalls though that provides no hardware redundancy (so we can upgrade it without downtime, we upgrade our production 2.0 boxes quite a bit). It's not a lot more than a mirror, we can live without it if it dies.

I've done a lot of deployments entirely contained in ESXi, wouldn't hesitate to run that way.

cmb

@the.it.dude:

What's your reasoning for running within ESXi instead of on bare metal? Is it for better support on newer hardware? Or for easier/quicker upgrades/recovery? Has that extra layer of management caused you any grief?

We used to run on physical hardware (we pushed roughly the same loads through a VIA 1 GHz), but we have several beefy ESX boxes, which are considerably faster than anything we would dedicate to a firewall. No need to dedicate physical hardware when you have virtual resources available, especially when even in a VM it'll be faster than anything physical we have available to dedicate to that purpose. Snapshots are nice to have available too, though I don't recall the last time I used one on our production firewalls. In most other circumstances, easier/quicker recovery would be one of my primary reasons, but it's so easy to switch between physical hardware that it isn't really a consideration with pfSense. We don't own any hardware that isn't compatible on the bare metal, though we're always at least one server generation behind the most recent as we get whatever people will give us and what we can buy on a limited budget.

None of it's caused any grief. The primary risk for grief in a scenario like we have (aside from user error) is if you have only one firewall, or you end up with both firewalls vmotioned to the same host, and you lose the host they're on without them being migrated to an available host. Then you're stuck, locked out, have to get access to the inside of the network to recover (which isn't as big of a deal in some circumstances, in a remote datacenter it's a very serious problem). Or if you have some other single point of failure between them, like they reside on the same SAN and that craps out. I've seen physical firewalls with drives so dead you can't read anything on the disk yet they still pass traffic perfectly fine (you're not going to be able to change the config though), ESX gets very unhappy if it loses a VM's datastore, rendering the VM completely unavailable from what I've seen.

@jwelter99:

CMB - are you using dedicated NIC's to a dedicated vswitch for public and carp? or are you using vlans off a nic group? We were running dual 10 gig-e per host and using vlans for the carp and public networks which I suspect might have been our issue.

Public/outside firewalls is on its own physical network, with everything else on VLANs. Internal is all gigabit, external 100 Mb. Servers behind the virtual firewalls are a mix of physical servers with FreeBSD on the bare metal, running a bunch of jails, and a variety of virtual servers on ESX.

the.it.dude

Well, I just did some throughput testing…

I had 2 Vostro laptops running iperf on either side of my R410. With pfSense installed on the bare hardware (Note: got it working by trying a different keyboard), I averaged about 140 mbit/sec. With pfSense running in a VM on ESXi 4.1 using e1000, I averaged about 85 mbit/sec. This number also did not change with the Open VM Tools installed. (pfSense also still showed all the interfaces as up even when a cable was not plugged in).

That is too significant of a performance hit for my comfort. I won't be running in a VM. At least now I know...

Thanks for everyone's input!

Jeff

cmb

That's not right, probably a general problem with something related to your testing methodology. One, a Pentium III can push more than 140 Mbps, something seriously not right there if that's all you're getting through the bare metal. Two, a PowerEdge 2650 (and probably something even slower than that but that's the slowest I've used) can push more than 85 Mbps through a VM in ESX, and a R710 is vastly faster, that's 4 generations newer.

There will be a difference in the maximum throughput achievable on bare metal vs. VM, but I would expect it to be several Gbps on bare metal on a server of that spec (from testing I've done on similarly-speced servers), and maybe around 500 Mbps in VM. Unless you need extremely high throughput (pps more than bps) you won't see any difference.

the.it.dude

cmb,

The limitation is probably in the endpoints (Vostro 1520 laptops). Using the same iperf test ("iperf -s" on 1 laptop and "iperf -c x.x.x.x -t 30" on the other laptop) with a crossover cable between the 2 laptops only yields an average of 172 mbits/s. I did the same test thru the following firewall distros: vyatta core, astaro essentials, & Untangle. Here are those results:

Laptop -> Laptop via Crossover cable

172 Mbits/sec

Astaro

Intel -> Intel = 165 Mbits/sec
Intel -> Broadcom = 147 Mbits/sec
Broadcom -> Intel = 132 Mbits/sec
Broadcom -> Broadcom = 140 Mbits/sec

Vyatta

Intel -> Broadcom = 114 Mbits/sec

Untangle

Intel -> Intel = 165 Mbits/sec
Intel -> Broadcom = 160 Mbits/sec
Broadcom -> Intel = 200 Mbits/sec
Broadcom -> Broadcom = 200 Mbits/sec

Note: the NICs in use also made a difference. When I did the pfSense test, I was going from Intel -> Broadcom.