Ah, forgot to respond on this one… three things:
1) I had messed around a bunch with the same firewall pair before starting to do performance testing and I suspect things were a little dirty under the hood. I ended up nuking the firewalls back to the base state and things looked better, but not perfect.
2) My apache settings were a little weak. I ended up making sure that I was logging to /dev/null and bumped the apache threads (I was using the worker model) up to a higher number and made sure to check vmstat on the system. It was surprisingly easy to overload the system I was using.
3) ab never really panned out for me. I ended up having a hard time getting it to really scale well. I ended up using curl-loader http://curl-loader.sourceforge.net/ from multiple machines, and running multiple apaches behind pfSense. The documentation was a bit sparse, but the results were more consistent and I could crush the servers behind pf. Ironically, I wasn't able to max out pf, as I needed a few more servers behind it to max it out. I think I was doing about 20,000 connection attempts per sec when I had to stop. The requests were pulling a tiny "Hello World" html file, so this was opening and closing sockets with very little data in between. I think my firewalls were at about 55-60% CPU. I also did a bandwidth test where I pulled a 50K file over and over again and was able to max the gig link without pfsense breaking a sweat, but that's really more of a test of the NIC then the software, anyway.