Ah, there we go that's about what I'd expect to see.
One of the interesting things about iperf is that it's deliberately designed to be single threaded. Running multiple parallel streams using the '-P' switch does not change that, you are still running one iperf process. But, as you already tried, that means you can run it multiple times to test combinations of CPU cores and streams. You still see a better result using -P because the firewall and the NICs can use multiple queues and therefore CPU cores to move that traffic.
Steve