Gateway load balancing seems to work well. I have two PIA VPN tunnels configured on an SG-3100. I have them both as part of a gateway group in tier 1, and my test machine matches a firewall rule that sends all traffic to that gateway group by default.
When running a Speedtest, the download test uses both tunnels - one openvpn process on each CPU. During the upload test, it only uses one of the tunnels. If I have the gateway group prefer one tunnel over the other, the download test only uses that tunnel and not the other, and the upload behavior doesn't change. I was able to confirm that by watching top from a console and looking at the bandwidth monitor.
I managed to pull down 60 mbit over OpenVPN doing it this way a few times, but on average it was about 50 mbit. I know there's more throughput available here given the hardware specs, so I need to figure out the best encryption algorithm to use. I want to try a real bench test to take the intertubes variable out of the equation to see how this really works.