Those words are misleading IMHO - substitute "down" for "exhausted" and that is what it does.
You Load Balance by putting multiple gateways at the same tier. Then new connections get allocated around the gateways of the same tier (that are up) in the gateway group.
For Failover, put gateways at different tiers and the Tier 1 will be used exclusively first, then when all Tier 1 are down Tier 2 is used…
As you imply, it might also be nice to use a Tier 1 gateway, and when it appears to be saturated with traffic, then put new connections onto a Tier 2 gateway... - there is no functionality to do that.
If you have multiple Tier1 gateways of different bandwidths, then you can put different weights in the gateway advanced parameters to make the system allocate more/less client connection to particular gateways (rather than just even balancing).