What you describe is closer to the truth, but I think there is a little confusion in both areas.
Traffic shaping does not happen on the interface it enters, it happens on the interface it leaves. That is a fact of life, it's the only way shaping can happen, because that's the only place it can possibly be limited. So, downloads are limited when they leave LAN, uploads are limited when they leave WAN.
Content is not "cached" in any way, but if some packets are dropped, which will trigger a resend, eventually the sending side will throttle itself back. Through a combination of this dropping/throttling of packets, the traffic is effectively limited.