Thanks for the reply CMB.
Have some new information to add to this. Here's our setup: We have 2 identical SuperMicro servers, one in production, one as a backup. No CARP, as we have another issue/bug we a troubleshooting there as well. Currently, the production box is set to "Round Robin" for NAT Translation, and the backup box is set to "Round Robin with Sticky", and both are running fine.
We changed the production box from Round Robin, to Round Robin with Sticky. The server was fine for about 30 seconds, and then locked up the exact same way we saw before. All interfaces stayed up, and everything looks fine, but the box is not administrate-able and no traffic was actually passing. We cut to the backup (which was running fine with Sticky), and we saw the exact same thing. Everything worked fine for about 30 seconds, and then poof, box explodes. We had to reboot the server, disconnect all traffic-bearing interfaces (so it wouldn't immediately lock up again), and revert the config to get things back up and running.
So, it seems changing the config wasn't a problem until you start having traffic use the new NAT translations options. Has anyone seen anything like this before? Is this a software bug, or does it seem more like a hardware incompatibility?
For those who are curious. We are trying the Sticky option due to possible issues with client devices have multiple sessions that NAT to multiple IP public addresses. When we statically set those clients to a single NAT IP address, those problems clear up. So, we were hoping that the Sticky option for NAT might alleviate these issues wide-scale.
Thanks!
Josh