Big crash after setting up CARP addys



  • I'm afraid I don't have any debugging info, but here's the gist of what happened today. Hopefully it'll ring a bell or something.

    I set up a bunch of proxy arp VIPs with 1:1 Nat (and appropriate firewall rules), but they weren't working. So I switched a couple to CARP, and it worked. I then proceeded to change the rest of them (30+ entries) to CARP. I hit "apply" when it showed up at the top, and the box spontaneously rebooted.

    When it tried to come back up, it would get through one time of "Waiting for final CARP interface bringup…. done." then a " done." on another line, then one more "Waiting for final CARP interface bringup...." then hang. Box locked and not responding to keyboard. Tried a few more times to boot and it always hangs.

    I booted from a FreeSBIE live cd and didn't find anything telling in the logs. I pulled a copy of the config and did notice two things about my CARP entries:

    1. I fat-fingered and had a few duplicate VHIDs.
    2. I used a /32 netmask, when apparently I'm supposed to use the netmask of the network (/24).

    Those things shouldn't cause such a catastrophic failure, though, right?

    Version was 20081026 snapshot.



  • I see in the past there have been crashes related to Carp, especially when using incorrect subnets. That was supposed to have been fixed a while ago. Perhaps the bug is back?



  • Update:

    I reinstalled with today's snapshot, fixed the netmasks and duplicate VHIDs in the config file, and reloaded the config file. It booted, but SLOWLY* through the "carp bringup" section.

    This time I installed the debug kernel and remote syslogging.



  • Carp bringup takes up to 120sec, this is normal.



  • Any idea about the spontaneous reboot? Should bad netmasks be able to cause that? Were there any commits in the past few days that might have fixed it (I brought it back up on a newer snapshot).

    Thanks.



  • I used to have those reboot when configuring CARP interface on 1.2 and broadcom NIC on IBM xseries, but only at configuration stage …



  • Running Intel nics and a Dell server here.

    The reboot happened after hitting apply after adding 30+ CARP IPs. Wonder if so many at once triggered something nasty?

    Or if it is the wrong subnet thing, could there be a sanity check added to verify the subnet of the CARP ip against that of the interface?


Log in to reply