2.0.3 RELEASE bugs

jwelter99

Have been running 2.0.3. PRERELEASE built on Thu Feb 21 18:49:39 EST 2013 FreeBSD 8.1-RELEASE-p13 for sometime without any issues. Upgraded to 2.0.3 RELEASE Friday night.

This is an HA pair.

Couple of problems since:

Changes made on the primary will not sync to the secondary. For example on the rate limiters trying to sync causes "A communications error" on the primary and on the secondary a message about not being able to delete the IPFW limiter. I suspect this is because it's syncing 'rules' that need the limiter so when it goes to delete it fails due to the limiter in use.
REBOOT from the DIAGNOSTICS menu doesn't work. It does a HALT instead. I had to go into the co-location due to this and found the system sitting at the HALT state. I then rebooted it, tried a REBOOT again only to end up HALTed again.
Something has changed in the rate limiters - have not narrowed it down yet but it seems that traffic is ended up in the wrong queues at times. I am digging into this deeper but currently #1 is making this difficult to sort out.

I've had great success with 2.0.1, bad luck with 2.0.2, good with the 2.0.3 PRERELEASE, and now back to bad luck on 2.0.3 RELEASE.

Not sure what people are running in production that's stable and problem free?

jimp

I haven't seen any reports of those errors or anything similar from customers or other testing.

#2 most certainly does not happen for me. I reboot my VMs all the time that way on 2.0.3 and they always come back up. They don't halt. Only halt halts.

For "rate limiters" are you referring to Limiters, or the shaper in general?

Nothing changed in limiters that would affect how traffic is queued in them.

If you remove a limiter, the others get renumbered and it messes up your rules because now you don't refer to any actual existing limiter on the highest numbered one. That's a known issue with all of 2.0.x and has been fixed on 2.1.

ZPrime

Reboot works fine on the hardware I've used (Soekris boxes).

Can't speak to #1 or #3. However, I am seeing a very annoying "hang" on a machine if it can't get to the NTP server(s) during boot. 2.0.2 didn't do this, but 2.0.3 blocks the boot process while it tries to talk to NTP servers, with "Error : hostname nor servname provided, or not known".

You can't even ctrl+c to skip that portion of the init - it just cancels the entire init script and then starts it over after you hit enter a few times.

In practical use it isn't a huge problem because most systems will have internet access to hit an NTP server… but when setting up a test box or setting something up that is not in situ, it's irritating to have to sit and wait for it to fail (which takes a long time).

jimp

The NTP issue has been discussed in other threads already, complete with a fix.