Random reboots, kernel panic, instability 2.0BETA5 (ok with NO ACPI)
I think this is a topic talked about greatly of late.
If it matters, I'm seeing a reboot every 20 minutes or so on any snapshot currently available, only when I turn on IPSEC "without" vlans being configured. It does not happen on snapshots dated November 2010, just the currently available snapshots.
I think there are numerous reports of this, with people providing backtraces and screenshots:
I am wondering if there have been this many reports, what exactly is needed in order to provide a solution? I am finding BETA5 pretty much unusable in its current state. I am not using older realtek cards or chipsets. I removed vlans. I updated to several different BETA5 builds in order to try to isolate the problem (6 actually, from mid December through Jan 23). The forum has other posts with similar issues that are not included above.
I do not find the same behavior (no reboots) if I just have the system idling with only the LAN port being enabled.
After perusing the freebsd forums on the same issue for freebsd 8.1, it is being suggest that hyperthreading is the main cause. So in my tests with pfsense 2.0BETA5 installed as a guest OS, if I boot pfsense with no ACPI, enable both network cards, and an IPSEC connection I have a relatively stable and working connection.
My question is "what" needs to be done to address this to the build as a whole to keep others from hitting the same problem? I somewhat doubt using a developer kernel to provide backtraces is actually necessary if someone can ascertain what has changed to the kernel between november and january.
I have been looking through the freesbd forums which suggests the issue is with hyperthreading being enabled on the bios. So yes, this relates to the hardware and chipsets being mentioned in the various threads. So in this case, my overall question is what is the most optimum way to address this.
I am assuming the issue is with the freebsd kernel (and introduced recently). Is this an incorrect assumption?
I think the point of the backtrace is to determine with some certainty whether all these panics are related, as there are some similarities between reports, but also some differences. For example, many people are reporting openvpn use, while you are using ipsec. I use neither.
Most folks with the panics are using the em driver, but at least one user has reported panics lately using the vr driver.
Many people report panics several times per day, but mine happen after 6-8 days uptime.
Many people can reproduce their panic faithfully, while it is more random for others.
Many people are seeing more problems lately. I've had the same issue since at least August.
So yeah, there are still a lot of unknowns here based on what I've seen in the forum, and hopefully a sampling of backtraces will help sort out what is what.
I've had a lot more problems with older snapshots than I do with one from late Thursday.
The real issues are that
The em driver is problematic with certain chipsets. Others are fine. We apparently don't have any of the bad ones between all of the devs so it's hard to track down since it can't be reproduced. The problematic em chipsets appear to be very early/old em chips that were moved to a path of code for "legacy em" or "lem".
The FTP proxy still needs some work, and can cause hangs
The PPTP proxy may still cause hangs (I haven't seen any more confirmations of this since the latest patches went in)
The CARP code can cause a panic, as we're tracking down a CARP issue (Bug #910) and even though a custom kernel had been working it can lead to panics for some in snapshots
Only one person has reported a vr panic. I can't replicate it on my ALIX and there have been no other reports, so it's hard to say if that's really a software issue at this point.
These issues have mostly affected a limited number of users, though the FTP one is a bit more common. I can't hang my box anymore (It was perfectly repeatable last week) on the snap I've been running, but others claim it still locks theirs up.
I started on a late December build. I'm using the full install, embedded kernel option, on ALIX w/microdrive. The only problem I had originally was with pass-through PPTP client connections "hanging." That was supposed to have been fixed in a 1/19 build (I think), so I upgraded. The PPTP hangs seemed to have been fixed, but then the router began randomly rebooting. Since then, I have tried a couple newer snapshots, but the reboots continue. I'm now using my backup router (DD-WRT on WRT-54GL) until I can rebuild the ALIX router to pfSense 1.2.3.