Has ufs_dirbad Boot Loop Been Fixed?



  • Dear All,

    I know there has been an issue with pfSense kernel panic (ufs_dirbad) after UFS file system corruption due to, e.g. power outages. I wonder if this has been fixed? I'm running 2.3.2-RELEASE-p2 and just experienced this issue. Manual fsck helped. I'd really like to avoid doing ZFS install. I wonder if 2.4.x versions have this issue addressed.

    Thank you very much in advance.


  • Rebel Alliance Netgate Administrator

    The bigger issue is why did the firewall loose power and reboot (and you really need to update).



  • It's installed in a very remote area - middle of the jungle, basically, without any options to provide UPS. It's a complicated story. Power supply will be intermittent at this location. I would appreciate if someone could comment on the ufs_dirbad boot loop issue.


  • LAYER 8 Netgate

    I would not install pfSense there unless you have a way to get console access if necessary.

    It is better but it is a live FreeBSD filesystem. You don't want to be yanking the power, zfs or not.



  • What's interesting is that we have about 100 firewalls deployed and it worked fine with read-only nanoBSD image on CFs. Problems started when we started using SD cards, and now even industrial-grade SSDs in read-write mode and normal images. Is there a way, perhaps, by putting the file system in read-only mode? Is there, perhaps, a way to do automatic fsck -y at boot instead of kernel panic/reboot loop?



  • @maximusatov said in Has ufs_dirbad Boot Loop Been Fixed?:

    It's installed in a very remote area - middle of the jungle, basically, without any options to provide UPS.

    Well if you can run networking hardware there you can also run a UPS. It doesn't have to be a big one, just needs to power the pfSense device long enough for a clean shutdown.


  • LAYER 8 Netgate

    It does do an fsck at boot. It passes. The kernel still panics when it is mounted.

    You might look at moving /var to a ram disk. That should minimize writes to the filesystem at least somewhat.

    https://www.netgate.com/docs/pfsense/install/upgrading-64-bit-nanobsd-2-3-to-2-4.html#use-ram-disks

    Of course a read-only filesystem would not have these issues. It's read-only.

    NanoBSD distributions of pfSense are dead, as you know.



  • Also, while an fsck can restore the integrity of a filesystem it can not always repair the data itself. So while the system may once again be able to mount that filesystem there is no guaranty that all of the files still contain usable data. So it's not a solution for your problem.



  • Thank you, all. Derelict, thanks a lot for the RAM disk suggestion. Will try that.


  • Netgate Administrator

    What are you actually running on?

    There are some pretty nice dc-dc UPSs for small 12v systems.

    I second that RAM disk suggestion, I use that on most systems I have installed.

    Steve



  • It's a mini-PC, 12V DC input. Will try to look for DC-DC UPSs, thank you very much!


  • Rebel Alliance Developer Netgate

    UPS is the best, naturally. ZFS + RAM Disks for /tmp and /var is the most robust solution and wouldn't hurt to do that in addition to the UPS.

    NanoBSD wasn't actually read-only since 2.3.1-RELEASE. On 2.3.x the rw/ro switch caused all kinds of problems so we left it rw. After that point it was essentially not much different than a traditional install + /tmp and /var in RAM.



  • Thanks, jimp. I'll start using UFS+RAM disks by default for the next 20 units. Will update this topic if any of these units fail.



  • Here's an update... Got one unit, enabled RAM disk, booted it. Simulated unstable power environment - simply plugged power cord from the device a number of times, and got to this state:

    0_1547484993529_Sentinel problem.jpeg

    So, RAM disks not helping. Any other ideas how to get pfSense perform well in unstable power environments?

    P.S. Guys, please don't suggest UPS or other means to stabilize power. Let's assume by default that the power is unstable. That's a given for this task. Many firewalls do perform well in this environment without breaking their file systems. NanoBSD-based pfSense performed without any issues.


  • LAYER 8 Netgate

    @maximusatov said in Has ufs_dirbad Boot Loop Been Fixed?:

    NanoBSD-based pfSense performed without any issues.

    As has been stated, NanoBSD is dead. I don't see how continuing to state the obvious is going to result in a solution to your problem.

    What happened after that?

    You simply cannot pull the power on a read-write UFS filesystem without at least prompting an fsck. The issue is whether or not the fsck results in a bootable system or panics.


  • Rebel Alliance Developer Netgate

    @maximusatov said in Has ufs_dirbad Boot Loop Been Fixed?:

    Guys, please don't suggest UPS or other means to stabilize power. Let's assume by default that the power is unstable.

    A UPS is the answer. Full stop. If you have unstable power, use a UPS. You can get dirt cheap UPS units that would cover a firewall for a significant amount of time. Coupled with a package like apcupsd or nut that can trigger a clean shutdown, it's a perfect solution.

    Moving the goalposts isn't going to get you a better solution here.

    ZFS helps, since it's a bit more resilient but even that isn't perfect. RAM disks do help but again, not perfect. NanoBSD is no different than using RAM disks. It had not been set read only in years.

    Locking this since it's just going in circles.


Log in to reply