[solved] 2.2.3 nanobsd - packages reinstall after upgrade totally screwed

cmb

I doubt that commit was directly related, as it'd likely break package reinstalls in all circumstances for everyone. They're no more problematic than they ever have been that I've seen (not that that's great, I know, something we'll be working on for 2.3 while switching to pkg).

There is some odd stuff there though. What packages you have on there? 32 or 64 bit?

doktornotor

Alix. Executive summary after I fixed the mess:


Package Name 	- After reboot 	- After Reinstall packages 	- After reinstall one by one
Service Watchdog - broken GUI 	- Broken GUI 			- Works
Notes 		- broken GUI 	- Broken GUI 			- Works
Cron 		- broken GUI 	- Broken GUI 			- Works
gwled 		- broken GUI 	- Broken GUI 			- Works
RRD Summary 	- 404 not found - Broken GUI 			- Works
Shellcmd 	- Broken GUI 	- Works 			- Works
SSHDCond 	- Broken GUi 	- Works 			- Works
System Patches 	- 404 not found - Works 			- Works

A special case:

NUT - completely failed multiple times (fetch failures, inc file missing resulting in deinstall failed so reinstall subsequently got aborted, took about 5 attempts to get reinstalled)

P.S. Also wondering what's this:

Jun 25 10:30:02	php-fpm[44952]: /pkg_mgr_install.php: Reference 1000 is going negative, not doing unreference.

phil.davis

Reference 1000 is going negative, not doing unreference.

That is from util.inc function refcount_unreference($reference)

The shared memory location keeps count of the number of things that have the file system mounted RW while they do their install, write conf files, make changes or whatever. It should count up when a bunch of stuff is happening at once, then when it transitions from 1->0 (and running nanoBSD…) the file system gets set back to RO.

Going negative indicates that there is a mismatched extra call to /etc/rc.conf_mount_ro somewhere, or something similar. All code that needs the file system mounted RW for changes should call rc.conf_mount_rw near the start, then rc.conf_mount_ro after making all the changes. Whatever code paths/error conditions/options there are in an installer it needs to always do the same number of RW calls on the way in as RO calls on the way out.

Perhaps there is a package that has something mismatched in its installer, because I remember looking at all the base system RW and RO calls a while ago and confirming that they all seemed to be in happily-matched pairs.

cmb

The removal of the forcesync patch means it takes much longer to go rw->ro, with the diff on an ALIX most pronounced. That might be somehow related in that circumstance. I'm way too sleep deprived right now to look into that further at this instant, but might help others who are interested in digging into it.

xbipin

i dont mean to hijack this thread but for me i use cron, lightsquid and squid3 and the first 2 reinstall fine but now squid3 wont go past extracting on a full install

Beginning package installation for squid3 .
Downloading package configuration file… done.
Saving updated package information... done.
Downloading squid3 and its dependencies...
Checking for package installation...
Downloading https://files.pfsense.org/packages/10/All/squid-3.4.10_2-i386.pbi ... (extracting)

doktornotor

@cmb:

The removal of the forcesync patch means it takes much longer to go rw->ro, with the diff on an ALIX most pronounced.

So, you removed what? This? Since vfs.forcesync seems to be still set to 1 on the upgraded system in System Tunables.

@xbipin:

i dont mean to hijack this thread

Yeah, so please don't and post to the proper forum section.

xbipin

lightsquid also same

Beginning package installation for Lightsquid .
Downloading package configuration file… done.
Saving updated package information... done.
Downloading Lightsquid and its dependencies...
Checking for package installation...
Downloading https://files.pfsense.org/packages/10/All/lightsquid-1.8_2-i386.pbi ... (extracting)

doktornotor

Dude, this thread is about nanobsd – which your box apparently is NOT. Plus, it's not about squid* failing to extract. Please leave it alone.

jimp

Yeah, that's the patch that was removed. At least the part of it that actually affected the filesystem.

I'd be interested to know if the problem could be repeated with the disk set to be permanently rw (Diag > NanoBSD, check the box and save)

The switch to RO is a crutch/safety net/etc to give people a warm and fuzzy about disk writes but in truth, aside from maybe a stray package here and there, things won't write to the disk willy-nilly so it's reasonably safe to do. All of the volatile things on NanoBSD are held in RAM disks anyhow, that's where the real danger is with a full install, constant writes to things in /tmp and logs that happen in RAM on NanoBSD.

doktornotor

@jimp:

Yeah, that's the patch that was removed. At least the part of it that actually affected the filesystem.

Hmmm… Seems to have pretty bad performance impact on these poor Alix boxes (even with UDMA) :(

@jimp:

I'd be interested to know if the problem could be repeated with the disk set to be permanently rw (Diag > NanoBSD, check the box and save)

Will that stick/work on the post-upgrade reboot? I can do that if that's the case, still lots of boxes left to upgrade where's I'd love to avoid this screw-up.

jimp

It varies from CF to CF – I have one CF that is really obnoxious to use with it, on the order of 45s-1m delays on each save. A different card is only ~3-4s, barely worse than without the patch.

If you set the flag in your config then it is consulted before any potential RO switch done at any time, essentially making conf_mount_ro() a no-op.

doktornotor

@jimp:

It varies from CF to CF – I have one CF that is really obnoxious to use with it, on the order of 45s-1m delays on each save. A different card is only ~3-4s, barely worse than without the patch.

Well, the cards are what PC Engines sells. This: http://www.pcengines.ch/cf2slc.htm

@jimp:

If you set the flag in your config then it is consulted before any potential RO switch done at any time, essentially making conf_mount_ro() a no-op.

Sounds good.

jimp

I have had no luck with those cards over the years. They've all died on me fairly soon.

The "good" card I'm using at the moment is a Sandisk. The one that is awful is a Kingston. Though I have another Kingston that is OK, so… shrug

Generally speaking, the faster the card the less likely you will be to see problems.

doktornotor

@jimp:

I have had no luck with those cards over the years. They've all died on me fairly soon.

Hmmm… interested. Out of some ~20, only one is dead here so far. Granted, there's not much done with them. Those Alixes are replacement for shitty ISP-supplied junk, with writing done when new pfS versions comes out, pretty much nothing to write home about in between.

Kingston, I won't touch. Can't even count how many SD cards have dies on me in phones. The higher the was class, the shittier product. Some of them even DoA. >:( >:( >:(

xbipin

i have 3 alix boxes and 4 full installs with 1 of them running on nanobsd and in general i see some issue with package reinstalls

the first alix with nanobsd had cron package and that reinstalled fine on reboot after upgrade
the second nanobsd machine is a full PC with nanobsd on it and that runs squid and cron only, on reboot cron installs fine but squid got stuck on extracting, so i aborted it and reupgraded it and the next time it installed squid just fine and all errors vanished

i dont know y but extracting gets stuck for some packages and for some it works fine

doktornotor

@jimp:

I'd be interested to know if the problem could be repeated with the disk set to be permanently rw (Diag > NanoBSD, check the box and save)

Well, good news is that this totally avoided any issue described in the OP on two Alix boxes… 8)

jimp

@doktornotor:

@jimp:

I'd be interested to know if the problem could be repeated with the disk set to be permanently rw (Diag > NanoBSD, check the box and save)

Well, good news is that this totally avoided any issue described in the OP on two Alix boxes… 8)

That is good news. I added a note to the changelog doc about that yesterday, I may add a note to the upgrade guide as well.

robi

@jimp:

Yeah, that's the patch that was removed. At least the part of it that actually affected the filesystem.

Guys, it really sucks now.
Every config action is delayed 4 seconds now, this is very anti-productive. I'm using brand new SanDisk CF cards, 2015 model, on Supermicro A1SRi-2758F with a CF-to-SATA adapter.
/etc/rc.conf_mount_rw followed by /etc/rc.conf_mount_ro is also 4 to 5 seconds.
Previously it was working in an instant.

I am using exclusively only NanoBSD version in all kinds of setups, Jetway systems, SuperMicro and various thin clients and never had any boot problems or whatsoever.

Can you please consider putting back the patch with a configurable option/system tunable? Because I definitely vote to keep using it.

I see the option of keeping it RW all the time, but NanoBSD exists exactly because of the super-great capability to keep the system RO, and we should really keep relying on that professional feature, as an extra security measure.

doktornotor

@robi:

@jimp:

Yeah, that's the patch that was removed. At least the part of it that actually affected the filesystem.

Guys, it really sucks now.
Every config action is delayed 4 seconds now, this is very anti-productive. I'm using brand new SanDisk CF cards, 2015 model, on Supermicro A1SRi-2758F with a CF-to-SATA adapter.
/etc/rc.conf_mount_rw followed by /etc/rc.conf_mount_ro is also 4 to 5 seconds.
Previously it was working in an instant.

I am using exclusively only NanoBSD version in all kinds of setups, Jetway systems, SuperMicro and various thin clients and never had any boot problems or whatsoever.

Can you please consider putting back the patch with a configurable option/system tunable? Because I definitely vote to keep using it.

I see the option of keeping it RW all the time, but NanoBSD exists exactly because of the super-great capability to keep the system RO, and we should really keep relying on that professional feature, as an extra security measure.

Consider yourself lucky with those 4 seconds. It's virtually minutes for some people, plain unusable without switching to permanent RW. Plus this - https://redmine.pfsense.org/issues/4803 – dunno how exactly this helped with filesystem corruption, appears the cure is worse than the disease.

robi

Oh shit.

Put back the patch, please, please…