Upgrade to 2.1.2: Stuck on 2.1

mickrussom

I tried all the remediations suggested.

1)
dd if=/dev/ad0 of=/tmp/mbr_part_bkup.img bs=512 count=1
dd of=/dev/ad0 if=/tmp/mbr_part_bkup.img bs=512 count=1

still "corrupt"

2)
[2.1-RELEASE][admin@pg-router-5/root(24): fdisk -B -b /boot/boot0 ad0

******* Working on device /dev/ad0 *******
parameters extracted from in-core disklabel are:
cylinders=7745 heads=16 sectors/track=63 (1008 blks/cyl)

Figures below won't work with BIOS for partitions not in cyl 1
parameters to be used for BIOS calculations are:
cylinders=7745 heads=16 sectors/track=63 (1008 blks/cyl)

Media sector size is 512
Warning: BIOS sector numbering starts with sector 1
Information from DOS bootblock is:
The data for partition 1 is:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
start 63, size 3854529 (1882 Meg), flag 80 (active)
beg: cyl 0/ head 1/ sector 1;
end: cyl 751/ head 15/ sector 63
The data for partition 2 is:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
start 3854655, size 3854529 (1882 Meg), flag 0
beg: cyl 752/ head 1/ sector 1;
end: cyl 479/ head 15/ sector 63
The data for partition 3 is:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
start 7709184, size 102816 (50 Meg), flag 0
beg: cyl 480/ head 0/ sector 1;
end: cyl 581/ head 15/ sector 63
The data for partition 4 is:
<unused>Do you want to change the boot code? [n] y

We haven't changed the partition table yet. This is your last chance.
parameters extracted from in-core disklabel are:
cylinders=7745 heads=16 sectors/track=63 (1008 blks/cyl)

Figures below won't work with BIOS for partitions not in cyl 1
parameters to be used for BIOS calculations are:
cylinders=7745 heads=16 sectors/track=63 (1008 blks/cyl)

Information from DOS bootblock is:
1: sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
start 63, size 3854529 (1882 Meg), flag 80 (active)
beg: cyl 0/ head 1/ sector 1;
end: cyl 751/ head 15/ sector 63
2: sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
start 3854655, size 3854529 (1882 Meg), flag 0
beg: cyl 752/ head 1/ sector 1;
end: cyl 479/ head 15/ sector 63
3: sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
start 7709184, size 102816 (50 Meg), flag 0
beg: cyl 480/ head 0/ sector 1;
end: cyl 581/ head 15/ sector 63
4: <unused>Should we write new partition table? [n] y

3)
[2.1-RELEASE][admin@pg-router-5]/root(23): gpart bootcode -b /boot/mbr ad0
gpart: table 'ad0' is corrupt: Operation not permitted

4)
[2.1-RELEASE][admin@pg-router-5]/root(26): gpart recover ad0
gpart: recovering 'ad0' failed: Function not implemented

5)
[2.1-RELEASE][admin@pg-router-5.]/root(27): boot0cfg -v -s 1 ad0
# flag start chs type end chs offset size
1 0x80 0: 1: 1 0xa5 751: 15:63 63 3854529
2 0x00 752: 1: 1 0xa5 479: 15:63 3854655 3854529
3 0x00 480: 0: 1 0xa5 581: 15:63 7709184 102816

version=2.0 drive=0x80 mask=0xf ticks=182 bell=# (0x23)
options=packet,update,nosetdrv
volume serial ID a8a8-a8a8
default_selection=F1 (Slice 1)

[2.1-RELEASE][admin@pg-router-5.]/root(28): fdisk -a /dev/ad0
******* Working on device /dev/ad0 *******
parameters extracted from in-core disklabel are:
cylinders=7745 heads=16 sectors/track=63 (1008 blks/cyl)

Figures below won't work with BIOS for partitions not in cyl 1
parameters to be used for BIOS calculations are:
cylinders=7745 heads=16 sectors/track=63 (1008 blks/cyl)

Media sector size is 512
Warning: BIOS sector numbering starts with sector 1
Information from DOS bootblock is:
The data for partition 1 is:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
start 63, size 3854529 (1882 Meg), flag 80 (active)
beg: cyl 0/ head 1/ sector 1;
end: cyl 751/ head 15/ sector 63
The data for partition 2 is:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
start 3854655, size 3854529 (1882 Meg), flag 0
beg: cyl 752/ head 1/ sector 1;
end: cyl 479/ head 15/ sector 63
The data for partition 3 is:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
start 7709184, size 102816 (50 Meg), flag 0
beg: cyl 480/ head 0/ sector 1;
end: cyl 581/ head 15/ sector 63
The data for partition 4 is:
<unused>Partition 1 is marked active
Do you want to change the active partition? [n]

We haven't changed the partition table yet. This is your last chance.
parameters extracted from in-core disklabel are:
cylinders=7745 heads=16 sectors/track=63 (1008 blks/cyl)

Figures below won't work with BIOS for partitions not in cyl 1
parameters to be used for BIOS calculations are:
cylinders=7745 heads=16 sectors/track=63 (1008 blks/cyl)

Information from DOS bootblock is:
1: sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
start 63, size 3854529 (1882 Meg), flag 80 (active)
beg: cyl 0/ head 1/ sector 1;
end: cyl 751/ head 15/ sector 63
2: sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
start 3854655, size 3854529 (1882 Meg), flag 0
beg: cyl 752/ head 1/ sector 1;
end: cyl 479/ head 15/ sector 63
3: sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
start 7709184, size 102816 (50 Meg), flag 0
beg: cyl 480/ head 0/ sector 1;
end: cyl 581/ head 15/ sector 63
4: <unused>Should we write new partition table? [n] y

I still have this issue:
[2.1-RELEASE][admin@router-5]/root(22): gpart status
Name Status Components
ad0s1 CORRUPT ad0
ad0s2 CORRUPT ad0
ad0s3 CORRUPT ad0
ad0s1a OK ad0s1
ad0s2a OK ad0s2

(33): gpart show
=> 63 7806897 ad0 MBR (3.7G) [CORRUPT]
63 3854529 1 freebsd [active] (1.9G)
3854592 63 - free - (31k)
3854655 3854529 2 freebsd (1.9G)
7709184 102816 3 freebsd (50M)

=> 0 3854529 ad0s1 BSD (1.9G)
0 16 - free - (8.0k)
16 3844433 1 !0 (1.9G)
3844449 10080 - free - (4.9M)
Is there a way to fix this? Seems I need to repair whatever it is that gpart looks at if this is ever going to be upgradeable again.

There was an upgrade to 2.1 before this, all upgrades were done via the GUI.

Would this work?

DISK=ad0

offset=diskinfo $DISK | awk '{ print $4 - 131072 }'
dd if=/dev/zero of=/dev/$DISK bs=64k count=1
dd if=/dev/zero of=/dev/$DISK bs=64k seek=$offset

gpart create -s gpt ${DISK}</unused></unused></unused></unused>

mickrussom

@jimp:

Can someone try setting this tunable:
sysctl kern.geom.part.check_integrity=0
And then perform an upgrade, see if that lets it get by.

I tried this; it came back with 2.1 again.

[2.1-RELEASE][admin@pg-router-5.]/root(4): sysctl -a | grep -i geom
kern.geom.part.check_integrity: 0

[2.1-RELEASE][admin@pg-router-5.]/root(5): exit

[2.1-RELEASE][admin@pg-router-5.]/root(2): exit
exit
*** Welcome to pfSense 2.1-RELEASE-nanobsd (i386) on pg-router-5 ***

WAN (wan) -> em5 -> v4/DHCP4: 192.168.7.139/24
LAN (lan) -> em4 -> v4: 192.168.14.1/24
OPT1 (opt1) -> em2 -> v4: 192.168.16.1/24
OPT2 (opt2) -> em0 ->
OPT3 (opt3) -> em1 -> v4: 192.168.17.1/24
OPT4 (opt4) -> em3 -> v4: 192.168.15.1/24
OPT5 (opt5) -> em0_vlan20 ->

Logout (SSH only) 8) Shell
Assign Interfaces 9) pfTop
Set interface(s) IP address 10) Filter Logs
Reset webConfigurator password 11) Restart webConfigurator
Reset to factory defaults 12) pfSense Developer Shell
Reboot system 13) Upgrade from console
Halt system 14) Disable Secure Shell (sshd)
Ping host 15) Restore recent configuration

Enter an option: 13

Starting the pfSense console firmware update system..

Update from a URL
Update from a local file
Q) Quit

Please select an option to continue: 2

Enter the complete path to the .tgz or .img.gz update file: /up/pfSense-2.1.4-RELEASE-4g-i386-nanobsd-upgrade.img.gz

One moment please…

Broadcast Message from admin@pg-router-5.
(no tty) at 22:54 PDT...

NanoBSD Firmware upgrade in progress...

Broadcast Message from admin@pg-router-5.
(no tty) at 22:54 PDT...

Installing /up/pfSense-2.1.4-RELEASE-4g-i386-nanobsd-upgrade.img.gz.

One moment please...

Broadcast Message from admin@pg-router-5.
(no tty) at 22:54 PDT...

NanoBSD Firmware upgrade in progress...

Broadcast Message from admin@pg-router-5.
(no tty) at 22:54 PDT...

Installing /up/pfSense-2.1.4-RELEASE-4g-i386-nanobsd-upgrade.img.gz.

...

NanoBSD Firmware upgrade is complete. Rebooting in 10 seconds.

...........Done. Rebooting...

*** Welcome to pfSense 2.1-RELEASE-nanobsd (i386) on pg-router-5 ***

mickrussom

@jimp:

.

I noticed that /boot/mbr and /boot/pmgr files are different, is that correct?

[2.1-RELEASE][admin@pg-router-5.]/boot(17): md5 mbr
MD5 (mbr) = db3f526667d01f5851ef3d0ddafb86db
[2.1-RELEASE][admin@pg-router-5.]/boot(18): md5 pmbr
MD5 (pmbr) = 6daee450f256507904e0aebe78187cf6

Also, from gpart man page (Im not sure what CORRUPT means, even after this reading)

RECOVERING
The GEOM PART class supports recovering of partition tables only for GPT.
The GPT primary metadata is stored at the beginning of the device. For
redundancy, a secondary (backup) copy of the metadata is stored at the
end of the device. As a result of having two copies, some corruption of
metadata is not fatal to the working of GPT. When the kernel detects
corrupt metadata, it marks this table as corrupt and reports the problem.
destroy and recover are the only operations allowed on corrupt tables.

If the first sector of a provider is corrupt, the kernel can not detect
GPT even if the partition table itself is not corrupt. The protective
MBR can be rewritten using the dd(1) command, to restore the ability to
detect the GPT. The copy of the protective MBR is usually located in the
/boot/pmbr file.

If one GPT header appears to be corrupt but the other copy remains
intact, the kernel will log the following:

GEOM: provider: the primary GPT table is corrupt or invalid.
GEOM: provider: using the secondary instead – recovery strongly advised.

or

GEOM: provider: the secondary GPT table is corrupt or invalid.
GEOM: provider: using the primary only -- recovery suggested.

Also gpart commands such as show, status and list will report about cor-
rupt tables.

If the size of the device has changed (e.g., volume expansion) the sec-
ondary GPT header will no longer be located in the last sector. This is
not a metadata corruption, but it is dangerous because any corruption of
the primary GPT will lead to loss of the partition table. This problem
is reported by the kernel with the message:

GEOM: provider: the secondary GPT header is not in the last LBA.

This situation can be recovered with the recover command. This command
reconstructs the corrupt metadata using known valid metadata and relo-
cates the secondary GPT to the end of the device.

NOTE: The GEOM PART class can detect the same partition table visible
through different GEOM providers, and some of them will be marked as cor-
rupt. Be careful when choosing a provider for recovery. If you choose
incorrectly you can destroy the metadata of another GEOM class, e.g.,
GEOM MIRROR or GEOM LABEL.

Any help recovering the ad0 would be cool to know.

jimp

Despite hacking and slashing at things in various ways I have yet to see any installation actually recover from this condition without reflashing the CF card (or using a new CF card)

trunix

@trunix:

jimp, I've got the same problem on a 4gb CF. Output of fdisk -p /dev/ad0:

/dev/ad0

g c7745 h16 s63
p 1 0xa5 63 3854529
p 2 0xa5 3854655 3854529
a 2
p 3 0xa5 7709184 102816

I've tried method #1 and #2, but neither worked. The output of fdisk -if /tmp/fdisk_bkup.txt /dev/ad0 from method #2 is below in case it's notable. I didn't get any errors from method #1, the system just booted back into 2.1 on the same slice. The same thing happened after method #2. I'm also not able to switch the bootup slice for whatever reason.

fdisk: WARNING line 2: number of cylinders (7745) may be out-of-range
(must be within 1-1024 for normal BIOS operation, unless the entire disk
is dedicated to FreeBSD)
******* Working on device /dev/ad0 *******

This system and CF card have been in stable operation for awhile now and I've successfully installed all the updates from 2.0.1 to 2.1. I never got a chance to install 2.1.1, I've had similar problems attempting to install 2.1.2.

I was onsite and got the opportunity to re-image the CF card for this build in mid-May to 2.1.3. Last week on a whim I decided to give the 2.1.4 update a shot. It's located a few states away so remote updates are definitely handy. I'm happy to report the update went fine, pfBlocker and the few other packages were reinstalled without issue. Whatever problem I had with 2.1 was solved with 2.1.3.

robi

Yep, one can re-flash safely the same card.

mickrussom

@jimp:

Despite hacking and slashing at things in various ways I have yet to see any installation actually recover from this condition without reflashing the CF card (or using a new CF card)

Any ideas what caused it? I was on a 2.0.x release, upped to 2.1, and then it got bonked. I guess I need a reflash - is there a howto to bootstrap the CF in another machine available so I can just goto the DC, swap and restore?

mickrussom

@jimp:

Despite hacking and slashing at things in various ways I have yet to see any installation actually recover from this condition without reflashing the CF card (or using a new CF card)

Does the replacement CF have to be 4GB, or can it be 16GB?

I have one of these:
SDCFXPS-016G

Would that work, also, how to install this form another machine .

stephenw10

You can use that card or any card bigger than 4GB. Seems like a bit of a waste though, that's an expensive CF card.

Write the Nano image to the card as described here:
https://doc.pfsense.org/index.php/Installing_pfSense#Writing_the_image

Backup your config file first remember.

Steve

mickrussom

@jimp:

Despite hacking and slashing at things in various ways I have yet to see any installation actually recover from this condition without reflashing the CF card (or using a new CF card)

Is there any way to upgrade the install in-place (kernel + userland) and just keep the corrupted labels for the time being.

jimp

@mickrussom:

@jimp:

Despite hacking and slashing at things in various ways I have yet to see any installation actually recover from this condition without reflashing the CF card (or using a new CF card)

Is there any way to upgrade the install in-place (kernel + userland) and just keep the corrupted labels for the time being.

Not any way that would be feasible/workable/supportable.

People have tried it, but it's not something I'd recommend or for which I'd provide any guidance.

luckman212

On my stuck unit I wound up just taking it apart and re-flashing the CF with a fresh 2.1.5 - problem solved.

pguthrie

Since there does not appear to be a fix for this yet, could someone with a valid partition table (e.g. on 2.1.5) on a 2G nanobsd post the results of a "gpart show", so I can compare with the invalid one.

Thank you!

jimp

@pguthrie:

Since there does not appear to be a fix for this yet, could someone with a valid partition table (e.g. on 2.1.5) on a 2G nanobsd post the results of a "gpart show", so I can compare with the invalid one.

I've tried comparing them before and saw no differences, and overwriting a bad with a good didn't appear to make a difference. Behavior like that is what led me even stronger toward the conclusion that it was something on the card itself to blame and not the actual partition table.

jcpolo

I have the same issue upgrading from any 2.x version to any newer 2.x version. for example I have a 2.1 that I wanted to upgrade to 2.1.5 but couldn't.

it wasn't until I started looking in the

Diagnostics: NanoBSD : view upgrade log that I put it together.

Bootup
Bootup slice is currently: ad0s1

NanoBSD Firmware upgrade in progress…

Installing /root/latest.tgz.
SLICE 2
OLDSLICE 1
TOFLASH ad0s2
COMPLETE_PATH ad0s2a

It appears that the slice that the auto upgrade utility (or manual) is upgrading is not the boot slice that is booting up. How do I change this? Clicking on change boot slice at the top does nothing.

doktornotor

@jcpolo:

It appears that the slice that the auto upgrade utility (or manual) is upgrading is not the boot slice that is booting up. How do I change this? Clicking on change boot slice at the top does nothing.

Of course! That is by design. You don't kill a working one under your hands.

Other than that - why's this thread even going? Do a fresh install on an empty drive/CF or whatnot and restore the config! 5 minutes job. Instead of debugging screwed partitioning for years. WTF really.

phil.davis

On nanoBSD the upgrade is supposed to write to the opposite boot slice. When all the commands to the opposite boot slice have succeeded then the upgrade script will switch the selected boot slice and initiate a reboot.
Being stuck means that something went wrong in setting up the opposite boot slice, and the upgrade aborted itself.
I typed more here: https://forum.pfsense.org/index.php?topic=87292.msg481424#msg481424

pguthrie

Other than that - why's this thread even going? Do a fresh install on an empty drive/CF or whatnot and restore the config! 5 minutes job. Instead of debugging screwed partitioning for years. WTF really.

Except for those of us who have remote firewalls where it is not a 5 minute job and costs money to go out there. WTF really.

doktornotor

You'll need to ship someone on site, or ship a replacement box in site. How many more years do you intend to wait for a nonexistent fix?

kejianshi

Don't give up… Keep debugging it. Never admit defeat!

All the people trying to update to 2.1.2 are depending on you. :P