Upgrade from 2.1.5 to 2.2 fails on Alix2d2 Board



  • After doing either an upgrade from 2.1.5 to 2.2 via Auto update or manual update the appliance reboots but the stops at:

    (aprobe0:ata0:0:1:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
    (aprobe0:ata0:0:1:0): CAM status: Command timeout
    (aprobe0:ata0:0:1:0): Error 5, Retries exhausted

    Any ideas on how to remedy this? At the console I have to reboot to the 2.1.5 version to boot.


  • Netgate Administrator

    Which install type? What are you booting from?

    Maybe related to this:
    https://doc.pfsense.org/index.php/DMA_and_LBA_Errors#Other_Errors

    Steve



  • It's a nanobsd upgrade on a compact flash card. It could be related to the above however the 2.1.5 version boots without this problem or any related errors



  • pfSense 2.1.5 is on top of FreeBSD 8.3
    pfSense 2.2 is on top of FreeBSD 10.1

    The possible disk issues in the link were introduced in about FreeBSD 9.2

    So yes, some hardware might run pfSense 2.1.5 fine, but have a problem with pfSense 2.2 because of this change/regression in FreeBSD.


  • Netgate Administrator

    So I just updated my home box, which is running 32bit Nano, and failed to reboot with a similar error:

    (ada0:ata0:0:0:0): READ_DMA. ACB: c8 00 6e 7e 77 40 00 00 00 00 01 00
    (ada0:ata0:0:0:0): CAM status: Command timeout
    (ada0:ata0:0:0:0): Error 5, Retries exhausted
    ata0: DMA limited to UDMA33, controller found non-ATA66 cable
    

    Edit: Actually could be completely unrelated!

    The box I have, like the Alix, cannot boot if DMA is enabled and it looks like it might be. The meathod of disabling DMA changed in 2.2. It's in the upgrade guide which of course I only skimmed through!  ::)
    https://doc.pfsense.org/index.php/Upgrade_Guide#Disk_Driver_Changes

    Try interrupting the boot and entering that hint or adding it to /loader.conf.local so it will be copied across if you update again.

    I have several identical boxes running 2.2 (RC) without any issue so but all fresh installs.

    Steve


  • Rebel Alliance Developer Netgate

    We have tested quite a bit on ALIX 2D3 and 2D13 units but it's probable that the 2d2 board and/or BIOS are quirky in some other way.

    Disabling DMA would be the first step, then maybe write caching. The sysctl knobs changed for both.


  • Netgate Administrator

    I assumed that upgrading a 32-Nano (non-VGA) to 2.2 would carry the disabled DMA with it. Is that not the case?

    Steve


  • Rebel Alliance Developer Netgate

    @stephenw10:

    I assumed that upgrading a 32-Nano (non-VGA) to 2.2 would carry the disabled DMA with it. Is that not the case?

    The OIDs changed and few systems needed them, so they were omitted
    https://redmine.pfsense.org/issues/4203

    If it's a widespread issue we can look into some form of upgrade code for that on 2.2.1


  • Netgate Administrator

    Ah, OK. It's more than 12 days since I updated my old test box I guess that explains it. Or have they never been there? Hmm, I'll have to check.

    Steve


  • Rebel Alliance Developer Netgate

    @stephenw10:

    Ah, OK. It's more than 12 days since I updated my old test box I guess that explains it. Or have they never been there? Hmm, I'll have to check.

    The new IDs were never there on 2.2



  • Just an update. Still no luck and I have tried:

    1. Enabling/Disabling DMA
    2. Enabling/DIsabling Write Caching
    3. Enabling/Disabling ACPI

    Looks like the Alix2 series may not work with 2.2


  • Netgate Administrator

    @jimp:

    The new IDs were never there on 2.2

    Hmm, definitely have to check what I had running then.  ::)

    What's the difference between the 2D2 and 2D3 then? They look to be identical apart from the extra NIC. Are you running the latest bios?

    Steve


  • Rebel Alliance Developer Netgate

    Two things to look at on the unit are definitely:

    1. The BIOS version
    2. The BIOS options

    I don't recall all of the differences between the 2d2 and 2d3, PC engines has a list on their site somewhere.

    It could also be the CF card. Wouldn't hurt to try a fresh image on a new card.


  • Netgate Administrator

    @jimp:

    The new IDs were never there on 2.2

    The old DMA loader options were still there in June 2014 and working fine.
    Probably should have tested a newer snap huh.  ::)
    Edit: Sep 8th snap also boots fine. I guess those are before 10.1 dropped.
    Edit: Ok wait, I have a box here running a snap from Dec 3rd. Old loader options are still present in loader.conf. Boots fine.  :-\

    Apologies for going somewhat off topic.

    Steve


  • Rebel Alliance Developer Netgate

    The old loader options do nothing though, never have on 10.x. They were in the file but had no effect, since they referred to sysctl OIDs that no longer existed.


  • Netgate Administrator

    Hmm. Odd then. The Dec 3rd snap boots no problems and it's a 10.1 build. 2.2 release in the same box - no go.
    I thought I hadn't been quite as far behind the curve as June!  ;)

    Thanks Jim

    Steve



  • I am having the exact same problem with a VIA Eden-V4/C7 board  http://www.ibt.ca/v2/items/fwa7304/

    I tried to upgrade from 2.1.5 to 2.2 and this failed.  A fresh install also failed with the same READ_DMA errors that the OP had.  This unit also requires that DMA be disabled or the boot will hang.  I am running the 4G nanobsd image on a CF card.

    So it seems I am stuck on 2.1.5 until this gets resolved.


  • Netgate Administrator

    You can just add the loader values to disable DMA. For example:
    https://forum.pfsense.org/index.php?topic=20095.msg480824#msg480824

    Steve



  • I had the same issue with an ALIX 2D13. Upgrade from 2.1.5 to 2.2 would complete, but system would not boot up properly. I see the same error message from serial console. I am certain my issue was with the CF card. I was able to upgrade without issue after replacing the CF card. I wanted to be sure, I loaded the new CF card with 2.1.5, restored config, then upgrade to 2.2. Same process, works with the new CF card, which failed with the old CD card.

    new CF card is SanDisk Ultra 8GB 50MB/s, brand new.
    old CF card is SanDisk Ultra II 8GB 15MB/s, been using it for at least 2 years.

    I don't know what's the issue with the old CF card. I am able to format it in Windows and read/write to it. Not sure if there are any tools I can use to do error checking on it.



  • @stephenw10:

    You can just add the loader values to disable DMA. For example:
    https://forum.pfsense.org/index.php?topic=20095.msg480824#msg480824

    Steve

    Thanks stevenw10!!

    This worked perfectly.  Simply added the /boot/loader.conf.local with the DMA disable string before doing the upgrade via the web interface and everything went smoothly.



  • @robinxyz:

    I had the same issue with an ALIX 2D13. Upgrade from 2.1.5 to 2.2 would complete, but system would not boot up properly. I see the same error message from serial console. I am certain my issue was with the CF card. I was able to upgrade without issue after replacing the CF card. I wanted to be sure, I loaded the new CF card with 2.1.5, restored config, then upgrade to 2.2. Same process, works with the new CF card, which failed with the old CD card.

    new CF card is SanDisk Ultra 8GB 50MB/s, brand new.
    old CF card is SanDisk Ultra II 8GB 15MB/s, been using it for at least 2 years.

    I don't know what's the issue with the old CF card. I am able to format it in Windows and read/write to it. Not sure if there are any tools I can use to do error checking on it.

    I had a similar issue, tried a few times to upgrade from 2.1.5 to the 2.2.x versions and they all would fail with the same errors as earlier in the thread.  I did a clean install, changed the boot loader configs and still had issues where it wouldn't boot up.  Both of the cards tested were 2GB Sandisk rated at 15MB/s and regardless of the DMA & other settings from the upgrade docs they wouldn't boot more than half way failing with the same error.  Both cards worked fine with 2.1.5.  I ended up ordering a couple 8GB cards from amazon rated at the 50MB/s and now it's all going again.

    To test the upgrades (all the troubleshooting) and finally get it installed without interrupting my "users" too much, I setup a Virtual Machine for BSD.  I added a serial port and a second virtual nic and set the vm to use the CF card as a physical disk (after writing the image).  In windows I had to use a utility (Named Pipe TCP Proxy) http://shvechkov.tripod.com/nptp.html and Putty set to access the named pipe to configure the adapters, etc.  Once I got the virtual image up and running I backed up the config from my production box, and restored it to the VM, shutdown both, swapped the cards, and all is golden again.

    TL;DR the CF cards with slower speeds seem to have an issue with the 2.2.x updates on Alix platform.  Disabling DMA doesn't fix it.  Perhaps forcing a slower PIO mode might resolve it but I didn't have the patience to test it that far, a $10 upgraded CF card resolved the issue for me.  Still have DMA disabled.


  • Netgate Administrator

    Which 2.2.X version did you try? There were some changes to Nano that made some cards significantly slower.
    https://doc.pfsense.org/index.php/2.2.3_New_Features_and_Changes#Security.2FErrata_Notices

    Steve



  • Had the same issues with 2.2.1 through 2.2.4.  Only swapping to a newer card fixed it.  Like I said, maybe forcing the loader to use a slower PIO mode may have worked, but I was having entering commands using the serial console (for some reason only with 2.2.x) and loading the image in vmware to save/restore configs was making the troubleshooting a hassle.