PfSense 1.2.3 RC3 locking up at boot (ad0: TIMEOUT - WRITE retrying)



  • I finally got pfSense 1.2.3 installed on my 4GB Kingston CF using the 2GB nanobsd version (for whatever reason the 4GB version would not work for me).

    I configured pfSense and everything was working great, a large improvement over monowall (except the boot up time).  Then in the middle of the night the system must have rebooted and it never came online after that.

    So this morning I connected to the console a did a hard boot on the system to troubleshoot.  I came up with two things that I'm hoping someone can help me out with,

    1. I see a lot of "disk is dirty.  fsck -y"  errors.  I am just assuming this is a minor issue that gets taken care of when the gateway is improperly shut down.

    2. After pfSense loaded the captive portal, it stopped for a min and then came up with this error,
      ad0: TIMEOUT - WRITE retrying
      I waited 5 mins and nothing happened, so I rebooted the system and it came up again.  I then disconnected the vr0 (LAN) vr1 (WAN) cables and rebooted the system, and this time it worked.  Go figure?

    My issue is that I want to use this in a production environment at several location where there are no IT personnel and if any issue arises then the staff at the location are normally instructed to unplug and replug the gateway.  Which if it is timing out, wouldn't be any good.

    Any help would greatly be appreciated.



  • Alix box? Generally if you get a timeout on ad0, it's time to get a new CF card.



  • Yeah, it's an Alix box.

    I'm noticing a patter with pfSense and Kingston CF Elite cards.  I just tried to install pfSense on another box and am having mounting issues.

    In anycase, these are new cards and work well with Monowall.  Is there a way for me to test the integrity of the cards and, in the case there is a fault in the card, is there a way for me to mark bad sectors so they are not used?

    It looks like MicroDrives may be the better option, but I would love to maintain a low power consumption.



  • I have bunches of Alix's running off cheap CF cards. I assume you've got the latest BIOS, as nano is flakey with <99h. If those are fancy high-speed cards, you might want to try messing with the BIOS and turning off UDMA or somesuch.
    From the problem reports I've seen here with microdrives, I would personally stay far away. Only certain brands seem to work. That's just what I remember, you could search for more info.



  • You are right, I updated the BIOS to 0.99h otherwise the system wouldn't get past the boot option screen.  I guess I got the fancy high-speed cards, they are Kingston CompactFlash Elite Pro 133X (www.kingston.com/flash/cf_elite.asp).

    I'll try and disable UDMA, but there isn't really much else for me to try out in the almost feature-free BIOS.

    Funny how Monowall runs without a hitch, and pfSense has all these… quirks.  The cost of progress I suppose.



  • I've had some 133x rated cards and they didn't have problems. I was thinking of those 300x cards that say UDMA. How'd you write the image, physdiskwrite from a Windows box? I have some vague recollection of someone getting pre-formatted cards and they had to blow away the partition or format the card before writing the image.



  • @dotdash:

    How'd you write the image, physdiskwrite from a Windows box?

    Yeah, I used physdiskwrite in a Windows XP environment.  My card's didn't come pre-formatted, and I don't think I ever formatted them to FAT/32 or NTFS.  When I got them I used them for M0n0wall, but recently wanted to convert to pfSense.

    I'll get a cheap CF card and see how that works.  I'm not out too much since I got these CF cards from Woot (not refurbs).

    I still need to try turning off UDMA (if it is on), but I guess I'll try writing pfsense through Ubuntu and see if that works any different.  I'll post back what I find out.


  • Rebel Alliance Developer Netgate

    I believe pfSense on nano already sets the sysctls to disable DMA.



  • i encountered the same error and it is reproducible.
    my hardware is an alix 2d3 board and the software is pfsense 1.2.3-rc* on various cf-cards. always a new install.

    at first boot try to reassign the interfaces and (most times) you get that error.
    alix 2d3 has got vr0 - vr2. at bootup they are vr0 -> lan and vr1 -> wan. i tried to set vr0 -> wan and vr2 -> lan. it also happens with snapshots.
    there is no problem if i restore a config file and reboot (maybe because the settings are applied after the reboot).



  • @superwutze:

    i encountered the same error and it is reproducible.
    my hardware is an alix 2d3 board and the software is pfsense 1.2.3-rc* on various cf-cards. always a new install.

    at first boot try to reassign the interfaces and (most times) you get that error.

    1. I've got a half-dozen 2c3/2d3's running 1.2.3rc's and I have never seen that error. I've got a bunch of different brands of cheap CF cards: A-Data, Kingston, Trancend, Lexar.
    2. Why would it be assigning interfaces? Nano defaults to vr0 and vr1. A restore will prompt you to re-assign if the interfaces are different. The OP concerns nano, so if you are trying to run full off a CF you should not be jacking this thread.
      If you are having ad0 timeout errors on a a nano 1.2.3 rc build, post more details on your setup and what steps can reproduce the problem.


  • to get more into detail:

    1. write the image (of course embedded nano, why else would i write in this thread) to the cf
    2. insert cf into alix 2d3
    3. attach serial console
    4. boot alix2d3
    5. when the menu appears hit 1 and enter and reassign the interfaces
    6. in 9 out of 10 times that procedure leads (on various alix 2d3 (0.99h) with various cf-cards) to the mentioned 'ad0 timeout'. tested with 1.2.3-rc3 and various snapshots.

    or in short:
    at first boot try to reassign the interfaces and (most times) you get that error.


  • Rebel Alliance Developer Netgate

    I've tried this on my 2d3 a few times in a row and had no problem. No errors at all.

    And this is using a well-worn Sandisk 4GB CF running a 2GB nano image.



  • I tried this several times on a 2d13 and couldn't reproduce the error. I even grabbed a 2c3 that came in for a refresh and tested it, after updating the BIOS to 99h. No problems. I used yesterday's snap of 1.2.3, but I don't think anything has changed for a week or so.
    I followed your procedure and on re-assign tried reversing the order, just adding vr2 as OPT, etc. Didn't get any errors.
    For reference, here are my BIOS settings (mostly default)
    PC Engines ALIX.2 v0.99h
    640 KB Base Memory
    261120 KB Extended Memory

    01F0 Master 044A CF 1GB
    Phys C/H/S 1966/16/63 Log C/H/S 983/32/63

    BIOS setup:

    9 9600 baud (2) 19200 baud (3) 38400 baud (5) 57600 baud (1) 115200 baud
    C CHS mode (L) LBA mode (W) HDD wait (V) HDD slave (U) UDMA enable
    (M) MFGPT workaround
    (P) late PCI init
    R Serial console enable
    (E) PXE boot enable
    (X) Xmodem upload
    (Q) Quit

    @superwutze:

    (of course embedded nano, why else would i write in this thread)

    No offense to you, but if you read through the forum, you will see that it is unwise to assume a newbie user knows where to post their question, forum etiquette, etc. That's why I asked.



  • @dotdash:

    No offense to you, but if you read through the forum, you will see that it is unwise to assume a newbie user knows where to post their question, forum etiquette, etc. That's why I asked.

    no offense taken, and no offense to all of you either, but for my last reply i tried it once more and the error was there again. i'll give it some in-depth research next week when i'm again at the office. there are still some variables to investigate (i.e. i always had vr0 and vr1 connected to the same switch). i'll post any suspicions.



  • I have what sounds like the same Kingston CF, and using the 4 GB image works perfectly. Not to mention about 40 other various CF cards. The looping fsck is bad, that would indicate file system corruption. I've only seen that on images where the build process went awry, that hasn't happened for months though.


Log in to reply