HUGE BUG(s) in "Developers" package, bricks system + some CD installer problems



  • Why hello there, pardon my use of CAPS, but this is a HUGE problem.

    Tonight, I decided to switch from Smoothwall back to pfSense, because Smoothwall is so god-forsaken unstable on my system - at least, the WAN (or as they call "red") is. It wouldn't stay connected for more than 24 hours and required a power cycle of the modem (no other solution would work, wtf?) to get back online. I think it was crashing my modem somehow, maybe with its PPPoE implementation.

    Anyway, before today's install, I was using pfSense on the Embedded platform, because I wanted to do without the power-guzzling unnecessary video card, and I liked the idea of using a CF card. Now, after having tried Smoothwall and dedicating a 20gb hard drive to the "box", I've moved on to an embedded-standard hybrid, keeping the headless functionality but losing the CF restrictions (lack of packages, in particular).

    I installed the Embedded kernel during setup. I had the channels backwards, I was too lazy to switch them back and I figured, like Windows, the boot loader should understand the new drive assignment. Nope! The first problem I had was that it couldn't find the Root partition, and to tell it where to find it. Huh? I had no idea. But after some trial and error, I figured out that ad0s1a is the right partition assignment for its new location (while ad2s1a was what the installer wrote in). Then was the quest of figuring out where the fuck to change that option, because evidently (after the second reboot) the "working combination" wasn't written back to a configuration file. I solve that problem by changing /etc/fstab (which I had no idea existed, I had to Google for various things to figure it out) to point to the right assignments.

    Finally, everything's up and working. I restore my configuration XML from the old CF card installation (which I had previously booted and backed up). After the unexpected reboot - normally "may need to be rebooted" doesn't mean "will reboot immediately" - it came back and I went for the package manager. I installed Squid right off the bat. It worked OK, though I was concerned about what would happen if the HTTP connection to the status-page were to break. OK, I got lucky that time.

    I went for the Developer package so I could try installing/building/running things that the developers of pfSense seem to religiously abhor - like maybe Samba or a BitTorrent client, which would fit nicely on my network (where "speed and functionality" takes a very high priority over "security", it's a home network typically run by a Netgear router for f*ck's sake!). So I click it to have it installed. It starts installing… very, very slowly. It stops at this line and takes forever to do whatever it's doing (which I'm guessing is downloading):

    Downloading package configuration file... done.
    Saving updated package information... done.
    Loading package configuration... done.
    Configuring package components...
    	Additional files... developer_pkg.tgz 0...100%
    

    (0…100% - that is, it moved, slowly, though the whole percentage)
    IMMEDIATELY when it hit 100%, my pfSense box went apeshit! I keep the serial console open and, well, all hell broke loose.

    # pid 329 (logger), uid 0: exited on signal 10 (core dumped)
    pid 1815 (pinger), uid 62: exited on signal 10
    pid 726 (routed), uid 0: exited on signal 10 (core dumped)
    pid 232 (sshlockout_pf), uid 0: exited on signal 10 (core dumped)
    pid 507 (php), uid 0: exited on signal 10 (core dumped)
    pid 970 (check_reload_status), uid 0: exited on signal 10 (core dumped)
    pid 4320 (sleep), uid 0: exited on signal 10 (core dumped)
    pid 501 (php), uid 0: exited on signal 10 (core dumped)
    pid 721 (dhcpd), uid 1002: exited on signal 10
    pid 1437 (sh), uid 0: exited on signal 10 (core dumped)
    pid 4343 (sleep), uid 0: exited on signal 10 (core dumped)
    pid 938 (sh), uid 0: exited on signal 10 (core dumped)
    pid 820 (miniupnpd), uid 0: exited on signal 10 (core dumped)
    pid 973 (minicron), uid 0: exited on signal 10 (core dumped)
    pid 802 (ntpd), uid 0: exited on signal 10 (core dumped)
    pid 801 (ntpd), uid 123: exited on signal 10
    pid 2242 (tcsh), uid 0: exited on signal 10 (core dumped)
    pid 2226 (sh), uid 0: exited on signal 10 (core dumped)
    pid 2225 (sh), uid 0: exited on signal 10 (core dumped)
    pid 1795 (login), uid 0: exited on signal 10
    pid 4360 (sleep), uid 0: exited on signal 10 (core dumped)
    pid 1675 (sh), uid 0: exited on signal 10 (core dumped)
    pid 549 (dnsmasq), uid 65534: exited on signal 10
    pid 822 (cron), uid 0: exited on signal 10 (core dumped)
    /libexec/ld-elf.so.1: /lib/libc.so.6: invalid file format
    pid 231 (sshd), uid 0: exited on signal 10
    pid 493 (php), uid 0: exited on signal 10 (core dumped)
    

    It got stuck at various points where I was about to hit Reset. Finally, it seemed to permanently hang at "pid 822 (cron)…", at which point I tried to hit Power to shut it off, and when that failed, I hit Reset. I looked back at my screen to see those last three lines on the screen. When it finally "rebooted" (note the quotes), I was greeted with this bit of text after about half the boot process:

    
    ad0: DMA limited to UDMA33, device found non-ATA66 cable
    ad0: 19881MB <maxtor 6e020l0="" nar61590=""> at ata0-master UDMA33
    Trying to mount root from ufs:/dev/ad0s1a
    WARNING: / was not properly dismounted
    /libexec/ld-elf.so.1: /lib/libc.so.6: invalid file format
    Enter full pathname of shell or RETURN for /bin/sh:</maxtor>
    

    YES! It corrupted my hard drive! This isn't the second time I've encountered unbelievably touchy hard drive corruption either - shut down a Smoothwall box without shutting down cleanly and guess what? It becomes corrupted. Same here I guess - instant corruption, just add reset button. What gives?

    So….. given the sequence of events past and present, I find the nearest item in my room and start beating the shit out of it. WHY! CAN'T! WINDOWS! BE! A! GATEWAY! BOX! Windows is so STABLE, so CLEAN, so RELIABLE, and yet NOBODY'S written any SOFTWARE to make it a full featured internet gateway! Those were just a few of the things I shouted at the top of my lungs while losing the last remaining grip on my sanity... plus, I'm getting over some generic illness that's already got me miserable. And it's 4am. And tomorrow there'll be at least 2 people wondering why their internet "is so slow" or "not working" like usual.

    Okay... so I tear my pfSense box out of its cozy corner, slam a video card back in, grudgingly put my last remaining pot-tweaked-to-barely-read-CDRW's CD-ROM drive back in, and boot that god forsaken LiveCD again. The first time around, I let it load then hit the Shell key. The second time around I just Ctrl+C'd the pfSense loading procedure, which seemed to work great and dumped me at a "#" prompt. This is a re-enactment of what followed:

    # ls /dev
    lots    of    stuff
    ad0    more  stuff
    ad0s1  more  stuff
    ad0s1a more  stuff
    ad0s1b more  stuff
    ad0s2c more  stuff
    # fsck --help
    Unknown switch - --. LOL you are so not smart, figure out this list of syntax goo for yourself
    Usage: fsck [-asdflksdgkjfhdsgldsfjhsdklfjb] [device]
    Good day to you!
    # fsck /dev/ad0s1a
    Checking /dev/ad0s1a...
    Oops! It looks like this file is messed up. Fix it? [y/n] y
    Oops! It looks like this file is messed up. Fix it? [y/n] y
    Oops! It looks like this file is messed up. Fix it? [y/n] y
    Oops! It looks like this file is messed up. Fix it? [y/n] y
    Oops! It looks like this file is messed up. Fix it? [y/n] y
    Oops! It looks like this file is messed up. Fix it? [y/n] y
    Oops! It looks like this file is messed up. Fix it? [y/n] y
    Oops! It looks like this file is messed up. Fix it? [y/n] y
    Something involving your bitmap is messed up. Fix it? [y/n] y
    Something involving blocks is messed up. Fix it? [y/n] y
    Something involving something else I didn't even bother reading is messed up. Fix it? [y/n] y
    OK, all fixed, and I told your drive it's clean too. Have fun, and good luck next time!
    # mount --help
    Lol Wut? Go fsck yourself.
    Usage: mount [here] [there]
    # mount /dev/ad0s1a /mnt
    # pwd
    /root
    # cd /var
    # cp *.* /mnt/var
    # reboot
    
    

    Without MAN pages, it makes life a living hell. I can't even –help these things? What gives? I just had to "wing it" on almost all those commands, guessing what their syntax is.

    Issue is, I shouldn't have to dig into and recover my system from a failed package install that seemed to be an innocent install link. Especially something that's beta, or unsupported, or anything like that... it should be MUCH more careful, i.e. extract to a temporary folder, then move if it thinks it's safe, back up changes and fall back in case of failure, etc... and above all, why doesn't the computer run fsck during boot if it detects the drive was FSCK'D UP? I've encountered that on both the Smoothie and pfSense now!

    Oh... and after restarting Firefox on my PC and reloading my tabs, it started the whole process over again because the package-to-be-installed is stored in the URL. So guess what? After fixing my pfSense box, I get raped again. It starts downloading the package and I know there's no way I can stop it (hitting stop will just break the HTTP connection, and with no way to stop it, I'm just screwed). I wait for the inevitable and watch the serial console. Sure enough, I verify it's reproducable... just set up a pfSense box with the Embedded kernel and try installing the Developers package. Works every time.

    Needless to say I'm not just bitching about shit that doesn't work. I'd like to offer my help in testing patches or fixes or whatever. Or providing my input, trying things on my system, etc. Usually when I write a long, bitchy post like this, I get nothing but "nobody's forcing you to use pfSense" (which is a lie and a totally bogus statement, since the whole POINT of freeware is to be better than the "other guys", get lots of users, and be the overall attractive option... anywho). I'd like to help SOLVE these problems, not just for me, but for other people that are less vocal about their issues.

    So anywho, thanks for reading. I'm still enjoying pfSense as long as I can avoid the flame-filled IRC channel and that darn "Developers" package ;)



  • What i can recommend you if you do not want to go through this hassle is use prebuild packages.
    Take a look at pkg_add -r option and if you feel you need a man page go to http://www.freebsd.org/cgi/man.cgi that should save you from your linuxism ;) of using –help which in the BSD world is just -h (as help) :S.

    Anyway good to hear you weren't disappointed and stayed in the ship.

    Just some thought to just let people know how to avoid some headaches, same as your post.

    Regards.


Locked