PfSense Crashed



  • Here's the crash report, got a call today after a year of stable operation (ish) that the network I have a 2.1 beta pfSense on was down. Here's the crash report, any ideas? Thanks!

    Crash report begins.  Anonymous machine information:
    
    i386
    8.3-RELEASE-p7
    FreeBSD 8.3-RELEASE-p7 #1: Thu Apr 25 21:20:25 EDT 2013     root@snapshots-8_3-i386.builders.pfsense.org:/usr/obj.pfSense/usr/pfSensesrc/src/sys/pfSense_SMP.8
    
    Crash report details:
    
    Filename: /var/crash/bounds
    1
    
    Filename: /var/crash/info.0
    Dump header from device /dev/ad4s1b
      Architecture: i386
      Architecture Version: 1
      Dump Length: 71680B (0 MB)
      Blocksize: 512
      Dumptime: Fri Apr 26 08:06:13 2013
      Hostname: glacierfire.glaciercamp
      Magic: FreeBSD Text Dump
      Version String: FreeBSD 8.3-RELEASE-p7 #1: Thu Apr 25 21:20:25 EDT 2013
        root@snapshots-8_3-i386.builders.pfsense.org:/usr/obj.pfSense/usr/pfSensesrc/src/sys/pfSense_SMP.8
      Panic String: sbdrop
      Dump Parity: 2828757601
      Bounds: 0
      Dump Status: good
    
    Filename: /var/crash/minfree
    2048
    
    Filename: /var/crash/textdump.tar.0
    ddb.txt���������������������������������������������������������������������������������������������0600����0�������0�������140000������12136504725�  7077� �����������������������������������������������������������������������������������������������������ustar���root����������������������������wheel������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������db:0:kdb.enter.default>  run lockinfo
    db:1:lockinfo> show locks
    No such command
    db:1:locks>  show alllocks
    No such command
    db:1:alllocks>  show lockedvnods
    Locked vnodes
    db:0:kdb.enter.default>  show pcpu
    cpuid        = 0
    dynamic pcpu = 0x312b80
    curthread    = 0xc70178a0: pid 0 "em1 taskq"
    curpcb       = 0xed99fd80
    fpcurthread  = none
    idlethread   = 0xc6d94000: tid 100006 "idle: cpu0"
    APIC ID      = 0
    currentldt   = 0x50
    db:0:kdb.enter.default>  bt
    Tracing pid 0 tid 100061 td 0xc70178a0
    kdb_enter(c0f8d1f0,c0f8d1f0,c0f93959,ed99f99c,0,...) at kdb_enter+0x3b
    panic(c0f93959,c78eda00,0,0,0,...) at panic+0x102
    sbdrop_internal(c7b33284,185,313,1,ed99f9e8,...) at sbdrop_internal+0x27e
    tcp_do_segment(c80e5d20,34,35,0,2,...) at tcp_do_segment+0x19a7
    tcp_input(c7f8e700,14,c702c800,1,0,...) at tcp_input+0xd17
    ip_input(c7f8e700,c7f8e700,10,c7f73e00,100,...) at ip_input+0x13a
    netisr_dispatch_src(1,0,c7f8e700,ed99fbcc,c0b5cc6f,...) at netisr_dispatch_src+0x71
    netisr_dispatch(1,c7f8e700,0,ed99fbdc) at netisr_dispatch+0x20
    ether_demux(c702c800,c7f8e700,3,0,3,...) at ether_demux+0x19f
    ether_input(c702c800,c7f8e700,c776f100,0,0,...) at ether_input+0x174
    lem_rxeof(0,c7065100,ed99fca0,c7065100,ed99fc90,...) at lem_rxeof+0x20d
    lem_handle_rxtx(c7072000,1,ed99fccc,c0aa32a5,c7065100,...) at lem_handle_rxtx+0x4f
    taskqueue_run_locked(c7065100,c7065118,c0f826cf,0,ed99fcf0,...) at taskqueue_run_locked+0x6d
    taskqueue_thread_loop(c70765fc,ed99fd28,30646870,0,c70765fc,...) at taskqueue_thread_loop+0x44
    fork_exit(c0ad7d00,c70765fc,ed99fd28) at fork_exit+0x87
    fork_trampoline() at fork_trampoline+0x8
    --- trap 0, eip = 0, esp = 0xed99fd60, ebp = 0 ---
    
    


  • Please try a snapshot coming today later on.



  • @ermal:

    Please try a snapshot coming today later on.

    Okay thanks because not only has it apparently crashed, pfSense is also totally non-functional up there. I can get in from my end fine, but no luck for them!



  • yeah .. mine is in a crash loop also … I will have to rebuild when I get home. Running a basic config off an old CD.

    Update: Meant to add that it seems to be only the i386 machines that are affected.



  • You running CP by any chance?



  • I am not running CP. I am running squid, rrd summary, git and pfblocker. I am also running traffic shaping and powerd. I just completed an upgrade from Last friday to this morning's latest.



  • I'm not running CP either. I got the system up and running by rolling back to a snap from a few days ago. When it wasn't crashed but not working I had a large number of "No Traffic" states.



  • got the same problem here with that snapshot (i386) and not running anything special - just basic config + OpenVPN.



  • I'm seeing crashed on an AMD64 machine as well.



  • Yep. crashed here too after upgrading at 5PM EST.



  • Mee too!!

    from 2.1-BETA1 (i386) built on Thu Apr 25 09:08:19 EDT 2013
    to 2.1-BETA1 (i386) built on Thu Apr 25 20:52:41 EDT 2013

    stays up for about 4 minutes then crash/automatic reset.

    Crash dump submitted from gui, hope it got there.

    Thanks for pfsense!



  • same here.

    2.1-BETA1 (i386)
    built on Thu Apr 25 20:52:41 EDT 2013
    FreeBSD 8.3-RELEASE-p7

    reboots about every 5-10 minutes.

    I happened to catch this message right before reboot in case it matters:

    panic: hfsc_dequeue

    I also submitted a crash report.



  • I noticed some stability issues few weeks back and made a post but every one was saying there was no problem…. don't know if it was related to the OP issues or the problems others are having here.

    Different symptoms, but couple of weeks back it locked up 3 or 4 times over the course of 2 weeks or so. Knock on Wood it has since been stable other than stuck wireless beacon and errors which at some point I'll try to figure out what the deal is with that. The wireless is functioning with the errors but when the beacon gets stuck it requires a reboot.



  • I'm seeing very similar crashes (and one complete freeze with only "em0: discard frame without header" on the console) with the Thu Apr 25 21:20:47 EDT 2013 build, which at the moment is the latest one. I just updated from a version from Tuesday where I did not see that problem. Crashes uploaded from GUI.

    /wj



  • Another failure mode is the text "ipsec_filter: m_pullup failed" on the console and then a hard hang, no crash or reboot. There seems to be something seriously wrong with this kernel.

    /wj



  • I'm getting the same failure as well, but there is not auto reboot. The system just hangs after a few hours and I have to manually reset it using the hw reset switch



  • I'm guessing at this point we can be pretty confident something just went awfully, horribly wrong :) Thanks pfSense team for all your hard work, it's times like these that we appreciate how much goes into making a project like this happen!



  • Same here… from an Alix board, looking at the console, the system is locked in boot-looping, restarting at some ramdom points... :S



  • Ok, I post just to help someone in the case. The easiest way I found to recover pfSense to a working state was restoring a full backup from an image I made some time ago.
    Just connect to the console, control+break the system while it is initializing, and run:

    /etc/rc.restore_full_backup /root/pfSense-full-backup-20130219-2240.tgz
    (just change pfSense-full-backup-20130219-2240.tgz with the filename of the full backup).

    Then, just wait for the process to complete and when it's done run: reboot

    Finally, you can restore the latest configuration from the webadmin interface.

    Ciao,
    Michele



  • crashes here also and i guess no new snaps also being generated as builder seems down



  • I was also stuck in a panic loop on the latest snapshot.  Disconnecting from external networks prevented random rebooting as the panics were on network events.  I am rolling back at present to pfSense-Full-Update-2.1-BETA1-i386-20130420-1706 hoping it's stable.

    A crash dump was successfully saved and has been submitted to the developers.



  • For mine, I had same panics on network activity. It would not stay up long enough to download a previous firmware version and apply it before it crashed.

    It would also not recover from what I thought I had as a full backup.

    I finally managed it by a) downloading the firmware to a webserver nearby. I also had to boot the pfsense box with option 2 disabling ACPI.

    I then specified to upgrade firmware by URL and entered the nearby server. Back online now.

    There may have been an easier way of doing it but thats what worked for me.

    pfsense still rocks!



  • @monkfish:

    pfsense still rocks!

    Always rocks! ;)



  • found myself today with Alix crashing after the upgrade

    for those who run 'embedded', just switch to the second slice, which should hold the previous build installed.

    and yeah… pfSense rocks  ;D



  • Yeah… it rocks... most of the time. Yesterday's build though sucks rocks.

    What happened? Must have been something pretty fundamental, but nothing significant shows up in commit history!



  • @jcyr:

    Yeah… it rocks... most of the time. Yesterday's build though sucks rocks.

    What happened? Must have been something pretty fundamental, but nothing significant shows up in commit history!

    While I certainly share your frustration as many of us are running 2.1 nightlies on production systems for driver reasons, I think saying it "sucks rocks" is unfair. These are nightlies we're running. They're not even alphas (yes, I know they're labeled "beta" but they're not betas in the traditional sense of a beta that's been slightly tested and expected to mostly work - they're automatic nightly builds). Nightlies can be broken in very fundamental ways, it's the nature of software development.

    The pfSense team is doing great work, and I'm especially appreciative of the members of the team who are friendly and helpful on these forums to all of us running pfSense. When 2.1 is out, or even an actual beta/RC, then we'll have builds we can use and not run the risk of them fundamentally trashing the whole system. Until then, they're nightlies, for better or for worse. I'm glad the team's made them available for widespread community testing.



  • @mdima:

    Ok, I post just to help someone in the case. The easiest way I found to recover pfSense to a working state was restoring a full backup from an image I made some time ago.
    Just connect to the console, control+break the system while it is initializing, and run:

    /etc/rc.restore_full_backup /root/pfSense-full-backup-20130219-2240.tgz
    (just change pfSense-full-backup-20130219-2240.tgz with the filename of the full backup).

    Then, just wait for the process to complete and when it's done run: reboot

    Since I always check the "make full backup" box when upgrading, that was an almost painless process to get things back working.
    Almost painless, because I didn't know how, until I read this here, but of course, my net was down, so…

    ...and also because my box needs to be opened up and a ribbon connector must be plugged onto the MB for me to hook up screen and keyboard, because that's just an "for install only" afterthought. Time to make a mod to the case, maybe.

    In any case, things are up and running again, as this post shows. Will someone deposit a message here when a new build is available that works?
    Given the hoops I have to jump through when things are busted in this particular way, I'm not keen on installing random builds until I know this particular issue is fixed.



  • @markuhde:

    While I certainly share your frustration as many of us are running 2.1 nightlies on production systems for driver reasons, I think saying it "sucks rocks" is unfair. These are nightlies we're running. They're not even alphas (yes, I know they're labeled "beta" but they're not betas in the traditional sense of a beta that's been slightly tested and expected to mostly work - they're automatic nightly builds). Nightlies can be broken in very fundamental ways, it's the nature of software development.

    The pfSense team is doing great work, and I'm especially appreciative of the members of the team who are friendly and helpful on these forums to all of us running pfSense. When 2.1 is out, or even an actual beta/RC, then we'll have builds we can use and not run the risk of them fundamentally trashing the whole system. Until then, they're nightlies, for better or for worse. I'm glad the team's made them available for widespread community testing.

    Lighten up guys. My comment was only an indication of surprise at the catastrophic outcome in applying this last nightly 'alpha', 'beta' or whatever. I had been lulled into complacence by the general quality of these releases and was taken off guard by this one.

    One thing that would have facilitated recovery is a boot option allowing the selection and re-installation of one of the previously created backup tarballs. Sure its easily done manually, but it's difficult to look up how when you've lost all Internet connectivity.



  • yes, let's relax a bit everybody… we all know that all the pfSense guys are doing an AWESOME job, just let's us all remember that version 2.1 is still in the developing phase. For example, I run it at home, but in the office I run the stable 2.0.X version, and it's rock solid.

    Since version 2.1 is still in developing, this can happen, we all have to remember it on each update, and have a backup plan, like trying to restore a previous config or full backup when things are working (because when things are not working could be too late) or have a stable backup disk/flash/microdrive to use in case of emergency or have two or more firewalls to update in different times, and so on.

    Btw, let's wait for the pfSense staff to produce a stable snapshot with all the latest updates and fixes.

    Michele



  • @mdima:

    Since version 2.1 is still in developing, this can happen, we all have to remember it on each update, and have a backup plan, like trying to restore a previous config or full backup when things are working (because when things are not working could be too late) or have a stable backup disk/flash/microdrive to use in case of emergency or have two or more firewalls to update in different times, and so on.

    No I totally agree that's all I meant, in a very light-hearted way :D



  • @mdima:

    Since version 2.1 is still in developing, this can happen, we all have to remember it on each update, and have a backup plan, like trying to restore a previous config or full backup when things are working (because when things are not working could be too late) or have a stable backup disk/flash/microdrive to use in case of emergency or have two or more firewalls to update in different times, and so on.

    Fully understood, but also one of the reasons why it would be great if the non-embedded version of pfSense could also have two slices, or a recovery partition that one can boot into to restore a previous backup, maybe even automatically if a system crashes more than X times within timespan t.

    Otherwise, particularly if a unit is in a remote location, it can be a true PITA to even restore a backup that does exist.



  • I just upgraded 3 instances to the Thursday build, and I've started experiencing this as well.

    Theyre all configured differently, and the only things in common are

    • OpenVPN client export
    • iPerf
    • They have OpenVPN configured (but disabled on one of them)

    It seems that messing with the firmware upgrade page is a sure-fire way to trigger a crash.  One of them seems stable as long as I dont log into it.

    The problem only started happening after a few hours of working on (2 of) the boxes, then all of a sudden all 3 started acting unstable.  I dont think its just the webGUI, I had a number of crashes while launching daemons from SSH as well.  Right now i have a ping -t running against one and I have ceased attempting to log into the GUI, and it seems stable.

    Not sure how helpful that is.



  • Um, me again…

    I think anybody chosing to run the DEVELOPMENT version should be doing so in the strict knowledge and understanding that there could be issues. I do.

    Ensure your backup/DR/contingency plan is robust, thats basic. Carry out own testing before releasing to a production or working rig, that's basic as well.

    I dont think its right however "light-hearted" it seems to criticise the project for a broken DEVELOPMENT version. But there we go. Thats my opinion.

    Constructively - would the developers perhaps consider PULLING the broken release to negate more people being affected?



  • In my case it doesn't reboot: it simply freezes and refuses to accept any input. Strangely, it was working fine yesterday. Not that it matters much: I've been using it on my secondary (backup) firewall to provide IPv6 connectivity. My primary one is still on 2.0.X.

    It probably would be an idea to pull the broken release.



  • Re-installed image from Mon Apr 22 04:52:47 EDT 2013  … to get back online for now.
    That works so far  ;)

    I will follow this thread to see when it is safe to do another update to the latest snapshot.

    Thanks, Stefan



  • 24th april snaps r safe, im using it from past few days



  • @monkfish:

    I dont think its right however "light-hearted" it seems to criticise the project for a broken DEVELOPMENT version. But there we go. Thats my opinion.

    Constructively - would the developers perhaps consider PULLING the broken release to negate more people being affected?

    Umm, I DIDN'T criticise it (since I was the one who said what I said was light-hearted). I was saying how great pfSense was. I was light-heartedly criticising those of us who got burned by this snap :) (I'm in that group, the snaps have been so close to production-ready that I thought nothing of clicking the upgrade button. My fault, it's development)



  • In my case, attempting to upgrade thru the GUI triggered reboots 90% of the time.  This should fix the problem if anyone has run into it (run from SSH / console)

    For i386:

    8
    fetch http://snapshots.pfsense.org/FreeBSD_RELENG_8_3/i386/pfSense_HEAD/updates/pfSense-Full-Update-2.1-BETA1-i386-20130423-1530.tgz
    exit
    13
    2
    /root/pfSense-Full-Update-2.1-BETA1-i386-20130423-1530.tgz
    y
    
    

    For amd64:

    8
    fetch http://snapshots.pfsense.org/FreeBSD_RELENG_8_3/amd64/pfSense_HEAD/updates/pfSense-Full-Update-2.1-BETA1-amd64-20130423-0841.tgz
    exit
    13
    2
    /root/pfSense-Full-Update-2.1-BETA1-amd64-20130423-0841.tgz
    y
    
    

    This has been tested on a remote system over SSH, and works fine.

    I should add that if you are on the affected version from thursday, you need to downgrade ASAP; on one of my (virtual) systems the problem progressed until the vm no longer booted up.  It seems to get worse, probably due to repeated dirty unmounts of the filesystem.



  • @limecat

    I can confirm the "getting worse" part, and that the GUI access makes it even less stable.
    However there seem to be other things, maybe VPN related, that made my system not stable enough to be recoverable from the CLI, because by the time it was up and I did an slogin, it was about to crash.

    So the only way I could do this was to disconnect all network cables (physically), so there were no network events (packets, pings, VPN down, etc.) and then do a restore of a full backup from the physical console, which was a bit tricky due to the kind of hardware I use. (No keyboard and video port on the outside of the case).

    So your suggestion may not work for everyone.



  • If you are in that state where it wont boot, this should work.  I have used it to remotely restore 3 boxes so far, and it seems to work well.

    Get an ISO of a "good" version (2.0.3, 2.1 as of april 23).  Boot up to it, and select "recovery".  Pick your drive, and continue.
    You will need to re-assign your interfaces to their adapters, dont worry about getting all of them correct as we will restore the config.
    Once you are at the standard "menu", run the following:

    8
    cp /tmp/hdrescue/cf/conf/config.xml /cf/conf/config.xml
    cp /tmp/hdrescue/cf/conf/config.xml /conf/config.xml
    rm /tmp/config.cache
    exit
    

    Your config should now be loaded.  Manually assign the proper IPs to your interfaces, and you should have proper web-gui access again.  Log in, and make a backup of your config.

    Continue with the installation, which should preserve your now in-memory configuration.

    I HIGHLY recommend that you A) confirm that the downloaded configuration is correct and that B) cat /cf/conf/config.xml shows your configuration.  Make SURE you have backups before proceeding with the install, which will involve unmounting and wiping your existing partition.


Log in to reply