PFSense 2.0 Beta5 1/19 build system locks up

  • I was running 2.0 built in 1/7/11 with no stability problems and then upgraded to the builds on 1/19/11 both builds caused stability issues where the system would lock up after a few hours.  After a reboot the system would work fine again for a while.  I'm fairly certain this is a bug in pfsense but I was asked to post this in the forums instead of in the bug tracker.

    Any idea what may have been happening at the time of the lockup?

    I'm running the i386 build from "Wed Jan 19 02:10:47 EST 2011" on my home router and it's been solid for around 24 hours now, and I beat it up pretty harshly.

  • The first two times I was out of the office and only a couple people were web browsing and receiving email (exchange server).  Today it locked up right after starting an ftp download.

    FTP may be somewhat more likely, there have been changes to the in-kernel FTP proxy over the past few days.

    Can you replicate it by accessing an FTP site? Or is it hit and miss?

    Someone else noticed that it might be two FTP connections to the same site one right after another that might be to blame

    Also, you can try to disable the FTP proxy and see if you can reproduce the lockup:

    System > Advanced, System Tunables page. Add one that sets debug.pfftpproxy to 1.

    One more thing to try if none of that helps, you could switch to a debug kernel:

  • I have had two lockups since the 1/19 update as well.  First lockups I have had since I start using the beta over a month ago.  I thought this was related to the PPTP VPN fix, but it was suggested that this was a separate issue.  I have not used FTP since this update and am still experiencing lock ups.

    Can you still try to disable the FTP proxy to see if it happens?

    You may not know you're using FTP, but there could be something running somewhere on your network that fetches files/updates/packages by FTP that isn't as obvious.

    OK, it's definitely FTP… I just reproduced it on my home router.

  • OK.  I set debug.pfftpproxy to 1 and I can no longer connect to a PPTP VPN.  If I set the flag to 0, I can connect, set it back to 1 and cannot connect.  I'm not sure why or how these are related, but they appear to be.

    I checked with Ermal (the author of the FTP and PPTP proxies) and he said at the moment that does disable them both, though it didn't seem to be entirely intentional.

    He's working on replicating the issue and fixing it now.

  • I've also had the lockup occur during pptp vpn.  I believe it was the pptp connections that triggered the other lockups i reported and not the web browsing.

    It may be then, but I have no PPTP activity on my network and yet it still locked up, and it locked up precisely when I did a LIST on an FTP site.

    A couple more patches hit the repo this afternoon so the next snapshot will be worth trying, but it probably won't be uploaded until later tonight/early tomorrow.

    FYI- On today's snap, I can no longer induce the lockup with FTP. Haven't tried PPTP yet though.

    I still get failed LIST command using passive FTP (and active even more so) but the router isn't seizing, which is a good thing…

  • I just updated to todays snapshot.  Should know within a couple of hours if I am still getting the PPTP lockup.

    Looks like there is a solid lead on the remaining FTP issue(s), so a fix should be coming along before too long.

  • I'm using 2.0-BETA5 (i386) built on Fri Jan 21 09:14:03 EST 2011 and still experiencing the lockups immediately after starting an ftp transfer using filezilla.

    I'm on the one from overnight:
    2.0-BETA5 (i386)
    built on Thu Jan 20 20:59:52 EST 2011

    Can you try to load that one and see if there is any difference?

    I tried both active and passive FTP and I was able to use both, but very spotty results.

    There are more FTP changes to come though, so it may not really be fixed quite yet. It probably won't be on a snap from today, but maybe one from tomorrow, depending on when the patch goes in.

  • Very UNHAPPY to report my first lockup with PFSENSE since we've been using it.

    Something is broken on today's snapshot. Had to power cycle to get it up again. One step forward two steps backward  ;D

    Does anyone know if this bug is replicated in the 64 version? We are going back to the 64 bit version on a backup box with a week old snapshot and not updating until a fix is confirmed.

  • I can try that specific build.  Is there a way to "upgrade" to it, or do i need to reflash? 
    another update, I'm still on 2.0-BETA5 (i386) built on Fri Jan 21 09:14:03 EST 2011 and the ftp issue seems to only occur when transferring multiple files.  Today I was able to transfer 226 files (out of a 5000 file batch) before it locked up.

    Just try the most recent build instead. A firmware update should work, no need to redo the whole thing.

  • I am also experiencing lockups, we do not use FTP, just normal ingress and egress traffic.  This happens on both cluster nodes will hang randomly, we are using an internal server for OpenVPN so just port forwards through pfsense.  I experienced this with 01-24.2011 snapshot, upgrading to today's 01-26-2011 and will let you know if this solves the problem.  The symptoms are things work great, then a box will just freeze, no console errors, just in hung state.  Are there logs I can look at or forward to the list?

    Not really, when it gets stuck there really isn't anything that can be done.

    FTP and PPTP are the main causes of the hang, or were. My router hasn't hung up in days (I'm on a snap from late on the 20th) The most recent snapshots should behave better all around, but there is still a panic out there that some people are hitting.

  • Ok, so now I have attempted to upgrade the master node to the latest snapshot, it seems stuck in the update process.  A ps -ef on the box shows:

      26  v0  Is+    0:00.01  sh /etc/rc autoboot
      254  v0  I+    0:04.28  /usr/local/bin/php -f /etc/rc.bootup
    10734  v0  SN+    0:00.50  /bin/sh /var/db/rrd/
    24545  v0  SN+    0:00.00  sleep 60
    26261  v0  S+    0:00.14  /usr/sbin/tcpdump -s 256 -v -l -n -e -ttt -i pflog0
    26317  v0  S+    0:00.17  logger -t pf -p
    46810  v0  I+    0:00.04  /usr/sbin/pkg_delete -x libwww-5.4.0_4
    47319  v0  I+    0:00.00  /bin/rm -r /var/db/pkg/libwww-5.4.0_4
    24777  0  R+    0:00.00  ps -ef
    55403  0  Is    0:00.00  -sh (sh)
    56612  0  I      0:00.00  /bin/sh /etc/rc.initial
    60290  0  S      0:00.02  /bin/tcsh

    It has been like this for an hour or so, it is still serving up 6 CARP vips, not sure if I should just reboot the box.  The slave node has not yet been upgraded.

    Looks like it's still running a package upgrade. Does the console not tell you what it's doing?

    (As an aside, on FreeBSD and thus pfSense, use ps uxawww)

  • yes my Linux underwear is showing :)  At any rate I hooked up a console to the master, hit enter a couple of times and the box automagically finished the upgrade process.  This box was not asleep as I could ssh into it, but only console activity made it actually complete.  Strange.  I am updating the slave node now.

  • I installed 2.0-BETA5 (i386) built on Wed Jan 26 10:45:46 EST 2011 yesterday afternoon.  By the time anyone tried to connect this morning it had locked up.  I'm not sure if this is caused by the FTP /PPTP issue or not but I will test ftp transfer yet this morning and see if we can trigger another lockup

  • I just tried ftp transfer again and was able to reproduce the lockup.

  • yes, lockup still persists with me as well.  The master node was up for around 10 hours last I checked and then locked up sometime during the night.  Thankfully the slave took over at that point, it has an uptime of around 18 hours right now.  Both have 2.0-BETA5 (i386) built on Wed Jan 26 09:44:03 EST 2011 installed.

    You might want to try the kernel that ermal made with a custom fix (in testing) for the Kernel Panic thread.

    cd /root/
    cp kernel_new.gz /boot/kernel/kernel.gz

    Then reboot.

  • doing that now, will also upgrade to Jan-27 snap

  • well ok, that kernel locks up on boot, last console message is "Vip14: 2 link states coalesced"  I am rebooting again

  • locks up on boot using new kernel, any way to get this back or is a reinstall necessary?

    You might be able to unplug all the network interfaces, boot it up, and then switch back to the SMP kernel:

  • ok, booting in safe mode I was able to get back to the smp kernel.  Does the new dev kernel rely on any other dependencies?  I am running Jan-26 snap, upgrading to Jan-27 right now.  As I understand in the other kernel panic thread this new kernel will put a crash dump for debugging on the filesystem, so it is probably worthwhile for me to do, just need to know the order in which to do things.  Thanks!

    Actually the crash dumps aren't in the current snapshots, they're in the ones building now (there was one more thing I had to fix)

    The kernel.gz referenced in the other thread doesn't require anything other than being on a recent snapshot.

  • ok, will wait for next snapshot then :)

  • Still freezes when I try to connect to FTP.

    2.0-BETA5 (i386)
    built on Thu Jan 27 20:55:04 EST 2011

  • Can you please try a snapshot from 28-JAN?

