PFSense 2.0 Beta5 1/19 build system locks up
-
Not really, when it gets stuck there really isn't anything that can be done.
FTP and PPTP are the main causes of the hang, or were. My router hasn't hung up in days (I'm on a snap from late on the 20th) The most recent snapshots should behave better all around, but there is still a panic out there that some people are hitting.
-
Ok, so now I have attempted to upgrade the master node to the latest snapshot, it seems stuck in the update process. A ps -ef on the box shows:
PID TT STAT TIME COMMAND
26 v0 Is+ 0:00.01 sh /etc/rc autoboot
254 v0 I+ 0:04.28 /usr/local/bin/php -f /etc/rc.bootup
10734 v0 SN+ 0:00.50 /bin/sh /var/db/rrd/updaterrd.sh
24545 v0 SN+ 0:00.00 sleep 60
26261 v0 S+ 0:00.14 /usr/sbin/tcpdump -s 256 -v -l -n -e -ttt -i pflog0
26317 v0 S+ 0:00.17 logger -t pf -p local0.info
46810 v0 I+ 0:00.04 /usr/sbin/pkg_delete -x libwww-5.4.0_4
47319 v0 I+ 0:00.00 /bin/rm -r /var/db/pkg/libwww-5.4.0_4
24777 0 R+ 0:00.00 ps -ef
55403 0 Is 0:00.00 -sh (sh)
56612 0 I 0:00.00 /bin/sh /etc/rc.initial
60290 0 S 0:00.02 /bin/tcshIt has been like this for an hour or so, it is still serving up 6 CARP vips, not sure if I should just reboot the box. The slave node has not yet been upgraded.
-
Looks like it's still running a package upgrade. Does the console not tell you what it's doing?
(As an aside, on FreeBSD and thus pfSense, use ps uxawww)
-
yes my Linux underwear is showing :) At any rate I hooked up a console to the master, hit enter a couple of times and the box automagically finished the upgrade process. This box was not asleep as I could ssh into it, but only console activity made it actually complete. Strange. I am updating the slave node now.
-
I installed 2.0-BETA5 (i386) built on Wed Jan 26 10:45:46 EST 2011 yesterday afternoon. By the time anyone tried to connect this morning it had locked up. I'm not sure if this is caused by the FTP /PPTP issue or not but I will test ftp transfer yet this morning and see if we can trigger another lockup
-
I just tried ftp transfer again and was able to reproduce the lockup.
-
yes, lockup still persists with me as well. The master node was up for around 10 hours last I checked and then locked up sometime during the night. Thankfully the slave took over at that point, it has an uptime of around 18 hours right now. Both have 2.0-BETA5 (i386) built on Wed Jan 26 09:44:03 EST 2011 installed.
-
You might want to try the kernel that ermal made with a custom fix (in testing) for the Kernel Panic thread.
cd /root/ fetch http://files.pfsense.org/kernel_new.gz cp kernel_new.gz /boot/kernel/kernel.gz
Then reboot.
-
doing that now, will also upgrade to Jan-27 snap
-
well ok, that kernel locks up on boot, last console message is "Vip14: 2 link states coalesced" I am rebooting again
-
locks up on boot using new kernel, any way to get this back or is a reinstall necessary?
-
You might be able to unplug all the network interfaces, boot it up, and then switch back to the SMP kernel:
http://doc.pfsense.org/index.php/Switching_Kernels
-
ok, booting in safe mode I was able to get back to the smp kernel. Does the new dev kernel rely on any other dependencies? I am running Jan-26 snap, upgrading to Jan-27 right now. As I understand in the other kernel panic thread this new kernel will put a crash dump for debugging on the filesystem, so it is probably worthwhile for me to do, just need to know the order in which to do things. Thanks!
-
Actually the crash dumps aren't in the current snapshots, they're in the ones building now (there was one more thing I had to fix)
The kernel.gz referenced in the other thread doesn't require anything other than being on a recent snapshot.
-
ok, will wait for next snapshot then :)
-
Still freezes when I try to connect to FTP.
2.0-BETA5 (i386)
built on Thu Jan 27 20:55:04 EST 2011 -
Can you please try a snapshot from 28-JAN?
-
Freezing again!
2.0-BETA5 (i386)
built on Fri Jan 28 05:30:15 EST 2011 -
Just to chime in, I had a freezing system that I upgraded yesterday that was previously on a snapshot from November. When I brought it up to date it started freezing, sometimes in 20 min or less, sometimes not for over an hour. I had to revert as it was production. Unfortunately I'm not sure what the cause is, it's a reasonably large network, though I would think it unlikely that FTP was happening at that specific time (could be wrong). They do not use PPTP at all, it's not configured or enabled. However, they use IPsec site-to-site (with more configured, but only one site-to-site tunnel was up), and also use one VLAN, which are the "nonstandard" things I can think of. It's a P4 2.4 box with the i386 single-processor kernel and it was installed on a Promise mirror at the time (I rebuilt it using a GeoM mirror when I reverted though).
Another system, a VM this time on ESXi 4.0, started locking up as soon as it went into production after several hours. I had to revert to the old solution (Endian) there temporarily. That was at our main office, also production, and hard to know what was going on on the network; we do computer repair so at any given time our customer computers can be spewing who-knows-what :-)
-
BTW can all here execute sysctl debug.pfftpproxy=1 and report if the hangs continue?