Kernel panic 4-5 Nov (i386)

m4rcu5

Unfortunately i can only confirm the kernel panic (page fault). It happens when booting, just after the interface configuration.
I am reinstalling a this moment to see if i can get the system working again.

-m4rcu5

jimp

@m4rcu5:

Unfortunately i can only confirm the kernel panic (page fault). It happens when booting, just after the interface configuration.
I am reinstalling a this moment to see if i can get the system working again.

-m4rcu5

That sounds more like the amd64 problem, this is on i386.

pakjebakmeel

Ok, got it.. Couldn't invoke the auto updater as there is no newer snapshot.. But after randomly clicking around in the webinterface for a minute it went down when halfway through on loading "Diagnostics –> ARP Table"


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00

fault virtual address	= 0x10317
fault code			= supervisor read, page not present
instruction pointer	= 0x20:0xc095f46b
stack pointer		= 0x28:0xe2e21bc4
frame pointer		= 0x28:0xe2e21bc8
code segment		= base 0x0, limit 0xfffff, type 0x1b
				= DPL 0\. pres 1, def32 1, gran 1
processor eflags	= interrupt enables, resume, IOPL = 0
current process	= 0 (ath0 taskq)
trap number		= 12

panic: page fault
cpuid = 0

Cannot dump. Device not defined or unavailable.

Automatic reboot in 15 seconds - press a key on the console to abort
Rebooting...

m4rcu5

@jimp:

@m4rcu5:

Unfortunately i can only confirm the kernel panic (page fault). It happens when booting, just after the interface configuration.
I am reinstalling a this moment to see if i can get the system working again.

-m4rcu5

That sounds more like the amd64 problem, this is on i386.

jimp, i use the i386 image on a Intel core2duo.

I have the exact same fault code on the same process (only on interface em0) as pakjebakmeel. This time it happend when booted from the live cd.

-m4rcu5

pakjebakmeel

For me it only happens when navigating the webGUI.. If I leave the GUI alone and generate massive traffic the box is rock solid.

Michael Sh.

Hi jimp,
Sorry for screenshots, but I use real hardware ASUS Pundit and have not serial console. Before my first post I wait panic maximum 5 minutes. Now on lasts snapshot more long, but I know differences between snappshots absent.
This panic was appear immediately after fetch command on pfSense host.

pf_panic.jpg_thumb

systat_vm.PNG_thumb
wan.svg.txt
lan.svg.txt

pakjebakmeel

Hi all,

Is this bug still present or can it be fixed by upgrading to a newer snapshot? 8)

Thanks!

jimp

It's still there. I just restarted the builder again to see if the fixes checked in yesterday made a difference. It should be done in a while but I'd still wait for an all-clear.

jimp

Current snap is OK:

2.0-BETA4 (amd64)
built on Tue Nov 9 17:26:01 UTC 2010

Michael Sh.

pfSense-Full-Update-2.0-BETA4-20101109-1641.tgz

panic.jpg_thumb

jimp

Ah, well I was hoping it may be a similar issue to amd64. Looks like it may be different.

Michael Sh.

pfSense-Full-Update-2.0-BETA4-20101110-0504.tgz
Same balls, but side-view. It seems to me traffic generating by router is reason of crash. Without this traffic router working more long time, but attempt to fetch any by router always cause panic.

jimp

I've been sitting here furiously loading GUI pages on my poor little ALIX running a snapshot from today and though it's gotten slow at times, I have yet to see a panic.

Is there anything else people in this thread might have in common? What kind of setups do you all have? Can you give a general idea of things that are in use? (Multi-wan, IPsec, OpenVPN, PPPoE, 3G, wireless, etc)

FisherKing

Not to cause you more frustration Jimp - but I'm seeing this GUI / Kernel Panic also.

Running 2010-11-09 (i386) on a PIII.
Reset to factory defaults yesterday & did a basic install
I've got 2 dual port intel nics.
fxp0 = WAN PPPoE
fxp0 = Opt3 DHCP private (10.x.x.x)
fxp1 = Opt1 DMZ static public
fxp2 = LAN static private (172.x.x.x.)
fxp3 = Opt2 static private (172.y.y.y)

I'm running DHCP server on Lan and Opt2.
I'm logging in under a 2nd administrator account.
I'm running Captive Portal on Opt2 w/ local auth
Firewall allows web, mail, dns traffic through to the public IPs on the DMZ
I have freeswitch installed on the box, but it isn't the pfSense package. pfSense doesn't know it's there.

How does this match up with what the rest of you have?

FisherKing

I've noticed the following on the kernel panic screen.

Cannot dump. Device not defined or unavailable.

Would the dump be helpful in diagnosing this? Is there a place where we can find directions on how to give it a dump device? Is it possible to dump to the local disk, or to a USB memory stick?

I do have a null modem cable & could dump to the serial port / windows terminal app if that's how it's done, but I'd need directions for that also.

eri--

Can it be your hardware?
Can you please install dev kernel in there?

jimp

In case someone missed it earlier in the thread, here are instructions for installing a dev kernel:

http://doc.pfsense.org/index.php/Switching_Kernels

Afterwards, capture the panic message and also the output of typing "bt" at the debugger prompt.

Michael Sh.

Hi!
snapshot pfSense-Full-Update-2.0-BETA4-20101110-1837.tgz
I switching to dev kernel but is not working at all (can't finish boot). Attaching dmesg for hardware configuration.
Rollback to 13 Oct, because not working PPTP MPD is critically for me.

panic1.jpg_thumb

panic2.jpg_thumb

panic3.jpg_thumb
dmesg.boot.txt

eri--

Can you please try setting net.inet.tcp.syncookies=0 and see if you still get panic?

Michael Sh.

pfSense-Full-Update-2.0-BETA4-20101111-2017.tgz
I try setting net.inet.tcp.syncookies=0 in /boot/loader.conf.local and all without changes.
Dev kernel cannot finish init scripts and panic on “syncache: mbuf to small”.
Uniprocessor kernel working some time, but panic on interrupt routines (more often Ethernet bfe0 WAN, but may be clock interrupt).
I can post screenshots if it can be helpful.