Crashing on new hardware with Fatal trap 12: page fault while in kernel mode
-
Have you considered loading a 64 BIT ubuntu release on it and then running prime95 on torture testing mode to check the possibility that perhaps some part of your hardware is just flakey before you spend too much time seeing if its a pfsense issue?
I would torture the heck out of the processor and ram to be sure its not just a piece of flakey hardware.
I you load prime and it crashes or the threads stop, you will have to look no further.
If it has no problems then you can be pretty sure its a pfsense issue.
-
Thanks kejianshi, that's what I'm going to do first. Maybe you're right and it's more than the RAM I already replaced. The machine froze several times without reason during a Debian 7.0 setup now - that's not normal. Maybe pfSense is not the problem here.
-
I'm not yet confident that it's the hardware.
What looks suspicious to me is that "pfpurge" is gíven as the active process (same as in my crashed reported here: http://forum.pfsense.org/index.php/topic,64131.0.html). pfpurge spends most of the sleeping, so statistically other processes should have a much chance of crashing because of hardware glitches.
-
@Klaws: I'm running mprime for hours now without problems. I'm still going to do some more testing, especially harddisk testing. Wouldn't be the first time the board is faulty. I saw your other thread, too, and the panic is basically the same. Is there any way to reproduce this (ie. you mentionend sth. about FTP?)
-
No, the FTP thread wasn't mine.
I've had absolutely no success in reproducing that crash on purpose yet. I checked and I actually do have an FTP address open in Chrome. So, if it has to do with FTP, maybe restarting Chrome (which then reopens all windows and tabs and then begins loading the content of all pages) would trigger the crash. Nope. I checked the FTP page, manually reloaded it several time just to be sure, and nope, no crash. The tunable debug.pfftpproxy is at the standard setting of 0, never ever changed it.
I hate unreliable bugs. ;-)
-
Yes, the bug is somewhat random. Sometimes the second passive FTP connection froze the whole system, sometimes it crashed and rebooted after 10 minutes of FTPing/downloading via HTTP. I haven't been able to crash the July 1st snapshot no matter how hard I tried. I'm running that now. Maybe it's worth a try for you, too.
-
OK, now I have been almost realiable able to reproduce the panic even on a kvm based proxmox machine running on a different server.
pfSense panics always when using passive FTP on any client and when actually uploading a file (i.e. writing to a FTP server). I could reproduce this multiple times now.
I guess my hardware is not the reason.
-
Did you try the July 1st snapshot, too?
-
No, but I'm about to go back to this one right now and test again.
-
@athurdent: Which build are you using?
-
This one seems to work fine:
http://snapshots.pfsense.org/FreeBSD_RELENG_8_3/amd64/pfSense_RELENG_2_1/livecd_installer/pfSense-LiveCD-2.1-RC0-amd64-20130701-1521.iso.gz -
Thanks, I downgraded using the integrated firmware update function with the full_version upgrade .tgz from that build. Let's see how that works.
-
Confirmed… The latest batch of FTP-related patches seems really bad, managed to crash my Alix testbox repeatedly… As simple as browsing to ftp-archive.freebsd.org/pub/FreeBSD-Archive/ports/i386/packages-8.1-release/Latest/ e.g., not even required to upload/download anything.
Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x70 fault code = supervisor read, page not present instruction pointer = 0x20:0xc04c96c1 stack pointer = 0x28:0xe31e6c20 frame pointer = 0x28:0xe31e6c2c code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 7 (pfpurge) trap number = 12 panic: page fault cpuid = 0 Uptime: 1d1h13m36s Cannot dump. Device not defined or unavailable. Automatic reboot in 15 seconds - press a key on the console to abort
Please revert those ASAP. :(
-
July 1st build - no crashes so far. Crashed almost every 15 minutes before.
-
July 1st build - no crashes so far. Crashed almost every 15 minutes before.
https://redmine.pfsense.org/issues/2650#note-19
-
I don't know. That could be coincidental.
Example. I have a machine here that I built 8 years ago and I over clocked it 8 years ago and it worked flawlessly every single day with no issues 24/7 without power offs. Later, I tried installing Linux and it would reliably crash after a few minutes, so I figured "Must be a compatibility issue with linux mint". Well, I removed the over clock settings and as it turns out, Mint is now rock solid on that machine.
So its was stable under XP overclocked but not with Mint - But it wasn't Mint's fault.
In nearly every case where an OS was unstable and I was certain it must be the OS it turned out to be hardware in in every single instance prime95 would expose that in short order. Most recently this was with a new mobo with new ram and processor and turned out the capacitors on a ram bank were weak.
-
I don't know. That could be coincidental.
I reproduced this on 3 different boxes (2 amd64, 1 x86) with post-July1 snapshots. Does not happen on any of them with previous snapshots that don't include the faulty FTP patches.
-
Then I guess you are right. Thats pretty solid evidence that its a problem with the patch. Makes sense.
-
I made fixes to prevent the panics.
Check later coming snapshots better from tomorrow ones. -
Sounds good, thanks!