Crashing on new hardware with Fatal trap 12: page fault while in kernel mode



  • Hello,

    I'm running 2.1 RC snapshot from Tue Jul  2 06:13:27 EDT 2013 and keep running in kernel panics. System is:

    • Asrock B75 PRO3-M B75 chipset

    • Corsair CL9 1600 RAM 8GB

    • Intel® Core™ i3-3220

    • Seagate 1TB SATA6

    • Intel Pro 1000PT Quad Port Gigabit LAN 4x RJ45 PCI Express

    • 400W power supply

    I already replaced faulty RAM. HDD and other components are new.

    What I get:

    Fatal trap 12: page fault while in kernel mode
    cpuid = 3; apic id = 03
    fault virtual address   = 0x68
    fault code      = supervisor read data, page not present
    instruction pointer = 0x20:0xffffffff80211084
    stack pointer           = 0x28:0xffffff80540ddac0
    frame pointer           = 0x28:0xffffff80540ddad8
    code segment        = base 0x0, limit 0xfffff, type 0x1b
                = DPL 0, pres 1, long 1, def32 0, gran 1
    processor eflags    = interrupt enabled, resume, IOPL = 0
    current process     = 8 (pfpurge)
    version.txt06000024012165337073  7616 ustarrootwheelFreeBSD 8.3-RELEASE-p8 #0: Tue Jul  2 06:13:27 EDT 2013
        root@snapshots-8_3-amd64.builders.pfsense.org:/usr/obj.pfSense/usr/pfSensesrc/src/sys/pfSense_SMP.8
    

    I know this has been posted on the forums here before and I added these lines to the boot.local.conf:

    kern.ipc.nmbclusters="131072"
    hw.em.num_queues=1

    Still, the box keeps panicking without apparent reasons - no heavy load (3 users, no high cpu or memory usage).

    Any ideas?
    log.txt



  • Have you considered loading a 64 BIT ubuntu release on it and then running prime95 on torture testing mode to check the possibility that perhaps some part of your hardware is just flakey before you spend too much time seeing if its a pfsense issue?

    I would torture the heck out of the processor and ram to be sure its not just a piece of flakey hardware.

    I you load prime and it crashes or the threads stop, you will have to look no further.

    If it has no problems then you can be pretty sure its a pfsense issue.



  • Thanks kejianshi, that's what I'm going to do first. Maybe you're right and it's more than the RAM I already replaced. The machine froze several times without reason during a Debian 7.0 setup now - that's not normal. Maybe pfSense is not the problem here.



  • I'm not yet confident that it's the hardware.

    What looks suspicious to me is that "pfpurge" is gíven as the active process (same as in my crashed reported here: http://forum.pfsense.org/index.php/topic,64131.0.html). pfpurge spends most of the sleeping, so statistically other processes should have a much chance of crashing because of hardware glitches.



  • @Klaws: I'm running mprime for hours now without problems. I'm still going to do some more testing, especially harddisk testing. Wouldn't be the first time the board is faulty. I saw your other thread, too, and the panic is basically the same. Is there any way to reproduce this (ie. you mentionend sth. about FTP?)



  • No, the FTP thread wasn't mine.

    I've had absolutely no success in reproducing that crash on purpose yet. I checked and I actually do have an FTP address open in Chrome. So, if it has to do with FTP, maybe restarting Chrome (which then reopens all windows and tabs and then begins loading the content of all pages) would trigger the crash. Nope. I checked the FTP page, manually reloaded it several time just to be sure, and nope, no crash. The tunable debug.pfftpproxy is at the standard setting of 0, never ever changed it.

    I hate unreliable bugs. ;-)



  • Yes, the bug is somewhat random. Sometimes the second passive FTP connection froze the whole system, sometimes it crashed and rebooted after 10 minutes of FTPing/downloading via HTTP. I haven't been able to crash the July 1st snapshot no matter how hard I tried. I'm running that now. Maybe it's worth a try for you, too.



  • OK, now I have been almost realiable able to reproduce the panic even on a kvm based proxmox machine running on a different server.

    pfSense panics always when using passive FTP on any client and when actually uploading a file (i.e. writing to a FTP server). I could reproduce this multiple times now.

    I guess my hardware is not the reason.



  • Did you try the July 1st snapshot, too?



  • No, but I'm about to go back to this one right now and test again.



  • @athurdent: Which build are you using?





  • Thanks, I downgraded using the integrated firmware update function with the full_version upgrade .tgz from that build. Let's see how that works.


  • Banned

    Confirmed… The latest batch of FTP-related patches seems really bad, managed to crash my Alix testbox repeatedly… As simple as browsing to ftp-archive.freebsd.org/pub/FreeBSD-Archive/ports/i386/packages-8.1-release/Latest/ e.g., not even required to upload/download anything.

    
    Fatal trap 12: page fault while in kernel mode
    cpuid = 0; apic id = 00
    fault virtual address   = 0x70
    fault code              = supervisor read, page not present
    instruction pointer     = 0x20:0xc04c96c1
    stack pointer           = 0x28:0xe31e6c20
    frame pointer           = 0x28:0xe31e6c2c
    code segment            = base 0x0, limit 0xfffff, type 0x1b
                            = DPL 0, pres 1, def32 1, gran 1
    processor eflags        = interrupt enabled, resume, IOPL = 0
    current process         = 7 (pfpurge)
    trap number             = 12
    panic: page fault
    cpuid = 0
    Uptime: 1d1h13m36s
    Cannot dump. Device not defined or unavailable.
    Automatic reboot in 15 seconds - press a key on the console to abort
    
    

    Please revert those ASAP. :(



  • July 1st build - no crashes so far. Crashed almost every 15 minutes before.


  • Banned

    @taenzerme:

    July 1st build - no crashes so far. Crashed almost every 15 minutes before.

    https://redmine.pfsense.org/issues/2650#note-19



  • I don't know.  That could be coincidental.

    Example.  I have a machine here that I built 8 years ago and I over clocked it 8 years ago and it worked flawlessly every single day with no issues 24/7 without power offs.  Later, I tried installing Linux and it would reliably crash after a few minutes, so I figured "Must be a compatibility issue with linux mint".  Well, I removed the over clock settings and as it turns out, Mint is now rock solid on that machine.

    So its was stable under XP overclocked but not with Mint - But it wasn't Mint's fault.

    In nearly every case where an OS was unstable and I was certain it must be the OS it turned out to be hardware in in every single instance prime95 would expose that in short order.  Most recently this was with a new mobo with new ram and processor and turned out the capacitors on a ram bank were weak.


  • Banned

    @kejianshi:

    I don't know.  That could be coincidental.

    I reproduced this on 3 different boxes (2 amd64, 1 x86) with post-July1 snapshots. Does not happen on any of them with previous snapshots that don't include the faulty FTP patches.



  • Then I guess you are right.  Thats pretty solid evidence that its a problem with the patch.  Makes sense.



  • I made fixes to prevent the panics.
    Check later coming snapshots better from tomorrow ones.



  • Sounds good, thanks!


  • Banned

    Thanks, will test tomorrow…



  • Hi I have the same problem and reverted to 1st July version. I don't get crashes anymore but I also have problem on ftp traffic. The connections are too slow or I get a lot of disconnection from ftp server.



  • Here, too. FTP connections are very unrealibly with July 1st version.



  • Has anyone tested the latest releases? Is it safe to upgrade again?



  • Seems to be stable again, but you can always downgrade with a backup.tgz on full installs or switch slices on nano.
    FTP still does not work very good with the sites I have tested, tough.



  • Hi,
    we have this crashes too… dependent on hardware it gaves a TRAP 12 (attached image with firmware from July, 4th).
    Other hardware get stucked without any hint what happened.
    Yesterday evening updated to
    2.1-RC0 (amd64)
    built on Mon Jul 8 21:53:11 EDT 2013
    FreeBSD 8.3-RELEASE-p8
    because last possible version to download was from July, 2nd. But still same problem…

    Has someone a link to a stable version from July, 1st  please (older version is from April, 22th so rolling back is not so nice)?
    Thanks.



  • Banned

    No such problem with Tue Jul 9 11:08:17 EDT 2013 FreeBSD 8.3-RELEASE-p8. However, the patches do nothing useful to improve FTP, in fact they break things badly here (not as in kernel panic, but breaking FTP functionality which was working perfectly fine before).

    https://redmine.pfsense.org/issues/3077#note-3

    I don't see how these regressions are good at this stage. There is a crapload more users with normal FTP usage, as opposed to esoteric multi-WAN scenarios these were supposed to improve.



  • Here the actual crashdump output of yesterday firmware:

    Fatal trap 12: page fault while in kernel mode
    cpuid = 4; apic id = 04
    fault virtual address   = 0x90
    fault code              = supervisor read data, page not present
    instruction pointer     = 0x20:0xffffffff802122cf
    stack pointer           = 0x28:0xffffff8055b9cac0
    frame pointer           = 0x28:0xffffff8055b9cad8
    code segment            = base 0x0, limit 0xfffff, type 0x1b
                            = DPL 0, pres 1, long 1, def32 0, gran 1
    processor eflags        = interrupt enabled, resume, IOPL = 0
    current process         = 8 (pfpurge)
    
    

    I found in it also Interesting snippet from yesterday install/update part which could cause problems, too:
    <118>Configuring crash dumps…
    <118>Using /dev/ad6s1b for dump device.
    <118>Mounting filesystems...
    ZFS WARNING: Recommended minimum kmem_size is 512MB; expect unstable behavior.
                Consider tuning vm.kmem_size and vm.kmem_size_max
                in /boot/loader.conf.
    ZFS filesystem version 5
    ZFS storage pool version 28

    <118>Disabling APM on /dev/ad6

    Edit these are the default parameters setup by installer (June 4th):
    [2.1-RC0][root@gw1.zws8.local]/root(1): cat /boot/loader.conf
    autoboot_delay="3"
    vm.kmem_size="435544320"
    vm.kmem_size_max="535544320"

    kern.ipc.nmbclusters="0"
    hw.usb.no_pf="1"



  • @doktornotor:

    I don't see how these regressions are good at this stage. There is a crapload more users with normal FTP usage, as opposed to esoteric multi-WAN scenarios these were supposed to improve.

    Well, FTP is broken on my second WAN link, too. I could swear it used to work about 2 month ago…



  • @Reiner030:

    I found in it also Interesting snippet from yesterday install/update part which could cause problems, too:
    Edit these are the default parameters setup by installer (June 4th):
    [2.1-RC0][root@gw1.zws8.local]/root(1): cat /boot/loader.conf
    autoboot_delay="3"
    vm.kmem_size="435544320"
    vm.kmem_size_max="535544320"

    kern.ipc.nmbclusters="0"
    hw.usb.no_pf="1"

    mmh, I changed this to
    vm.kmem_size="536870912"
    vm.kmem_size_max="1073741824"

    didnt helped. It crashed 3 time again since then…



  • Has anyone updated to the latest snapshots? Is it safe to upgrade again? Any feedback from you guys here?



  • Yes, I'm running the latest snapshot. For me, it's rock-solid again since a few days.



  • Have you tryed small partitions ?
    I had a problem with all pfsense version on a new computer with a partition size higher than ~32Gb (can't be 100% sure about the actual size at witch it does not work)