11/6 Snapshot causing system to panic /reboot



  • Hello,

    I hope this was a bug and can be fixed… I have been updating my pfsense every week or so to the latest snapshot. Yesterday morning I saw one that was built very early. Put it on and after that started getting resets.

    Today I look and still 11/6 but it has much later time so I assume it had more changes. Put it on in hopes to correct and still no joy.

    I get:

    Fatal trap 12: page fault whil ein kernel mode.
    fault virtual address = 0x4
    fault code = supervisor write, page not present
    instruction pointer = 0x20:0xc074792d
    stack pointer = 0x28:0xd43efbbc
    frame pointer = 0x28:0xd43efbbc
    code segemt = base 0x0, limit 0xfffff, type 0x1b
                      = DPL 0, pres 1, def32 1, gran 1
    processor eflags = interrupt emabled, resume, IOPL = 0
    current process = 11 (swi1: net)
    trap number = 12
    panic: page fault

    Whats wrong? Is this a code error or something wrong with my PC all the sudden? Been running fine for months...

    Thanks in advance



  • Read and monitor this thread, please:
    http://forum.pfsense.org/index.php/topic,6729.0.html

    Might be the snapshot that's broken.



  • Well the original poster was running 1.2RC2 so it is not same snapshot, and the 2nd person has RC3.

    My system will boot and work for a bit, and than it crashes to that…



  • There aren't any issues with the snapshots.

    This is a completely different issue from the other thread.

    Which snapshot are you using cybercare?



  • I was using the 11/6 morning snapshot. Ran fine for about 12 hours than the above started. So this morning I seen a 11/6 snapshot that was from around 1900hours. So I tried that. It still happened right away.

    Waited longer, and now tried the latest one dated today, and same issue…

    If I use any of my older ones it is fine.

    What does this error mean? I am at a lost and this happened at a bad time for me. :( I have to get it working and get another 1:1 nat issue working that wouldnt also by Friday AM. I had just been fighting the nat thing but this jumped out of the blue on me... Thats what I get for wanting the latest snapshot always huh? Just dont know why it only seems to be me?



  • I should have been more specific - what exactly does it show on the front page? There were some 1.3 developer only snapshots that users were getting their hands on, and I wanted to make sure that wasn't the case with yours. As long as it shows 1.2 on the front page, you're fine.

    What is the last version that worked?  If you downgrade to that version, does it work again?

    What you're seeing is a kernel panic, which are caused by OS bugs or hardware problems. The changes made in the past 2 months have been minor PHP changes, nothing that would start causing kernel panics out of nowhere. The random kernel panics you're describing would be caused by a FreeBSD issue if it were a software bug, and FreeBSD 6.2 hasn't changed in months.

    My guess is you have an issue you'll continue to see on a previous version, because of the lack of any OS changes. Because it came up out of nowhere without any relevant changes to cause such a problem, I would suspect hardware issues first.



  • @cmb:

    There aren't any issues with the snapshots.
    This is a completely different issue from the other thread.

    Thanks, cmb, for looking into this so quickly!

    With your knowledge this was easy to sort out and divide into different causes and track them down.
    Snapshots are built automatically, hence there's a possibility for something going wrong silently.
    I believe it's better to ring a false alarm than letting users run into problems, right?

    Thanks for the really good work on the product and the support!



  • @cmb:

    I should have been more specific - what exactly does it show on the front page? There were some 1.3 developer only snapshots that users were getting their hands on, and I wanted to make sure that wasn't the case with yours. As long as it shows 1.2 on the front page, you're fine.

    What is the last version that worked?  If you downgrade to that version, does it work again?

    What you're seeing is a kernel panic, which are caused by OS bugs or hardware problems. The changes made in the past 2 months have been minor PHP changes, nothing that would start causing kernel panics out of nowhere. The random kernel panics you're describing would be caused by a FreeBSD issue if it were a software bug, and FreeBSD 6.2 hasn't changed in months.

    My guess is you have an issue you'll continue to see on a previous version, because of the lack of any OS changes. Because it came up out of nowhere without any relevant changes to cause such a problem, I would suspect hardware issues first.

    Well, yesterday it ran all day fine on an older Oct 26 snapshot. I turned off, went home, came in today, powered on and after a few moments did the same thing so I am sorry, you are correct in it seems to be something hardware.

    The problem now is finding it. Its very basic system. CD, Floppy, Hard drive, mobo, cpu, mem and 3 nics + on onboard nic. I am running memtest right now to make sure not that. We did also buy a new hard drive I am waiting for but I am sure its not that as it gets booted and runs, so at that point its running from RAM. So if the mem passes my next thing I guess is PSU. I just hope its something cheap as we really cant get any other parts or boxes at the moment. :(



  • So I have a new hard drive and power supply in the unit and still have same issue. Using todays snapshot now.

    It seems once I start to configure it and it gets any light traffic, such as just connecting a VPN it starts to give this problem.

    Any suggestions on what to do to figure out the cause?

    I ran memtest and it passed, CD-ROM has been swapped, floopy removed, HD new, PSU new. Only thing left is the CPU/Mobo and NICs. I would assume the nics are fine as I can pass data on them fine.

    The Mobo is a CUSL2 w/ PIII 866 & 512mb PC133.

    Anyone who can help I would be in your debt.

    Thanks



  • Memory and power supply would be my first two guesses, and are the most common panic causes. Even if it did pass a memtest, I would still try different memory if you have it readily available. Or if you can take some of the memory out, try that.

    Next most likely possibility is the motherboard or CPU.



  • Alight I am at a bit of a loss….

    I have installed PfSense on an diffrent computer all together. (diffrent Mobo, CPU, Mem, PSU, HD, CD) and now have the same problem. When I say diffrent I mean even diffrent brands.... I went from intel to AMD... from PIII w/ PC133 to XP2600 w/ DDR....

    It seems to happen most when I start my Ipsec tunnel... So this to me says its an code issue as two diffrent PC's do the same thing within minutes of getting them up and being in the middle of setting up the configs it just boots on me. I normally setup my nics, than setup my ipsec tunnel and rule for it, than I am in the middle of setting up my pptp stuff and boom its boots.

    The remote end has not had any problems yet, and its running 1.2-RC3 built on Fri Oct 19 09:31:18 EDT 2007.

    I cant afford to update it to see if maybe its just the two incompatable because if that one starts to crash/reboot than we will be in trouble beyond any words....

    Side note, I tried to use a Dell PE600SC but it seems pfsense does not like me adding my Intel Pro 100 S Desktop cards as I have 3 of them and when I put any one of the three in it and any one of the many slots it gets device time out errors after bootup... The systems onboard gigabit is fine. And I tried each of the three cards, all same result... All diffrent batches.. So I guess pfsense does not like the dell and the combo of cards? So the AMD I am trying is my work PC but got same issue as PC1...



  • Well, it seems I found the bug… Indeed it is not my hardware. Allthough I wish the dell would work, not sure why it has the device timeout, any suggestions?

    Anyway the bug that causes the crash is as follows.

    Have an IPsec tunnel on a PC with multiple WANs. Instead of the gateway being on default, pick the one you actually want it on. Make your rule to allow the traffic to your network.

    Now have the other side of the tunnel ping you, BOOM instant crash... I was having the problem because I had keep alive on the remote end on and I guess because it pings it killed me.

    I can reproduce this on the spot and it crashes the min I hit the ping button... I was testing it by remoting onto the other end of the tunnel and going to diag, and than going to ping and pinging my lan..... Pick the interface LAN on the remote end and type in your other ends local IP and hit ping it will die.

    I am not sure when this became an issue than may have been longer... I have other combos setup with same builds and it is fine but only one WAN so is left on default...

    Please advise.

    Thanks!



  • Interesting. I opened a ticket on this, I'll look into it when I get a chance. I may need further info later to attempt to duplicate.



  • Hello Gentlemen,
    I have exactly the same problem! I'm running a pFSense 1.2-RC3 on a Dell Dimension 4100 with two WAN Connections. A DMZ Interface for our
    WLan enables Mobile Laptops with IPsec to enter the Lan.
    This configuration works fine as long as you do NOT enable either the 2nd WAN Connection or the "Enable IPsec" field in the Tunnel Interface of the "VPN: IPsec" Screen of the web-IF.

    In this Situation, the System instantly crashes with:

    Fatal trap 12: page fault while in kernel mode
    fault virtual address  = 0x4
    fault code              = supervisor write, page not present
    instruction pointer    = 0x20:0xc0747949
    stack pointer          = 0x28:0xd4454bbc
    frame pointer          = 0x28:0xd4454bcc
    code segment            = base 0x0, limit 0xfffff, type 0x1b
                            = DPL 0, pres 1, def32 1, gran 1
    processor eflags        = interrupt enabled, resume, IOPL = 0
    current process        = 11 (swi1: net)
    trap number            = 12
    panic: page fault
    Uptime: 1m13s
    Cannot dump. No dump device defined.
    Automatic reboot in 15 seconds - press a key on the console to abort
    Rebooting…

    Then the system continously reboots until you deselect (and save) the IPsec tunnel.

    I also tried to change the RAM of the System but it seems not to be a hardware issue.

    Thanks in advance for your assistance
    Hubert



  • I can confirm this crash also.


Log in to reply