• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

Kernel Panic

2.0-RC Snapshot Feedback and Problems - RETIRED
35
325
246.0k
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J
    jimp Rebel Alliance Developer Netgate
    last edited by Jan 14, 2011, 4:56 PM

    It did happen. I manually restarted the builders after the patch went in. So apparently it still isn't quite right.

    Remember: Upvote with the πŸ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

    Need help fast? Netgate Global Support!

    Do not Chat/PM for help!

    1 Reply Last reply Reply Quote 0
    • J
      jimp Rebel Alliance Developer Netgate
      last edited by Jan 14, 2011, 9:16 PM

      Could someone who can readily reproduce this panic give this custom firmware build a try?

      http://cvs.pfsense.org/~jimp/pfSense-Full-Update-2.0-BETA5-i386-20110114-2041.tgz

      It was built without a patch that does the extra mbuf operations that may be triggering the panic.

      Remember: Upvote with the πŸ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

      Need help fast? Netgate Global Support!

      Do not Chat/PM for help!

      1 Reply Last reply Reply Quote 0
      • L
        LostInIgnorance
        last edited by Jan 14, 2011, 10:07 PM

        Bad news JimP, still crashes.

        Kernel page fault with the following non-sleepable locks held:
        exclusive sleep mutex em0 (EM TX Lock) r = 0 (0xc2f52580) locked @ /usr/pfSensesrc/src/sys/dev/e1000/if_lem.c:1350
        KDB: stack backtrace:
        X_db_sym_numargs(c0eb72fb,ccc3ca90,c0a41f25,546,0,...) at X_db_sym_numargs+0x146
        kdb_backtrace(546,0,ffffffff,c145d42c,ccc3cac8,...) at kdb_backtrace+0x29
        witness_display_spinlock(c0eb9813,ccc3cadc,4,1,0,...) at witness_display_spinlock+0x75
        witness_warn(5,0,c0ef7bc2,14,c131b3c0,...) at witness_warn+0x20d
        trap(ccc3cb68) at trap+0x19e
        alltraps(c2feeb00,dedeadc0,c2feeb00,c2feeb00,ccc3cbf0,...) at alltraps+0x1b
        m_tag_delete_chain(c2feeb00,0,c0e6e75d,0,c2ed9b50,...) at m_tag_delete_chain+0x3f
        reallocf(c2feeb00,100,0,c0a42978,df,...) at reallocf+0x8a5
        uma_zfree_arg(c1d7e380,c2feeb00,0,b5,ccc3cc84,...) at uma_zfree_arg+0x29
        m_freem(c2feeb00,4,c0e6e75d,b87,c2f4e000,...) at m_freem+0x43
        ed_probe_RTL80x9(c2f52580,0,c0e6e75d,546,c2f525bc,...) at 0xc06ec4d8
        ed_probe_RTL80x9(c2f4e000,1,c0eb8bcc,4f,c2edb918,...) at 0xc06efea0
        taskqueue_run(c2edb900,c2edb918,c0ea5f85,0,c0eb222b,...) at taskqueue_run+0x103
        taskqueue_thread_loop(c2f525ec,ccc3cd38,c0eaed9a,344,c131b3c0,...) at taskqueue_thread_loop+0x68
        fork_exit(c0a3b1a0,c2f525ec,ccc3cd38) at fork_exit+0xb8
        fork_trampoline() at fork_trampoline+0x8
        --- trap 0, eip = 0, esp = 0xccc3cd70, ebp = 0 ---
        
        Fatal trap 12: page fault while in kernel mode
        cpuid = 0; apic id = 00
        fault virtual address= 0xdedeadc0
        fault code= supervisor read, page not present
        instruction pointer= 0x20:0xc0a611c8
        stack pointer        = 0x28:0xccc3cba8
        frame pointer        = 0x28:0xccc3cbb8
        code segment= base 0x0, limit 0xfffff, type 0x1b
        = DPL 0, pres 1, def32 1, gran 1
        processor eflags= interrupt enabled, resume, IOPL = 0
        current process= 0 (em0 taskq)
        [thread]
        Stopped at      m_tag_delete+0x48:      movl    0(%ecx),%eax
        db> [/thread]
        
        1 Reply Last reply Reply Quote 0
        • F
          FisherKing
          last edited by Jan 15, 2011, 12:23 AM

          currently running 2.0-BETA5 (i386) built on Thu Jan 13 19:33:19 EST 201
          not sure how far back this happens.

          in a test network -
          2 machines, each w/ 4 intel nics (em0 - em3)
          WAN, LAN, Opt1, Opt2 (CARP interface)

          Running CARP on WAN, LAN, Opt1 interfaces
          Syncing on Opt2 interface.

          Recently started getting panics on box2 when changing settings on box1.

          Panic & BackTrace from box2 included below.

          
          Fatal trap 12: page fault while in kernel mode
          
          cpuid = 0; apic id = 00
          
          fault virtual address	= 0x1a4
          
          fault code		= supervisor read, page not present
          
          instruction pointer	= 0x20:0xc09ee51d
          
          stack pointer	        = 0x28:0xd670aa54
          
          frame pointer	        = 0x28:0xd670aa70
          
          code segment		= base 0x0, limit 0xfffff, type 0x1b
          
          			= DPL 0, pres 1, def32 1, gran 1
          
          processor eflags	= interrupt enabled, resume, IOPL = 0
          
          current process		= 253 (devd)
          
          [thread]
          Stopped at      _mtx_lock_sleep+0x6d:   movl    0x1a4(%ecx),%eax
          
          db> bt
          Tracing pid 253 tid 64081 td 0xc4142000
          _mtx_lock_sleep(c40f16d0,c4142000,0,c0ecfc57,fd,...) at _mtx_lock_sleep+0x6d
          _mtx_lock_flags(c40f16d0,0,c0ecfc57,fd,0,...) at _mtx_lock_flags+0xf7
          carp6_input(c3ae5800,c0286938,c40f3a00,c0ea9fce,3,...) at carp6_input+0x9bd
          ifioctl(c46a3b44,c0286938,c40f3a00,c4142000,c40cf900,...) at ifioctl+0x141e
          soo_ioctl(c412ddc8,c0286938,c40f3a00,c39aa400,c4142000,...) at soo_ioctl+0x415
          kern_ioctl(c4142000,f,c0286938,c40f3a00,1a3b7d0,...) at kern_ioctl+0x1fd
          ioctl(c4142000,d670acf8,c0ef7af5,c0ecdaff,c41a77f8,...) at ioctl+0x134
          syscall(d670ad38) at syscall+0x220
          Xint0x80_syscall() at Xint0x80_syscall+0x20
          --- syscall (54, FreeBSD ELF32, ioctl), eip = 0x8088357, esp = 0xbfbfe89c, ebp = 0xbfbfe908 ---
          db> reboot
          [/thread]
          
          1 Reply Last reply Reply Quote 0
          • J
            jimp Rebel Alliance Developer Netgate
            last edited by Jan 15, 2011, 3:37 AM

            Out of curiosity, what type of network cards do you have in that box? Is it rl and em both? Or just em? or just rl? Or something else?

            Remember: Upvote with the πŸ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

            Need help fast? Netgate Global Support!

            Do not Chat/PM for help!

            1 Reply Last reply Reply Quote 0
            • L
              LostInIgnorance
              last edited by Jan 15, 2011, 2:15 PM Jan 15, 2011, 3:45 AM

              one em network (gig embedded on the board of an old dell p4).  All network traffic is VLAN'd on that one interface.

              1 Reply Last reply Reply Quote 0
              • J
                jimp Rebel Alliance Developer Netgate
                last edited by Jan 15, 2011, 3:48 AM

                OK, just checking… It looks odd to me that the backtrace references ed_probe_RTL80x9 which is a really old realtek chip, but it may just be something weird that I don't know at that level in the kernel/network stack.

                We have arranged serial console access with someone who has been able to reproduce the panic so hopefully we'll have a lead on a fix early next week.

                Remember: Upvote with the πŸ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                Need help fast? Netgate Global Support!

                Do not Chat/PM for help!

                1 Reply Last reply Reply Quote 0
                • W
                  wallabybob
                  last edited by Jan 15, 2011, 5:35 AM

                  @jimp:

                  OK, just checking… It looks odd to me that the backtrace references ed_probe_RTL80x9 which is a really old realtek chip,

                  Here's an extract from the stack trace:

                  m_freem(c2feeb00,4,c0e6e75d,b87,c2f4e000,...) at m_freem+0x43
                  ed_probe_RTL80x9(c2f52580,0,c0e6e75d,546,c2f525bc,...) at 0xc06ec4d8
                  ed_probe_RTL80x9(c2f4e000,1,c0eb8bcc,4f,c2edb918,...) at 0xc06efea0
                  taskqueue_run(c2edb900,c2edb918,c0ea5f85,0,c0eb222b,...) at taskqueue_run+0x103
                  

                  Note the two ed_probe_RTL80x9 references are not accompanied by a symbol name and offset. I suspect ed_probe_RTL80x9 is merely the closest lower value global symbol but its too far away to warrant printing the PC as symbol+offset. If that is the case you shouldn't take too much notice of the ed_probe_RTL80x9.

                  1 Reply Last reply Reply Quote 0
                  • L
                    LostInIgnorance
                    last edited by Jan 15, 2011, 4:23 PM

                    @jimp:

                    We have arranged serial console access with someone who has been able to reproduce the panic so hopefully we'll have a lead on a fix early next week.

                    JimP, is there anything I can do to help out?

                    1 Reply Last reply Reply Quote 0
                    • J
                      jimp Rebel Alliance Developer Netgate
                      last edited by Jan 15, 2011, 4:26 PM

                      Not that I'm aware of. If the mbuf tag patch isn't the cause, it almost has to be the recent e1000 driver update (em, igb, etc).

                      Remember: Upvote with the πŸ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                      Need help fast? Netgate Global Support!

                      Do not Chat/PM for help!

                      1 Reply Last reply Reply Quote 0
                      • J
                        jimp Rebel Alliance Developer Netgate
                        last edited by Jan 17, 2011, 4:56 PM

                        Someone else had seen that once but so far we've been unable to replicate it so the real cause can be tracked down.

                        It seemed to be something in the configuration, though.

                        Remember: Upvote with the πŸ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                        Need help fast? Netgate Global Support!

                        Do not Chat/PM for help!

                        1 Reply Last reply Reply Quote 0
                        • L
                          LostInIgnorance
                          last edited by Jan 19, 2011, 7:45 PM

                          I am afraid to update since I haven't heard anything back.  Is it still crashing or has it been fixed?

                          1 Reply Last reply Reply Quote 0
                          • J
                            jimp Rebel Alliance Developer Netgate
                            last edited by Jan 19, 2011, 7:55 PM

                            Nothing has changed with the drivers, but there are plenty of other things that have been fixed, it may be worth trying.

                            Remember: Upvote with the πŸ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                            Need help fast? Netgate Global Support!

                            Do not Chat/PM for help!

                            1 Reply Last reply Reply Quote 0
                            • S
                              Sabbasth
                              last edited by Jan 20, 2011, 9:13 AM

                              How can I get the logs (system.log is flushed every boot ?) so I can help targeting the problem ?

                              I have 4 NICs (5 if), all Intel em, both PCI NIC or MB integrated NIC.
                              The computer just freezes, no reboot.

                              I have nmap and bandwithd installed. I'm using outboud Multi Wan, DHCP server, no VLAN, no traffic shaper, no VPN.

                              All was running good with an old snapshot. Freezes started after an upgrade a week ago. I currently have the lastest snapshot installed.
                              The freezes are random, sometimes pfSense runs some minutes, sometimes some hours.

                              Any FTP transfert aborts with an error (There were problems a week or so with passive FTP, but they were connection problems, here transferts are aborted).
                              I think this can be linked to the problem if a buffer in the driver is the problem.

                              1 Reply Last reply Reply Quote 0
                              • J
                                jimp Rebel Alliance Developer Netgate
                                last edited by Jan 20, 2011, 1:15 PM

                                @Sabbasth:

                                How can I get the logs (system.log is flushed every boot ?) so I can help targeting the problem ?

                                I have 4 NICs (5 if), all Intel em, both PCI NIC or MB integrated NIC.
                                The computer just freezes, no reboot.

                                I have nmap and bandwithd installed. I'm using outboud Multi Wan, DHCP server, no VLAN, no traffic shaper, no VPN.

                                All was running good with an old snapshot. Freezes started after an upgrade a week ago. I currently have the lastest snapshot installed.
                                The freezes are random, sometimes pfSense runs some minutes, sometimes some hours.

                                Any FTP transfert aborts with an error (There were problems a week or so with passive FTP, but they were connection problems, here transferts are aborted).
                                I think this can be linked to the problem if a buffer in the driver is the problem.

                                If you are seeing a freeze and not a reset/panic, then this thread isn't related. Start a new thread for that. These drivers haven't changed for several weeks now.

                                Remember: Upvote with the πŸ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                                Need help fast? Netgate Global Support!

                                Do not Chat/PM for help!

                                1 Reply Last reply Reply Quote 0
                                • C
                                  clarknova
                                  last edited by Jan 21, 2011, 3:58 PM

                                  2.0-BETA5 (amd64)
                                  built on Wed Jan 12 18:01:47 EST 2011

                                  I just experienced my first kernel panic last night after more than 8 days uptime. I'm using a SM X7SPA-H board with only the onboard Intel GBE (Intel 82574L Gigabit Ethernet). I'm not using openvpn, but both NICs have multiple vlans on them and deal only in tagged traffic.

                                  Is there a reasonable chance that updating to the latest snap will resolve this? I don't know that I can reproduce this panic intentionally, as it hasn't happened before and I wasn't doing anything interesting when it happened. I do have clients, but the panic happened at my lowest traffic period of the day.

                                  db

                                  1 Reply Last reply Reply Quote 0
                                  • J
                                    jimp Rebel Alliance Developer Netgate
                                    last edited by Jan 21, 2011, 4:05 PM

                                    I don't think we have done anything that would have fixed these panics, if it is driver-related.

                                    Are you able to switch to a developer kernel so you can obtain a backtrace?

                                    Remember: Upvote with the πŸ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                                    Need help fast? Netgate Global Support!

                                    Do not Chat/PM for help!

                                    1 Reply Last reply Reply Quote 0
                                    • L
                                      LostInIgnorance
                                      last edited by Jan 21, 2011, 4:07 PM

                                      JimP,
                                      I will be trying the old dell with the gigE port today to find out if any changes you mentioned above may have done something.  I haven't had the downtime lately to put it back into the system.  Took that computer out and used another with 2 100 nics and got it running so I could remote back into the system.
                                      So it wouldn't be worthwhile to put that old dell back in?

                                      1 Reply Last reply Reply Quote 0
                                      • C
                                        clarknova
                                        last edited by Jan 21, 2011, 4:08 PM

                                        I'll see about a backtrace. Shouldn't be a problem.

                                        db

                                        1 Reply Last reply Reply Quote 0
                                        • J
                                          jimp Rebel Alliance Developer Netgate
                                          last edited by Jan 21, 2011, 4:26 PM

                                          It may be worth trying again on a current snap. At least on today's snapshot FTP no longer freezes my router :-)

                                          Remember: Upvote with the πŸ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                                          Need help fast? Netgate Global Support!

                                          Do not Chat/PM for help!

                                          1 Reply Last reply Reply Quote 0
                                          34 out of 325
                                          • First post
                                            34/325
                                            Last post
                                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.