Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Periodic Panic on CE 2.8.0 - DHCP6 Client (I Think)

    Scheduled Pinned Locked Moved General pfSense Questions
    5 Posts 2 Posters 5.9k Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • D Offline
      davefinster
      last edited by davefinster

      Hi All

      First time posting in the Netgate forums so LMK if I've done this in the wrong place. I've been encountering an issue for as long as I can remember where my PFSense firewall, running on a Lenovo M46KT27A (somewhat overkill sure) that I've installed a 2 SFP port Intel X520 with the following things plugged in:

      E.C.I. NETWORKS PN: ENXGSFPPOMACV2 - SFP/SFP+/SFP28 10G Base-LR (SC)
      Ubiquiti Inc. PN: DAC-SFP10-0.5M SN: BA22093023861 DATE: 2022-09-26 - SFP/SFP+/SFP28 1X Copper Passive (No separable connector)

      The starting point seems to be the following which seems to be DHCP6 related. Beyond that I'm not familiar with debugging these things. It's been happening reasonably regularly (at least once per week) for as long as I can remember and I've only now decided to dig into it.

      db:0:kdb.enter.default>  run pfs
      db:1:pfs> bt
      Tracing pid 52781 tid 100414 td 0xfffff800126df740
      kdb_enter() at kdb_enter+0x33/frame 0xfffffe00d3de67f0
      panic() at panic+0x43/frame 0xfffffe00d3de6850
      trap_fatal() at trap_fatal+0x40b/frame 0xfffffe00d3de68b0
      trap_pfault() at trap_pfault+0x46/frame 0xfffffe00d3de6900
      calltrap() at calltrap+0x8/frame 0xfffffe00d3de6900
      --- trap 0xc, rip = 0xffffffff80f5b213, rsp = 0xfffffe00d3de69d0, rbp = 0xfffffe00d3de6a20 ---
      in6_unlink_ifa() at in6_unlink_ifa+0x53/frame 0xfffffe00d3de6a20
      in6_purgeaddr() at in6_purgeaddr+0x366/frame 0xfffffe00d3de6b40
      in6_purgeifaddr() at in6_purgeifaddr+0x13/frame 0xfffffe00d3de6b60
      in6_control_ioctl() at in6_control_ioctl+0x5e1/frame 0xfffffe00d3de6bd0
      ifioctl() at ifioctl+0x8b0/frame 0xfffffe00d3de6cd0
      kern_ioctl() at kern_ioctl+0x255/frame 0xfffffe00d3de6d40
      sys_ioctl() at sys_ioctl+0x117/frame 0xfffffe00d3de6e00
      amd64_syscall() at amd64_syscall+0x115/frame 0xfffffe00d3de6f30
      fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00d3de6f30
      --- syscall (54, FreeBSD ELF64, ioctl), rip = 0x822a2bcca, rsp = 0x820280e58, rbp = 0x820280f50 ---
      db:1:pfs>  show registers
      cs                        0x20
      ds                        0x3b
      es                        0x3b
      fs                        0x13
      gs                        0x1b
      ss                        0x28
      rax                       0x12
      rcx         0xb671471b9956e201
      rdx         0xfffffe00d3de6310
      rbx                      0x100
      rsp         0xfffffe00d3de66c8
      rbp         0xfffffe00d3de67f0
      rsi         0xfffffe00d3de6580
      rdi         0xffffffff82740878  vt_conswindow+0x10
      r8                        0x3c
      r9                        0x3c
      r10                          0
      r11                          0
      r12                          0
      r13                          0
      r14         0xffffffff8145d99f
      r15         0xfffff800126df740
      rip         0xffffffff80d457b3  kdb_enter+0x33
      rflags                    0x82
      kdb_enter+0x33: movq    $0,0x1d76cd2(%rip)
      db:1:pfs>  show pcpu
      cpuid        = 6
      dynamic pcpu = 0xfffffe009b4325c0
      curthread    = 0xfffff800126df740: pid 52781 tid 100414 critnest 1 "dhcp6c"
      curpcb       = 0xfffff800126dfc60
      fpcurthread  = 0xfffff800126df740: pid 52781 "dhcp6c"
      idlethread   = 0xfffff800027e5740: tid 100009 "idle: cpu6"
      self         = 0xffffffff83a16000
      curpmap      = 0xfffff800126f0358
      tssp         = 0xffffffff83a16384
      rsp0         = 0xfffffe00d3de7000
      kcr3         = 0xffffffffffffffff
      ucr3         = 0xffffffffffffffff
      scr3         = 0x0
      gs32p        = 0xffffffff83a16404
      ldt          = 0xffffffff83a16444
      tss          = 0xffffffff83a16434
      curvnet      = 0xfffff80001288840
      db:1:pfs>  run lockinfo
      db:2:lockinfo> show locks
      No such command; use "help" to list available commands
      db:2:lockinfo>  show alllocks
      No such command; use "help" to list available commands
      db:2:lockinfo>  show lockedvnods
      Locked vnodes
      

      info.0.txt
      info.1.txt
      textdump.0.tar
      textdump.1.tar

      D 1 Reply Last reply Reply Quote 0
      • D Offline
        davefinster @davefinster
        last edited by

        So as a follow on, I have noticed that the gateway monitors are tripping fairly regularly on my AT&T Fiber IPv6 which is probably what is causing the DHCPv6 client to jump into action which occasionally leads to this situation. I've found similar issues from older releases where there was a race between interface reconfiguration and disablement.

        I've disabled the IPv6 monitor from taking action (but still logging) so will see if that eliminates the panics. But the fact that it can happen is still concerning.

        1 Reply Last reply Reply Quote 0
        • stephenw10S Online
          stephenw10 Netgate Administrator
          last edited by

          @davefinster said in Periodic Panic on CE 2.8.0 - DHCP6 Client (I Think):

          in6_unlink_ifa

          Hmm, that looks like this: https://redmine.pfsense.org/issues/14164 But that should be resolved in 2.8.0.

          In both crashes crashes the log is spammed by something trying to use a linklocal IPv6 address for public routing which is not allowed.

          I would guess it's an issue with the tailscale interface though since that's the only other thing showing much activity. That has been shown to cause the related bug: https://redmine.pfsense.org/issues/14431

          I was never able to replicate that locally but it could be a timing issue that only a fast WAN connection hits. I see you're using ixl NICs, what speed is your WAN that tailscale is using?

          D 1 Reply Last reply Reply Quote 0
          • D Offline
            davefinster @stephenw10
            last edited by

            I see you're using ixl NICs, what speed is your WAN that tailscale is using?

            I've got 5Gbps/5Gbps through AT&T Fiber using a WAS-110 in one of the SFP ports as the GPON endpoint. This SFP does all the network/GPON specific bits such that PFSense just performs DHCP(v6) over the interface. That is my WAN side and then on the LAN side it's just a 10Gbps Twinax into an aggregation switch.

            To at least prevent the issue from happening, I've been doing a bit more study on the prefix delegation expectations of the AT&T service and I've arrived at a point where I've set the DHCPv6 client on the WAN interface to only ask for a prefix delegation and not for an address for itself. When it asked for such an address the /128 provided by AT&T is non-routable anyway. This also seemed to cause significant instability in IPv6 networking where the gateway pinging and v6 routing in general would periodically break which opened up an opportunity for this race presumably. By not requesting the /128 the gateway pinger for v6 is purely using its link-local address.

            The end result is that the WAN interface only ends up with its link-local address and everything IPv6 related that originates from the router (e.g. Tailscale) is now using the routers IP from the PD'd IPv6 range on the LAN interface which unlike the /128 that AT&T provides is routable. Since making these changes I've not had any issues for 2 days.

            1 Reply Last reply Reply Quote 1
            • stephenw10S Online
              stephenw10 Netgate Administrator
              last edited by

              Ah, interesting. Yup AT&T expect to see their own router at the end of GPON/XPON and pfSense could well be doing something that doesn't play well. Obviously it still shouldn't panic like that.

              The panic appears to be caused by a race condition during removal of an IPv6 address. If the WAN was renewing a lease repeatedly that seems likely.

              1 Reply Last reply Reply Quote 0
              • First post
                Last post
              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.