Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Kernel Panic when enabling CODELQ on multiple Vlans and freeze on reboot setting up routes

    Scheduled Pinned Locked Moved Firewalling
    1 Posts 1 Posters 218 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • I
      IT_Luke
      last edited by

      Hi all, I am encountering a reproduceable Kernel Panic which propagates to the secondary pfSense (latest version 2.6) in the CARP HA Cluster (so double trouble) when I enable a CODELQ Traffic limiter on a third Vlan interface. I have 2 CODELQ limiters enabled without issues, one on a physical iface (bge2) and one on a Vlan through another interface (bge0 vlan 102). If I enable a 3rd CODELQ limiter through the same interface (bge2) as the first where it was already enabled it crashes and burns taking down the partner box as it syncs the config immediately (and the second behaves as the first confirming the reproduceability). The main pfsense reboots but will not complete the boot freezing on "applying routes" if I recall the exact syntax while the second (slave) reboots correctly as the config.xml is not found and reloads the previous one without issues and becomes CARP master. To recover from this mayhem I have to manually boot the first node in safe mode single user and copy the previous saved config.xlm over after running an fsck. It then reboots with the prior config without the 3rd limiter set and all is well.

      Here is an excerpt of the textdump.0 - what catches the eye is the "bge2 taskq" which makes me think it's related and caused by the cascading limiter on the vlan on the same interface:

      db:1:lockinfo> show lockedvnods
      Locked vnodes
      db:0:kdb.enter.default> show pcpu
      cpuid = 1
      dynamic pcpu = 0xfffffe0080d98200
      curthread = 0xfffff80004adf740: pid 0 tid 100065 "bge2 taskq"
      curpcb = 0xfffff80004adfce0
      fpcurthread = none
      idlethread = 0xfffff80004619740: tid 100004 "idle: cpu1"
      curpmap = 0xffffffff8368f6e8
      tssp = 0xffffffff83719808
      commontssp = 0xffffffff83719808
      rsp0 = 0xfffffe00004edcc0
      kcr3 = 0x8000000003d06002
      ucr3 = 0xffffffffffffffff
      scr3 = 0x368dfaeca
      gs32p = 0xffffffff83720020
      ldt = 0xffffffff83720060
      tss = 0xffffffff83720050
      tlb gen = 46299116
      curvnet = 0
      db:0:kdb.enter.default> bt
      Tracing pid 0 tid 100065 td 0xfffff80004adf740
      kdb_enter() at kdb_enter+0x37/frame 0xfffffe00004ed6d0
      vpanic() at vpanic+0x197/frame 0xfffffe00004ed720
      panic() at panic+0x43/frame 0xfffffe00004ed780
      trap_fatal() at trap_fatal+0x391/frame 0xfffffe00004ed7e0
      trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00004ed830
      trap() at trap+0x286/frame 0xfffffe00004ed940
      calltrap() at calltrap+0x8/frame 0xfffffe00004ed940
      --- trap 0xc, rip = 0xffffffff80e2105b, rsp = 0xfffffe00004eda10, rbp = 0xfffffe00004eda20 ---
      m_tag_delete_chain() at m_tag_delete_chain+0x5b/frame 0xfffffe00004eda20
      uma_zfree_arg() at uma_zfree_arg+0x3a/frame 0xfffffe00004eda80
      m_freem() at m_freem+0x9b/frame 0xfffffe00004edaa0
      bge_txeof() at bge_txeof+0x5d/frame 0xfffffe00004edad0
      bge_intr_task() at bge_intr_task+0x1e4/frame 0xfffffe00004edb20
      taskqueue_run_locked() at taskqueue_run_locked+0x144/frame 0xfffffe00004edb80
      taskqueue_thread_loop() at taskqueue_thread_loop+0xb6/frame 0xfffffe00004edbb0
      fork_exit() at fork_exit+0x7e/frame 0xfffffe00004edbf0
      fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00004edbf0
      --- trap 0, rip = 0, rsp = 0, rbp = 0 ---

      I can't really do any further testing as it's in a customer site and as both boxes crash it creates some connectivity problems before the second one reboots which takes a good few minutes (DL380 G8s are not blazing fast a booting). I don't know if this is a redmine but I have seen this thread:

      https://redmine.pfsense.org/issues/5383

      however I don't think it is directly related, though it could be indirectly related. Maybe someone can hopefully reproduce and confirm it.

      1 Reply Last reply Reply Quote 0
      • First post
        Last post
      Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.