Debug Kernel Build with Witness



  • I don't suppose there's a way to get a debug kernel for an SG-3100 with witness enabled? I'm trying to identify some random crashes with the 2.5.0-dev branch. I believe it's a lock somewhere, but these builds don't appear to have witness.

    At some point during the day (every 1-2 days-ish) all network traffic stops, but from the console I can still access the shell; however, as soon as I run any command that interacts with the network stack (e.g. netstat, or ping), the console locks up also and all I can do is power cycle. There are no system logs, no console output, nothing indicating a problem at all.

    Thanks in advance.


  • Rebel Alliance Developer Netgate

    There is a pfSense-kernel-debug-pfSense-SG-3100 package you can install but I don't recall which options are in the kernel at the moment.

    I don't recall ever seeing anything like that on the 3100 on 2.5.0 (or 2.4.x) either.

    If you press ^T at the console when it's locked up, does it produce any output?

    Does anything happen if you send a serial break? (Maybe also set a tunable for debug.kdb.break_to_debugger=1)



  • Next time it locks up I'll give the ^T a shot. I've also set that sysctl to 1.

    I'm already using the pfSense-kernel-debug-pfSense-SG-3100.

    pfSense-kernel-debug-pfSense-SG-3100-2.5.0.a.20200808.0050
    Name           : pfSense-kernel-debug-pfSense-SG-3100
    Version        : 2.5.0.a.20200808.0050
    Installed on   : Sat Aug  8 09:13:21 2020 CDT
    

    Since the builds don't include the config, a quick strings /boot/kernel/kernel | grep witness didn't return anything so I assume its not built with it.

    I'll let you know what happens next time it locks up. Thanks for the response @jimp



  • Well @jimp It's been 17 days... 9 days ago I had it lock up, got nothing out of CTRL+T, and no panic/dump, so I just rebooted it and updated to the latest snapshot. It's been running fine for 9 Days 20 Hours 54 Minutes 26 Seconds so... /shrug, maybe someone fixed something between pfSense-kernel-debug-pfSense-SG-3100-2.5.0.a.20200808.0050 and pfSense-kernel-debug-pfSense-SG-3100-2.5.0.a.20200817.1850 (the one I'm running now).

    At any rate, I figured you were quite anxious to hear what was happening with this so thought I'd fill you in. Thanks again for putting up with us (us: the ungrateful community)!

    EDIT: I swear, within 5 minutes posting that it locked up. I'm really wondering now if it has something to do with me logging in to through ssh and/or the web portal. I was able to issue a few commands (tcpdump, no packets, ps, dmesg, checked some logs...nothing interesting), but as soon as I ran a netstat, the console locked up also.



  • Ah hah! @jimp

    Everything on the "pf rules(et?)" Wait Chain is waiting on a Lock. That appears to be ...

    sys/netpfil/pf/pf_ioctl.c:
    struct rmlock pf_rules_lock;
    rm_init(&pf_rules_lock, "pf rulesets");

    sys/net/pfvar.h:
    extern struct rmlock pf_rules_lock;
    #define PF_RULES_RLOCK() rm_rlock(&pf_rules_lock, &_pf_rules_tracker)
    #define PF_RULES_RUNLOCK() rm_runlock(&pf_rules_lock, &_pf_rules_tracker)

    ...which gets called many times throughout pf, in particular the ioctl handler /sys/netpfil/pf/pf_ioctl.c ... If I'm a betting man, it's in pf purge or ng_queue (the two kernel procs with pages locked in core). Buuttt... most likely with in netgraph (I'm using the AT&T bypass).

    I've stripped down this ps -ax1O flags,flags2 to just the relevant (STAT L*) entries.

      UID   PID  PPID C PRI NI   VSZ   RSS MWCHAN   STAT TT      TIME COMMAND            PID        F       F2 
        0     0     0 1 -16  0     0   112 swapin   DLs   -   0:00.62 [kernel]             0 10000284 00000000 
        0     1     0 0  20  0  3280   768 wait     ILs   -   0:00.05 /sbin/init --        1 10004200 00000000 
        0     8     0 0 -16  0     0     8 pf rules LL    -   0:00.37 [pf purge]           8 10000204 00000000 
        0    12     0 1 -92  0     0    16 pf rules LL    -   0:39.99 [ng_queue]          12 10000284 00000000 
    65534  4229  3258 1  42 20  4416  2040 pf rules LNs   -   0:00.02 /usr/local/sbin/  4229 10004100 00000000   -- /usr/local/sbin/expiretable -v -t
        0  8994  8958 1  20  0 21652  7352 pf rules L     -   0:00.23 nginx: worker pr  8994 10000100 00000000 
      100 19467     1 0  20  0 71260 29648 pf rules L     -   1:09.85 (squid-1) --kid  19467 10004100 00000000 
        0 38115     1 1  72  0  4736  2420 pf rules L     -   0:00.01 /sbin/pfctl -e   38115 10004000 00000000 
        0 57945     1 1  20  0  9556  6004 pf rules Ls    -   0:00.09 /usr/local/sbin/ 57945 10000080 00000000 
        0 76017 76011 0 -92  0  4804  2452 pf rules L     -   0:00.01 /sbin/pfctl -vvP 76017 10004000 00000000 
        0 77566 77277 0  40 20  4736  2420 pf rules LN    -   0:00.00 /sbin/pfctl -vvs 77566 10004000 00000000 
        0 77617 12439 0  20  0  4460  2120 pf rules LC    -   0:00.00 /sbin/ping -t2 - 77617 10004000 00000000 
        0 78292 13859 0  20  0 12140  7224 pf rules Ls    -   0:00.13 sshd: root (sshd 78292 10004100 00000000 
        0 55147 55120 1  93 20  4736  2420 pf rules LN   u0   0:00.00 /sbin/pfctl -vvs 55147 10004002 00000000 
        0 77870 60115 1  20  0  5044  2588 -        R+   u0   0:00.01 ps -axlO flags,f 77870 10004002 00000000 
        0 80126 73786 1  20  0  4932  2592 wait     I    u0   0:00.02 /bin/sh /etc/rc. 80126 10004002 00000000 
    	```


  • Well, after much frustration, here's my final findings before I switch back to 2.4.5.

    There is a kernel memory leak, that is overwriting the priority tracker for the pf_rules_lock, which is causing the kernel to enter a deadlock, since it can't release the mutex(rmlock).

    Here's some information if anyone else cares to try to figure out what is actually causing the memory leak.

    https://github.com/pfsense/FreeBSD-src/blob/devel-12/sys/netpfil/pf/pf_ioctl.c
     rm_init(&pf_rules_lock, "pf rulesets");
      -> This allocates a lock_object at 0xc1c77e78, as evidenced by the below kgdb, it is a static symbol (always at that address)
    
    (kgdb) print &pf_rules_lock
    $1 = (struct rmlock *) 0xc1c77e78 <pf_rules_lock>
    (kgdb) ptype pf_rules_lock
    type = struct rmlock {
        struct lock_object lock_object;
        volatile cpuset_t rm_writecpus;
        struct {
            struct rm_priotracker *lh_first;
        } rm_activeReaders;
        union {
            struct lock_object _rm_wlock_object;
            struct mtx _rm_lock_mtx;
            struct sx _rm_lock_sx;
        } _rm_lock;
    }
    
    (kgdb) ptype struct lock_object
    type = struct lock_object {
        const char *lo_name;
        u_int lo_flags;
        u_int lo_data;
        witness *lo_witness;
    }
    

    74fca238-1ed8-44d9-87f9-53016d67f1b9-image.png

    447eaa11-be7d-4f57-890e-d6008e91b8b9-image.png

    76aca55a-af6d-4d69-a648-68ea0ec2158e-image.png

    Note: This is from a vmcore dump, kernel memory virtual addresses (0xc0000000) are mapped to physical addresses at 0x00400000.

    From the Ghidra image above, you can see where the lock object is allocated in memory and that it is populated with data (I didn't build out the full struct, only the parts I cared about). The lo_name correctly maps to the "pf_ruleset" string, but following the rm_priotracker, you can see the memory is corrupted. Those values don't match to what should be in the struct. Upon further investigation, that "spray" of data starts around 0x7f560000 and goes to 0x80000000 (0xAA000 bytes of data), it's a repeating pattern of 4 bytes + 0x00000100 (LE), but the values change periodically to a pattern I couldn't figure out.

    I did a factory reset and re-installed my packages to let it run for a while, just to see... but no luck, it still locks up (same section of memory gets overwritten with garbage).

    So, now I'm in the process of re-installing 2.4.5 on my SG-3100. Good luck to anyone else who attempts this.



  • Hi, it's hard to make any conclusions based on the debugger output, but if you are willing to try and reproduce the issue again, "procstat -kka" output would be most useful. It dumps the stacks of all blocked threads in the system. (I'm not 100% sure that it is reliable on that platform so it'd be worth checking that it gives sane-looking output before spending much time on it.)

    I don't really understand your assertion that the fields following rm_priotracker are corrupted. The following four bytes are a pointer which represent the lo_name field of the lock object in the union following rm_priotracker, and I can't see anything further than that in the posted memory dump.



  • While testing, I tried procstat, but it locked up my console. I'm assuming it tried to crawl the process structure and came across one of the dead locked processes?

    Let me explain the dump a bit more. In the top image the rm_priotracker pointer (which is actually called rm_activeReaders) is at offset 0x02077e8c (which would be vmem address 0xc1c77e8c, or pf_rules_lock+0x14). It's value is 0xc7b86908 (virtual), which maps to a physical address of 0x07f86908, which I've shown in the last image.

    The first value of the rm_priotracker is actually a pointer to a queue rm_queue rmp_cpuQueue, whose first value is a pointer, so this should have a pointer in it.

    (kgdb) ptype struct rm_priotracker
    type = struct rm_priotracker {
        rm_queue rmp_cpuQueue;
        rmlock *rmp_rmlock;
        thread *rmp_thread;
        int rmp_flags;
        struct {
            rm_priotracker *le_next;
            rm_priotracker **le_prev;
        } rmp_qentry;
    }
    
    (kgdb) ptype struct rm_queue
    type = struct rm_queue {
        rm_queue * volatile rmq_next;
        rm_queue * volatile rmq_prev;
    }
    

    So the value in the dump at 0x07f86908 should be a pointer to an rm_queue struct (which is a queue/linked list), however, 0x00306f65 is not a valid memory location (not in kernel memory anyway). When I checked this value on a live system (reading /dev/kmem), it mapped to a valid 0xc* address.



  • It's possible that one of the deadlocked threads held a lock required by procstat. If you're able to run kgdb against a vmcore you should be able to get backtraces with "thread apply all bt".

    Why do you say that 0xc7b86908 maps linearly to physaddr 0x07f86908? ARM kernels do not use a direct map. Indeed, the base of the kernel map is 0xc0000000 but kernel virtual memory is allocated dynamically once the page table code is bootstrapped. Maybe I'm missing something - did you resolve the address using the kernel page tables?



  • Sorry, guess I should also have posted that!

    I got the physical offset using readelf on the vmcore. Considering the information below, and the fact the pf_rules_lock->lo_name actually pointed to the string "pf rulesets" (that middle image, physical address 0x00d2227a maps to virtual 0x0c92227a), I figured it was accurate.

    readelf --segments vmcore.0
    
    Elf file type is CORE (Core file)
    Entry point 0x0
    There are 2 program headers, starting at offset 52
    
    Program Headers:
      Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
      LOAD           0x001000 0x00001000 0x00001000 0x7ffff000 0x7ffff000 R   0x1000
      LOOS+0xfb5d000 0x000000 0xc0000000 0x00400000 0x00000 0x00000 R   0x1000
    

    Edit, quick little python snippet I've been using (its not perfect, but should map the kernel offsets):

    po = 0x00400000
    vo = 0xc0000000
    def p_to_v(paddr):
      return hex(paddr - po + vo)
    def v_to_p(vaddr):
      return hex(po + (vaddr - vo))
    
    >>> p_to_v(0x00d2227a)
    '0xc092227a'
    

    Edit2:
    Lastly, kgdb could not read the vmcore. It kept saying: Failed to open vmcore: cannot read l1pt

    My assumption there (since my expertise is not in ARM) is that the "l1pt" (table?) was corrupted.

    Edit3:
    Really lastly, while I'm not willing to share the full vmcore, if you (or someone else) is really interested in figuring this out, shoot me a private message and I can carve out data from the vmcore if you have some offsets you want to look at.



  • What I mean is that in general you cannot translate kernel virtual addresses by subtracting 0xc0000000. Lock names are strings embedded into the kernel, and the kernel image itself is indeed mapped linearly. However, rm lock priotrackers are allocated from kernel stacks, which are allocated and mapped dynamically.

    Indeed, it sounds like the vmcore is corrupted or somehow unrecognized. l1pt is the root of the kernel page tables, needed to translate virtual addresses.

    Are you able to break into the in-kernel debugger from the console? That might be another way to get info about the offending code if you are able to reproduce the problem. I'm much more familiar with FreeBSD than pfSense, so I'm not sure whether a debugger is included in the kernel configuration.



  • Right, which is why I used the PhysAddr/VirtAddr from the segments information in the vmcore to do the mapping (So <virtual address> - 0xc0000000 + 0x00400000 gets me the physical address).

    I see what you are saying though about the rm_lock being allocated on the kernel stacks; however, everytime I run (kgdb) print &pf_rules_lock against that kernel, it always returns 0xc1c77e78. On my 2.4.5p1 build, when I run it I always get 0xc1a3e2d4.

    As far as a debugger, I only just noticed (as I went to check) /usr/lib/debug/boot/kernel/kernel.debug, which I was originally using for debug symbols, but now I realize I wasn't booting from that one, heh. That one of course has witness support (like jimp told me, but I was too thick skulled to realize I wasn't booting the correct kernel, smh), and also appears to have remote kgdb support.

    Alright, well If I get some more time, I'll try actually booting that kernel and see what I can find, but for now I need my SG-3100 to be stable.


Log in to reply