What is check_reload_status?



  • What exactly is the function of /usr/local/sbin/check_reload_status ?

    I've been having a problem where it sometimes spins out of control and maxes 1 CPU core. This has been an occasional issue for months, and I am now on 2.4-master. A quick search here of "check_reload_status cpu" pulled up 11 threads, but many were from 2012-2013, and none of them had any answer other than "a reboot fixed it". Only this single post from JimP even hints at its purpose: "check_reload_status is a command dispatching daemon. If it's using CPU, it's because it's being given a lot of commands". But I have been watching my log files (they are idle) and all other procs on my system are <1%. No traffic going though the firewall etc. There's redmine#2555 sitting there quietly at 0% for almost 5 years. Last comment is from Chris Buechler declaring victory over the bug in 2.1. ???

    I can't find any detailed information as to what this process is, what it does, or where to look at its source code (is the source even available??)

    I have run it through truss, ktrace, and kdump but all that revealed is the following repeated over and over 100s of times per second:

    truss (this repeats endlessly)

    kevent(4,{ },0,{ 14,EVFILT_READ,EV_EOF,0x0,0x0,0x30303 5,EVFILT_READ,EV_EOF,0x0,0x0,0x30303 7,EVFILT_READ,EV_EOF,0x0,0x0,0x30303 },64,0x0) = 3 (0x3)
    recvfrom(14,0x7fffffffd940,8,0x0,NULL,0x0) = 0 (0x0)
    recvfrom(5,0x7fffffffd940,8,0x0,NULL,0x0) = 0 (0x0)
    recvfrom(7,0x7fffffffd940,8,0x0,NULL,0x0) = 0 (0x0)
    

    ktrace/kdump (also repeats endlessly)

    328 check_reload_status GIO   fd 4 read 128 bytes
           0x0000 0e00 0000 0000 0000 ffff 0080 0000 0000 0000 0000  |....................|
           0x0014 0000 0000 0303 0300 0000 0000 0500 0000 0000 0000  |....................|
           0x0028 ffff 0080 0000 0000 0000 0000 0000 0000 0303 0300  |....................|
           0x003c 0000 0000 0700 0000 0000 0000 ffff 0080 0000 0000  |....................|
           0x0050 0000 0000 0000 0000 0303 0300 0000 0000 0800 0000  |....................|
           0x0064 0000 0000 ffff 0080 0000 0000 0000 0000 0000 0000  |....................|
           0x0078 0303 0300 0000 0000                                |........|
    
       328 check_reload_status RET   kevent 4
       328 check_reload_status CALL  recvfrom(0xe,0x7fffffffd940,0x8,0,0,0)
       328 check_reload_status GIO   fd 14 read 0 bytes
           ""
       328 check_reload_status RET   recvfrom 0
       328 check_reload_status CALL  recvfrom(0x5,0x7fffffffd940,0x8,0,0,0)
       328 check_reload_status GIO   fd 5 read 0 bytes
           ""
       328 check_reload_status RET   recvfrom 0
       328 check_reload_status CALL  recvfrom(0x7,0x7fffffffd940,0x8,0,0,0)
       328 check_reload_status GIO   fd 7 read 0 bytes
           ""
       328 check_reload_status RET   recvfrom 0
       328 check_reload_status CALL  recvfrom(0x8,0x7fffffffd940,0x8,0,0,0)
       328 check_reload_status GIO   fd 8 read 0 bytes
           ""
       328 check_reload_status RET   recvfrom 0
       328 check_reload_status CALL  kevent(0x4,0x801628000,0,0x801628800,0x40,0)
       328 check_reload_status GIO   fd 4 wrote 0 bytes
           ""
    


  • Rebel Alliance Developer Netgate

    My other post is correct. Something must have sent a ton of events or somehow it fell into that state.

    It handles things like filter reloads, interface events, etc. Events are triggered through it using pfSctl or send_event() and they are queued and handled by check_reload_status to avoid things stepping on each other too much.

    We've seen it get caught in a loop on rare occasions yet but haven't been able to reproduce it reliably enough to debug it properly.



  • I think I can reproduce it when a WAN connection is flapping. But, how to debug it? Can you answer my question about the source code?


  • Rebel Alliance Developer Netgate

    Source is in our FreeBSD ports tree

    : pkg which -o `which check_reload_status`
    /usr/local/sbin/check_reload_status was installed by package sysutils/check_reload_status
    
    

    https://github.com/pfsense/FreeBSD-ports/tree/devel/sysutils/check_reload_status/



  • Thanks. Will try to dig.



  • I made a small test patch (have not submitted a PR yet because I wanted feedback first) that seems to solve the issue for me. At least in my case it was caused by Console opt 16 restart php-fpm, killing that made check_reload_status go into a nosedive.  So I wrapped it with a start/stop. Been testing that for a couple days and so far it has helped. Any thoughts?

    Side note: since it doesn't seem possible to build a "pfSense" platform from source, what is the recommended method for trying to make & test changes to check_reload_status.c in case it needs to be worked on?

    edit: didn't get any comments here so I submitted PR#3637