What is check_reload_status?
-
What exactly is the function of /usr/local/sbin/check_reload_status ?
I've been having a problem where it sometimes spins out of control and maxes 1 CPU core. This has been an occasional issue for months, and I am now on 2.4-master. A quick search here of "check_reload_status cpu" pulled up 11 threads, but many were from 2012-2013, and none of them had any answer other than "a reboot fixed it". Only this single post from JimP even hints at its purpose: "check_reload_status is a command dispatching daemon. If it's using CPU, it's because it's being given a lot of commands". But I have been watching my log files (they are idle) and all other procs on my system are <1%. No traffic going though the firewall etc. There's redmine#2555 sitting there quietly at 0% for almost 5 years. Last comment is from Chris Buechler declaring victory over the bug in 2.1. ???
I can't find any detailed information as to what this process is, what it does, or where to look at its source code (is the source even available??)
I have run it through truss, ktrace, and kdump but all that revealed is the following repeated over and over 100s of times per second:
truss (this repeats endlessly)
kevent(4,{ },0,{ 14,EVFILT_READ,EV_EOF,0x0,0x0,0x30303 5,EVFILT_READ,EV_EOF,0x0,0x0,0x30303 7,EVFILT_READ,EV_EOF,0x0,0x0,0x30303 },64,0x0) = 3 (0x3) recvfrom(14,0x7fffffffd940,8,0x0,NULL,0x0) = 0 (0x0) recvfrom(5,0x7fffffffd940,8,0x0,NULL,0x0) = 0 (0x0) recvfrom(7,0x7fffffffd940,8,0x0,NULL,0x0) = 0 (0x0)
ktrace/kdump (also repeats endlessly)
328 check_reload_status GIO fd 4 read 128 bytes 0x0000 0e00 0000 0000 0000 ffff 0080 0000 0000 0000 0000 |....................| 0x0014 0000 0000 0303 0300 0000 0000 0500 0000 0000 0000 |....................| 0x0028 ffff 0080 0000 0000 0000 0000 0000 0000 0303 0300 |....................| 0x003c 0000 0000 0700 0000 0000 0000 ffff 0080 0000 0000 |....................| 0x0050 0000 0000 0000 0000 0303 0300 0000 0000 0800 0000 |....................| 0x0064 0000 0000 ffff 0080 0000 0000 0000 0000 0000 0000 |....................| 0x0078 0303 0300 0000 0000 |........| 328 check_reload_status RET kevent 4 328 check_reload_status CALL recvfrom(0xe,0x7fffffffd940,0x8,0,0,0) 328 check_reload_status GIO fd 14 read 0 bytes "" 328 check_reload_status RET recvfrom 0 328 check_reload_status CALL recvfrom(0x5,0x7fffffffd940,0x8,0,0,0) 328 check_reload_status GIO fd 5 read 0 bytes "" 328 check_reload_status RET recvfrom 0 328 check_reload_status CALL recvfrom(0x7,0x7fffffffd940,0x8,0,0,0) 328 check_reload_status GIO fd 7 read 0 bytes "" 328 check_reload_status RET recvfrom 0 328 check_reload_status CALL recvfrom(0x8,0x7fffffffd940,0x8,0,0,0) 328 check_reload_status GIO fd 8 read 0 bytes "" 328 check_reload_status RET recvfrom 0 328 check_reload_status CALL kevent(0x4,0x801628000,0,0x801628800,0x40,0) 328 check_reload_status GIO fd 4 wrote 0 bytes ""
-
My other post is correct. Something must have sent a ton of events or somehow it fell into that state.
It handles things like filter reloads, interface events, etc. Events are triggered through it using pfSctl or send_event() and they are queued and handled by check_reload_status to avoid things stepping on each other too much.
We've seen it get caught in a loop on rare occasions yet but haven't been able to reproduce it reliably enough to debug it properly.
-
I think I can reproduce it when a WAN connection is flapping. But, how to debug it? Can you answer my question about the source code?
-
Source is in our FreeBSD ports tree
: pkg which -o `which check_reload_status` /usr/local/sbin/check_reload_status was installed by package sysutils/check_reload_status
https://github.com/pfsense/FreeBSD-ports/tree/devel/sysutils/check_reload_status/
-
Thanks. Will try to dig.
-
I made a small test patch (have not submitted a PR yet because I wanted feedback first) that seems to solve the issue for me. At least in my case it was caused by Console opt 16 restart php-fpm, killing that made check_reload_status go into a nosedive. So I wrapped it with a start/stop. Been testing that for a couple days and so far it has helped. Any thoughts?
Side note: since it doesn't seem possible to build a "pfSense" platform from source, what is the recommended method for trying to make & test changes to check_reload_status.c in case it needs to be worked on?
edit: didn't get any comments here so I submitted PR#3637
-