Higher load on one 4100 using same config and another 4100

Proton

Hi!

I have 2 Netgate 4100.
I used the same config on both.

On one i see:
last pid: 47044; load averages: 1.02, 1.14, 1.17 up 6+20:51:58 14:41:21
300 threads: 4 running, 263 sleeping, 33 waiting
CPU: 6.1% user, 9.3% nice, 16.0% system, 0.0% interrupt, 68.6% idle
Mem: 57M Active, 391M Inact, 124K Laundry, 878M Wired, 2483M Free
ARC: 349M Total, 60M MFU, 267M MRU, 280K Anon, 2632K Header, 19M Other
275M Compressed, 1122M Uncompressed, 4.08:1 Ratio

PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
435 root 155 20 13M 2952K CPU0 0 77.5H 100.00% /usr/local/sbin/check_reload_status
11 root 187 ki31 0B 32K RUN 1 116.5H 62.89% [idle{idle: cpu1}]
11 root 187 ki31 0B 32K RUN 0 109.6H 49.17% [idle{idle: cpu0}]

On the other:
CPU Activity
last pid: 87925; load averages: 0.21, 0.29, 0.30 up 0+01:53:10 14:41:51
289 threads: 3 running, 256 sleeping, 30 waiting
CPU: 1.6% user, 0.4% nice, 2.1% system, 0.0% interrupt, 95.9% idle
Mem: 109M Active, 194M Inact, 542M Wired, 56K Buf, 2965M Free
ARC: 207M Total, 30M MFU, 171M MRU, 136K Anon, 1064K Header, 5491K Other
173M Compressed, 416M Uncompressed, 2.40:1 Ratio
Swap: 1024M Total, 1024M Free

PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
11 root 187 ki31 0B 32K RUN 0 108:35 100.00% [idle{idle: cpu0}]
11 root 187 ki31 0B 32K CPU1 1 108:24 100.00% [idle{idle: cpu1}]

I see that one the second there is also showing a swap information, and on the first it shows a prosess
100.00% /usr/local/sbin/check_reload_status

So what i want to understand is what is this process doing?
I know this process receiver commands and execute them, but how can i pinpoint excact what commands is running repeatedly?

THX!
Per

stephenw10

It's possible the check_reload_status has become stuck. If you kill it does it come back?

Proton

@stephenw10
kill -9 <PID>
process is NOT coming back and load went down to 0,3

Not sure why this happened though. Guess i will never know :(

THX!

stephenw10

Indeed we have seen a few reports of that but have never seen it here. If it happens again and anything is logged at that point or, even better, if you can find a way to replicate it please let us know.

There have been similar issues in the past but not for quite a while.