CPU usage increase suddenly

michmoor

@stephenw10 Interestingly when i try to start the DNSBL service it fails. Nothing in the log thats helpful.
PFblocker just the IP blocking is working without issue. Just the dns sinkholing is where we're failing. My curiosity here is peaked.
Any commands you can recommend to gain some insight?

michmoor

@stephenw10 Restarted the entire pfblocker package and now its functioning. The increased in CPU usage has come back -- unbound related.
How can i diagnose better?

Below is when i had pfblocker Enabled without DNSBL. Then I turned it back on.

michmoor

Just really weird that unbound tied with pfblocker is acting so strange.

Phizix

@michmoor,

Is it not expected for it to use some more CPU? So is it much more than expected?

Phizix

michmoor

@Phizix Historically on my SG-6100 cpu utilization isnt an issue. I have a baseline so thats how i know where this an issue. Right now although DNSBL is the problem its not causing any system instability. I would like to know why its acting this way if there is indeed an issue which i suspect there is.

Phizix

@michmoor,

Makes sense if you had a baseline to compare to. So indeed more CPU usage than normal.

I am curious what you find when you solve it. Was there a recent package update?

Phizix

michmoor

@Phizix I'll keep this thread as updated as i can. I started a reddit post on it so i hope the maintainer can respond there as well. @BBcan177

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
36790 unbound       1 118    0   294M   260M CPU3     3 551:03 100.96% unbound
 8387 www           1  20    0    44M    23M kqread   3  41:13   1.27% haproxy
59321 root          5  20    0   666M   578M nanslp   3 187:01   0.79% suricata
18304 root          1  20    0    12M  2488K kqread   3   7:11   0.75% dhcpleases
23975 root          5  20    0   859M   711M nanslp   3 172:29   0.70% suricata
98310 root          5  20    0   745M   667M nanslp   2 145:47   0.58% suricata
30622 root          1  20    0    13M  3152K select   0 105:28   0.44% syslogd
35711 root          1  20    0    13M  3584K bpf      3  66:04   0.22% filterlog
96223 root          1  20    0    14M  4000K CPU2     2   0:00   0.10% top
60163 nobody        1  20    0    16M  4996K select   3   0:32   0.08% softflowd
46104 zabbix        1   4    0    24M    11M select   1   5:45   0.08% zabbix_agentd
13937 root         17  68    0   107M    28M sigwai   1   7:27   0.08% charon
45925 zabbix        1  20    0    24M    11M select   1   5:39   0.07% zabbix_agentd
60519 dhcpd         1  20    0    25M    13M select   1   0:49   0.07% dhcpd
31822 root          1  20    0    18M  8012K select   0   9:39   0.05% openvpn
48305 root          3  20    0    69M    35M kqread   1   1:26   0.05% syslog-ng

michmoor

Problem solved. Re-installed pfblockerNG. Made sure i had the 'Keep settings' option enabled.
It was really a last option thing. I didnt know if a reinstall would fix it but i knew there was something wrong with the configuration.
I had a custom DNS block list that was blocking example.com. I have since removed it a while ago but i noticed the domain is still getting sink holed. I triple checked to make sure the domain wasnt listed but pfblocker was indeed blocking it.
Re-install and now im back to baseline. Weird bug in the package but without other tools to debug i cant say why the package freaked out the way it did. I also cant reproduce the problem anymore.

Phizix

@michmoor,

Thanks for the follow-up. Good to know.

Phizix

michmoor

The problem has come back.
Restarting unbound or dnbl doesnt solve the problem. The only solution is to disable DNSBL and cpu util goes back to normal.
I honestly have no idea and im at a lost.
I reinstalled the package from completely not saving any settings.

   11 root        187 ki31     0B    64K RUN      2 862:26  87.99% [idle{idle: cpu2}]
   11 root        187 ki31     0B    64K CPU3     3 835:22  84.28% [idle{idle: cpu3}]
   11 root        187 ki31     0B    64K CPU1     1 851:44  77.69% [idle{idle: cpu1}]
18451 unbound      68    0   235M   202M kqread   1   7:55  46.19% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
18451 unbound      68    0   235M   202M kqread   1   0:00  46.00% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
18451 unbound      68    0   235M   202M kqread   0   0:00  46.00% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
18451 unbound      68    0   235M   202M kqread   2   0:00  46.00% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}

michmoor

Interesting...Checking the DNS logs i see the same script being loaded over and over again.
Im still poking around. No clue right now :)
Whats also so weird is that unbound keeps restarting..

stephenw10

Can you disable python mode in pfBlocker to test?

If you run ps -auxwwd can you see what script is actually running?

michmoor

@stephenw10 I think i solved it. Im fairly confident its solved now....i hope

The clues are in the DNS logs. I noticed Unbound kept restarting and i remember a while ago reading on the forums that DHCP registration causes Unbound to restart. I do have registration enabled for all VLANs so i wasnt totally buying that as a reason. Regardless I reviewed each DHCP configuration for the vlans and what do i find?

Ahhh this aggressive lease timer. I stood up new DNS servers for a vlan and needed clients to switch over quickly. I never updated this until today. Switched back to defaults and CPU utilization shot back down to normal baseline levels.

Someone correct me if im wrong but i thought the DHCP registration issue with needed Unbound to restart was solved in the latest release?

stephenw10

Part of that issue was solved but it still restarts Unbound to load the new values every time. Which is.... sub optimal!

Yes, 60s is very short. Any reason it was set to that?

michmoor

@stephenw10 said in CPU usage increase suddenly:

Yes, 60s is very short. Any reason it was set to that?

Reply

I stood up new DNS servers and wanted devices to cut over right away which worked but caused an issue for myself.

This entire issue smelled like a config problem but i couldn't prove it at the time. I went against my rule of rebooting the firewall as i truly dislike doing that especially if things were working before.