[solved] check_reload_status at 100% + apinger messages
-
Need some help guys…..
I've started to configure a new pfsense host (new install) and got it up and running with the default config, 2 interfaces wan/lan and ssh enabled. I've not changed anything else....
I have noticed if i unplug the WAN port from my local network (for testing, my laptop is plugged directly into lan) the Check_reload_status slowly drops down from 100% to 0%. When i plug the wan port back in, it slowly goes to 100% over about 1 minute.
pfSense Version:
2.1.4-RELEASE (amd64) built on Fri Jun 20 12:59:50 EDT 2014 FreeBSD 8.3-RELEASE-p16
I noticed some CPU usage when it should have been idle and found this:
last pid: 38330; load averages: 1.14, 0.89, 0.70 up 0+00:20:07 20:40:09 51 processes: 2 running, 49 sleeping CPU: 0.0% user, 2.6% nice, 11.7% system, 0.0% interrupt, 85.7% idle Mem: 116M Active, 28M Inact, 197M Wired, 128K Cache, 23M Buf, 7514M Free Swap: PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 246 root 1 139 20 6908K 1400K CPU6 6 16:07 100.00% check_reload_status 30950 root 1 76 20 157M 46804K nanslp 3 0:00 1.46% php 33
So I started looking through the logs and found that a pinger is writing out the following messages to the gateway log:
Jul 27 20:31:57 apinger: SIGHUP received, reloading configuration. Jul 27 20:32:11 apinger: ALARM: WAN_DHCP(172.17.60.1) *** down *** Jul 27 20:32:11 apinger: ALARM: WAN_DHCP6(fe80::20d:b9ff:fe13:4b90%igb0) *** down *** Jul 27 20:35:12 apinger: alarm canceled: WAN_DHCP6(fe80::20d:b9ff:fe13:4b90%igb0) *** down *** Jul 27 20:35:13 apinger: alarm canceled: WAN_DHCP(172.17.60.1) *** down *** Jul 27 20:35:15 apinger: SIGHUP received, reloading configuration. Jul 27 20:36:25 apinger: SIGHUP received, reloading configuration. Jul 27 20:36:32 apinger: SIGHUP received, reloading configuration. Jul 27 20:36:40 apinger: SIGHUP received, reloading configuration. Jul 27 20:36:47 apinger: SIGHUP received, reloading configuration. Jul 27 20:36:54 apinger: SIGHUP received, reloading configuration. Jul 27 20:37:02 apinger: SIGHUP received, reloading configuration. Jul 27 20:37:09 apinger: SIGHUP received, reloading configuration.
Here is my /var/etc/apinger.conf file:
[2.1.4-RELEASE][root@pfsense.localdomain]/root(14): cat /var/etc/apinger.conf # pfSense apinger configuration file. Automatically Generated! ## User and group the pinger should run as user "root" group "wheel" ## Mailer to use (default: "/usr/lib/sendmail -t") #mailer "/var/qmail/bin/qmail-inject" ## Location of the pid-file (default: "/var/run/apinger.pid") pid_file "/var/run/apinger.pid" ## Format of timestamp (%s macro) (default: "%b %d %H:%M:%S") #timestamp_format "%Y%m%d%H%M%S" status { ## File where the status information should be written to file "/var/run/apinger.status" ## Interval between file updates ## when 0 or not set, file is written only when SIGUSR1 is received interval 5s } ######################################## # RRDTool status gathering configuration # Interval between RRD updates rrd interval 60s; ## These parameters can be overridden in a specific alarm configuration alarm default { command on "/usr/local/sbin/pfSctl -c 'service reload dyndns %T' -c 'service reload ipsecdns' -c 'service reload openvpn %T' -c 'filter reload' " command off "/usr/local/sbin/pfSctl -c 'service reload dyndns %T' -c 'service reload ipsecdns' -c 'service reload openvpn %T' -c 'filter reload' " combine 10s } ## "Down" alarm definition. ## This alarm will be fired when target doesn't respond for 30 seconds. alarm down "down" { time 10s } ## "Delay" alarm definition. ## This alarm will be fired when responses are delayed more than 200ms ## it will be canceled, when the delay drops below 100ms alarm delay "delay" { delay_low 200ms delay_high 500ms } ## "Loss" alarm definition. ## This alarm will be fired when packet loss goes over 20% ## it will be canceled, when the loss drops below 10% alarm loss "loss" { percent_low 10 percent_high 20 } target default { ## How often the probe should be sent interval 1s ## How many replies should be used to compute average delay ## for controlling "delay" alarms avg_delay_samples 10 ## How many probes should be used to compute average loss avg_loss_samples 50 ## The delay (in samples) after which loss is computed ## without this delays larger than interval would be treated as loss avg_loss_delay_samples 20 ## Names of the alarms that may be generated for the target alarms "down","delay","loss" ## Location of the RRD #rrd file "/var/db/rrd/apinger-%t.rrd" } target "172.17.60.1" { description "WAN_DHCP" srcip "172.17.60.183" alarms override "loss","delay","down"; rrd file "/var/db/rrd/WAN_DHCP-quality.rrd" } target "fe80::20d:b9ff:fe13:4b90%igb0" { description "WAN_DHCP6" srcip "fe80::225:90ff:fef4:7936%igb0" alarms override "loss","delay","down"; rrd file "/var/db/rrd/WAN_DHCP6-quality.rrd" }
Any help would be greatly appreciated.
Thanks!
-
To anyone that has read this and might have ran into the same issue… The auto detected IPV6 network IP addresses that were assigned to the interface didn't have a gateway that was reachable and so apinger was going bananas trying to constantly reload the configuration. When I disabled iPV6, utilization went back to normal and re-enabled and properly configured the IPV6 addresses on the WAN it fixed the issue...
Hope this is at least helpful to someone else as the logs are not very clear as to what's happening.