[solved] check_reload_status at 100% + apinger messages



  • Need some help guys…..

    I've started to configure a new pfsense host (new install) and got it up and running with the default config, 2 interfaces wan/lan and ssh enabled. I've not changed anything else....

    I have noticed if i unplug the WAN port from my local network (for testing, my laptop is plugged directly into lan) the Check_reload_status slowly drops down from 100% to 0%. When i plug the wan port back in, it slowly goes to 100% over about 1 minute.

    pfSense Version:

    
    2.1.4-RELEASE (amd64) 
    built on Fri Jun 20 12:59:50 EDT 2014 
    FreeBSD 8.3-RELEASE-p16
    
    

    I noticed some CPU usage when it should have been idle and found this:

    
    last pid: 38330;  load averages:  1.14,  0.89,  0.70                                           up 0+00:20:07  20:40:09
    51 processes:  2 running, 49 sleeping
    CPU:  0.0% user,  2.6% nice, 11.7% system,  0.0% interrupt, 85.7% idle
    Mem: 116M Active, 28M Inact, 197M Wired, 128K Cache, 23M Buf, 7514M Free
    Swap:
    
      PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
    246 root        1 139   20  6908K  1400K CPU6    6  16:07 100.00% check_reload_status
    30950 root        1  76   20   157M 46804K nanslp  3   0:00  1.46% php
    33
    
    

    So I started looking through the logs and found that a pinger is writing out the following messages to the gateway log:

    
    Jul 27 20:31:57	apinger: SIGHUP received, reloading configuration.
    Jul 27 20:32:11	apinger: ALARM: WAN_DHCP(172.17.60.1) *** down ***
    Jul 27 20:32:11	apinger: ALARM: WAN_DHCP6(fe80::20d:b9ff:fe13:4b90%igb0) *** down ***
    Jul 27 20:35:12	apinger: alarm canceled: WAN_DHCP6(fe80::20d:b9ff:fe13:4b90%igb0) *** down ***
    Jul 27 20:35:13	apinger: alarm canceled: WAN_DHCP(172.17.60.1) *** down ***
    Jul 27 20:35:15	apinger: SIGHUP received, reloading configuration.
    Jul 27 20:36:25	apinger: SIGHUP received, reloading configuration.
    Jul 27 20:36:32	apinger: SIGHUP received, reloading configuration.
    Jul 27 20:36:40	apinger: SIGHUP received, reloading configuration.
    Jul 27 20:36:47	apinger: SIGHUP received, reloading configuration.
    Jul 27 20:36:54	apinger: SIGHUP received, reloading configuration.
    Jul 27 20:37:02	apinger: SIGHUP received, reloading configuration.
    Jul 27 20:37:09	apinger: SIGHUP received, reloading configuration.
    
    

    Here is my /var/etc/apinger.conf file:

    
    [2.1.4-RELEASE][root@pfsense.localdomain]/root(14): cat /var/etc/apinger.conf
    
    # pfSense apinger configuration file. Automatically Generated!
    
    ## User and group the pinger should run as
    user "root"
    group "wheel"
    
    ## Mailer to use (default: "/usr/lib/sendmail -t")
    #mailer "/var/qmail/bin/qmail-inject"
    
    ## Location of the pid-file (default: "/var/run/apinger.pid")
    pid_file "/var/run/apinger.pid"
    
    ## Format of timestamp (%s macro) (default: "%b %d %H:%M:%S")
    #timestamp_format "%Y%m%d%H%M%S"
    
    status {
    	## File where the status information should be written to
    	file "/var/run/apinger.status"
    	## Interval between file updates
    	## when 0 or not set, file is written only when SIGUSR1 is received
    	interval 5s
    }
    
    ########################################
    # RRDTool status gathering configuration
    # Interval between RRD updates
    rrd interval 60s;
    
    ## These parameters can be overridden in a specific alarm configuration
    alarm default {
    	command on "/usr/local/sbin/pfSctl -c 'service reload dyndns %T' -c 'service reload ipsecdns' -c 'service reload openvpn %T' -c 'filter reload' "
    	command off "/usr/local/sbin/pfSctl -c 'service reload dyndns %T' -c 'service reload ipsecdns' -c 'service reload openvpn %T' -c 'filter reload' "
    	combine 10s
    }
    
    ## "Down" alarm definition.
    ## This alarm will be fired when target doesn't respond for 30 seconds.
    alarm down "down" {
    	time 10s
    }
    
    ## "Delay" alarm definition.
    ## This alarm will be fired when responses are delayed more than 200ms
    ## it will be canceled, when the delay drops below 100ms
    alarm delay "delay" {
    	delay_low 200ms
    	delay_high 500ms
    }
    
    ## "Loss" alarm definition.
    ## This alarm will be fired when packet loss goes over 20%
    ## it will be canceled, when the loss drops below 10%
    alarm loss "loss" {
    	percent_low 10
    	percent_high 20
    }
    
    target default {
    	## How often the probe should be sent
    	interval 1s
    
    	## How many replies should be used to compute average delay
    	## for controlling "delay" alarms
    	avg_delay_samples 10
    
    	## How many probes should be used to compute average loss
    	avg_loss_samples 50
    
    	## The delay (in samples) after which loss is computed
    	## without this delays larger than interval would be treated as loss
    	avg_loss_delay_samples 20
    
    	## Names of the alarms that may be generated for the target
    	alarms "down","delay","loss"
    
    	## Location of the RRD
    	#rrd file "/var/db/rrd/apinger-%t.rrd"
    }
    target "172.17.60.1" {
    	description "WAN_DHCP"
    	srcip "172.17.60.183"
    	alarms override "loss","delay","down";
    	rrd file "/var/db/rrd/WAN_DHCP-quality.rrd"
    }
    
    target "fe80::20d:b9ff:fe13:4b90%igb0" {
    	description "WAN_DHCP6"
    	srcip "fe80::225:90ff:fef4:7936%igb0"
    	alarms override "loss","delay","down";
    	rrd file "/var/db/rrd/WAN_DHCP6-quality.rrd"
    }
    
    

    Any help would be greatly appreciated.

    Thanks!



  • To anyone that has read this and might have ran into the same issue… The auto detected IPV6 network IP addresses that were assigned to the interface didn't have a gateway that was reachable and so apinger was going bananas trying to constantly reload the configuration. When I disabled iPV6, utilization went back to normal and re-enabled and properly configured the IPV6 addresses on the WAN it fixed the issue...

    Hope this is at least helpful to someone else as the logs are not very clear as to what's happening.