Sudden high latency ( check_reload_status?? )



  • Suddenly I am noticing extremely high latency on my connection. I am not sure if it is related but top shows something seemingly abnormal i.e. wcpu % for check_reload_status higher then usual. If I kill -9 the PID latency will slowly decrement then right back up once check_reload_status respawns. After a reboot latency is good for about 15-20 min.

    Screenshots:

    http://i.imgur.com/GI1FFhA.png

    http://i.imgur.com/QLFTmK1.png

    What could be causing it?



  • Do you have anything that would be maxing out your upload speed like a dropbox sync or something. If you don't have traffic shaping enabled and you max the upload on an async connection you will kill your downstream.



  • I started having the same exact issue. I have isolated it to the router itself that is taking up all the upload bandwidth. I was thinking it was hacked or something and possibly doing a DDoS attack. It started about 3 days ago. I updated to the most recent version and no change. Any ideas?



  • I disabled the DNS forwarder and it fixed my issue. I noticed the traffic was on port 53.



  • I had a similar issue the other day that I boiled down to my Snort version. Solved by upgrading Snort to the latest.



  • I've been having a weird latency reporting problem lately, too.

    This all seems to be on pfSense metrics though, as it doesn't happen from any of my hosts on the LAN, nor from shell of pfSense directly. Connectivity doesn't seem to be affected at all, and there is very little activity on the WAN link. The link is 30M/5M down/up.

    Please see attached graphics for captures showing the metrics in pfSense as well as from the pfSense shell.

    Any ideas?

    EDIT: after reading this thread I checked top and saw that check_reload_status had taken a substantial amount of CPU time, far more than snort. I did a kill -9 and on the next spawn it took 24s of CPU time! This is has a dedicated Opteron core with little load (pfSense load avg 0.10). The respawn of check_reload_status also didn't seem to help.

    EDIT2: restart of apinger didn't have an effect, either.

    EDIT3: restarting apinger service from the services page seems to have fixed the problem for now. We'll see how long until it comes back.








  • This also started happening to me in the past few days.



  • Restarting apinger service worked for about 36 hours and then the problem returned as usual.

    Another restart of apinger service seems to have fixed again … for the next 36 hours?  :o




  • Just for fun (and I use this term loosely), I reinstalled pfSense 2.1 from scratch, restored my backup config, and reinstalled the packages. It had no effect… the latency went up to > 4000 msec within a day.



  • To work around this bug with apinger I've resorted to using cron to restart the apinger service every day.

    I used the script found in this thread:
      https://forum.pfsense.org/index.php?topic=69533.0

    /root/apinger_restart.php:

    require_once("/etc/inc/service-utils.inc");
    require_once("/etc/inc/globals.inc");
    require_once("/etc/inc/vslb.inc");
    require_once("/etc/inc/gwlb.inc");
    
    service_control_restart(apinger,restartservice)
    ?>
    
    chmod 755 /root/apinger_restart.php
    

    Then add to /etc/crontab to restart at 0200 every day:

    0       2       *       *       *       root    /usr/local/bin/php /root/apinger_restart.php
    

    Be sure to read the notes in the crontab about ending with a newline.

    And if you reboot your pfsense, you'll need to add this config back to crontab.



  • And if you reboot your pfsense, you'll need to add this config back to crontab.

    If you ever edit a real file somewhere in the FreeBSD file system, then you are (99% of the time) not doing it the way it is designed - pfSense configures (almost) everything from its own config.xml and regenerates things like crontab each boot.
    Install Cron package and add the cron job using that.



  • Does anyone have any theory as to the cause? I would restart apinger or reboot the firewall and it would go away, as some have mentioned, for a matter of hours and then be back. Mysteriously, the problem seems to have subsided. Anything further I can do in regards to troubleshooting or additional information gathering? Worth submitting a bug report?


Log in to reply