CPU load spikes every day at 1pm!

  • I have deployed two other pfSense ALIX boxes from Netgate with no problems.

    For a more serious, higher-bandwidth deployment, I purchased the Netgate Hamakua – http://www.netgate.com/product_info.php?cPath=60_85&products_id=793

    Since I've gone live (3 days ago), something very interesting has happened. EVERY DAY at 1pm, the firewall stops responding! I experience lots of packet loss (40% when I ping a machine across an ipsec VPN tunnel), I can't log in via ssh or HTTP, and today even the DNS forwarder was not responding. Nothing is logged during this period, and the RRD graphs show a big bunch of nothing.

    The episodes last almost exactly 30 minutes. Then things return (almost) to normal. There is still some residual packet loss for several hours, but less than 1%.

    The RRD graphs do show that just before the "episodes", the CPU load goes from about 5-10% to 60-70%.

    I assume the box is under serious CPU load. I poked around in the console and can't see any cron jobs, etc.

    Any ideas what this is? How to diagnose? I'd suspect the hardware, but the "exactly at 1pm" nature of this makes me think something is executing on a schedule!

  • Oh, I should mention the box runs pfSense 1.2.3-RELEASE.

  • The Nexgate guys chimed in and helped me figure this out.

    There's a backup running at 1pm, and that generates a tremendous amount of traffic between the LAN/DMZ interfaces. It's all 100BaseT equipment, so it must be 100Mbs into one interface, and 100Mbs out of the other.

    The biggest problem is DNS – pfSense is running the network's DNS server, and of course when the box is totally loaded, it stops responding.

    I changed the backup schedule (to late at night) and enabled network polling on the "advanced" tab... despite the admonitions. DNS is a pretty critical service!

