Occasional Round-Trip Ping Time Spikes?



  • I recently pointed nagios at my office pfSense box, and once every few days it complains at me about round-trip time (on pings) jumping up to several hundred milliseconds.  Within 10 minutes it's telling me things are back down to tens of milliseconds.

    I'm pinging the LAN interface from a machine on a directly-connected switch.  I do have some traffic shaper rules which seem to work OK.

    These events aren't correlated with system load on the nagios machine nor do I see high RTA warnings from other monitored machines on the LAN, but they might be correlated with higher traffic (but still only on the order of 1Mbps on a 1.2GHz box) across the pfSense box.  Nothing out of the ordinary in the pfSense log.

    I'm not quite sure where to look next.



  • Check you rrd graphs if your wan is loaded during that time. The trafficshaper will affect lan to lan traffic as well as your wan downstream is shaped outgoing on lan. Also check the other rrd graphs for peaks.



  • OK, yes, thanks, I can see now that they are definitely correlated - I just got an ~1Mbit spike across the WAN (5 minute sustained, it looks like) and the LAN round-trip time went up to ~246ms for the duration.

    I'm mildly surprised that 1Mbit of traffic would cause a 1.2GHz machine to break a sweat.  Is my surprise naive?  Perhaps I've not optimized well, or ought to be doing some better tuning?

    Hmm, this is interesting - I do have several other peaks today, but they were all a tiny bit below this one.  I have 980Kbps assigned to the root queues, and this spike may have maxed that out, while the others were probably on the order of 920Kbps or so.  I don't really understand how the traffic shaper works, but it seems plausible to me that something different would happen in the shaping code when the queue max was reached.

    I'm still not sure if that just means there's inherent inefficiency here, or if I've got a crummy configuration.



  • I'm guessing you must be using the traffic shaper. Because of the way it works in 1.2, it queues traffic destined to the LAN IP the same way it does for Internet-bound traffic. It can't differentiate between those types of traffic, it's a limitation in the implementation.

    This is already addressed for 1.3 though, with a complete rewrite of the traffic shaper that gets rid of this and several other limitations in 1.2.



  • @cmb:

    This is already addressed for 1.3 though, with a complete rewrite of the traffic shaper that gets rid of this and several other limitations in 1.2.

    Oh, sweet.  I plan to load 1.3 as soon as the beta lands, so I'm going to forget about this now. :)

    Thank you.


Locked