Help troubleshooting random crazy high pings

grandrivers

i have been getting crazy random ping times in 1000's ms ping times(I have been using google dns as target) dont think either connection should be maxxed out but maybe
cable connection is 30M down 5M up cable modem does not seem to show anything in logs during problem pings
dsl is 12M down 1.5M up this is on a new remote that is fiber fed
firewall doesn't seem busy

not sure where to start, i doubt complaining to isp's will do any good

dpinger log:
Jan 30 01:09:18 dpinger DSL_DHCP 8.8.4.4: Clear latency 295695us stddev 513466us loss 0%
Jan 30 01:09:18 dpinger DSLTUNNEL_TUNNELV6 2001:4860:4860::8844: Clear latency 313511us stddev 519605us loss 0%
Jan 30 01:00:14 dpinger DSL_DHCP 8.8.4.4: Alarm latency 751714us stddev 917701us loss 0%
Jan 30 01:00:13 dpinger DSLTUNNEL_TUNNELV6 2001:4860:4860::8844: Alarm latency 755787us stddev 928136us loss 0%
Jan 30 01:00:11 dpinger CABLEMODEM_DHCP 8.8.8.8: Clear latency 25801us stddev 1354us loss 0%
Jan 30 01:00:11 dpinger HENETV6CM_TUNNELV6 2001:4860:4860::8888: Clear latency 42410us stddev 98591us loss 0%
Jan 30 00:59:43 dpinger CABLEMODEM_DHCP 8.8.8.8: Alarm latency 3953332us stddev 2911633us loss 0%
Jan 30 00:59:43 dpinger HENETV6CM_TUNNELV6 2001:4860:4860::8888: Alarm latency 4117914us stddev 2991625us loss 0%
Jan 30 00:59:22 dpinger CABLEMODEM_DHCP 8.8.8.8: Alarm latency 4455902us stddev 2653892us loss 24%
Jan 30 00:59:19 dpinger CABLEMODEM_DHCP 8.8.8.8: Alarm latency 4350145us stddev 2482418us loss 16%
Jan 30 00:59:09 dpinger CABLEMODEM_DHCP 8.8.8.8: Alarm latency 4173967us stddev 2128680us loss 21%
Jan 30 00:59:07 dpinger HENETV6CM_TUNNELV6 2001:4860:4860::8888: Alarm latency 4693392us stddev 1946939us loss 22%
Jan 30 00:59:05 dpinger CABLEMODEM_DHCP 8.8.8.8: Alarm latency 4411678us stddev 2128222us loss 10%
Jan 30 00:58:54 dpinger CABLEMODEM_DHCP 8.8.8.8: Alarm latency 4339307us stddev 2222211us loss 22%
Jan 30 00:58:52 dpinger HENETV6CM_TUNNELV6 2001:4860:4860::8888: Alarm latency 3963105us stddev 2175802us loss 18%
Jan 30 00:58:38 dpinger HENETV6CM_TUNNELV6 2001:4860:4860::8888: Alarm latency 4391051us stddev 2248435us loss 21%
Jan 30 00:58:35 dpinger HENETV6CM_TUNNELV6 2001:4860:4860::8888: Alarm latency 4590721us stddev 2196014us loss 10%
Jan 30 00:58:33 dpinger CABLEMODEM_DHCP 8.8.8.8: Alarm latency 4860984us stddev 2118777us loss 2%
Jan 30 00:58:22 dpinger HENETV6CM_TUNNELV6 2001:4860:4860::8888: Alarm latency 6108897us stddev 2170993us loss 21%
Jan 30 00:58:16 dpinger HENETV6CM_TUNNELV6 2001:4860:4860::8888: Alarm latency 6143825us stddev 2119903us loss 13%
Jan 30 00:57:48 dpinger CABLEMODEM_DHCP 8.8.8.8: Alarm latency 7078675us stddev 3902331us loss 22%
Jan 30 00:57:48 dpinger HENETV6CM_TUNNELV6 2001:4860:4860::8888: Alarm latency 7750165us stddev 3778713us loss 24%
Jan 30 00:57:45 dpinger HENETV6CM_TUNNELV6 2001:4860:4860::8888: Alarm latency 8592792us stddev 4183348us loss 13%
Jan 30 00:57:44 dpinger CABLEMODEM_DHCP 8.8.8.8: Alarm latency 8407503us stddev 4562891us loss 8%
Jan 30 00:57:30 dpinger HENETV6CM_TUNNELV6 2001:4860:4860::8888: Alarm latency 9580424us stddev 9205855us loss 48%
Jan 30 00:57:29 dpinger CABLEMODEM_DHCP 8.8.8.8: Alarm latency 9739973us stddev 9408171us loss 43%
Jan 30 00:57:15 dpinger CABLEMODEM_DHCP 8.8.8.8: Alarm latency 25866us stddev 1581us loss 22%
Jan 30 00:57:14 dpinger HENETV6CM_TUNNELV6 2001:4860:4860::8888: Alarm latency 27255us stddev 2535us loss 21%

throughput.png_thumb

proc.png_thumb

mem.png_thumb

dsltraffic.png_thumb

dslping.png_thumb

dslpackets.png_thumb

cmtraffic.png_thumb

cmping.png_thumb

DLFerguRD

I have seen some high pings as well, up to about 2 seconds.
It started about a month ago and usually happened when there was fairly high upload traffic.
My cable modem was set to 30 down and 5 up so I thought. I discovered after a speed check I was only getting .5 up.
So a call to the isp and I found out that somehow the service was downgraded. Got it put back the way it was supposed to be and now I only see ping highs no more than about 20ms.
I am not pinging google, just my local isp gateway.
I have always wondered about the frequency of the pings (default 250ms). It seems rather fast to me. It just adds to the upload traffic. I changed mine to ping only once every 2 seconds.

dennypage

Each probe from dpinger is 28 bytes at the IP level (standard ping is 84). A dpinger probe every 250ms represents less than 1/50 of one percent of your upload. Not something that is likely to impact your usage.

Btw, if you relax the send interval, you probably want to change the time period and loss interval correspondingly.

@DLFerguRD:

I have always wondered about the frequency of the pings (default 250ms). It seems rather fast to me. It just adds to the upload traffic. I changed mine to ping only once every 2 seconds.

dennypage

While you aren't likely to get too far complaining about latency, the amount of packet loss you are seeing is pretty substantial and something that you should be able to get traction on.

The first thing they are going to have you do is to power cycle the modem. Might as well do that before you call them… ;)

@grandrivers:

not sure where to start, i doubt complaining to isp's will do any good

dpinger log:
Jan 30 00:58:33 dpinger CABLEMODEM_DHCP 8.8.8.8: Alarm latency 4860984us stddev 2118777us loss 2%
Jan 30 00:57:48 dpinger CABLEMODEM_DHCP 8.8.8.8: Alarm latency 7078675us stddev 3902331us loss 22%
Jan 30 00:57:44 dpinger CABLEMODEM_DHCP 8.8.8.8: Alarm latency 8407503us stddev 4562891us loss 8%
Jan 30 00:57:29 dpinger CABLEMODEM_DHCP 8.8.8.8: Alarm latency 9739973us stddev 9408171us loss 43%
Jan 30 00:57:15 dpinger CABLEMODEM_DHCP 8.8.8.8: Alarm latency 25866us stddev 1581us loss 22%

Harvy66

The packetloss is probably caused by the latency. TCP will treat large latency as lost packets creating many retransmitted packets that will flood the target. Bittorrent does this to me all the time when seeders are seeing 1.5k+ pings. As much as 50% of my incoming data can be retransmitted TCP segments. Luckily my ISP uses an AQM which limits my ping to around 30ms no matter what. I just get massive loss instead of high pings. This is a much more sane approach. Lost packets are much easier to handle than bufferbloat.

250ms is my ping from Midwest USA to China, India, or New Zealand. That's half way around the World. 8x that is crazy and breaks stuff like TCP.

I would first validate dpinger is reporting correct values by having another ping, MTR, or pathping seeing the same results.

dennypage

Dpinger acts a bit differently with regard to missing packets. Packets are not permanently declared to be lost. When dpinger generates a report, it reports loss based on the number of packets that have not been received outside of the loss window at that point in time. If missing packets are subsequently received, they are not counted as lost in subsequent reports.

If you want to stop reading here, the take home point is that grandrivers is almost certainly experiencing significant packet loss.

While the only thing we have is the alarm entries in the log, we can still deduce a lot about his situation. Assuming that he has stayed with the default dpinger parameters, the 2% and 8% examples could be explained by periods of high latency. However, the examples of higher loss can't be explained this way.

There are two examples of an alarm firing with 22% loss. If he is using the default dpinger parameters, there would be 115 probes that are outside the loss interval of 1.125 seconds during reporting. For a 22% loss rate, there have to be at least 25 probes with a missing response. To achieve this without actual loss, the minimum latency would have to be over 6.3 seconds.

So in the first example of 22% loss, where the average RTT is 7.1 seconds, it is theoretically possible that very high latency turned into a 22% reported loss for that period. But it's still pretty unlikely given that the standard deviation indicates that there is a fair bit of spread in the RTT samples. In the second example of 22% loss, it's really not possible–the average latency is 25 milliseconds, which the 1.6 millisecond standard deviation indicates is highly consistent.

For the 43% example, the minimum RTT would need to be above 12 seconds.

Now of course if grandrivers is not using the default dpinger values, the above analysis is all wrong. :)

grandrivers

Denny,
not quite at defaults but am thinking about going back to defaults I am at a probe interval of 750 and set the time to treat packet as loss to 2500MS

I am not happy with either of my isps and the guys at esf dont have much experience with semi rural isps as they talk of gig connections at there home that is more than my isps backbone 540M ish for more than 1100 cable modems (4M-50M down packages) school with 1300 students there business t-1 t-3 and they do have dsl customers will be this way for next 2 years contract length then there next problem is mixed modem cable modem upstream with station maintenance on qspk but data flow on qam16 so as the system gets noisy station maintenance rarely moves my modem by more than 1dbmv the modems are also all over the board mine comes in at 35-36 dbmv while others run 44 or more (ie the whole system needs rebalanced)
I know the head cable tech(not always a good thing) he was a year behind me in school so he does get the loss graphs via email and sometimes text depending on my mood, this is also an isp that blocked all icmp on there network for 5-7 years

Now my Adsl2+ all i have to say is its a windstream connection one without enough competitors multiple remotes all funning back to overloaded fiber ports and a way overloaded backbone. equipment just got upgraded and the area manager told me 2 weeks ago my capacities problems are all fixed now 1 month prior i was told no issues or it way on my equipment or even being told it was cause i had more than 1 apple device on my network

i do have 3 different fiber optic backbones that are infront of our farm and no money can buy me access to any of them and all 3 did damage durning installation so was alway willing to trade access for damages, no luck

ps sorry for rant tone of post

dennypage

Not everyone has gig fiber at home. I am on cable… but if I would get fiber I could. :)

I'm sure most of the devs have good connectivity at home. This is to be expected. It's what they do for a living. That being said, I think they put a lot of effort into making pfSense work well in a wide variety of circumstance.

Regards your dpinger settings, the packet loss value of 2500ms is fine for a higher latency connection. However on the send interval, I would either return it to 250ms, or increase the time period to 90 seconds or so to improve the accuracy of loss reporting. Your current accuracy is about 3%.

grandrivers

i returned it to defaults and will give it awhile see
have thought about also about ways i could monitor and graph my cable modem is doing as i know the isp is not

would also be nice to have 2 targets per connection to monitor

grandrivers

heres what last night looked like set at defaults, is there a benefit to leaving probe interval at 250 and moving time period longer than 30 say like 45 or 60 ?

Jan 31 22:49:05 dpinger CABLEMODEM_DHCP 8.8.8.8: Clear latency 25818us stddev 1181us loss 0%
Jan 31 22:48:41 dpinger CABLEMODEM_DHCP 8.8.8.8: Alarm latency 3263762us stddev 3077790us loss 0%
Jan 31 22:48:15 dpinger CABLEMODEM_DHCP 8.8.8.8: Alarm latency 758464us stddev 1477231us loss 25%
Jan 31 22:48:14 dpinger CABLEMODEM_DHCP 8.8.8.8: Alarm latency 725947us stddev 1451881us loss 21%
Jan 31 22:46:51 dpinger CABLEMODEM_DHCP 8.8.8.8: Clear latency 55638us stddev 157651us loss 0%
Jan 31 22:46:26 dpinger CABLEMODEM_DHCP 8.8.8.8: Alarm latency 3291780us stddev 3383516us loss 11%
Jan 31 22:46:12 dpinger CABLEMODEM_DHCP 8.8.8.8: Alarm latency 863789us stddev 1647268us loss 21%
Jan 31 22:46:09 dpinger CABLEMODEM_DHCP 8.8.8.8: Alarm latency 765539us stddev 1570559us loss 11%

dennypage

There is both a benefit and detriment. The benefit would be increased accuracy of loss and smoothing for the latency and standard deviation. The detriment would be delayed response for alarm thresholds. It's a balance. I would leave it at 30 seconds.

@grandrivers:

is there a benefit to leaving probe interval at 250 and moving time period longer than 30 say like 45 or 60 ?

cmb

You still using Hyper-V? It still has the same root issue which caused same with apinger. I believe if you either disable time sync to the VM at the Hyper-V level, or disable NTP sync within the VM, that helps that issue.

grandrivers

have never used hyper v always bare metal should add signature

super micro c2558

cmb

I must have confused you with someone else, nevermind.

grandrivers

no problem , its got to be hard keeping everything you do as straight as you do, thanks for all the help