V6 Quality drop problem

Treffin · Aug 26, 2012, 8:17 AM

Hi,

I've been running 2.1beta0 for awhile now, with a v6 tunnel from he.net up and going for a month or so now. When I first configured v6, it was extremely fast. As of about Tuesday August 14th, something changed which has affected either the actual RTT through the tunnel, or affected the RRD "Quality" reporting for v6. I'm attempting to determine which is the case.

1. Has anyone else noticed a quality change with Tunnelbroker/he.net with a significant uptick in RTT ms times beginning at around that date?

2. As I've been updating pfSense sporadically (when I am not traveling) I also have to question whether there was an update to RRD graphing that would have changed during that time?

I am in Dallas, TX area and connect to the Equinix tunnel broker endpoint here in town. I use TimeWarner cable for my local Internet service and haven't seen a change for non-v6 traffic/quality during that time.

I'm looking for ideas on what might have happened here, so any info/discussion is welcome. I've also sent an email to HE.net to see if anything has changed at the Equinix endpoint or with routing.

The graphs posted below show the v6 and v4 figures respectively.

Thanks!
Treffin

cmb · Aug 27, 2012, 1:02 AM

The values in the RRD are what they are, nothing has changed there. HE.net's service can be a little hit and miss at times so it just may be greater load on that endpoint, or it's possible they moved that endpoint IP somewhere else, it's possible your ISP had an issue with a peering and had to drop it temporarily, it's possible for business reasons they or your ISP took down a peering and now it has to take a much longer path, there are numerous potential causes. Try pinging the v4 IP that's the endpoint of the tunnel and see how it responds. From what I've seen, it's always right in line with the ping times on v6 within the tunnel.

Treffin · Aug 27, 2012, 8:57 AM

Thanks cmb! I did some research and found the following via ping and trace route:

[2.1-BETA0][admin@storm.xxx.xxx]/root(1): ping 216.218.224.42
PING 216.218.224.42 (216.218.224.42): 56 data bytes
64 bytes from 216.218.224.42: icmp_seq=0 ttl=55 time=50.247 ms
64 bytes from 216.218.224.42: icmp_seq=1 ttl=55 time=50.202 ms
64 bytes from 216.218.224.42: icmp_seq=2 ttl=55 time=50.382 ms
64 bytes from 216.218.224.42: icmp_seq=3 ttl=55 time=47.988 ms

So that looks sort of close to what I'm seeing recently in the RRD graphs. It certainly wasn't that slow in the beginning, but it appears something has changed with the routing/peering between TW and HE. Here is the traceroute, which looks ugly…especially towards the bottom. It appears that the traffic may be forwarded to LAX, to PHX and back to DFW, or at least it's being shoved through those core router nets.

dtuc-mac-e:~ dt$ traceroute 216.218.224.42
traceroute to 216.218.224.42 (216.218.224.42), 64 hops max, 52 byte packets
1 storm (10.0.1.1) 1.567 ms 0.642 ms 0.562 ms
2 cpe-xx-xx-xx-x.tx.res.rr.com (x.x.x.x) 10.844 ms 21.627 ms 13.379 ms
3 24.164.210.253 (24.164.210.253) 9.240 ms 8.634 ms 9.440 ms
4 tge1-4.dllatx40-tr02.texas.rr.com (24.175.38.32) 25.671 ms 20.352 ms 23.774 ms
5 be26.dllatx10-cr02.texas.rr.com (24.175.36.216) 20.657 ms 28.513 ms 24.039 ms
6 24.175.49.8 (24.175.49.8) 20.945 ms 23.499 ms 24.511 ms
7 ae-2-0.cr0.hou30.tbone.rr.com (66.109.6.108) 21.874 ms 23.194 ms 23.427 ms
8 ae-0-0.pr0.dfw10.tbone.rr.com (66.109.6.181) 21.586 ms
107.14.17.141 (107.14.17.141) 20.466 ms
ae-0-0.pr0.dfw10.tbone.rr.com (66.109.6.181) 21.870 ms
9 tengigabitethernet2-1.ar4.dal2.gblx.net (64.211.60.81) 19.162 ms
66.109.9.214 (66.109.9.214) 15.513 ms 17.511 ms
10 64.209.105.42 (64.209.105.42) 57.022 ms 61.205 ms 56.050 ms
11 10gigabitethernet1-3.core1.lax2.he.net (72.52.92.122) 58.119 ms 58.108 ms 64.109 ms
12 10gigabitethernet2-3.core1.phx2.he.net (184.105.222.85) 62.391 ms 61.794 ms 73.904 ms
13 10gigabitethernet5-3.core1.dal1.he.net (184.105.222.78) 56.680 ms 63.593 ms 57.502 ms
14 tserv1.dal1.he.net (216.218.224.42) 52.135 ms 52.867 ms 50.681 ms

In any case, it does seem that the problem is HE<=>TW related, now that I'm into it a bit further. Thanks for the input!

Treffin

cmb · Aug 28, 2012, 4:11 AM

Wow yeah, that's one heck of a path. Judging by the latency, I suspect that is accurate. I'm in Austin at the moment and I have basically the exact same connectivity and latency to that .42 host as you have, about 50 ms, +/- 5 ms. I'm going ATX > DAL > LA > PHX > DAL. Terrible path…

One of our developers is about 40 miles outside of Chicago, to get to the Chicago HE.net endpoint, his traffic goes to NYC and back. At home I'm about 300 miles away from Chicago using the same one, and my latency is about the same if not a little better than his. Not always the best routing in the world on those, unfortunately...