The strange story of accessing certain websites.
-
I have multi-WAN connection, WAN1 is PPPoE, directly via ethernet, no modem and WAN2 which comes from 4G+ openwrt based router.
I have problems accessing certain websites via WAN1. For example — micron.com
pfSense diagnostic
PING micron.com (13.107.213.70) from ISP.52.ISP.52: 56 data bytes 92 bytes from 13.104.182.179: Time to live exceeded Vr HL TOS Len ID Flg off TTL Pro cks Src Dst 4 5 00 0054 f96d 0 0000 01 01 6a22 ISP.52.ISP.52 13.107.213.70 92 bytes from 147.243.141.111: Redirect Host(New addr: 147.243.141.97) Vr HL TOS Len ID Flg off TTL Pro cks Src Dst 4 5 00 0054 4aaa 0 0000 3a 01 dfe5 ISP.52.ISP.52 13.107.213.70 92 bytes from 13.104.182.179: Time to live exceeded Vr HL TOS Len ID Flg off TTL Pro cks Src Dst 4 5 00 0054 4aaa 0 0000 01 01 18e6 ISP.52.ISP.52 13.107.213.70 92 bytes from 147.243.141.111: Redirect Host(New addr: 147.243.141.97) Vr HL TOS Len ID Flg off TTL Pro cks Src Dst 4 5 00 0054 a50d 0 0000 3a 01 8582 ISP.52.ISP.52 13.107.213.70 92 bytes from 13.104.182.179: Time to live exceeded Vr HL TOS Len ID Flg off TTL Pro cks Src Dst 4 5 00 0054 a50d 0 0000 01 01 be82 ISP.52.ISP.52 13.107.213.70 --- micron.com ping statistics --- 3 packets transmitted, 0 packets received, 100.0% packet loss
Also, it is not possible to access this site through any browser on Android clients connected to the LAN. However, on the LAN PC, it only opens in Firefox or K-Meleon, but not in Edge, for example. But if I start a VM with Windows 7 on that PC, which implies that it is behind a double NAT, then I can access micron.com through any browser installed in this VM.
Traceroute directly on PC:4 6 ms 10 ms 8 ms netnod-ix-ge-a-sth-1500.microsoft.com [194.68.123.181] 5 9 ms 7 ms 8 ms 13.104.182.179 6 * * * Request timed out. 7 * * * Request timed out. 8 7 ms 6 ms 7 ms 13.104.182.179 9 * * * Request timed out. 10 * * * Request timed out. 11 7 ms 7 ms 7 ms 13.104.182.179 12 * * * Request timed out. 13 * * * Request timed out. 14 7 ms 8 ms 7 ms 13.104.182.179 15 * * * Request timed out. 16 * * * Request timed out. 17 8 ms 7 ms 7 ms 13.104.182.179 18 * * * Request timed out. 19 * * * Request timed out. 20 7 ms 8 ms 8 ms 13.104.182.179 21 * * * Request timed out. 22 * * * Request timed out. 23 7 ms 7 ms 7 ms 13.104.182.179 24 * * 15 ms 13.107.246.70 Trace complete.
Traceroute in VM:
5 7 ms 6 ms 7 ms netnod-ix-ge-a-sth-1500.microsoft.com [194.68.12 .181] 6 7 ms 7 ms 7 ms 13.104.182.178 7 * * * Request timed out. 8 * * * Request timed out. 9 * * * Request timed out. 10 * * * Request timed out. 11 * * * Request timed out. 12 10 ms 7 ms 7 ms 13.104.182.179 13 * * * Request timed out. 14 * * * Request timed out. 15 9 ms 8 ms 8 ms 13.104.182.179 16 * * * Request timed out. 17 * * * Request timed out. 18 10 ms 10 ms 8 ms 13.104.182.179 19 * * * Request timed out. 20 * * * Request timed out. 21 8 ms 9 ms 9 ms 13.104.182.179 22 * * * Request timed out. 23 * * * Request timed out. 24 10 ms 9 ms 8 ms 13.104.182.179 25 * * * Request timed out. 26 * * * Request timed out. 27 12 ms 12 ms 9 ms 13.104.182.179 28 * * * Request timed out. 29 * * * Request timed out. 30 11 ms 9 ms 9 ms 13.104.182.179
First hops are related to single/double NAT/ISP addresses and removed.
Ping on WAN2 looks totally different also.
PING micron.com (13.107.213.70) from 10.0.100.101: 56 data bytes 64 bytes from 13.107.213.70: icmp_seq=0 ttl=56 time=32.917 ms 64 bytes from 13.107.213.70: icmp_seq=1 ttl=56 time=38.977 ms 64 bytes from 13.107.213.70: icmp_seq=2 ttl=56 time=43.901 ms --- micron.com ping statistics --- 3 packets transmitted, 3 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 32.917/38.598/43.901/4.492 ms
Has the ISP on WAN1 failed? What could be the issue? When I connect to the other ISP, everything works as expected. Or could there be a misconfiguration on my end?
-
@w0w try to set MTU on WAN1 to 1492 (PPPoE)
-
@mcury
As far, as I remember it was always 1492, but I'll check this
EDIT:
WAN1 shows
-
@w0w Does this issue only happen with that website ?
If so, I don't think it is a configuration issue at your end..
If you sniff that connection, what do you see ? -
Which WAN is the default gateway? I'd guess it's WAN2.
Do you have traffic shaping enabled on either WAN? For buffer bloat mitigation for example.
Is there any reason you posted this in off-topic? Seems very much on-topic.
-
@w0w said in The strange story of accessing certain websites.:
13.104.182.179
This looks like a routing loop at that remote IP when coming from WAN1. Probably not much you can do about it.
-
@stephenw10 said in The strange story of accessing certain websites.:
This looks like a routing loop at that remote IP when coming from WAN1. Probably not much you can do about it.
hmm, looking at it now, I think that is it..
12 10 ms 7 ms 7 ms 13.104.182.179
13 * * * Request timed out.
14 * * * Request timed out.
and all over again.thanks stephenw10
-
@stephenw10
Yep, maybe, but this traceroute was performed in the VM, where I can access this site at least in IE, Chrome, and Firefox. Those two PCAPs are from this PC, with Firefox accessing this site and Edge not.
nonworking_edge.pcap
working_firefox.pcap
So what is the magic? -
Run a pcap, look for differences. I'd check the TTL first. Thought it looks like the loop is exhausting the TTL.
Edit: I see you did!
-
@stephenw10 said in The strange story of accessing certain websites.:
Edit: I see you did!
I don't think its a loop anymore, server is answering with a fin ack directly
-
Were those pcap filtered?
The non-working one does have the initial syn / syn-ack / ack handshake.
It appears to initially connect to something via http and get's redirected to https. That can see the browser sending the request.
In the failed connection we are not seeing all the traffic in that pcap. Could there be some route symmetry somehow? Loadbalancing between the WANs? You do still occasionally see sites that can't handle that.
-
@mcury said in The strange story of accessing certain websites.:
server is answering with a fin ack directly
The client is sending the fin-ack. But the fin is not shown. The client must have seen it though.
-
@stephenw10 said in The strange story of accessing certain websites.:
@mcury said in The strange story of accessing certain websites.:
server is answering with a fin ack directly
The client is sending the fin-ack. But the fin is not shown. The client must have seen it though.
so, if its not MTU, or a loop, and work with some browsers and not others, it must be something with the website/browser compatibility.
this wouldn't be the first time that this site presents problems with browsers, but the weird part is that in the past, it was Firefox that had problems with it.. at that time, a banner would show up showing "outdated browser detected".
-
@stephenw10 said in The strange story of accessing certain websites.:
Were those pcap filtered?
Yes, but both filtered the same way. ip.addr == 13.0.0.0/8
It was captured on pfSense. I'll try it again on the PC side@stephenw10 said in The strange story of accessing certain websites.:
. Could there be some route symmetry somehow? Loadbalancing between the WANs? You do still occasionally see sites that can't handle that.
No load-balancing currently. I have tried to disable WAN2 interface completely, same behavior exists.
-
I've found just setting MTU can be insufficient. Try also setting MSS to the same value.
-
@stephenw10
edge_less_filtered_upload.pcap
Direct capture from PC, filtered unnecessary LAN traffic only. -
@dem
Can't set it right now, may be tomorrow morning... -
@mcury said in The strange story of accessing certain websites.:
Does this issue only happen with that website ?
Sorry for the late response. I have similar problems with some microsoft pages, like https://answers.microsoft.com/en-us and again, firefox is OK and EDGE is failing to display the page correctly like it is in text mode partially or corrupted, same I see on android phone
-
Ok. Looks like the problem is solved!
I just went to the WAN1 and removed MTU and MSS completely!
No It was just some cached result, whatever... not working again
Setting MSS also have no effect -
The problem is isolated, and it seems that some network setting or equipment of my network is to blame because when connecting the laptop or this PC directly with a cable to the ISP and establishing a PPPoE connection, all websites are accessible and everything works fine.
Edit1:
I have CARP and now switched to the secondary firewall and looks like the problem is gone, this time it is isolated to the next step — only when primary firewall is active.after a while, this is not true anymore, because EDGE refuses to access the site again.
Tried playing with TSO/LRO settings, compared routes, don't see much difference. Yes, the hardware on the firewalls is different, but by adding LAGG interfaces, they are brought to a "compatible" state for synchronization. The rules and settings are almost identical.
I got lost... what's next?
Edit2: I just went a bit crazy and set the MTU to 1200 on the primary firewall... The result is that I can now access and ping micron.com
Edit3: after a 20 min. i can still ping micron.com but the site is no more accessible in EDGE, but firefox works.
WTF, magic?
Edit4: Same story on the secondary firewall after a while. Will re-test on the second ISP (WAN2) and will re-test on direct connection to the WAN1 ISP...