Gmail/Google services unresponsive
-
Problem appeared this morning and I cleared the pfsense state table but Gmail remained unresponsive.
So I toggled the interface WAN1 off/on and that fixed the problem.
When I turn off WAN1, failover kicks in and traffic is routed thru WAN2 with a different IP. A few seconds later, I turn WAN1 back on, traffic is again routed thru WAN1 and Gmail works fine.
Bear in mind that WAN1 is passing traffic to a router with a static IP.
The only time the IP changes is those few seconds when I turn off WAN1 and failover kicks in passing traffic to WAN2.So toggling WAN1 off/on does not change the public IP of WAN1. However, turning it off and turning it back on again seems to "reset" something else, which allows traffic to flow to Gmail.
Reloading the filter or clearing the state table does not change anything.UPDATE:
I did the following experiment: I set up a single workstation to send its traffic directly to the ISP's router (the one used by WAN1), however, pfsense was still doing DNS Forwarding for that workstation. Guess what happened? Gmail became unresponsive, just like on the workstations which route everything thru pfsense's WAN1.
So am I dealing with some sort of DNS problem? -
Potentially. Though I would expect to see some sort of resolving error if so.
Try setting a different DNS server on the failing client directly and see if that corrects the issue.
Steve
-
Yeah, I'd expect the web browser to give me a resolving error, but that's not the case. The browser hangs or times out. It never gives me a "unable to resolve name" error. Weird.
But the workstation experiment is pretty indicative of a DNS problem, since all non-DNS traffic is sent directly to the ISP's router, and only DNS requests go thru pfsense. Hmm...
When I set a DNS server on the failing client directly thru the ISP's router, Gmail works fine. It's like plugging a laptop directly into the ISP's router, which I already did before.On monday I will pass all DNS traffic directly to the ISP's router, and all non-DNS traffic thru pfsense. I'm curious what will happen.
Funny thing is, I have a second pfsense box in another part of town and no problems. Latest pfsense, the same ISP (different static IP), same DNS servers and no issues with Google/Gmail...
-
Which WAN is Unbound using to access the DNS servers? It will use the system default gateway unless you've set it use something else.
It's possible the outgoing IP will return different DNS results causing a failure. If any of the WANs are over a VPN for example.Steve
-
I haven't touched the default settings of the DNS Resolver service. For Network Interfaces it says "all", and for Outgoing Network Interfaces it also says "all".
Since this morning all clients are getting their DNS queries directly thru the ISP's router and all is well. All non-DNS traffic is still going thru pfsense. Let's see how long this lasts.
UPDATE: "happy monday" did not last long. Gmail just became unresponsive with all DNS queries going directly to the ISP's router.
So I started sending all DNS to a different ISP - did not help.
I also changed the DNS server to Google's 8.8.8.8 - did not help.
OK, I'm thinking now this is NOT a DNS issue. -
I agree. It seems unlikely to be a DNS issue from that evidence.
Previously you speculated it might be an encryption negotiation issue. Do you see this problem on all clients? All browsers?
If it was something like that I would expect it to affect only something outdated.Steve
-
I think the encryption negotiation idea leads nowhere. All my faulty clients are running latest Firefox or Chrome, so TLS cipher negotiation shouldn't be a problem. I was just thinking out loud...
I did another test which may indicate the ISP (or Google?) is at fault somehow after all.
I split up 40 faulty clients into 2 groups of 20, group A and B.
Both groups have DNS vetted, that is they are going directly to the ISP's router and hitting Google's 8.8.8.8.
As for non-DNS traffic, both groups are going thru pfsense, but:
Group A is going thru the "faulty" 300mbps cable internet, while Group B is going thru my backup ISP's 40mbps ADSL pipe.And what happened was that Group A suffered from unresponsive Gmail, while Group B was completely fine.
I checked Group A for any lag, but Ping and tracert to Gmail came back fine, even though Gmail itself was unresponsive thru web browser.
I know Ping and tracert are UDP/ICMP, while Gmail is TCP with TLS overhead, but that obviously shouldn't cause Gmail to be unresponsive.So both groups are going thru pfsense, hitting different ISPs, and one group ends up being faulty, while the other is fine.
ISP at fault after all?I've contacted the ISP a few times, but they told me everything is fine on their side, and they said to "check my firewall for port blocking"...
Yeah right, port blocking, when everything that is HTTPS/443 works fine except for Google services...So tomorrow I will reprogram Group A to go directly to the ISP's router and I'll wait for Gmail to crap out again (or maybe it won't).
-
UPDATE: I sent traffic from Group A directly to the "faulty" ISP's router and user feedback was that Gmail/Drive was very slow or unresponsive...
So when I plug a single laptop into the ISP's router no problems, but a group of 20 clients "crashes" Google services? Very odd... -
Hmm, that is strange. Triggering some blocking somewhere upstream? Something in the ISPs router maybe?
An anti DOS service or similar perhaps.
Steve
-
I've done some more packet tracing with software called WinMTR.
I've traced packets going to mail.google.com and I have as much as 10% packet loss at the last hop or second to last hop. Apparently the issue is on Google's end. They're dropping about 10% of our traffic, no wonder Gmail/Drive doesn't work properly. Ugh.
I've sent screenshots of the traces to Google. I wonder if and when they will find the reason for the 10% drops.It seems pfsense is not the culprit. And WinMTR is your best friend when there's packet loss. :)
-
There is an MTR package for pfSense in case you weren't aware.
The webgui page for it runs for a limited number of packets but you can run it at the console with the usual options:
[2.4.5-RC][admin@1100-3.stevew.lan]/root: mtr --help Usage: mtr [options] hostname -F, --filename FILE read hostname(s) from a file -4 use IPv4 only -6 use IPv6 only -u, --udp use UDP instead of ICMP echo -T, --tcp use TCP instead of ICMP echo -a, --address ADDRESS bind the outgoing socket to ADDRESS -f, --first-ttl NUMBER set what TTL to start -m, --max-ttl NUMBER maximum number of hops -U, --max-unknown NUMBER maximum unknown host -P, --port PORT target port number for TCP, SCTP, or UDP -L, --localport LOCALPORT source port number for UDP -s, --psize PACKETSIZE set the packet size used for probing -B, --bitpattern NUMBER set bit pattern to use in payload -i, --interval SECONDS ICMP echo request interval -G, --gracetime SECONDS number of seconds to wait for responses -Q, --tos NUMBER type of service field in IP header -e, --mpls display information from ICMP extensions -Z, --timeout SECONDS seconds to keep probe sockets open -r, --report output using report mode -w, --report-wide output wide report -c, --report-cycles COUNT set the number of pings sent -j, --json output json -x, --xml output xml -C, --csv output comma separated values -l, --raw output raw format -p, --split split output -t, --curses use curses terminal interface --displaymode MODE select initial display mode -n, --no-dns do not resove host names -b, --show-ips show IP numbers and host names -o, --order FIELDS select output fields -y, --ipinfo NUMBER select IP information in output -z, --aslookup display AS number -h, --help display this help and exit -v, --version output version information and exit See the 'man 8 mtr' for details.
Steve
-
Just an idea ...
I saw a "Youtube" presentation of a known guy using the free gmail/google cloud free disk space as a huge (even bigger) free remote NAS backup space.
They even combined several Google accounts together, to make it even bigger.
They discovered that Google will limit you when you = a a user on an IP, uploads more then xxxxxxxxxxx bytes a day.I'll post that video here when my memory comes back. .. edit : https://www.youtube.com/watch?v=y2F0wjoKEhg
Could this explain what you are seeing ?
For example : your cable WAN IP is shared with others ? so it's maybe not even you uploading .... -
UPDATE: Google engineer has replied saying that after analyzing my WinMTR results and "extensive research" he believes the problem is on the ISP's side...
How can it be that WinMTR is showing packet loss on the very last hop of the trace, and Google says it's the ISP?
Besides, I also have "TCP Retransmission errors" in my packet captures which point to Google's IP.
No packet loss according to WinMTR from any of the in-between hops. Google's conclusion makes no sense to me.
If it were an ISP problem there should be packet loss at the in-between hops, right or wrong?
Man, this problem is going to drive me mad...I've also checked the DNS logs over at my DNS provider cleanbrowsing.org
I don't know why, but about 16% of all DNS queries were AAAA records being queried. AAAA records are IPv6 host records, but ISP told me IPv6 is turned off. Besides, my pfsense is not allowing IPv6.
I have no idea why these AAAA are popping up in the DNS logs... And quite a few are AAAA records belonging to Google. Hmm... -
@namotore said in Gmail/Google services unresponsive:
very last hop of the trace, and Google says it's the ISP?
ISP's and major info providers like Google have a ongoing discussion about who pays the the POP between them.
Or that link is right now to small ....AAAA : something like : most if not all Google apps and services you use on your devices prefer IPv6, thus resolve AAAA first.
To find out later on a IPv6 link can't be opened, so they switch back to A (IPv4). -
If you have any IPv6 connectivity at all but not full connectivity that can really bork stuff.
I have seen sites appear to fail because clients think they can connect ober v6 but cannot. Triple check that!
Steve