Netgate 2100 apparently causing sporadic (and specific) network outages
-
@SteveITS The traceroute is identical during the times the network allows me access to the websites and when I'm having the timeouts.
-
@sjstein So the traceroute succeeds?
Interesting, I’m troubleshooting an issue between (theory) Comcast in Chicagoland and Squarespace, but that site is on Hostgator. In my case Port 443 is blocked somewhere but not 80 or ICMP, but only Squarespace sites.
-
@SteveITS said in Netgate 2100 apparently causing sporadic (and specific) network outages:
@sjstein So the traceroute succeeds?
Interesting, I’m troubleshooting an issue between (theory) Comcast in Chicagoland and Squarespace, but that site is on Hostgator. In my case Port 443 is blocked somewhere but not 80 or ICMP, but only Squarespace sites.
The traceroute always succeeds, yes. It is so odd.
Here is a trace I just ran - this is when I am unable to connect to the website at grassmonkeysimulations.com:
$ tracert grassmonkeysimulations.com Tracing route to grassmonkeysimulations.com [192.254.235.38] over a maximum of 30 hops: 1 <1 ms <1 ms <1 ms pfSense.home.arpa [192.168.1.1] 2 19 ms 17 ms 18 ms 10.112.162.3 3 11 ms 16 ms 16 ms po-326-346-rur302.elmhurst.il.chicago.comcast.net [96.216.28.101] 4 24 ms 18 ms 18 ms po-2-rur301.elmhurst.il.chicago.comcast.net [68.87.235.37] 5 23 ms 19 ms 20 ms po-300-xar01.elmhurst.il.chicago.comcast.net [68.86.197.57] 6 13 ms 17 ms 19 ms be-33-ar01.area4.il.chicago.comcast.net [68.85.177.85] 7 * * * Request timed out. 8 71 ms 50 ms 58 ms ae1.37.bar4.SaltLakeCity1.level3.net [4.69.219.58] 9 75 ms 88 ms 73 ms 4.53.7.174 10 68 ms 72 ms 68 ms 69-195-64-113.unifiedlayer.com [69.195.64.113] 11 74 ms 68 ms 77 ms po99.prv-leaf1b.net.unifiedlayer.com [162.144.240.135] 12 75 ms 71 ms 76 ms 192-254-235-38.unifiedlayer.com [192.254.235.38]
-
@SteveITS And now a few hours later, I am able to establish a connection to grassmonkey.com - here is the trace now:
$ tracert grassmonkeysimulations.com Tracing route to grassmonkeysimulations.com [192.254.235.38] over a maximum of 30 hops: 1 1 ms <1 ms <1 ms pfSense.home.arpa [192.168.1.1] 2 19 ms 22 ms 21 ms 10.112.162.3 3 19 ms 13 ms 15 ms po-326-346-rur302.elmhurst.il.chicago.comcast.net [96.216.28.101] 4 29 ms 16 ms 17 ms po-2-rur301.elmhurst.il.chicago.comcast.net [68.87.235.37] 5 16 ms 16 ms 18 ms po-300-xar01.elmhurst.il.chicago.comcast.net [68.86.197.57] 6 12 ms 11 ms 20 ms be-33-ar01.area4.il.chicago.comcast.net [68.85.177.85] 7 18 ms 24 ms 17 ms 4.68.110.122 8 50 ms 53 ms 47 ms ae1.37.bar4.SaltLakeCity1.level3.net [4.69.219.58] 9 67 ms 66 ms 67 ms 4.53.7.174 10 76 ms 70 ms 71 ms 69-195-64-113.unifiedlayer.com [69.195.64.113] 11 76 ms 70 ms 72 ms po99.prv-leaf1b.net.unifiedlayer.com [162.144.240.135] 12 72 ms 88 ms 74 ms 192-254-235-38.unifiedlayer.com [192.254.235.38] Trace complete.
The only difference I see between the trace I ran when I was unable to establish a session, and this latest one is hop 7 came back as "Request timed out."
To be honest, I'm not sure what that means/implies.
A whois on that IP (hop 7) shows:
whois 4.68.110.122 # # ARIN WHOIS data and services are subject to the Terms of Use # available at: https://www.arin.net/resources/registry/whois/tou/ # # If you see inaccuracies in the results, please report at # https://www.arin.net/resources/registry/whois/inaccuracy_reporting/ # # Copyright 1997-2023, American Registry for Internet Numbers, Ltd. # NetRange: 4.0.0.0 - 4.127.255.255 CIDR: 4.0.0.0/9 NetName: LVLT-ORG-4-8 NetHandle: NET-4-0-0-0-1 Parent: NET4 (NET-4-0-0-0-0) NetType: Direct Allocation OriginAS: Organization: Level 3 Parent, LLC (LPL-141) RegDate: 1992-12-01 Updated: 2019-07-17 Ref: https://rdap.arin.net/registry/ip/4.0.0.0 OrgName: Level 3 Parent, LLC OrgId: LPL-141 Address: 100 CenturyLink Drive City: Monroe StateProv: LA PostalCode: 71203 Country: US RegDate: 2018-02-06 Updated: 2023-08-10 Comment: USAGE OF IP SPACE MUST COMPLY WITH OUR ACCEPTABLE USE POLICY: Comment: https://www.lumen.com/en-us/about/legal/acceptable-use-policy.html
-
@sjstein Intermediate hops timing out isn't necessarily a problem, backbone routers often deprioritize or ignore pings if they are busy.
I do see you're near me though so maybe that puts this back on Comcast? My two test IPs were in Naperville, though one in West Chicago did connect.
I'm curious are you able to get to Squarespace sites? Two are:
https://www.carbonsys.com/
https://www.harrisfootball.com/ -
@SteveITS said in Netgate 2100 apparently causing sporadic (and specific) network outages:
https://www.harrisfootball.com/
Hi Steve,
I am able to point my browser to both of those websites and the load fine - this is during a time where I am unable to get to the grassmonkeysimulations.com site.
For grins, I ran another traceroute - and that same hop 7 is showing up as a bit odd again:
$ tracert grassmonkeysimulations.com Tracing route to grassmonkeysimulations.com [192.254.235.38] over a maximum of 30 hops: 1 1 ms <1 ms <1 ms pfSense.home.arpa [192.168.1.1] 2 15 ms 14 ms 11 ms 10.112.162.3 3 13 ms 12 ms 11 ms po-326-346-rur302.elmhurst.il.chicago.comcast.net [96.216.28.101] 4 10 ms 13 ms 14 ms po-2-rur301.elmhurst.il.chicago.comcast.net [68.87.235.37] 5 14 ms 18 ms 10 ms po-300-xar01.elmhurst.il.chicago.comcast.net [68.86.197.57] 6 17 ms 13 ms 14 ms be-33-ar01.area4.il.chicago.comcast.net [68.85.177.85] 7 * 16 ms * 4.68.110.122 8 50 ms 50 ms 52 ms ae1.37.bar4.SaltLakeCity1.level3.net [4.69.219.58] 9 73 ms 67 ms 70 ms 4.53.7.174 10 66 ms 67 ms 71 ms 69-195-64-113.unifiedlayer.com [69.195.64.113] 11 67 ms 64 ms 65 ms po99.prv-leaf1b.net.unifiedlayer.com [162.144.240.135] 12 67 ms 66 ms 67 ms 192-254-235-38.unifiedlayer.com [192.254.235.38] Trace complete.
fwiw - this is run from a bash shell on my main windows machine. I also see similar results on my Ubuntu (bare metal) machine.
-
@sjstein If your 2100 is in fact factory default (no installed extra packages) when this happens, I think it’s safe to say its not the hardware/pfSense install that is causing this issue. Since this only concerns certain websites (I assume others work fine during the issue?), and you can also Traceroute to the destination while this issue is present, then it must be something either along the way, or perhaps more likely with the destination site/loadbalancer or similar.
-
@keyser said in Netgate 2100 apparently causing sporadic (and specific) network outages:
@sjstein If your 2100 is in fact factory default (no installed extra packages) when this happens, I think it’s safe to say its not the hardware/pfSense install that is causing this issue. Since this only concerns certain websites (I assume others work fine during the issue?), and you can also Traceroute to the destination while this issue is present, then it must be something either along the way, or perhaps more likely with the destination site/loadbalancer or similar.
I did indeed do a factory reset via the web interface yesterday - so unless that doesn't really "do it all", the 2100 is back to factory fresh (albeit with the most recent firmware).
I want to believe the Netgate isn't the issue, however the test I did yesterday still seems to point in that direction:
- Unable to browse to grassmonkeysimulations.com through laptop connected to LAN port on 2100 with WAN port connected to cable modem
- Disconnect 2100 from cable modem, connect laptop direct to cable modem, power-cycle cable modem
- Successful connection to above website
- Disconnect laptop from cable modem, connect 2100 WAN port (to cable modem), power cycle (cable modem)
- Connect laptop to LAN port on 2100
- Unable to browse to above website. (Verified I was able to browse to google.com, nasa.gov, etc)
I think I may end up going to the local Microcenter and buying a standalone wired router (to substitute for the 2100) and see if that helps(?)
Just to show the error pages that chrome and firefox present:
Chrome:
Firefox:
-
@sjstein Generally: the only way this could be caused by your pfSense software specifically, is if (DNS is not the issue here):
1: It has packages that dynamically inspects and filters sessions in real time.
2: You have IP address conflicts in your setup (between clients and pfSense)About nr. 1: You have not since you are in factory default
About nr 2: I assume you have covered this. Traceroute/ping can behave quite differently from TCP/UDP when an IP conflict is present. Mostly by actually sometimes working intermittently because it runs a full ICMP/ARP for every packet, and if the conflicting IP then responds moments later, the ICMP packet might have gone through - but any sessions on TCP/UDP will die.The other possible causes for your 2100/pfSense to be involved is:
1: If the hardware is defective.
2: The WAN side link has actual link issues and traffic speeds/latency is all over the placeBut these should be ruled out in this case as you can (please confirm):
1: browse the internet and have a running download session/streaming video working while these sites are unreachable - the test should be done in parallel on the same machine that fails to reach the sites.
2: Do the same on other machines on your network while you test machine fails to reach these sites.This will put your pfSense/WAN in the clear for hardware issues, and brings you back to the top where only really a IP conflict could be the reason.
-
@keyser Thanks for the ideas.
Regarding your nr2 (I have IP address conflicts in my setup (between clients and pfSense)) - I want to make sure I understand what that means before saying anything definitive (my apologies as this is all pretty new to me). Are you saying that I might see problems like this if I have multiple with the same IP# on my inside network?
If that is the case, I'm assuming the only way this would happen would be the case that I had devices which are configured as static IP along with the DHCP server on pfSense giving out dynamic IPs. In the past I have had static bound IP#s, but I believe those were abandoned a while ago in favor of binding an IP# to a MAC address within the pfSense config. Of course that mapping no longer exists after the factory reset.
But in general - is my interpretation of your nr2 correct? If so, I will definitely look into assuring all my devices are on unique IPs.
As an aside - all of my work on my home network has been using ipV4, despite the fact that I believe the pfSense DHCP server is also doling out ipV6 addresses...
As to your two tests at the end of your last post:
- (Browse / stream on the same machine I'm having specific website issues with) - that works fine. I just concluded a google meets videoconf without a hiccup, yet that same site is timing out.
- I am able to stream video without issue on other machines on the same network (both wired and wireless) during these outages. Note that those machines (and all machines) on my network are unable to browse to that website during these "outages".
-
@sjstein Generally an IP conflict would cause recurring connectivity issues with both devices that are sharing the IP. It's not going to affect one web site.
I don't think I specifically asked but can you try http://grassmonkeysimulations.com from a different browser or private window? I'm wondering if it does connect and redirects to https (which then fails to connect), which would be what I am seeing with Squarespace.
Edit:
or "telnet grassmonkeysimulations.com 80" and type "/" then Enter to trigger a response, and CTRL+C to quit. -
http://grassmonkeysimulations.com from an incognito window times out in a similar fashion to the other failures
telnet to port 80 of the same URL never connects
$ telnet grassmonkeysimulations.com 80 Trying 192.254.235.38...
addendum: traceroute shows the same timeout on hop 7
-
@sjstein Hmm, then I guess it's a different issue. Kinda bummed actually. :-/
When you swapped your 2100 for a laptop did the WAN IP change?
-
@SteveITS I don't believe it did - but I will verify later today when I am able to take down my network and do some more sluething.
I went out over lunch today and purchased a small Ubiquity (wired only) router/firewall. Not sure when I'll have the time to install, but it may also help pinpoint where the failure is occuring.
-
@sjstein Cool, then we can definitely conclude that the issue itself is not the pfSense/SG2100 box. You problem comes from something else.
That something else could be a IP conflict, and you understood it correctly, but that is not your issue. I can conclude this because your goole meeting and browsing works flawlessly from the same machine experiencing the problem.
My only remaining suggestion apart from the problem being at the remote end is IPv6. My ISP does not offer IPv6, so I’m not yet “used” to include that properly in my faultfinding.
If your ISP offers IPv6 and it works with to some extent with a default pfSsense setup (not many do), that it could very well be the problem. All modern operating systems prefer IPv6 over IPv4 if available, so your HTTP/HTTPS attemps might be done with IPv6 from your browser, while all your faultfinding traceroute and such are done with IPv6.Try and disable/remove IPv6 from one of the clients your are testing from. That will force it to use IPv4 which should avoid any IPv6 issue that might be causing the problem.
-
@keyser I disabled IPV6 on a few clients, rebooted and am still seeing this issue.
Curiously, I also just got a notice from Comcast that I'm approaching my data cap. I don't believe that can be true, but maybe yet-another-clue(?)
Does anyone have a recommendation on a good tool to monitor client data use from the pfSense?
-
@sjstein There are a few, I've used https://docs.netgate.com/pfsense/en/latest/monitoring/graphs/bandwidth-usage.html#bandwidthd. If memory serves it only works on one interface.