Windows Clients cannot access the internet, very strange unexpected DNS problem.
-
@bmeeks good insight.. Depending for sure - you could get different paths taken, or path could change - it would all come down to the actual setup.. And if there is even multiple paths that could be taken..
But yeah you could be on to something with the routing changing to why seeing issue sometimes and not others.
-
@johnpoz Actually I am not doing anything special here, I just added some cisco switches behind a pfsense box, done everything according to pfsense and cisco regulations, it should work, and yes I have multiple path in the form of ether channels, but the client is only two hops away from pfsense. Even one hop away from pfsense is probably gonna give the same issue, directly connected I know will probably work for sure.
It is probably not the ISP, because when I directly connect the PC with the VDSL modem, then there is no issue.It has to be the cisco hardware.
By my knowledge there is nothing else you can do further in Cisco IOS to troubleshoot the problem with the current network condition.Now be frankly, is this firewall ever tested with cisco hardware or ospf in general?
If have to throw everything into mottballs, that would be very lame if you ask me... -
@bmeeks
Don't bother with text, text is wrong, just the network model applies. -
@IrixOS said in Windows Clients cannot access the internet, very strange unexpected DNS problem.:
Don't bother with text, text is wrong, just the network model applies.
But do all the link aggregations still apply? If so, that's a lot of places where something can go weird with a slight misconfiguration.
I also found a long thread from 2020 from a user that was having LACP issues with pfSense that were apparently never resolved. Have a look here: https://forum.netgate.com/topic/158534/lacp-not-working.
-
Yes they still apply, but I have nothing configured on the leftside of the CATALYST in the middle yet. The right side of the catalyst in the middle is configured.
Just consider the spot with 'I am here'. From there zooooom over PO1 to the ASBR (the switch that is directly connected to pfsense) and zooooom over PO2 to pfsense).
That's about it. OSPF is configured between, all routes are advertised and on the ASBR a Null route 0.0.0.0 0.0.0.0 is configured with the pfsense IP as its Next hop address. Static route in pfsense pointing back to the internal network in the form of a summary route, so all connections are there.
It should work according to regulations. -
@IrixOS said in Windows Clients cannot access the internet, very strange unexpected DNS problem.:
Yes they still apply, but I have nothing configured on the leftside of the CATALYST in the middle yet. The right side of the catalyst in the middle is configured.
Just consider the spot with 'I am here'. From there zooooom over PO1 to the ASBR (the switch that is directly connected to pfsense) and zooooom over PO2 to pfsense).
That's about it. OSPF is configured between, all routes are advertised and on the ASBR a Null route 0.0.0.0 0.0.0.0 is configured with the pfsense IP as its Next hop address. Static route in pfsense pointing back to the internal network in the form of a summary route, so all connections are there.
It should work according to regulations.As I mentioned previously, this part of networking is not my strong point. I understand the basic concepts, but in my old job never had to actually fully design something like this. At my company we had the equivalent of @johnpoz engineers who designed the links. My job was primarily cybersecurity and firewalls, client/server software installation, configuration and administration, and various types of system programming. I interacted very frequently with the link-layer stuff and even did most of the firmware updates on equipment at my sites, but I was not heavy into the design phase.
-
@bmeeks
You mention that LACP of a past thread. I know the LACP aggregaat between pfsense is working at least from watching the pings.
JohnPoz advised me last year to do a wireshak to see what is going on, I have the feeling from the output of wireshark, not sure, there seems to be some point where dns doesn't come trough.
And that is the ip address of the LACP aggregate (10.216.64.17) as shown in the network diagram. -
@IrixOS said in Windows Clients cannot access the internet, very strange unexpected DNS problem.:
there seems to be some point where dns doesn't come trough.
That may be true, but DNS coming through or not coming through has absolutely nothing to do with not getting a reply when running
ping 8.8.8.8
. As I've said a few times, forget DNS until you have zero problems pinging an outside IP address. When you ping an IP directly (without using a domain or host name), then DNS is not relevant. DNS is UDP and/or TCP. Ping is ICMP. When you can't ping an IP address directly, then ICMP or routing (or both) is broken.When the basic Layer 2/3 connectivity is broken, then of course DNS is not going to work. Also be aware that asymmetric routing, if present, is going to drive a stateful firewall like pfSense bonkers. It's going to block certain traffic because it may not have seen the SYN and so did not create an open state for any stateful replies.
-
@bmeeks Yes I agree.
-
Is pfsense actually ever tested with cisco hardware?
-
@IrixOS said in Windows Clients cannot access the internet, very strange unexpected DNS problem.:
Is pfsense actually ever tested with cisco hardware?
pfSense is not tested for connectivity with any type of switch. An Ethernet port is an Ethernet port. It either correctly auto-negotiates and connects at the physical layer, or it does not.
In regards to something like LACP, that is always going to be a question mark regardless of who the vendors are. It seems that no manufacturer can resist the urge to "improve" upon some agreed upon standard. That's why incompatibilities exist.
If you think the Cisco connection to your pfSense box is the source of the problem, then simplify the connection to a single GigE link for a test. You only have 1 Gig to the Internet according to the diagram (but I know you said the text was not always accurate). Collapsing down to a regular single GigE link will eliminate any possibilty of protocol incompatibilities because there will be no LAGG and no LACP.
If your diagram is accurate in regards to all those aggregated links, then I think your issue lies there. I think something is happening in OSPF in your internal networks. Maybe a routing loop as @johnpoz hypothesized.
-
@IrixOS said in Windows Clients cannot access the internet, very strange unexpected DNS problem.:
Static route in pfsense pointing back to the internal network in the form of a summary route, so all connections are there.
You may want to check this article to see if maybe one of the two possible cons with summary routes could bite you with your current setup: https://networklessons.com/rip/introduction-route-summarization. Summary routes are in general good, but there are scenarios where they can cause issues.
Here is another good article explaining the possible pitfalls that can happen with summary routes: https://bradhedlund.com/2006/02/19/route-summarization-with-alternate-paths/.
Not saying either of these articles represents your problem, but it is something you might want to thoroughly investigate.
-
@bmeeks @IrixOS no offense dude but that is a mess! You might want to draw up your L2, and with another layer or different drawing do your L3.. I can not tell what device this is on even..
Why are you listing 3 different IPs all in your 10.214.0.0/20 ?
You got multiple port channels/laggs/lacp all over the place.. How many clients, where exactly iss the routing happening for these different vlans.. Why do you have /20s those are huge networks.. Do you really have that many clients that justify such large networks.. A /22 is prob the largest I would go or you start running into broadcast/multicast issues - especially if your windows shop, and have not turned down all the noise they send out, and if you leave IPv6 enabled - even if not doing anything with it, that is just way more noise.
What is the point of so many members in your port channels.. What is the point of 4 1ge connections into pfsense if it only has 1 ge out? Is it routing these other vlans? If so a summary route that covers all your vlans your directly attached to can be problematic yes.. If you are pointing that downstream to some other router.. If pfsense is directly attached and routing some of these vlans... Why is the dhcp only like 20 IPs?? If you have so many clients you need a /20
I just have no desire to dive into that and try to make any sense out of it - sorry!! Now you were a paying customer, and we were take over this network... What my gut reaction would be if this was the drawing the current IT gave me - would be ok, we are going to do wipe and new design.. Starting with a green field.. vs wasting cycles and money trying to make any sense of that current drawing..
What is the point of 8 1ge connections from 1 switch to another switch? Your switches have that many spare ports? Why would you just not run a couple of 10ge interfaces for your uplinks?
What networks/address are these?
You have 1 too many octets in there ;)
-
@johnpoz Haha I will explain if it makes sense to you, The reason I have so many port-channels because I like to play with/influence the path for ospf and the vlans.
Don't bother the text, it is wrong. As for the range, I once made a mistake for having no more address space to address more devices without
reconfiguring the whole network, so I took it large and to have the feeling being on an enterprise network, the noise of the device and the smell of the plastic
an the silicon sounds like music in the ears, does that make sense to you? -
@johnpoz Yes it might be a mess, should I be ashamed? The left side is a three tier model and multiple links is common, correct me? As for the port-channels, It
technically works. -
@johnpoz The 1Gigabit internet speed might change some time in the future right?
-
@IrixOS said in Windows Clients cannot access the internet, very strange unexpected DNS problem.:
to have the feeling being on an enterprise network,
I can tell you right now, enterprise networks don't use /20s.. Not any of the ones I have been on, or companies I have worked for or supported in 30 some years of doing this ;) I have seen customers use /16 for their printer vlan - they had maybe 100 tops.. But that was just because the guy didn't have a clue....
Sure you plan out an IP scheme for a location, sure you leave room for growth, so if you can add more devices to a specific vlan/network if you need too.. But unless you have or know for sure sometime real soon your going to be over a /24, use /24 to keep it simple!! leave room before the next network so you can expand that network if you want before bumping into the next one.. Sure leave room so it could grow to a /20 if you want..
Go try and work in a DC for a couple of hours.. It won't be music to your ears ;) hehehehe
If you want to lab with routing protocols - great more power too you.. But wouldn't be playing around with ospf, nobody uses that any more.. I haven't seen that used anywhere in years and years and years.. BGP is what is used in any actual enterprise network. Got to be atleast 20 some years since I have seen that used anywhere.. eigrp sure.. Still see that hanging around in some shops.. I have a change tmrw actually to remove some old eigrp config that was sitting on some routers at a site that was acquired years ago and just needs to be cleaned up.. Since its no longer needed, bgp is doing all the routing.
Here is what I suggest you do - take your "production" and make it as simple as possible!! If you want to lab, then do it on your lab/learn.. Do it on your lab.. Not your network.. Be it actual user going to yell when something not working, or if just the wife and kids going to yell when they can't watch netflix.. You shouldn't lab in what is "production"
Also if you want to play with routing - play with it being multiple sites, not all the same network.. Your going to notice companies collapsing their networks, reducing equipment, routing is done at the distribution layer or the core for a large campus.. But unless it is a huge campus that is really spread out and really is more like different sites, you are seeing the distribution layer collapse and route just at what is being called the core.. But the typical core, distribution, access layer is still quite common.. Looking at that drawing I don't see the 3 layers of your traditional network.. No "enterprise" would be setup like that..
You want your network to be simple, simple is easier and faster to find where the problem is.. You want it to be redundant so failure can happen and network still functions.. I don't see simple in that setup, I see a pain the ass to troubleshoot, I see multiple places for problems to happen and just break everything..
And don't pay attention to the text? Documentation is key in any network.. If your going to do a drawing it should be easy to read, it should be well documented and correct! Or you might as well just throw it out.. Because its useless, and if anything going to lead you down the wrong path trying to fix something.
-
I have 2x 3750G series, 1x 3750E, 4x 4948 series and some X2 and FSP modules, what do you recommend I should do then?
According to the cisco pages, the left side represents a three tier model, the access switches connected to both HSRP routers and each switch for a given vlan has a 10Gbit port monitored and the vlan traffic is routed to the middle switch/router. Okay the links between might be overkill, what do you think I should do then?
As for the switch connected to the servers, it has some redundancy, are these servers supposed to be in a private vlan? I read that servers are interconnected in a ToR topologyWhat do you want me to do next? How would you design it?
-
@johnpoz I am sorry mate.
-
@IrixOS said in Windows Clients cannot access the internet, very strange unexpected DNS problem.:
How would you design it?
for what the couple of boxes you have shown... I would throw it all out and just get 1 switch, and connect that to pfsense and let it route.. I see nothing on that drawing that suggest you need any of that... I do even see an arrow that say 200 users here, Or 50 users, or any thing that would justify anything near that complexity..
It looks like you threw together some stuff to try and lab something.. But not sure what you wanted to lab.. And had a bunch of cables laying around and figured what the hell lets plug them all in ;)
If your single devices you show - you don't even show what vlans they are?? From what I can tell they are all in the single /20
You have 2 networks this 10.214.48 and then some 10.214.64/s that look like transits? Is your 10.214.48 your management vlan?
But can't tell what is actually doing routing? And for what networks? How much data flow is actually needed?
If you got some gear and you want to play/learn - great do that.. But I wouldn't run your actual whatever network on it.. If you want to hang your lab off of some transit network on pfsense or even multiple vlans off pfsense for your "lab" then do that... But your PC to get to the internet or other devices you use like your nas/filer or DC, etc. that shouldn't sit on on what your labbing on.