Can't figure out why a few workstations are dropping packets
-
I posted a similar question on reddit today, but I can use all of the help I can get.
I recently upgraded a client's network but now a few workstations are either dropping packets like crazy or unable to load certain websites.
The network is set up as follows:
pfSense box is connected to a Dell PowerConnect 5448. I have two UniFi APs providing WiFi for the building. There are 8 tenants/small businesses in the building, each on a separate VLAN. The UniFi APs are each broadcasting 4 SSIDs, one for each VLAN.
We have about 20-30 fully functioning devices. They're receiving IP addresses just fine, no connectivity issues, moving along very quickly. But, there are a few workstations that are just not working. They're all Windows 7 machines. If I ping 8.8.8.8 on an indefinite ping, I'll receive Timed Out errors every 2-3 attempts, sometimes for long stretches of time. I can ping other devices on the respective subnets with no packet loss at all. One of the workstations is connected to the same 5-port switch as my workstation. Mine is working flawlessly. I've plugged the malfuntioning workstation into the wall directly to no success, as well as different ports on our room's 5-port switch. All of these devices function perfectly when I connect them back over to a separate router I've set up for testing.
If I connect these devices to other VLANs, same deal. If I connect them to just WiFi, same problem. I've set the devices up on the default VLAN as well, and I get the same unfortunate results. I can RDP into my workstation, but I can't remote into the malfunctioning device on my same dumb-switch. The port forwarding rules are identical, except for the specific port.
I'm at a loss. I've verified with our ISP that my Public IP, gateway, subnet mask etc. are all correct.
One of the PCs can connect to some websites like facebook.com while failing to connect to google.com. When it does connect to facebook, it takes an extremely long time.
The PCs are all within 3-4 years old. I've ensured the PCs' NICs' drivers are up to date.
I'm at a loss.
Here are the specs of the pfSense box:
-Intel(R) Atom(TM) CPU D2500 @ 1.86GHz
-Dual Intel 82574L Gigabit Ethernet Controllers
-4GB RAM
-60GB SSD -
If it's just those Win7 boxes and they all show the same symptoms on different ports/VLANs, then I would think it's the Win7 boxes. If it were pfSense, the issue would be widespread and affect all machines the same.
-
If I ping 8.8.8.8 on an indefinite ping, I'll receive Timed Out errors every 2-3 attempts, sometimes for long stretches of time. I can ping other devices on the respective subnets with no packet loss at all.
What about the pfSense VLAN interface? What about your ISP next-hop?
I would concentrate on one of the most consistently-failing devices and check everything (everything) again.
Is all your switching gear VLAN-capable or at least on untagged edge ports? No attempt at putting tagged traffic through unmanaged devices?
Everything configured to auto-negotiate? No hard-set 100-full in the switch ports or edge devices anywhere?
-
If I ping 8.8.8.8 on an indefinite ping, I'll receive Timed Out errors every 2-3 attempts, sometimes for long stretches of time. I can ping other devices on the respective subnets with no packet loss at all.
What about the pfSense VLAN interface? What about your ISP next-hop?
I would concentrate on one of the most consistently-failing devices and check everything (everything) again.
Is all your switching gear VLAN-capable or at least on untagged edge ports? No attempt at putting tagged traffic through unmanaged devices?
Everything configured to auto-negotiate? No hard-set 100-full in the switch ports or edge devices anywhere?
I can ping the pfSense VLAN interface with no packet loss. I can ping other devices on the VLAN subnet with no packet loss. I can RDP into my functioning workstation from the malfunctioning workstation, but I can't RDP into the malfunctioning workstation from the functioning workstation.
The managed switch that's between the pfSense router and the workstations is VLAN-capable. All of the ports are untagged with the exception of the ports that connect to the router and APs. I have no tagged traffic going through unmanaged devices.
-
What about your ISP next-hop?
-
What about your ISP next-hop?
Sorry, I can ping ISP next-hop with no packet loss as well. I hope this is what you're asking for.
-
I would say the next call is to your ISP then.
-
I would say the next call is to your ISP then.
Thanks for all of your help so far! I've verified with the ISP that my settings are correct. I've also moved the pfsense router behind an unmanaged switch and set up a separate router on the unmanaged switch. I've connected the problematic PCs to this router and they're working fine. This would rule out an ISP issue, correct?
-
I don't see how you think that if you can ping the ISP next hop with no problem but get packet loss to 8.8.8.8 that it's not an ISP/internet problem.
Or there's a piece missing in your description somewhere. Like I said before, take a known-problem host and check everything top to bottom in the path from it out to the internet.
-
These issues only starting happening when we moved to the new pfsense setup. The reason I don't believe it's an ISP issue is because when the same devices are on a separate router on the same modem, we're not having issues. I'll will certainly do as you suggest and recheck everything.
There probably is something missing in my description, I just don't know what.
-
if you're using win7, use pathping to get a bunch of samples for a trace route. See where the loss is starting.
-
Sounds like a Gateway or DNS problem. If you connect to another router all is running fine as you described
is this right? Then I really thing you should search in the DNS direction at first.