Windows can't connect to the internet

Fons

Hello,

I have this odd problem. On one of the interfaces of our pfsense box windows machines aren't able to connect to the outside world.
but all others, mac, linux, specialized boxes, are connecting smoothly through the firewall. This problem came up last Monday. Before no problems of this kind were seen.
apart from some floating rules for general ports and specific rules for the interface we use a block rule at the end of the rule list to block all traffic that has not been allowed. in general: everything is blocked unless specifically allowed.
We think of a computer, most probably a pc, that is causing this trouble. maybe someone got some malware. but in the logs, in the packet sniffers at the interface nor in a scanner we set up we can find a source for this. all hardware is getting valid ip from the dhcp server on the interface, dns forwards are being handled by the interface. but I do not see the windows machines coming up in the arp tables or in the states tables.
I can see a lot of udp traffic trying to get to the outside world. most of it is related to multicast.
today I will try to get a deeper look in all hardware behind the if.
my question: does anyone have a clue which could help me solve this or to narrow the problem?

regards, Fons

johnpoz

So pfsense is your dhcp server? And your windows machines get IP from the dhcp server fine - but your saying you don't see any of them in the pfsense arp table?

When you say other boxes are connecting fine - they connecting through the same pfsense lan interface? Or a different one?

So from a windows machine - can you ping the pfsense lan IP? What is in the windows arp table for pfsense IP? Is it the correct MAC?

Are you windows machines and these other machines all on the same network, or are their vlans?

First thing I would do is verify you can ping pfsense from your windows machines - if you can not, then you need to troubleshoot that before you worry about internet access.

Fons

Hi Johnpoz
Thank for your time and attention.
indeed all hardware is on the same /24 network and all is getting dhcp from the pfsense interface. no underlying vlans on that network.
after troubleshouting the networkcomponents like the pfsense if, dhcp settings, managed network switches, access points, cabling to client output, I got to check the windows machines.
first from my own book, a win7 machine. after sticking in the network cable I got an ip and the correct network settings from the pfsense if. no problems there. But I could not connect to the pfsense box. the box did not answer to ping either.
I looked to the pfsense from another interface, on which there are no troubles like this. I could see my book in the dhcp leases but not in the arp and the states tables or traffic from my machine in the syslog.
I disconnected the network and connected my book directly to the pfsense if. just to check of course. lucky enough but not surprisingly I got a valid ip adres and access to the www. so I connected back the network and I connected my book again via a switch and the same trouble came up.

I think the answer is quite simple, someone has a device in the network causing this. and it's certainly not the pfsense box or any of the other basic network components
Unlucky for me it is a network used by people that wants to hire a desk for a short or longer period. no admin rights for me on all of their hardware. Time consuming troubleshooting on all of the client devices will be my part. I do run a tcp dump trying to get some traffic to analyse, but till now I did not see any anomalies or abnormal traffic. But of course I only can see the traffic on the pfsense if. Maybe a hub could help me out here to sniff all the network traffic.
thanks again, I will be struggling on, Fons

johnpoz

And what sort of device do you think could do this?

So you say you get dhcp from pfsense on the windows machine. But then you can not ping pfsense on this IP address? And you don't even see the traffic hitting the interface on pfsense via sniff?

So dhcp is UDP and is going to be broadcast traffic.. So these other devices that work are connected to the same switch? And these are managed switches? So guess its possible something could be configured there that prevents directed tcp traffic but allows broadcast udp??

And works fine when you take switch out of the mix. have you tried connecting a windows box to the port and wire that a good working machine is using?

I can not think of an issue that some other device on the network could be doing that would only mess with windows machines. Now its possible you have a box that has duplicate IP of your pfsense IP.. So when you arp for the mac of the pfsense IP you get this other box, etc. But that should effect ALL machines on the network, not just windows machines.

I would verify in the windows machine what it shows for mac of pfsense IP, and check other linux machines showing the same mac - and that it is in fact the mac of your pfsense interface.

I would then sniff on windows machine with simple ping.. So you see it go out on the wire! then on pfsense sniff at same time - do you see this packet get there? So do you have access to this managed switch - what switch is it? Depending you could verify its got the macs on the correct ports for your devices. With span/mirror port you should be able to verify packets enter and leave ports on switch, etc.

Fons

Hi Johnpoz,

thanks for your advice.

in the middle of the afternoon all over sudden the problem faded and all the windows machines were able to reach the outside world. It somewhat at the time I worte my reply to your post.
meanwhile I have set up a mirror/monitor on both the switches where clients are connected. If somewhere tomorrow the problem comes up again I will try to track it going in deep from one of the affected machines and sniffing from the switches at the same time. looking for the mac of the connected gateway I will certainly try to find out.
the kind of device could be anything, even an smartphone of some sort, maybe something like acting as access point. maybe someone forgot to put off that functionality on his/her device. or maybe advanced malware.

The point that this disturbance comes up and goes unpredictable gets me to think of malware or a disconfigured device. I asked the people on the floor to keep track of persons coming in or out, logging in or out, and what happens on the network. It doesn't seem nice to do so but if the problems stay they can't do their jobs anymore. most of them are involved in app building and are professional internet users.
thanks again and wish me luck, regards, Fons

Fons

Hi Guys,

I've got some new information. When windows users are rejected from connection. If you do a ipconfig /release and ipconfig /renew on a windows machine they sometimes can't find the dhcp server.
But the machines are up in the dhcp leases. so I presume they don't get answers back. or to say the dhcp ack does.not reach the client. does anyone know of extra udp ports that need to be opened to let windows machines accept the dhcp ack.

hope to hear from you, Fons

johnpoz

If you have lost connection, you would not be able to release it on the server even if you released it local - which is why lease would still be listed on server.

When this happens, what do you show for the MAC of the pfsense IP?? Is it there, is it correct? If you have some rouge machine with pfsense dupe IP and that is what answers arp for the IP, then that could explain your issue.

Problem is all devices should be seeing this problem, not just windows machines.

stephenw10

This is an interesting thread. From my point of view at least, not being the person receiving complaints. ;)

If there was a dupe IP on the subnet pretending to be the gateway I would expect to see some 'duplicate IP' errors in the pfSense logs.

Windows machines have been infected with some malware that's rewriting the gateway information? Similar to some of those DNS viruses I read about but have never seen.

Your switch is somehow misconfigured? What sort of switch is it? Layer 3 capabilities?
Malfunctioning switches can cause all manner of odd behaviour.

IPv6 on something causing some alternative route/gateway? That might explain why only windows machines.

Steve

johnpoz

Hmmm I like your IPv6 idea – yup that could be some weird stuff going on there for sure.

But he states he can not ping his pfsense when the problem happens. "the box did not answer to ping either. " I would have to assume he is pinging the ipv4 address. So no matter what his windows machines thought they should be doing with ipv6 as dns or gateway, etc. That should not affect him pinging an IPv4 on the local segment.

He says no issue when connected directly to his pfsense interface - so that really really points to something odd with the switch. But what makes no sense is only window devices being effected?? Unless they are just on specific ports on one blade in the switch?

No details of the switch to work with other than mentioned its a managed one.

Fons

Hi Guys,

I found the wrongdoer. it happened to be a badly configured iphone. the owner had it configured as an accesspoint with the same gw ip address as the pfsense if. because the problem comes up now and then I thought it had to be such sort of device.
last time the problem appeared I started a wireshark session on both the switches (straight out layer 2, no vlans or other difficulties by the way, spanning tree enabled on the ports) from the monitoring port. on the pfsense if I only can scan for traffic on the incoming port which would not bring me to traffic between the assaulted clients and the bad iphone.
after that I looked up the arp table on one of the assaulted machines and found the bogger right away. the gateway ip had another mac-address as it should be.
a little analysing through the wireshark files and the arp table at pfsense brought me the ip-address and a beautiful hostname which stated his christian name. from there it all went easy.
I'd like to say I shot him but shure he did not know what he was doing.

even though a question remains open. how to prevent an open network from such behaviour. could it help to use another ip range, away from the standard 192.168.2.0? or can I have something on pfsense that can recognise such behaviour and prevent the network from it. the way clients work the network has to be as open as possible. I'd like to share your thoughts on this.
thanks anyway for all your advise and thoughts, in the end it has been an interesting few days with loads of stress but also good learning moments and points. regards Fons

stephenw10

@Fons:

I found the wrongdoer. it happened to be a badly configured iphone.

Wow. I did not expect that. ::)
I take it your wifi is on the same subnet as the affected clients then? That's another good reason to isolate wireless clients.
There were no 'duplicate IP' errors in the pfSense logs? Interesting.
Using an slightly unusual IP range would certainly have at least helped you trace the problem. clients would have been given a completely different address by the iphone if it was configured to do so.
If you have your wifi bridged to your wired lan through two pfSense interfaces you can filter DHCP requests from wired to wifi. This would stop a rouge DHCP server on wifi but perhaps not a dupe gateway. Much more likely to either not be a problem or show log errors since all traffic is processed by pfSense. That may present a network bottleneck though.

Good to know anyway. Thanks for coming back with the info. :)

Steve

johnpoz

What doesn't make sense is why none of the other devices were affected.

If you had a device saying hay my IP address is the same as pfsense lan IP (your gateway) Then yes its possible that devices when they arp for the gateway IP they would get that mac and try to use it as their gateway.

but what doesn't make sense in that scenario is that only your window machines were affected. Since you stated all devices were on the same segment and such. All devices arping should of intermittently either gotten the correct mac for the IP, or the bad mac for the iphone. Maybe they were and only windows users reported the issue?

See my comment from quite a few posts back

" Now its possible you have a box that has duplicate IP of your pfsense IP.. So when you arp for the mac of the pfsense IP you get this other box, etc. But that should effect ALL machines on the network, not just windows machines."

stephenw10

Possibly Windows machines refresh their ARP table faster.
Maybe machines running other OSes are not rebooted every five minutes! ;)

Still seems very odd.

Steve

johnpoz

Very true!! I believe windows would be some random time between 15 and 45 seconds unless modified

http://support.microsoft.com/kb/949589

on linux, for example ubuntu I show this
net.ipv4.neigh.eth1.gc_stale_time = 60

So that should be 60 seconds?

But doesn't this come into play as well?
net.ipv4.neigh.eth1.locktime = 100

One way to prevent this from happening again would be to create a static arp entry on each machine for the pfsense IP.

Metu69salemi

@johnpoz:

One way to prevent this from happening again would be to create a static arp entry on each machine for the pfsense IP.

And/Or create dhcp snooping protection from switches

Fons

Hi guys,

thanks for all your thoughts on this, and thank Johnpoz, for the interesting lecture about arp and windows. I surely don't understand why microsoft always do things different as standard rfc's mention and are able to get their own rules packed in slightly different rfc's. most of the time they create vulnerabilities, if not reboot every 5 minutes ;-)
anyway it seems I do have some studying to do the coming days to get some working measures on the network segment. I'll let you know what I will get working.

the guy with the iphone had his hotspot settings enabled with indeed the same gateway address and all things hotspots need to do enabled like handing out ip addresses, and so on. he stated he wasn't aware but I think his battery should have been empty every few hours.

anyway, maybe a little bit early but I start the weekend after last week's stress and I hope the see you all again soon, bye for now, Fons

Fons

Hi Guys,

it happened again, last friday and this morning. but another macaddress acted as or was reached as a dhcp server. not the same macaddress from last thursday.

it seems this always starts at about 9:30, coming in time for most of the workers and it stops after 30 to 60 minutes.

this morning I was to late to start wireshark to intercept all udp traffic. I'll give it a go tomorrow.

what I do think now that this isn't a badly configured smartphone or computer but some malware capable of acting like a dhcp server to attrack others on the same network. A network virus maybe.
One problem to find it is the fact that I don't have access to all the hardware on the network. The only I provide on this network segment is connection and bandwith, the only hardware I'm responsable for is the firewall, two switches and a printer. As far as I can see there's nothing misconfigured in any of those.
So no chance to alter arp tables or add static arp for me. I can only advise them

But a question came up: on PfSense I got a floating rule on all internal segments for udp on 67 & 68, which is granted to and from any.
I had a equal rule working on my former shorewall and it always worked fine. should I narrow the functionality for this rule?
As far as I can see it would not help against an extra dhcp server on the same network segment, especially not when it is spoofing macaddresses or acting as the gateway address.

any clues?

regards, Fons

johnpoz

Why do you think you even need that rule?

So is this dhcp server also having the same IP as your pfsense box?

Fons

Hi Johnpoz,

this dhcp server is indeed using the same ip-adres as the pfsense box

the rule is necessary because we leave all ports closed unless needed.

fons

johnpoz

I do not believe it is, since its one of those rules in the default set

http://doc.pfsense.org/index.php/How_can_I_see_the_full_PF_ruleset

So on mine if I do a pfctl -sa I see these rules which I did not create!

pass in quick on em0 inet proto udp from any port = bootpc to 255.255.255.255 port = bootps keep state label "allow access to DHCP server"
pass in quick on em0 inet proto udp from any port = bootpc to 192.168.1.253 port = bootps keep state label "allow access to DHCP server"
pass out quick on em0 inet proto udp from 192.168.1.253 port = bootps to any port = bootpc keep state label "allow access to DHCP server"

So your specific rules become pointless? Since dhcp is part of the default rules.

Kind of wish the interface showed all the rules!!! And just locked the default rules like the above from delete. Kind of need them if your running a dhcp server on pfsense ;) Which sadly some users would not understand and not create the rules if not done for them, and then wonder why their dhcp server didn't work.

btw - just for clarity, I picked .253 as my pfsense lan IP, because many devices default to .1 or .254 – so you run into issues like what your seeing when you use a common IP. 192.168.2 is very common as well for many routers, and such. I personally would change your pfsense lan IP to not be on the ends and or even change your segment to be less common. 192.168.3 is not used by any devices that I recall for example.