[2.1 & 2.0.3] Can access SOME but not ALL LAN hosts
Good morning. I hope someone can shed some light on a problem that I've wrestled with for several days now.
I updated to 2.1 3 days ago and the problems are the same as they were with 2.0.3.
Office pfSense server with ovpn running on port 1195, UDP (server is quad core AMD with 8GB RAM, dual gigabit NICs, running only WAN and LAN interfaces)
Office WAN (ISP) line is symmetrical 5mbit provided by a well known point-to-point (line of sight) wireless provider.
pfSense LAN NIC plugged into a 24port managed Netgear switch (switch is running pretty much default configuration)
From the 24-port Netgear, LAN drops go to office department areas where there is an 8-port workgroup switch for each department
Each user's PC is connected to the Workgroup switches in their respective area
Office lan is flat 192.168.1.0/24, no VLAN's, no additional subnets
pfSense is also running as an ovpn client to a cloud server to connect the office to their development server(s)
Home laptop, Windows 7 Ultimate 64bit running latest ovpn GUI client
Home ISP is 30/7mbit Xfinity (comcast) running through a 2.1 pfSense box as my router
Additional office setup info:
In addition to the departmental office LAN areas (the small 8 port switches), there is a "server area"
The server area also has an 8port workgroup switch backhauled to the 24port Netgear
On this server switch, there are 2 physical VMware ESXi servers, each running 5 VM hosts - VM hosts include mostly Linux, some Windows 7 Pro, and OSX
In addition to the 2 VM servers, the server switch also has connected 1 Windows physical host, 1 Linux physical host, and 1 iMac host (running OSX)
Finally, there is a NAS server attached to the server switch
From my home laptop ovpn connection, I can FULLY access any machine on the "server switch" - Linux, Windows, Mac, VM or Physical, with its firewall turned on, with its firewall turned off, ping, RDP, SSH - everything works, but there is 10-20% packet loss even at the least busy times of day (very late at night).
During busy times at the office, packet loss to these servers on the server switch is 50% or more
I can only access the machines connected to the "server switch".
Running wireshark on my home client, and a target machine at the office, I can conclude that when I ping from my home machine, the ping packets do arrive at the target machine, but they are never returned - the ping REQUEST arrives at the office workstation, but the ping REPLY is never sent.
Adding a static route to the office workstation has no effect (route add 10.0.225.0 netmask 255.255.255.0 gw 192.168.1.1)
server conf: (this is copy/pasted from the pfSense shell /var/etc/openvpn - this server conf was built using the pfSense OpenVPN wizard)
dev ovpns2 dev-type tun tun-ipv6 dev-node /dev/tun2 writepid /var/run/openvpn_server2.pid #user nobody #group nobody script-security 3 daemon keepalive 10 60 ping-timer-rem persist-tun persist-key proto udp cipher AES-128-CBC up /usr/local/sbin/ovpn-linkup down /usr/local/sbin/ovpn-linkdown client-connect /usr/local/sbin/openvpn.attributes.sh client-disconnect /usr/local/sbin/openvpn.attributes.sh local 18.104.22.168 #office WAN public IP tls-server server 10.0.225.0 255.255.255.0 client-config-dir /var/etc/openvpn-csc username-as-common-name auth-user-pass-verify /var/etc/openvpn/server2.php via-env tls-verify /var/etc/openvpn/server2.tls-verify.php lport 1195 management /var/etc/openvpn/server2.sock unix max-clients 30 push "route 192.168.1.0 255.255.255.0" push "dhcp-option DOMAIN office.domainname" push "dhcp-option DNS 192.168.1.30" ca /var/etc/openvpn/server2.ca cert /var/etc/openvpn/server2.cert key /var/etc/openvpn/server2.key dh /etc/dh-parameters.2048 tls-auth /var/etc/openvpn/server2.tls-auth 0 comp-lzo persist-remote-ip float
dev tun persist-tun persist-key cipher AES-128-CBC tls-client client resolv-retry infinite remote 22.214.171.124 1195 udp tls-remote <name of="" pfsense="" ovpn="" server="">auth-user-pass pkcs12 pfsense-udp-1195-ovpnusername.p12 tls-auth pfsense-udp-1195-ovpnusername-tls.key 1 ns-cert-type server comp-lzo</name>
I've debugged all I know to do and I think I'm unable to see the forest for the trees.
Any advice on where else I should look for troubleshooting steps? What other information can I provide for review?
Why can I fully access some machines (on the "server" switch) but not access other machines at ALL?
Why does adding a static route on the problem machine not allow the machine to send ping REPLY back to my ovpn client?
Thanks in advance.
Float = none
WAN = 1 default "block bogons" and 1 IPV4 for OVPN created by the pfSense OVPN wizard
LAN = 1 default anti-lockout rule and 1 default "allow LAN to any" rule
OPENVPN = 1 "wide open" allow rule created by pfsense OVPN wizard
This will sound silly after all your troubleshooting, but are you sure that the office PC's firewall isn't blocking the ping reply? The Windows firewall on Windows 7 for example is configured by default to NOT respond to ping requests coming from outside the subnet.
BTW, kudos for your detailed information on the post! :D
I have indeed tested with the Windows 7 target-machine firewall both on and off. I'm lucky that I have a user in-office who helps me test and is willing and able to change settings on his workstation as I ask.
Changing the Win7 firewall (disabling it on all interfaces, fully) had no change in the behavior.
What really blows my mind though is that many of those machines on the "server switch" (some physical, some virtual) have their respective firewalls enabled, yet I can still ping, ssh, and rdp to each of them as expected. It's just the machines not on the "server switch" that behave badly.
One really weird thing - first thing this morning I ask my internal testing guy if he can ping me (using my ovpn tunnel ip) and he can. I saw it. He saw it. I wireshark-packet-captured it, and I captured it on my home pfsense box. All 4 pings (default windows ping) came and went with a request and a reply.
I was shocked because I knew that would not work - but it did. Then I asked him to do it again and it timed out on his end.
Why the intermittent behavior?
Also, his is the machine that we have manually added a static route to, and his is the machine we have used for full end-to-end packet capture tests. his is the machine that can receive my pings, but never replies. The office pfsense box packet capture sees the ping come in over the ovpn interface. His local PC wireshark sees the ping come TO his machine. Nothing ever leaves his machine in the form of a return/reply ping packet.
The static route being from his machine, across the ovpn tunnel address, through the default pfsense gateway.
I'm really confused here. I thought that since his PC was getting the pings but not replying, that a route was the issue, but it seems I can't sort it out with just that.
EDIT: To be clear, let me re-frame what I consider my most descriptive test scenario.
Me at home on ovpn tunnel 10.0.225.x (i'm coming through my home pfsense router)
Him at work on his office pc on 192.168.1.x LAN
My laptop - wireshark - sees all 4 windows pings leave my machine
My pfsense server, packet capture on WAN sees all 4 pings leave my home pfsense network
office pfsense server packet capture on OVPN interface sees all 4 pings from my home laptop
tcpdump on another internal office server sees all 4 pings from pfsense to his machine IP
wireshark on his office machine sees all 4 pings arrive at his workstation
wireshark on his office machines sees ZERO pings leave his workstation (this is with windows firewall turned off completely)
obviously, the return path the tcpdump server (mirror port off the main office switch), nor the office pfsense server, nor my pfsense server, nor my laptop wireshark see any ping return packets.
They literally never leave his machine, and I think we've proven that with so many layers of packet capture at various points along the entire round-trip path.
I'm stumped. I really am.
I guess to sum it up, I have so many "eyes" into the entire path along the way that I'm 100% sure I know where packets get to, and I know where packets don't get to…....I'm just unsure as to why they don't reply from any machine other than those on this specific switch. As I said before, my first thought was surely a route issue - if the LAN pc's don't have a route from LAN-->OVPN_SUBNET then there's no way a packet can return, but adding a static route to a LAN PC has not helped at all. And then the weird issue of all those "server switch" machines that reply round-trip just fine..........
There's not much more to check. If you can access one device on a LAN, there no reason for you not to be able to access the rest (assuming you have the right rules and general config)
You say the ping request arrives at the office PC but the reply never shows up. That still sounds like a firewall issue, or some sort of issue with the PC itself. Static routes are not needed for this.
I would completely disable the firewall myself on one PC and repeat the testing…
Fully understood…and I will 110% double/triple check tomorrow.
For now, I can tell you that i can:
a) ovpn connect to the office [my home lan is 10.38.x.x, my ovpn tunnel is 10.0.225.x, my office lan is 192.168.1.x]
b) vsphere console to a vmserver on the lan [the vsphere server is 192.168.1.x on the office lan]
c) use the vsphere console on the lan to control a windows VM [the windows vm is 192.168.1.x also, on the office lan]
d) ping from the windows vm back to my home IP (ovpn tunnel) IP address, and vice versa, with the firewall on the windows vm turned on
so a ping from 192.168.1.x to my tunnel IP arrives at my home laptop (full round trip successful)
a ping from my home laptop arrives at the vsphere vmware windows vm at ip 192.168.1.x (full round trip successful)
with this same setup, a ping from my home laptop (10.38.x.x) across the ovpn tunnel (10.0.225.x) to another office PC (192.168.1.x on the LAN) gets to the internal machine, but never returns.
Let me be very clear: I am not arguing or disputing anything you say - I much appreciate the comments. I am merely providing test results. I want to make sure my reply doesn't sound like an ass-hat of "duh I already tried that". Please don't take it as such, I'm literally triple checking everything as I type and as you and any others provide any possible insight.
Thank you very much.
(EDIT: corrected my home IP range for 100% clarity, sorry)
No worries! Based on the facts you provided, there's obviously something wrong with the PC you are using as a test subject. Maybe it's not the firewall, but some service/process/NIC/whatever that is making it discard the ICMP packets and not reply to them. Can this particular PC be pinged from within the LAN? How about the wireshark capture? Does the incoming ICMP packet look "in shape"? I would test the same on another PC
Something else that just came to my mind, is the default gateway on the Windows PC pointing to the pfSense machine? Because if you have multiple gateways on the LAN, and/or it is incorrectly set, this will never work without proper routes on the Windows PC
a) yes this pc can be pinged from within the lan (i ovpn, then ssh to a server on the "server switch" and every ping works)
b) wireshark on this pc while pinging from a true LAN host shows ping request and reply on both ends
c) df gw on this PC is the pfsense LAN interface, 192.168.1.1, same as the other "server switch" hosts that work properly
I connected to ovpn.
I then connected to a vmware vsphere application to use the console of one of the VM's in the office.
I compared pinging from the VM to pinging from my home laptop.
Internal IP / ping from home Y/N / ping from VM Y/N / description
1 Y/Y - office pfsense means the IP of the machine is 192.168.1.1 (note the "1"), and the Y/Y means I could ping it from both the internal VM and my home laptop.
To me, the anomalies are the .10 server attached to the same "server switch" that seems to provide the most reliable results. I assume this is a client firewall issue on the .10 machine itself.
Also the difference between the .60 and .68 - why would one machine on a random switch respond to ping, but the other would not? Again, do I assume a client firewall issue?
Finally, the .22 machine is the internal test machine we've been working with. Firewall on or off, no difference. Static route on the machine or not, no difference.
It still seems that most of the machines on the "server switch" want to act right, while most of the machines on other 8port workgroup switches in the office want to not work at all. I would chalk it up to "firewall" on all machines if it wasn't for the fact that I know we have 100% disabled the Windows firewall on the .22 Ip with the same (failed) results.
EDIT: I should note, notice that from an internal machine on the LAN, all pings to target machines worked as expected. (The "Y/Y? column has all Y in the 2nd part)
EDIT2: for IP's 57-84 I cannot comment on the OS or state of the firewall at this point - I assume some are on, some are off. I also assume that most (if not all) are Windows 7 workstations.
1 Y/Y - office pfsense
5 Y/Y - office 24port switch directly attached to pfsense
7 Y/Y - wifi access point daisy-chained to an 8port workgroup switch
9 N/Y - wifi access point daisy-chained to an 8port workgroup switch
10 N/Y - physical server (windows) attached to 8port "server switch"
11 Y/Y - physical server (linux) attached to 8port "server switch"
12 Y/Y - physical server (windows) attached to 8port "server switch"
13 Y/Y - physical server (imac running OSX) attached to 8port "server switch"
14 Y/Y - physical server (vmware esxi) attached to 8port "server switch"
15 Y/Y - 2nd physical server (vmware esxi) attached to 8port "server switch"
16 Y/Y - vm guest OS (linux) on vm1 (ip .14)
17 Y/Y - vm guest OS (linux) on vm1 (ip .14)
20 Y/Y - physical NAS server (synology) attached to 8port "server switch"
21 Y/Y - vm guest OS (windows 7) on vm2 (ip .15)
22 N/Y - physical workstation (windows) - my primary internal test machine
30 Y/Y - vm guest os (linux) on vm1 (ip .14)
33 N/Y - vm guest os (linux) on vm1 (ip .14)
35 Y/Y - vm guest os - linux - 2nd NIC of physical vm server - this is the mirror port of the 24 port switch attached to pfsense
37 Y/Y - vm guest os (linux) on vm1 (ip .14)
57 Y/Y - physical workstation attached to some misc 8port workgroup switch, attached to the 24 port switch that is attached directly to pfsense
60 Y/Y - physical workstation attached to some misc 8port workgroup switch, attached to the 24 port switch that is attached directly to pfsense
68 N/Y - physical workstation attached to some misc 8port workgroup switch, attached to the 24 port switch that is attached directly to pfsense
70 N/Y - physical workstation attached to some misc 8port workgroup switch, attached to the 24 port switch that is attached directly to pfsense
79 N/Y - physical workstation attached to some misc 8port workgroup switch, attached to the 24 port switch that is attached directly to pfsense
84 Y/Y - physical workstation attached to some misc 8port workgroup switch, attached to the 24 port switch that is attached directly to pfsense