Port Forwards Only Work For Some People!?

RobEmery

Hi,

We have a PF installation being used as our primary firewall, there's a collection of port forward and 1:1 NAT Rules on it, all of which work for 99% of people. So far we have had reports of 5 people who are unable to access our sites.

I've been attempting to assist a customer of ours who is unable to access some of our services (browser 'times out') today. If we access from his machine one of our services which are Port-Mapped, there is no throughput; however he can access one using 1:1 NAT.

The Port Forwarding rules work absolutely fine for everybody else. I've taken a PCAP on the PF Box as well as on the customer's computer, which clearly shows his machine sending TCP SYN to the firewall, which the PF box receives. The firewall capture then shows pf sending a TCP SYN ACK, which never arrives on the customer's computer.

I've attached a screenie of the pcap trace from the firewall

The first session (up to 16:25:07 is accessing a 1:1 NAT), the second attempted session (16:26:21) is accessing a Port Forward.

The only real difference I can see is the WIN=65535? The client machine is a windows 7 professional workstation accessing our service over 802.11g (Asus WL530g) if that's at all relevent.

Does anyone have any ideas as to what on earth is going on here?

Many Thanks!
-Rob

DoctoredScreenie.png_thumb

cmb

that capture is from the perspective of the WAN on your firewall?

The traffic that looks to not be working, you're repeatedly sending SYN ACKs back in response to the SYN, and the remote host is re-sending the SYN. That indicates the SYN ACK is lost somewhere in between, you're sending it out and the source isn't getting it. Why is hard to say, when you see the traffic going out your network and it disappears somewhere in the middle, troubleshooting isn't easy. Allow pings on your WAN and see if the affected hosts can ping. Try to find a common denominator amongst the hosts that don't work - a particular ISP, or something like that.

That's assuming the capture is from your WAN and there is nothing upstream of there other than your ISP router. If you have unusual routing going on, you could be blocking SYN ACKs with a stateful firewall somewhere if that firewall isn't seeing the SYN.

RobEmery

@cmb:

that capture is from the perspective of the WAN on your firewall?

The traffic that looks to not be working, you're repeatedly sending SYN ACKs back in response to the SYN, and the remote host is re-sending the SYN. That indicates the SYN ACK is lost somewhere in between, you're sending it out and the source isn't getting it. Why is hard to say, when you see the traffic going out your network and it disappears somewhere in the middle, troubleshooting isn't easy. Allow pings on your WAN and see if the affected hosts can ping. Try to find a common denominator amongst the hosts that don't work - a particular ISP, or something like that.

That's assuming the capture is from your WAN and there is nothing upstream of there other than your ISP router. If you have unusual routing going on, you could be blocking SYN ACKs with a stateful firewall somewhere if that firewall isn't seeing the SYN.

Hi cmb,

That's exactly what I figured, the capture is from our WAN and the WAN goes directly into a switch which is connected to our ISP's network. So far we've found a correlation that all the individuals appear to be on Virgin Media (UK ISP), however they are scattered allover the country and other VM users (our office connection, my home and other members of staff's houses etc) appear to work fine.

Ping's and UDP traffic appear to work fine, I put in a port forward for UDP:53 to our internal DNS server and the customer was able to DNS from it without issue, so it appears that the only problem is the TCP handshake.

I'm going to ask our Datacenter to see if they can do some packet logging for us on the routers inside their network to prove that the SYN ACK leave their network. An option I was considering yesterday is that perhaps the customer's routers are doing something odd so I may ask one of them to log me in to their router to look for any more information.

I presume that in 1:1 NAT the TCP handshake is performed by the internal machine, with IP rewriting done by pf whilst on port forwarding the pfsense boxes have to rewrite the TCP packets (in order to change the port number), which would explain why 1:1 NAT works, however port forwards do not. I was tempted to find some options to switch the windowsize and scale factor down from 65535 & 8 to 8192 & 4 (the one negotiated by windows in 1:1 NAT) to see if that helps but I've been unable to locate any tunables for that.

Thanks for your input!
-Rob

cmb

The window size and scale factor is set by the source host, you can't change that. That's unlikely to be relevant.

Port forwards don't rewrite source ports. There isn't a difference network-wise between allowing traffic in from the Internet on 1:1 and rdr (port forward). Either way you're strictly rewriting the destination IP (though with rdr you can rewrite the destination port that's only if it's diff outside vs. inside, eg from the Internet you have a web server listening on port 8000 but it's on 80 internally, that doesn't sound like it's the case here, and wouldn't be relevant either way). That's different for outbound traffic but that's not what you're looking at here.