Cyclical failure routing to specific clients

WRI

We have 3 offices with a pfSense box in Site A and SOHO routers in Site B and Site C. Sites B and C had AT&T MPLS connections installed a year ago where AT&T provided the routers - these sites successfully communicate with each other through AT&T but communicate to Site A through VPN tunnels between the soho router and the pfSense box in Site A. This month we installed an AT&T MPLS connection with router in Site A with the expectation that all the interoffice traffic would move over the MPLS network and all other traffic would go out the local routers to the internet.

Once everything was setup, the tunnels from Site A were torn down, the Site A AT&T router was setup as another gateway and a static route was set to push the traffic to Sites B and C through the AT&T gateway. At first everything seemed fine as we could tracert from both end ssuccessfully across the MPLS network to needs at the other ends. After a little time, however, we realized that the PCs on Site A were having a cyclical failure when accessing resources on Sites B and C. If, for example, you were copying a file from a server in Sites B or C to a PC in Site A the copy would start but only run a few seconds and then stall for 10 - 20 seconds and then progress for a few more seconds and then stall again….on and on. If you were on a PC in Site A and used Remote Desktop to connect to a resource in Sites A or B, it would connect but after a few second it would freeze and then refresh and work for a few seconds and freeze again - just like the copy command. We eventually narrowed this down to be an issue with the pfSense box because if we made the AT&T router the default gateway of the same PC in Site A, everything behaved normally.

A few more odd things we observed:
1. This does not appear to happen with the servers in Site A even though the servers and PCs are on the same subnet. To be honest, I inherited the pfSense box and am not familiar with the software at all so it's possible this might be due to something configured in pfSense.
2. From Sites B and C we can ping anything on Site A except pfSense.....we can tracert to it successfully but can't ping it - a firewall rule maybe?
3. From Site A we can successfully browse to a server in Sites B or C but in both B and C if we try to browse back to that PC in Site A it fails, even though we can ping it. This also goes away if we make AT&T the Site A PC's gateway.
4. We have rebooted pfSense more than once during the troubleshooting process but this doesn't appear to make any difference.

I know this sounds pretty obscure but any help would be greatly appreciated.

podilarius

1. I am guessing that you have a route setup with MPLS as the gateway in pfSense.
2. This is a firewall issue. The default LAN rule only allows LAN subnet access. You will need to add a rule or change the existing rule to incorporate the remote subnets.
3. There is a setting in advanced firewalling/NAT to bypass FW rules for traffic terminating on the same interface as it started.


Static route filtering 	Bypass firewall rules for traffic on the same interface
This option only applies if you have defined one or more static routes. If it is enabled, traffic that enters and leaves through the same interface will not be checked by the firewall. This may be desirable in some situations where multiple subnets are connected to the same interface.

I would also check the logs on pfSense to see if you are dropping any packets.

WRI

Thanks for the reply Podilarius.
1. Yes, we setup the AT&T router as a gateway in pfSense and set static routes in pfSense for both Site B and Site C networks. I'm confident this is working correctly because from any site, I can tracert to nodes in the other sites with no issue.
2. I'm a little confused by your statement here. If the issue were the default rule blocking traffic then why would the traffic come and go like this? I would have expected it to fail consistently. Also - a continuous ping between the source and destination nodes never fails, even when the other traffic stalls. Further, we don't see this odd behavior when communicating with the servers in Site A, only the workstations…..odd thing here is that they are all on 1 - 24 bit subnet so I don't see why there would be a difference, unless my predecessor had something in place for the servers? Unfortunately, I don't know pfSense enough to know where to look for this.
3. You hit the nail on the head here! I flipped that switch and now I don't see that stalling I saw before. Not really sure I understand why it was cyclical, my experience with firewalls is you're either in or your out but I suppose there may be something more going on here than just filtering.

At any rate, thanks very much for your help, I don't think I could have found it without you!

podilarius

Does any of your firewall rules have a blue "A" on the left hand side? This indicates advanced options. It could be you have advanced options to limit number of states or how fast they are opening. I am guessing though. By bypassing the firewall on the same interface will remove that problem.

Glad it is working for you.