All tcp connections drop after 30 seconds - route based rules
i am using pfsense 1.2.2 as a virtual machine firewall to create a dmz in a box to protect my database server from my lan. my wan interface is setup as 10.0.0.125 and my lan interface, which is the dmz, is setup as 10.0.6.1. i have the firewall configured as a route based firewall and i am not using nat.
everything is working properly except all tcp connections from the wan to the lan drop after about 30 seconds. ping however continues to work which elminates an issue with the vmware vswitches (according to vmware). when i look at the state in the firewall it shows the connection from the wan device to the lan device and from the lan device back to the wan device as established. i found the post below which seems to be a very simliar problem however i do not have the skills to debug the firewall programming. also, in the post below they indicate the state is dropping however my state indicates its established.
after further testing, it appears the cause is the physical juniper firewall that is routing the 10.0.0.0 subnet traffic to the pfsense interface of 10.0.0.125 which then routes the traffic to 10.0.6.0 subnet. the physical juniper firewall is logging the route based traffic, but does not see any bytes being received so it kills the session. i will update this again with my lessons learned from the physical firewall.
according to my physical firewall vendor, juniper, the issue is ack knowledge packets are not being sent back from the virtual server located in the pfsense lan. vmware refers to this setup as a dmz in a box. if anyone has any thoughts on why ack packets are not being sent back that would be great!
here is a run down on my setup.
physical firewall juniper ns25
trust / lan subnet 10.0.0.0
physical lan interface 10.0.0.1
static route 10.0.6.0 to GW 10.0.0.125
physical laptop = 10.0.0.75
virtual firewall pfsense 1.2.2
wan interface = 10.0.0.125 (vswitch1-lan)
lan interface = 10.0.6.1 (vswitch2-dmz)
virtual db server = 10.0.6.25 (vswitch2-dmz)
10.0.0.75 can continously ping 10.0.6.25 and vice versa
any tcp connections from 10.0.0.75 to 10.0.6.25 get cut off after about 20 seconds. so for example you can telnet from 10.0.0.75 to 10.0.6.25, login into the server and run commands successfully until the physical firewall kills the session after not receiving a ack.
juniper confirmed this was the issue by turning off seq and syn checks. after turning these off everything works correctly, however juniper strongly warned me this was not safe to leave as is.
I must be misunderstanding something. If acks are not being sent back, how can TCP possibly work?
a lot of this is new to me so i am trying to figure this out as i go, so it seems on the netscreen i have asymmetric routing so a session is created on the netcreen firewall when the traffic is routed to the 10.0.6.0 subnet. for some reason the netscreen is not recognizing traffic coming back from 10.0.6.25 as being associated with the traffic being initiated from 10.0.0.75. So the incoming packets which do not match a current session must have the SYN bit set but i dont think the SYN bit is checked? so basically after 20 seconds the netscreen decides the traffic is a potential threat and it kills the session / drops the packets.
i found this link which is simliar to my issue but not identical.
well, this is not good. asymmetric routing like this is really a bad idea. why does it need to be that way?
so the more i read about Asymmetric Routing, which is very common in larger networks, i dont think it applies to my scenario in the textbook sense. i really dont have 2 paths for traffic to travel as in most examples of Asymmetric Routing for example balancing two isp connections with 2 routers. the traffic only flows from the intiating device 10.0.0.75 to 10.0.0.1 to the route 10.0.0.125 to 10.0.6.1 to the server 10.0.6.25. for some reason the traffic coming back from 10.0.6.25 to 10.0.6.1 to 10.0.0.125 to 10.0.0.1 to the intial device is not appearing to juniper as the same session when it flows through.
ASR is almost never used when there is some kind of state-tracking firewall between the two hosts. I am not sure of the packet flow, but i am guessing that what is happening is that input tcp packets are entering from a different interface than the one the routing table on the receiving host says to use for the return packets. fixing this (somehow) would be a good idea.