Routing issue - Static routes needed?
- 
 Well, I should've mentioned that I planned the whole network topology thoroughly, and therefore already knew when I was planning everything that I'd need static routes for possibly all the networks behind the "left hand side firewall" for all the DMZ hosts (which are very few, and that is absolutely acceptable). On the other hand, I also calculated a connection being: LAN Host –> LeftHandSideFirewall --> RightHandSideFirewall --> MGMT as perfectly OK, as long as the "Bypass Firewall Rules" option was set. And it is, at least for 99% of my connections (never had any problems so far, everything working well). But that single 'SCP' connection that I tried which started this thread in fact still does not make sense to me. I would understand if the firewall were to drop packets because of state-mismatch, but that's not the case, as the option is active and should allow the connections. So why does it close down? Can't be the operating system either I think, because then, it would not allow traffic for SSH either. And the OS firewall has been disabled for now. So what else could it be? The application? I don't think that the application would be looking at IP level, so no,...so what is it? 
 I did some more testing and what I experienced is really really strange:- Transfer a debian ISO image from the LAN host to the MGMT host via SCP, no static route: Works perfectly and fast!
- Transfer text files and excel sheets and other stuff from the LAN host to the MGMT host via SCP, no static route: Works perfectly and fast!
- Transfer an elasticsearch.deb file I downloaded from the LAN host to the MGMT host via SCP, no static route: Crashes!
- Transfer the same elasticsearch.deb file from the LAN host to the MGMT host via SCP, static route set: Works perfectly and fast.
 So I guess it's not only the assymetric route that is causing the problem here. It's also somehow related to that file,...any idea how to get to the point of what's going on here? Regarding the setup, apart from the SCP problem I experience, I will go for the following 'solution' in the long term:I see that a layer 3 switch would solve the problem, as the "asymmetric part" of the routing would vanish. But in fact, I would have the layer 3 switch replace my LAN switch. That way, from the DMZ, everything would continue to work with static routes, but from the LAN, the layer 3 switch would do the routing part for the secure networks... DMZ Hosts (Dual Homed) LAN Hosts +----+ +---+ +----+ + + + +----+ | | |---| | | | | | | +-----+ | | |---| | | +-+--+--+-------+ | | +------+ FW +--------------+ FW1+-----+ Lay3 Switch +----+FW2 | Secure Nets | | |---| | | +---------------+ | +-----+ | | |---| | | | | +----+ +---+ +----+ +----+-----+p.s. Thanks for the book link, I can only recommend that book as well, as well as the "TCP/IP Illustrated series"! 
- 
 …But that single 'SCP' connection that I tried which started this thread in fact still does not make sense to me. I would understand if the firewall were to drop packets because of state-mismatch, but that's not the case, as the option is active and should allow the connections. So why does it close down? Can't be the operating system either I think, because then, it would not allow traffic for SSH either. And the OS firewall has been disabled for now. So what else could it be? The application? I don't think that the application would be looking at IP level, so no,...so what is it? 
 I did some more testing and what I experienced is really really strange:...In your initial drawing, you appear to be running firewalls in high-availability mode using CARP in pairs A-C and B-D. 
 Have you ruled out the following:- 
Does the problem occur if you disable CARP temporarily? To rule out that the firewalls aren't switching roles during the transfer. Heavy load could lead to packet loss which could lead to a state change. 
- 
The default gateways / static routes are all using the VIP address of firewall pair and not a single firewall in particular? 
- 
Does the size of the file have some influence on the issue? I should think not, that elasticsearch.deb is only around 27MB 
 Some traffic capture is going to be required to find out where the problem lies… 
 Run tcpdump on the host while you run the SCP, open it up in wireshark and have a look to see what the last communication was; presumably a data packet that got no ACK, followed by retries that got no ACKs.Run tcpdump on the left hand firewall's LAN interface while running SCP. You should be seeing the first packet, the firewall should be responding with an ICMP redirect, and you shouldn't see anymore traffic hitting it until the ICMP redirect times out. Perhaps the duration of the transfer has something to do with it? Once the ICMP redirect times out, the next packet is going to hit the left hand firewall. This will be a data packet, and it will be out of state. Technically the fact that the Bypass firewall rules for traffic on the same interface option is checked should not be causing an issue and you should see an ICMP redirect. Run tcpdump on the WAN interface of the right hand side firewall while running SCP. I'd be looking for how the ICMP redirect packet is handled, and if those last ACKs are actually being sent out. 
- 
- 
 Have you ruled out the following: - 
Does the problem occur if you disable CARP temporarily? To rule out that the firewalls aren't switching roles during the transfer. Heavy load could lead to packet loss which could lead to a state change. 
- 
The default gateways / static routes are all using the VIP address of firewall pair and not a single firewall in particular? 
- 
Does the size of the file have some influence on the issue? I should think not, that elasticsearch.deb is only around 27MB 
 - 
I did not disable CARP temporarily yet, I could try that. But I checked the logs for any failover entry regarding CARP and there was nothing 
- 
The default gateways and static routes are using the VIPs, yes 
- 
That was my initial thought. But then I transferred the 240MB debian ISO image, which did not cause any problems. 
 I did another test with the ".deb" file today, without static route. It always and consitently crashes when being transferred via SCP. When I 7z it first, it transferres,…funny. Regarding "disabling CARP cluster", is it sufficient to disable that one on the primary node "temporarily"? Or on both nodes, or...? I will not have network downtime I guess? 
 To do thorough packet capturing I will have to wait till I find the time I'm afraid,...:-/ As it's not like high priority, as everything seems to be working except that one stupid file.
- 
- 
 Ok, my curiosity won in the end. I did a packet capture on both ends. I see ICMP redirects, but the client (Windows Server) ignores them and sends all packets along to the original left hand side gateway. 
 After a while, there are lots of TCP retransmissions on the client side.On the server side I see some ACKs sent to the client that the client side never received. So I did another packet capture: - On the client
- On the left hand side firewall
- On the right hand side firewall
- On the server
 http://ianfe.dyndns.org/netdiag/scp_client.pcapng 
 http://ianfe.dyndns.org/netdiag/fwall_left.cap
 http://ianfe.dyndns.org/netdiag/fwall_right.cap
 http://ianfe.dyndns.org/netdiag/scp_server.pcapLeft hand firewall trace looks pretty strange, but I'm not sure if this is just because wireshark is confused about the asymmetric route,…;-) 
- 
 The LHS trace is what you'd expect if ICMP redirects aren't taking place. The LAN interface is bouncing all the traffic. 
 Check System -> Advanced -> System Tunables, make sure net.inet.ip.redirect = 1 (default)A couple of things caught my eye: - 
In the LHS capture, at packet 129 the LAN interface is no longer reflecting the traffic, set logging on the firewall rule that is handling this, there should be a LAN->LAN rule in place and see if you're getting a drop. This is why the copy is stopping. What's causing the traffic to suddenly drop on the LAN interface is another matter. 
- 
In the client capture, the traffic pattern is very strange, there is a 10 second delay between packets 21 and 22, and another 5 second delay beforee packets 23 and 24, after that it starts running… 
 Go into Status -> System Logs -> Settings 
 Check all 4 boxes in Log Firewall Default Blocks section (default block, default pass, block bogon, block private)Re-run the copy and see if something pops up in the log. 
- 
- 
 Ok, I will try that tomorrow, have to leave now. 
 Regarding the strange delay:- First 5 seconds might be till password prompt appears (I don't really know what happens there in the backend)
- Next 10 seconds delay is me entering password
 At least I guess that's being it,… 
- 
 Actually, it just occurred to me that even though the LAN interface is set to forward all local traffic, because there might be an interpretation in the word "local", the traffic is actually destined to a different subnet, so technically it isn't local. 
 That being said, the firewall IS inspecting the traffic and IS making a state entry for it.
 The responses from the RHS firewall aren't going back to the LHS firewall, but instead being sent directly to the client.
 Because of this, the LHS firewall eventually times out the state and stops forwarding the traffic.As to why it works with certain files but not others…that requires more investigation. 
- 
 That being said, the firewall IS inspecting the traffic and IS making a state entry for it. Well, but as there's a rule on the LHS and RHS that state that this traffic is allowed, state entries should not matter for the direction client –> server. As the firewall rule explicitly allows that traffic anyway. 
 Otherwise a protocol like RDP would not work either, would it? And I would see blocked traffic, but I don't see blocked traffic at all, only allowed traffic...
- 
 That being said, the firewall IS inspecting the traffic and IS making a state entry for it. Well, but as there's a rule on the LHS and RHS that state that this traffic is allowed, state entries should not matter for the direction client –> server. As the firewall rule explicitly allows that traffic anyway. 
 Otherwise a protocol like RDP would not work either, would it? And I would see blocked traffic, but I don't see blocked traffic at all, only allowed traffic...Your observations make sense, but the behavior seen in capture seems to suggest that some sort of stateful inspection is taking place. 
 Did you have a chance to look at the logs to see if anything popped up?
- 
 Well, yes. I first checked that ip redirects are sent, both firewalls would do that. 
 I assured that the logs are all active, that looks ok. I still see no blocked traffic though, very strange. I will activate layer 3 switch in about 1.4 hours on the LAN, and hopefully this will resolve all of that. Although it would've been interesting to know what the issue is,…
