Yet another ping problem with Virtual IPs

rebi

d'oh, I guess I'm not skilful enough to find exactly what's going on ... but will summarize whatever I've experienced so far:

Environment summary:
I have a virtualized environment in a data center - single server with a single network cable which carries a bunch of IPs. A pfSense VM serves as firewall, router, ids (monitoring-only) and OpenVPN server. All other VMs are behind this VM in local networks (virtual vmbr devices). Except one of the IPs which is set as WAN, the rest are set up as Virtual IPs of type "IP alias". Since one of the VMs is a reverse proxy to various web services in the internal network, its local IP is set up as 1:1 NAT to the virtual IPs in use. An ICMP rule on the WAN allows ICMP echo requests.

The problem:
Let's assume IP1 and IP2 are virtual IPs with set up 1:1 NAT to the reverse proxy VM local IP.
The following behaviour is experienced:

ping IP1 - works
ping IP2 - not working between 1 and 4 pings then starts replying
ping IP2 - works
ping IP1 - not working between 1 and 4 pings then starts replying
latter happens every time I alternate the IPs when pinging

Findings:
The above behaviour is confirmed to happen from 3 different places where the only thing in common is the usage of TPLink router (official firmwares - one is OpenWRT-based, rest are using TPLink firmware). Strangely enough, it doesn't happen when pinging from router's diagnostic page. I see everything working as expected in lots of different networks, incl. behind a mobile TPLink 4GLTE MiFi router.

Current packet captures observations:

ping IP1 - works
Here I can see requests from my IP and replies from IP1 in the packets
ping IP2 - not working between 1 and 4 pings then starts replying
WAN packet capture - For all pings that do not go through I see "No response seen to ICMP request" in for the request packet (in latest Wireshark)
Firewall logs - nothing
LAN packet capture - I only see the successful ICMP requests and responses and I do not see these marked with "No response seen to ICMP request"
ping IP2 - works
Again, I can see requests from my IP and replies from IP2 in the packets

stephenw10

Try adding portforwards for ICMP to the same VM. They will override the 1:1 NAT.

If the pings arrive on WAN but never leave LAN something must be preventing that. Possibly it's unable to create a state on LAN as one exists from the previous ping.

Steve

rebi

Port forwards didn't help.

Here's what I've found with states:
I filtered states by ICMP and whenever I ping from my office network, I got states immediately created whenever I execute the pings. However, when alternating the pings from my home network I don't see the second state immediately, it gets created after a while.

So I did the following test:
I opened 2 command prompts and executed simultaneous continuous pings. I did this from both networks. While it worked ok from my office network, I can only see one of the pings working from my home network - the other one gives "Request timed out" until I cancel the other one and then in a few seconds it starts working. The second pair of states was never created for the second ping from my home network while both pairs were always created for the pings from my office network.

Derelict

@rebi said in Yet another ping problem with Virtual IPs:

Yes, whenever I alternate the pings first time I have between 1 and 4 failed pings and these pings are captured as requests on WAN but not LAN ...

Show me.

stephenw10

I can only see it varying by source address if you have some other rule(s) in place using those.

Steve

rebi

@Derelict

What would be the ethical way of doing it?

Thanks!

rebi

@stephenw10

Nope, except the default rules all I have is 4 rules which allow OpenVPN (UDP to "This Firewall"), ICMP and HTTP/HTTPS (TCP to Reverse Proxy Internal IP) + 2 1:1 rules for the Virtual IPs.

Thanks!

stephenw10

If it was some sort of ARP issue I'd expect to see pfSense ARPing for the target in the LAN side pcap. But I can't see how that could happen since the internal VM is already in the table as the target for the previous forward.

You can PM the pcaps to us if you need to.

Steve

rebi

@stephenw10 Thank you!
I'm not sure how to send one on this particular forum software (nodebb) ... seems like there are chats instead of regular PMs which are restricted

Derelict

How, exactly, are the 1:1 NATs configured?

rebi

@Derelict

Interface External IP Internal IP Destination IP
WAN VIP1 192.168.101.2 *
WAN VIP2 192.168.101.2 *

Derelict

That is not 1:1 NAT. That is 2:1 NAT.

stephenw10

Yes. The port forwards should override that if there is some problem there but only inbound. There might still be conflicting outbound NAT causing an issue.

Try disabling the 1:1 NAT rules and using port forwards only there.

Steve

Derelict

Or put another address on the target server and 1:1 NAT the VIPs to their own addresses.

rebi

Yes, you're right :/ ... I'll try either using port forwards or setting up another internal IP for the second 1:1 NAT (will report the result tomorrow as I have to do it overnight)

rebi

Actually ... isn't 1:1 NAT simply a more convenient way of port forwarding everything to the specified destination?
(BTW outbound NAT is set to "Automatic outbound NAT rule generation")

Anyway, I've disabled 1:1 NATs and created separate HTTP/HTTPS/ICMP rules for each Virtual IP (without associated rules as these already exist). Unfortunately, I experience the very same behaviour, i.e. it still works just fine from my office network but alternating pings fail as per the description above from my home network.

rebi

I've just added a temporary local IP address on the reverse proxy VM (ip a add ...) and I've changed one of the ICMP port forwards to go to the new local IP address. I've also run iptraf on the VM to be sure that it's the VM which handles the ICMP replies, not pfSense.

With this configuration, as expected, everything works normally (even alternate pings from my home network).

Am I missing something from a conceptual point of view?
I should be able to port forward to the same internal IP, moreover it already works with HTTP/HTTPS traffic ...

stephenw10

I suspect this is a state issue. pf is trying to open a state on the LAN interface with a source and destination IP that are identical to those of an existing state. With a TCP connection it also using the source and destination port and since the source port is random that will be different to any existing state.

1:1 NAT rules forward all ports the same as a 1-65535 port forward would but it also NAT's traffic outbound from the internal target to the external IP. Obviously that can't happen to two IPs so one rule will fail there. I believe the first 1:1 rule will win.

If you are doing this just to provide a ping target for external users it's probably easier to have pfSense respond to the pings on the VIPs directly and forward only TCP traffic to the proxy.

Steve

rebi

I don't believe this would be the cause ... ICMP packets should be routable just like TCP and UDP which work ok (well, technically I think ICMP is just an application protocol on top of IP which is routable). Also, if this was the case I would expect it to work intermittently but it works flawlessly from my office network.

As far as I understand, 1:1 NAT is all about incoming traffic and the part of the outgoing traffic that should go out from WAN originating from the associated VIP. Could it be that I need to change the "Automatic outbound NAT rule generation" to, say, "Hybrid Outbound NAT rule generation" and create a manual rule for ICMP? Anyway, I still don't get how it works from one place and not another ... does it work ok when you ping these IPs from your network?

As for the pfSense answering the pings, that simply defeats the purpose of having the ping as it will only let users know whether pfSense itself is working not the services behind it.

stephenw10

Having looked into this further I'm certain it's a state issue.

If you look at the failed packet capture the incoming ping requests are all using ICMP identifier number '1'. pf uses that in the state it opens in the same way it does ports for TCP/UDP. It can't open a state on LAN with the same source, destination and identifier as one that exists so nothing passes until the previous state times out.

If you test remotely from behind a difference pfSense it will randomise the ICMP identifier when it outbound NATs the traffic. That means when it arrives at the port forward the two pings have different identifiers and two states can be created. No problem.

It appears that when you are testing and seeing failures the client you are testing from is using the same identifier for all pings. And the router you are testing behind is not randomising the identifier on the way out.

Nothing we can do about that in the pfSense forwarding those pings.

Interesting. Never hit that before.

Steve