Yet another ping problem with Virtual IPs

rebi

Hi,

I have a strange problem when pinging my Virtual IPs ... basically, if I ping one of them, it works ok but then if I ping another one it doesn't work the first few pings and then starts working ... and vice versa, if I ping afterwards the first one again it doesn't work the first few pings and then starts working.

My setup is as follows:

virtualized server - pfSense (latest version) is a VM
one network connection with bunch of real IPs

pfSense has that connection with the main IP set up as WAN
The rest of the IPs are set up as Virtual IPs of type "IP alias"
pfSense LAN is an internal network (192.168.xxx.xxx) between the VMs.
One of the VMs serves as reverse proxy to some of the VMs on the internal network. Since each such service is assigned a different real IP, all of used IPs are configured as 1:1 NAT to the reverse proxy VM.

Everything works as expected and had been working for any of the services we have!
However, when I create an ICMP rule on the WAN to allow ICMP echo requests ("out of courtesy"), I observe the behaviour above.

Do you have any clue about this strange intermittent ping behaviour?

Thank you in advance!

Derelict

Need to see a packet capture illustrating this behavior. Probably something weird with the virtual infrastructure. Based on the information given there's no way to know if it's a WAN-side or LAN-side problem.

rebi

I will try to capture something more useful ... but from what I've tried:

on the reverse proxy side, iptraf-ng does not say anything while pings fail
on the pfSense side though, if I record the traffic (package capture), I simply have "No response seen to ICMP request" packets when having a look with wireshark

Edit: While searching for the problem, I've found another person with a similar problem in the french forum: https://forum.netgate.com/topic/63030/probl%C3%A8me-ping-simultan%C3%A9-sur-2-virtual-ip

Derelict

That isn't telling us much. We need to see packet captures on the inside and the outside interfaces and we need to know exactly what you are testing and what interface we're looking at. Start at the WAN. Do you see the pings arrive? Is there a response? Who is ARPing for whom? Be sure you do exactly what it takes to duplicate the issue.

Then do the same thing but capture on the inside. Is that NAT happening? Are the pings being sent? Is there a response? Who is ARPing for whom?

Does it work if you eliminate the 1:1 NAT and pass the pings to the pre-NAT address on the WAN?

stephenw10

I would ignore that other thread. That's a very old pfSense version running in a very old ESXi version. Any problems they might have been hitting back then are unlikely to apply now.

If you pcap on the pfSense LAN interfaces and see the ping requests being forwarded to the correct internal IP and MAC then check on that VM to see if they are actually arriving there.

Steve

rebi

Blimey! Case got quite weird ... but seems like nothing related to pfSense :)

Of course, I had tried this first from two machines but they were my laptop and another desktop machine at home (HPs if that matters) ... so, after I spent quite some time looking at packets without any clues, while trying to capture yet another one, I run the ping from a server console (data center) that was handily open and it worked ok. That's how I got suspicious ... then ping from the router itself (my home) was ok, ping from a windows server (office network) was ok, ping from a colleague's laptop (his home) was NOT ok. Today I asked the same colleague to do the same test from his laptop in the office network ... it was ok :) So, after talking with him I figured out we used different models same vendor routers at home (mine is using vendor's firmware, his is openwrt-based if that matters) as well as places where I tested and it worked, routers were pfSense and Mikrotik (and probably Cisco in the data center).

So, I'll assume it's something to deal with that vendor routers and won't bother researching it any more ... unless someone thinks it's too weird to be passed over :)

Edit: I've just tested with an OpenVPN connection to the office (using redirect-gateway def1) and it's working ok, so it seems problem is somewhere in these routers NAT implementation or whatever it is (it's not firewall as I've already tested disabling it).

stephenw10

So those initial ping requests that fail never arrive at the pfSense WAN?

Steve

rebi

bummer ... they all arrive on WAN but I can only see the successful ones on LAN.
I'm not sure I understand what's going on ... :/

stephenw10

Ok so to be clear you ping a VIP from somewhere external and, for example, you see 4 failed pings then a successful one at the client end.
In packet captures on pfSense you see all 5 ping requests arrive on the WAN but only one leaves the LAN?

And nothing is blocked in the firewall?

It's hard to imagine what could cause that. All of those pings on all the VIPs are forwarded to the same internal IP/MAC right?

Steve

rebi

Yes, whenever I alternate the pings first time I have between 1 and 4 failed pings and these pings are captured as requests on WAN but not LAN ...
And to confirm what my theory is ... I can't reproduce it today as I'm in the office

I promise to give it a second thorough look when I get back home. Indeed, it's hard to imagine what would cause that ...

stephenw10

You would see that if for some reason the first 4 ping requests were blocked by the firewall, you should see that in the log though.
Otherwise I'd try to see if they are being misrouted somehow. Though since everything is going to one internal IP it can't be something like a missing ARP record.

Steve

rebi

d'oh, I guess I'm not skilful enough to find exactly what's going on ... but will summarize whatever I've experienced so far:

Environment summary:
I have a virtualized environment in a data center - single server with a single network cable which carries a bunch of IPs. A pfSense VM serves as firewall, router, ids (monitoring-only) and OpenVPN server. All other VMs are behind this VM in local networks (virtual vmbr devices). Except one of the IPs which is set as WAN, the rest are set up as Virtual IPs of type "IP alias". Since one of the VMs is a reverse proxy to various web services in the internal network, its local IP is set up as 1:1 NAT to the virtual IPs in use. An ICMP rule on the WAN allows ICMP echo requests.

The problem:
Let's assume IP1 and IP2 are virtual IPs with set up 1:1 NAT to the reverse proxy VM local IP.
The following behaviour is experienced:

ping IP1 - works
ping IP2 - not working between 1 and 4 pings then starts replying
ping IP2 - works
ping IP1 - not working between 1 and 4 pings then starts replying
latter happens every time I alternate the IPs when pinging

Findings:
The above behaviour is confirmed to happen from 3 different places where the only thing in common is the usage of TPLink router (official firmwares - one is OpenWRT-based, rest are using TPLink firmware). Strangely enough, it doesn't happen when pinging from router's diagnostic page. I see everything working as expected in lots of different networks, incl. behind a mobile TPLink 4GLTE MiFi router.

Current packet captures observations:

ping IP1 - works
Here I can see requests from my IP and replies from IP1 in the packets
ping IP2 - not working between 1 and 4 pings then starts replying
WAN packet capture - For all pings that do not go through I see "No response seen to ICMP request" in for the request packet (in latest Wireshark)
Firewall logs - nothing
LAN packet capture - I only see the successful ICMP requests and responses and I do not see these marked with "No response seen to ICMP request"
ping IP2 - works
Again, I can see requests from my IP and replies from IP2 in the packets

stephenw10

Try adding portforwards for ICMP to the same VM. They will override the 1:1 NAT.

If the pings arrive on WAN but never leave LAN something must be preventing that. Possibly it's unable to create a state on LAN as one exists from the previous ping.

Steve

rebi

Port forwards didn't help.

Here's what I've found with states:
I filtered states by ICMP and whenever I ping from my office network, I got states immediately created whenever I execute the pings. However, when alternating the pings from my home network I don't see the second state immediately, it gets created after a while.

So I did the following test:
I opened 2 command prompts and executed simultaneous continuous pings. I did this from both networks. While it worked ok from my office network, I can only see one of the pings working from my home network - the other one gives "Request timed out" until I cancel the other one and then in a few seconds it starts working. The second pair of states was never created for the second ping from my home network while both pairs were always created for the pings from my office network.

Derelict

@rebi said in Yet another ping problem with Virtual IPs:

Yes, whenever I alternate the pings first time I have between 1 and 4 failed pings and these pings are captured as requests on WAN but not LAN ...

Show me.

stephenw10

I can only see it varying by source address if you have some other rule(s) in place using those.

Steve

rebi

@Derelict

What would be the ethical way of doing it?

Thanks!

rebi

@stephenw10

Nope, except the default rules all I have is 4 rules which allow OpenVPN (UDP to "This Firewall"), ICMP and HTTP/HTTPS (TCP to Reverse Proxy Internal IP) + 2 1:1 rules for the Virtual IPs.

Thanks!

stephenw10

If it was some sort of ARP issue I'd expect to see pfSense ARPing for the target in the LAN side pcap. But I can't see how that could happen since the internal VM is already in the table as the target for the previous forward.

You can PM the pcaps to us if you need to.

Steve

rebi

@stephenw10 Thank you!
I'm not sure how to send one on this particular forum software (nodebb) ... seems like there are chats instead of regular PMs which are restricted