Yet another ping problem with Virtual IPs



  • Hi,

    I have a strange problem when pinging my Virtual IPs ... basically, if I ping one of them, it works ok but then if I ping another one it doesn't work the first few pings and then starts working ... and vice versa, if I ping afterwards the first one again it doesn't work the first few pings and then starts working.

    My setup is as follows:

    virtualized server - pfSense (latest version) is a VM
    one network connection with bunch of real IPs

    pfSense has that connection with the main IP set up as WAN
    The rest of the IPs are set up as Virtual IPs of type "IP alias"
    pfSense LAN is an internal network (192.168.xxx.xxx) between the VMs.
    One of the VMs serves as reverse proxy to some of the VMs on the internal network. Since each such service is assigned a different real IP, all of used IPs are configured as 1:1 NAT to the reverse proxy VM.

    Everything works as expected and had been working for any of the services we have!
    However, when I create an ICMP rule on the WAN to allow ICMP echo requests ("out of courtesy"), I observe the behaviour above.

    Do you have any clue about this strange intermittent ping behaviour?

    Thank you in advance!


  • LAYER 8 Netgate

    Need to see a packet capture illustrating this behavior. Probably something weird with the virtual infrastructure. Based on the information given there's no way to know if it's a WAN-side or LAN-side problem.



  • I will try to capture something more useful ... but from what I've tried:

    • on the reverse proxy side, iptraf-ng does not say anything while pings fail
    • on the pfSense side though, if I record the traffic (package capture), I simply have "No response seen to ICMP request" packets when having a look with wireshark

    Edit: While searching for the problem, I've found another person with a similar problem in the french forum: https://forum.netgate.com/topic/63030/problème-ping-simultané-sur-2-virtual-ip


  • LAYER 8 Netgate

    That isn't telling us much. We need to see packet captures on the inside and the outside interfaces and we need to know exactly what you are testing and what interface we're looking at. Start at the WAN. Do you see the pings arrive? Is there a response? Who is ARPing for whom? Be sure you do exactly what it takes to duplicate the issue.

    Then do the same thing but capture on the inside. Is that NAT happening? Are the pings being sent? Is there a response? Who is ARPing for whom?

    Does it work if you eliminate the 1:1 NAT and pass the pings to the pre-NAT address on the WAN?


  • Netgate Administrator

    I would ignore that other thread. That's a very old pfSense version running in a very old ESXi version. Any problems they might have been hitting back then are unlikely to apply now.

    If you pcap on the pfSense LAN interfaces and see the ping requests being forwarded to the correct internal IP and MAC then check on that VM to see if they are actually arriving there.

    Steve



  • Blimey! Case got quite weird ... but seems like nothing related to pfSense :)

    Of course, I had tried this first from two machines but they were my laptop and another desktop machine at home (HPs if that matters) ... so, after I spent quite some time looking at packets without any clues, while trying to capture yet another one, I run the ping from a server console (data center) that was handily open and it worked ok. That's how I got suspicious ... then ping from the router itself (my home) was ok, ping from a windows server (office network) was ok, ping from a colleague's laptop (his home) was NOT ok. Today I asked the same colleague to do the same test from his laptop in the office network ... it was ok :) So, after talking with him I figured out we used different models same vendor routers at home (mine is using vendor's firmware, his is openwrt-based if that matters) as well as places where I tested and it worked, routers were pfSense and Mikrotik (and probably Cisco in the data center).

    So, I'll assume it's something to deal with that vendor routers and won't bother researching it any more ... unless someone thinks it's too weird to be passed over :)

    Edit: I've just tested with an OpenVPN connection to the office (using redirect-gateway def1) and it's working ok, so it seems problem is somewhere in these routers NAT implementation or whatever it is (it's not firewall as I've already tested disabling it).


  • Netgate Administrator

    So those initial ping requests that fail never arrive at the pfSense WAN?

    Steve



  • bummer ... they all arrive on WAN but I can only see the successful ones on LAN.
    I'm not sure I understand what's going on ... :/


  • Netgate Administrator

    Ok so to be clear you ping a VIP from somewhere external and, for example, you see 4 failed pings then a successful one at the client end.
    In packet captures on pfSense you see all 5 ping requests arrive on the WAN but only one leaves the LAN?

    And nothing is blocked in the firewall?

    It's hard to imagine what could cause that. All of those pings on all the VIPs are forwarded to the same internal IP/MAC right?

    Steve



  • Yes, whenever I alternate the pings first time I have between 1 and 4 failed pings and these pings are captured as requests on WAN but not LAN ...
    And to confirm what my theory is ... I can't reproduce it today as I'm in the office

    I promise to give it a second thorough look when I get back home. Indeed, it's hard to imagine what would cause that ...


  • Netgate Administrator

    You would see that if for some reason the first 4 ping requests were blocked by the firewall, you should see that in the log though.
    Otherwise I'd try to see if they are being misrouted somehow. Though since everything is going to one internal IP it can't be something like a missing ARP record.

    Steve



  • d'oh, I guess I'm not skilful enough to find exactly what's going on ... but will summarize whatever I've experienced so far:

    Environment summary:
    I have a virtualized environment in a data center - single server with a single network cable which carries a bunch of IPs. A pfSense VM serves as firewall, router, ids (monitoring-only) and OpenVPN server. All other VMs are behind this VM in local networks (virtual vmbr devices). Except one of the IPs which is set as WAN, the rest are set up as Virtual IPs of type "IP alias". Since one of the VMs is a reverse proxy to various web services in the internal network, its local IP is set up as 1:1 NAT to the virtual IPs in use. An ICMP rule on the WAN allows ICMP echo requests.

    The problem:
    Let's assume IP1 and IP2 are virtual IPs with set up 1:1 NAT to the reverse proxy VM local IP.
    The following behaviour is experienced:

    • ping IP1 - works
    • ping IP2 - not working between 1 and 4 pings then starts replying
    • ping IP2 - works
    • ping IP1 - not working between 1 and 4 pings then starts replying
    • latter happens every time I alternate the IPs when pinging

    Findings:
    The above behaviour is confirmed to happen from 3 different places where the only thing in common is the usage of TPLink router (official firmwares - one is OpenWRT-based, rest are using TPLink firmware). Strangely enough, it doesn't happen when pinging from router's diagnostic page. I see everything working as expected in lots of different networks, incl. behind a mobile TPLink 4GLTE MiFi router.

    Current packet captures observations:

    • ping IP1 - works
      Here I can see requests from my IP and replies from IP1 in the packets
    • ping IP2 - not working between 1 and 4 pings then starts replying
      WAN packet capture - For all pings that do not go through I see "No response seen to ICMP request" in for the request packet (in latest Wireshark)
      Firewall logs - nothing
      LAN packet capture - I only see the successful ICMP requests and responses and I do not see these marked with "No response seen to ICMP request"
    • ping IP2 - works
      Again, I can see requests from my IP and replies from IP2 in the packets

  • Netgate Administrator

    Try adding portforwards for ICMP to the same VM. They will override the 1:1 NAT.

    If the pings arrive on WAN but never leave LAN something must be preventing that. Possibly it's unable to create a state on LAN as one exists from the previous ping.

    Steve



  • Port forwards didn't help.

    Here's what I've found with states:
    I filtered states by ICMP and whenever I ping from my office network, I got states immediately created whenever I execute the pings. However, when alternating the pings from my home network I don't see the second state immediately, it gets created after a while.

    So I did the following test:
    I opened 2 command prompts and executed simultaneous continuous pings. I did this from both networks. While it worked ok from my office network, I can only see one of the pings working from my home network - the other one gives "Request timed out" until I cancel the other one and then in a few seconds it starts working. The second pair of states was never created for the second ping from my home network while both pairs were always created for the pings from my office network.


  • LAYER 8 Netgate

    @rebi said in Yet another ping problem with Virtual IPs:

    Yes, whenever I alternate the pings first time I have between 1 and 4 failed pings and these pings are captured as requests on WAN but not LAN ...

    Show me.


  • Netgate Administrator

    I can only see it varying by source address if you have some other rule(s) in place using those.

    Steve



  • @Derelict

    What would be the ethical way of doing it?

    Thanks!



  • @stephenw10

    Nope, except the default rules all I have is 4 rules which allow OpenVPN (UDP to "This Firewall"), ICMP and HTTP/HTTPS (TCP to Reverse Proxy Internal IP) + 2 1:1 rules for the Virtual IPs.

    Thanks!


  • Netgate Administrator

    If it was some sort of ARP issue I'd expect to see pfSense ARPing for the target in the LAN side pcap. But I can't see how that could happen since the internal VM is already in the table as the target for the previous forward.

    You can PM the pcaps to us if you need to.

    Steve



  • @stephenw10 Thank you!
    I'm not sure how to send one on this particular forum software (nodebb) ... seems like there are chats instead of regular PMs which are restricted


  • LAYER 8 Netgate

    How, exactly, are the 1:1 NATs configured?



  • @Derelict

    Interface External IP Internal IP Destination IP
    WAN VIP1 192.168.101.2 *
    WAN VIP2 192.168.101.2 *


  • LAYER 8 Netgate

    That is not 1:1 NAT. That is 2:1 NAT.


  • Netgate Administrator

    Yes. The port forwards should override that if there is some problem there but only inbound. There might still be conflicting outbound NAT causing an issue.

    Try disabling the 1:1 NAT rules and using port forwards only there.

    Steve


  • LAYER 8 Netgate

    Or put another address on the target server and 1:1 NAT the VIPs to their own addresses.



  • Yes, you're right :/ ... I'll try either using port forwards or setting up another internal IP for the second 1:1 NAT (will report the result tomorrow as I have to do it overnight)



  • Actually ... isn't 1:1 NAT simply a more convenient way of port forwarding everything to the specified destination?
    (BTW outbound NAT is set to "Automatic outbound NAT rule generation")

    Anyway, I've disabled 1:1 NATs and created separate HTTP/HTTPS/ICMP rules for each Virtual IP (without associated rules as these already exist). Unfortunately, I experience the very same behaviour, i.e. it still works just fine from my office network but alternating pings fail as per the description above from my home network.



  • I've just added a temporary local IP address on the reverse proxy VM (ip a add ...) and I've changed one of the ICMP port forwards to go to the new local IP address. I've also run iptraf on the VM to be sure that it's the VM which handles the ICMP replies, not pfSense.

    With this configuration, as expected, everything works normally (even alternate pings from my home network).

    Am I missing something from a conceptual point of view?
    I should be able to port forward to the same internal IP, moreover it already works with HTTP/HTTPS traffic ...


  • Netgate Administrator

    I suspect this is a state issue. pf is trying to open a state on the LAN interface with a source and destination IP that are identical to those of an existing state. With a TCP connection it also using the source and destination port and since the source port is random that will be different to any existing state.

    1:1 NAT rules forward all ports the same as a 1-65535 port forward would but it also NAT's traffic outbound from the internal target to the external IP. Obviously that can't happen to two IPs so one rule will fail there. I believe the first 1:1 rule will win.

    If you are doing this just to provide a ping target for external users it's probably easier to have pfSense respond to the pings on the VIPs directly and forward only TCP traffic to the proxy.

    Steve



  • I don't believe this would be the cause ... ICMP packets should be routable just like TCP and UDP which work ok (well, technically I think ICMP is just an application protocol on top of IP which is routable). Also, if this was the case I would expect it to work intermittently but it works flawlessly from my office network.

    As far as I understand, 1:1 NAT is all about incoming traffic and the part of the outgoing traffic that should go out from WAN originating from the associated VIP. Could it be that I need to change the "Automatic outbound NAT rule generation" to, say, "Hybrid Outbound NAT rule generation" and create a manual rule for ICMP? Anyway, I still don't get how it works from one place and not another ... does it work ok when you ping these IPs from your network?

    As for the pfSense answering the pings, that simply defeats the purpose of having the ping as it will only let users know whether pfSense itself is working not the services behind it.


  • Netgate Administrator

    Having looked into this further I'm certain it's a state issue.

    If you look at the failed packet capture the incoming ping requests are all using ICMP identifier number '1'. pf uses that in the state it opens in the same way it does ports for TCP/UDP. It can't open a state on LAN with the same source, destination and identifier as one that exists so nothing passes until the previous state times out.

    If you test remotely from behind a difference pfSense it will randomise the ICMP identifier when it outbound NATs the traffic. That means when it arrives at the port forward the two pings have different identifiers and two states can be created. No problem.

    It appears that when you are testing and seeing failures the client you are testing from is using the same identifier for all pings. And the router you are testing behind is not randomising the identifier on the way out.

    Nothing we can do about that in the pfSense forwarding those pings.

    Interesting. Never hit that before.

    Steve


  • Netgate Administrator

    Hmm, you could maybe have it outbound NAT the traffic to a different internal IP for ICMP only. That would randomise the identifier and allow it. 🤔
    Ugly but...might work.



  • @stephenw10
    Aaah, that makes sense now! Spot on!!! :)

    Could it be because of a double NAT? ... When I think about it, I have a GPON that handles fibre optic and an IPTV box plus my router are connected to that GPON. My router does have an internal IP on the WAN interface. I don't have access to that GPON ... at least legally (I know the account though), as it's controlled by my TV & Net provider. Probably the guys whom I asked to test do have similar setup.

    Anyway, I'm extremely grateful that you, guys, put a lot of effort into dealing with my problem!

    Thanks and have a great week!


  • Netgate Administrator

    No problem. Interesting case.
    I'd be interested to see if that outbound NAT workaround allows the pings. That would confirm it

    You would see the identifier numbers changed in the state table.

    Steve



  • I didn't quite get the workaround idea (BTW using "Automatic outbound NAT rule generation" at the moment). What rule do I have to create and will it take precedence, i.e. do I need to switch the Outbound NAT Mode?

    As for the pings, I've just confirmed it's definitely the ICMP identifier as I downloaded something called hrping (googled it) which allows setting the ICMP Id. Whenever I set one and the same ICMP Id I have the problem and once they're different (hrping does this by default) there's no problem ... bloody routers! ... though one could blame Windows as well! Here's an excerpt from Wikipedia I've just read: "The Identifier and Sequence Number can be used by the client to match the reply with the request that caused the reply. In practice, most Linux systems use a unique identifier for every ping process, and sequence number is an increasing number within that process. Windows uses a fixed identifier, which varies between Windows versions, and a sequence number that is only reset at boot time."


  • Netgate Administrator

    Yes, blame all around there. 😉

    So pf can't create the state because everything is the same. If we add an outbound NAT rule on the LAN that then chnages the states to include the NAT but more importantly pf randomises the identifier making it unique.

    So first switch outbound NAT to hybrid mode so you can add manual rules. The add a rule like:

    Selection_600.png

    It would be better to specify the source or use !this_firewall but that's not available. As it is if you try to ping the proxy VM from the firewall itself it will NAT that too. But that may not be an issue, certainly not as a test.

    Steve



  • @stephenw10
    Just tested this and it's the same. I've changed the NAT mode and added this rule:
    a4b0fe2a-04b3-492e-8517-7a8b11473a14-image.png



  • @rebi said in Yet another ping problem with Virtual IPs:

    @stephenw10
    Just tested this and it's the same. I've changed the NAT mode and added this rule:
    a4b0fe2a-04b3-492e-8517-7a8b11473a14-image.png

    @stephenw10 said in Yet another ping problem with Virtual IPs:

    If we add an outbound NAT rule on the LAN that then chnages the states to include the NAT but more importantly pf randomises the identifier making it unique.

    He said on LAN and you created it on WAN.



  • @Grimson ☹ my bad ... here it is but still doesn't work:
    6bad0fe6-697a-410b-8377-defb4d7b0829-image.png


  • Netgate Administrator

    What happens when you try to ping? Any change at all?

    Check the states in Diag > States, filter by icmp.

    Steve


Log in to reply