Pfsense in Azure - Cannot reach host on IPsec tunnel



  • @stephenw10

    Oh i didn't know that box existed for show child sa's, yes it does appear that p1 is up and p2 is down.

    I've double checked the config and it all matches up.
    IPSec log does show information on child SA, i've pasted part of the log below:
    Dec 16 17:32:49 charon 09[IKE] <con1000|46> establishing CHILD_SA con1000{1773} reqid 38
    Dec 16 17:32:49 charon 09[ENC] <con1000|46> generating CREATE_CHILD_SA request 19 [ N(ESP_TFC_PAD_N) SA No KE TSi TSr ]
    Dec 16 17:32:49 charon 09[NET] <con1000|46> sending packet: from <local>[4500] to <remote>500] (348 bytes)
    Dec 16 17:32:49 charon 09[NET] <con1000|46> received packet: from <remote>[4500] to <local>[4500] (76 bytes)
    Dec 16 17:32:49 charon 09[ENC] <con1000|46> parsed CREATE_CHILD_SA response 19 [ N(NO_PROP) ]
    Dec 16 17:32:49 charon 09[IKE] <con1000|46> received NO_PROPOSAL_CHOSEN notify, no CHILD_SA built
    Dec 16 17:32:49 charon 09[CFG] <con1000|46> configured proposals: ESP:AES_CBC_256/HMAC_SHA2_256_128/MODP_1024/NO_EXT_SEQ
    Dec 16 17:32:49 charon 09[IKE] <con1000|46> failed to establish CHILD_SA, keeping IKE_SA
    Dec 16 17:32:49 charon 09[CHD] <con1000|46> CHILD_SA con1000{1773} state change: CREATED => DESTROYING
    Dec 16 17:32:49 charon 09[IKE] <con1000|46> activating new tasks
    Dec 16 17:32:49 charon 09[IKE] <con1000|46> nothing to initiate


  • Netgate Administrator

    @bhodges said in Pfsense in Azure - Cannot reach host on IPsec tunnel:

    NO_PROPOSAL_CHOSEN

    Yup so some mismatch with the other side, not the actual subnets:
    https://docs.netgate.com/pfsense/en/latest/vpn/ipsec/ipsec-troubleshooting.html#phase-2-encryption-algorithm-mismatch

    It's connected as AES-128/SHA1 at phase1. It's common to use the same at phase2 but not required. Do you know what the other side is set to?

    Steve



  • @stephenw10

    For phase 2 it is set to AES-256 and SHA-256 which is what pfsense is set to so i'm not sure it is a mismatch?:

    caabb19f-f1ce-4b86-bd09-bed3be9f1b40-image.png


  • Netgate Administrator

    Different PFS key set at the other side?

    The other side is sending that response so logs from that would probably show exactly what's not matching.

    Steve



  • Hey Stephen,

    Ok phase2 is up and the issue was with NAT. I added a NAT rule in phase2 for the whole network which pfsense sits on.

    I can now ping from within pfsense to a server down the tunnel and vice versa.

    My next issue is that i can't ping the server on the other side of the tunnel from another VM in the same network or another network.

    I've setup ip forwarding on the pfsense interfaces in azure and created a route table to which has the address space on other side of tunnel to go through the pfsense LAN IP address.

    This is the response i get if i tracert from the other VM:

    Tracing route to <Server> [IP]
    over a maximum of 30 hops:

    1 1 ms <1 ms 1 ms <LANIP>
    2 * * * Request timed out.
    3 <LANIP> reports: Destination host unreachable.

    it's being picked up on firewall logs and is being accepted:
    eec17d44-5f15-449b-aebb-c0273afb3ee6-image.png

    do i need to be setting up a NAT forward so it knows what to do with the packet when it receives one destined for a host down the tunnel?

    thanks
    Ben


  • Netgate Administrator

    If you can ping from pfSense to something across the VPN when selecting the appropriate source but not from another host in that subnet it sounds like a routing issue at the host.
    You might have policy routing in place that re-routes traffic from the host but would not affect traffic from pfSense itself.

    Steve



  • it sounds as though i have the same issue as mbogoev explained above.

    it seems as though the host where the traffic is originating from is routing traffic correctly to the pfsense LAN interface but it doesn't go beyond that.

    in pfsense i have a static route for the network on the other side of the tunnel. If i disable this then tracert gets 'request timed out' errors for 30 hops. If i enable this then i get one 'request timed out' and then 'destination host unreachable' - the same error i sent in my previous message.

    This is what i see in the packet capture:

    12:18:51.842993 IP 10.233.2.4 > 10.128.2.181: ICMP echo request, id 1, seq 500, length 72
    12:18:51.844700 IP 10.233.2.4 > 10.128.2.181: ICMP echo request, id 1, seq 501, length 72
    12:18:51.846129 IP 10.233.2.4 > 10.128.2.181: ICMP echo request, id 1, seq 502, length 72
    12:18:52.862071 IP 10.233.2.4 > 10.128.2.181: ICMP echo request, id 1, seq 503, length 72
    12:18:52.862157 ARP, Request who-has 10.128.2.181 tell 10.239.3.5, length 28
    12:18:56.853012 IP 10.233.2.4 > 10.128.2.181: ICMP echo request, id 1, seq 504, length 72
    12:18:56.853074 ARP, Request who-has 10.128.2.181 tell 10.239.3.5, length 28
    12:19:00.860894 IP 10.233.2.4 > 10.128.2.181: ICMP echo request, id 1, seq 505, length 72
    12:19:00.860954 ARP, Request who-has 10.128.2.181 tell 10.239.3.5, length 28
    12:19:04.864868 IP 10.233.2.4 > 10.128.2.181: ICMP echo request, id 1, seq 506, length 72
    12:19:04.864932 ARP, Request who-has 10.128.2.181 tell 10.239.3.5, length 28
    12:19:08.857729 IP 10.233.2.4 > 10.128.2.181: ICMP echo request, id 1, seq 507, length 72
    12:19:08.857809 ARP, Request who-has 10.128.2.181 tell 10.239.3.5, length 28
    12:19:12.861699 IP 10.233.2.4 > 10.128.2.181: ICMP echo request, id 1, seq 508, length 72
    12:19:12.861766 ARP, Request who-has 10.128.2.181 tell 10.239.3.5, length 28

    it doesn't appear to be responding to the ARP requests


  • Netgate Administrator

    Where did you capture that? What hosts are those IPs shown there?

    What exactly is the static route you have in place?



  • @stephenw10

    i clicked on packet capture from the diagnostics menu within pfsense.

    10.233.2.4 is just a test vm in another network i've created
    10.128.2.181 is IP on other end of IPsec tunnel
    10.239.3.5 is the LAN IP of pfsense


  • Netgate Administrator

    I assume you're capturing on the LAN interface?

    It looks like you have a subnet conflict though. pfSense is ARPing for 10.128.2.181 from 10.239.3.5 it will only do that if they are inside the same subnet. It would be a huge subnet configured there though.

    Steve



  • @stephenw10

    yes that's right, i am capturing on the LAN interface.

    It has to come from 10.239.3.5 though? the test vm has to have a route table telling it to go to the LAN address of pfsense so it can contact the 10.128.2.181 because 10.239.3.5 is the only interface which knows of this network. So i've created a route table telling the VM to send all traffic for 10.128.2.0/24 through 10.239.3.5 and this is picked up on tracert, firewall logs on pfsense and packet trace.

    How else could VM's use the IPsec tunnel?

    thanks
    Ben


  • Netgate Administrator

    The static route is in Azure then? That's correct if so.

    Still left with pfSense trying to ARP for a device in a subnet on the other end of the VPN. That should never happen.

    It implies you have the LAN subnet configured as something huge that contains it like /9.

    Steve



  • Yes there is a static route in azure which is..

    24f4b9d7-cced-4194-8f2e-bf488253c90a-image.png

    then one in pfsense which is..
    ea7e97e3-1b24-48e2-9c15-0f473acc1802-image.png

    and then i've setup a firewall nat outbound rule:

    fceb7e2b-fd86-40b7-b215-92d6709ecfbd-image.png

    "It implies you have the LAN subnet configured as something huge that contains it like /9." the lan subnet is a /24, i have setup NAT on phase 2 of the whole network /16 and this is only setting which works.

    thanks
    Ben


  • Netgate Administrator

    Ok you should not need that static route in pfSense unless you need to send traffic from the firewall itself in which case it can be applied as a workaround:
    https://docs.netgate.com/pfsense/en/latest/vpn/ipsec/accessing-firewall-services-over-ipsec-vpns.html

    You should never apply NAT using pf on the IPSec interface, that's what is breaking the traffic here. The IPSec interface does not have an address in tunnel mode.
    Any NAT you need across the IPSec tunnel must be configured in the P2 settings.

    Steve



  • Ok now that i have removed the NAT outbound rule and the gateway the test vm isn't routing to pfsense LAN at all and just gets timeouts. i've set everything back up again but still it isn't routing to the LAN. I have rebooted the box and things seem to be back to normal again.

    The link you have sent is what i have in place already. I have a lan gateway setup with a static route. Or are you saying i shouldn't have that in place for my scenario?

    I have disabled the NAT outbound rule but only put it there as things aren't routing and i'm assuming pfsense LAN doesn't know what to do with the packet when it receives it.

    I essentially just need pfsense to act like the role in windows server which handles routing so when a VM sends a packet for a VM it routes to pfsense who then forwards that packet onto the destination since it is the only one who knows where to send it. i've set the azure side up the same in this scenario but just don't know what i'm missing in pfsense?

    thanks
    Ben


  • Netgate Administrator

    You need the static route in Azure otherwise VMs there have no way to reach the subnet over the VPN.

    You only need a static route in pfSense for services on the firewall itself. It is not required for connectivity from other VMs.

    You cannot have a NAT rule on the IPSec interface it will break the traffic. If you need NAT between those subnets though it should be in the IPSec P2 config. You probably don't need it though, since those subnets do not overlap, unless the other side is configured for some other subnet.

    Other than that you just need to have firewall rules to allow the traffic into the firewall on LAN.

    Steve



  • Hey Stephen,

    I'm a little lost here, the traffic is being captured on the firewall and it is being allowed on both sides so it doesn't seem to be a firewall blocking issue. I have also turned off windows firewall on the test vm for testing purposes.

    I've now changed the NAT outbound to automatic mode but it hasn't made any difference.

    I still get this response "Destination host unreachable." when i try to tracert to the server on other side of tunnel from test vm.

    I am not sure what else i can change?

    thanks
    Ben


  • Netgate Administrator

    Ok so start a continuous ping to the VM in Azure from the other side of the tunnel. Now run pcaps on the IPSec and then LAN interfaces filtered by the source IP you're pinging from.

    Do you see the pings in both pcaps?

    If it's not leaving the LAN check the WAN. We have seen Azure do some weird things with traffic like that.

    Steve



  • so in IPsec pcaps if i ping pfsense LAN address it comes up but if i ping the test VM it doesn't.

    if i do the same for LAN pcaps nothing comes up for LAN address or test VM

    If i change to WAN neither come up but the public IP address on the other side of the tunnel comes up when i don't run any ping commands.

    thanks
    Ben


  • Netgate Administrator

    So when you ping the VM IP from the remote end of the tunnel that traffic never arrives over IPSec at the pfSense instance in Azure?

    Then something is blocking it at the remote end or the VPN is not configured correctly so it doesn't carry that traffic.

    Steve



  • Hi Steve,

    "the VPN is not configured correctly so it doesn't carry that traffic." - the pfsense VPN or the VPN we have on other side? I have followed all of the instructions from numerous videos (which all seem to be different) and numerous online guides and what you've said above so what else could it be? My only guess is routing under 'system' or NAT under 'firewall' but i have tried numerous different settings but it has made no difference other than breaking it further.

    "Then something is blocking it at the remote end" i have the firewall switched off on the test vm and the vm on other side of tunnel so it shouldn't be being blocked.

    Do you have any other suggestions what could be causing this?

    thanks
    Ben


  • Netgate Administrator

    The configured phase 2 policy has to carry the traffic you are sending. If it doesn't match it won't see the traffic as interesting and it won't grab it and send it over the tunnel.

    The defined P2 has to match at both ends or it won't establish. We know it is established since you could ping over it from pfSense itself but what exactly has it established as? pfSense will allow a P2 to establish that is a smaller subnet defined within whatever it has configured. So for example if pfSense has 10.239.0.0/16 defined as the local subnet and the other end has 10.239.3.0/24 pfSense will allow the /24 to establish but of course that will not carry traffic from anything else in the /16.

    Looking back your LAN IP is 10.239.3.5 but the VM you are trying to connect from/to is in a different subnet so the established P2 might not carry that.

    Show us the established P2.

    You also mentioned adding NAT in the P2 but I don't see that other than the actual NAT rule which was breaking things.

    Additionally it's not clear why pfSense was ARPing for 10.128.2.181. It must have an interface in that subnet for that to happen.

    Steve



  • ok here is the established P2:

    699b29fc-f659-4a90-be87-fbe46e9d76fd-image.png

    the local subnets doesn't look right as i would expect to see the range there but even if i change this it doesn't make much difference

    150ece96-7029-48fa-b5d8-251ffc001c1f-image.png

    what do you mean by "Looking back your LAN IP is 10.239.3.5 but the VM you are trying to connect from/to is in a different subnet so the established P2 might not carry that.". I am trying to test connection to the vm on other side of tunnel with a vm in a different network in azure.

    doing a packet capture now i can't see it ARPing anymore it just shows this:

    17:17:00.607947 IP 10.233.2.4 > 10.239.3.5: ICMP echo request, id 1, seq 710, length 40
    17:17:01.615256 IP 10.233.2.4 > 10.239.3.5: ICMP echo request, id 1, seq 711, length 40
    17:17:01.772138 IP 10.233.2.4 > 10.128.2.181: ICMP echo request, id 1, seq 712, length 72
    17:17:05.778564 IP 10.233.2.4 > 10.128.2.181: ICMP echo request, id 1, seq 713, length 72


Log in to reply