Can only communicate in one direction. (A bit complicated.)



  • Well.. I'm bassically stuck on an issue I've encountered. I've set up site to site openvpn tunnels with pfsense in the past (we have 3 different ones currently), but for some reason I am just stuck. lol The scenerio I currently have is this:
    I have an HA cluster at an offsite location with two pfsense 2.3.4 boxes. xmlrpc sync is enabled between them, state sync is off(conflicts with limiters which will be used in the fute, so I have it off) and there are carp ip's for both the lan and the wan. Our Main office has a single PFSense machine (2.3-RELEASE). Currently, there are no multiwan vpn setups. We do have two WAN connections at site A, but right now I'm just trying to get a single, stable vpn connection up from interface "WAN2" to Site B. The problem is that I am only able to communicate in one direction through the vpn tunnel. For the most part, I can't communicate from site A to site B. I can ping Site B's firewall from machines on Site A, but not any other machines. I also can't do a traceroute to site B's firewall. :/ All communication seems to be fine if it is initiated by a machine on Site B, trying to contact machines on site A. I can ping/trace/etc.. from site B to site A.

    Site A = Main Office ( OpenVPN Server )
    Site B = Colocation ( OpenVPN Client )

    (192.168.0.0/21) <-Local Subnet… (And yes, it's very wierd... it's from the last guy that worked here.)
                |SITE A| <-Running Openvpn server, tied to wan address and port 1197
                (1.2.3.1)
                    |
                INTERNET (Open VPN Tunnel is 172.16.50.0/30)
                    |
                (2.2.3.3) <-Carp Wan IP, OpenVPN Client connecting through this address.
                /       
        (2.2.3.1)    (2.2.3.2)
    [SITE B Primary] [SITE B Secondary]
    (172.16.172.1)  (172.16.172.2)
                \      /
              (172.16.172.3) <-Carp LAN IP
                    |
                [SWITCH] <-Layer 2
                    |
            (172.16.172.101) <-Set Static and with the Carp LAN ip as GW
                [SERVER 1]

    FROM SITE A TO SITE B. (Notice I can ping the firewall but not host 172.16.172.101.)

    
    brom@brom-D415MT-BM2DK ~ $ ping 172.16.172.1
    PING 172.16.172.1 (172.16.172.1) 56(84) bytes of data.
    64 bytes from 172.16.172.1: icmp_seq=1 ttl=63 time=35.7 ms
    64 bytes from 172.16.172.1: icmp_seq=2 ttl=63 time=35.8 ms
    64 bytes from 172.16.172.1: icmp_seq=3 ttl=63 time=35.5 ms
    ^C
    --- 172.16.172.1 ping statistics ---
    3 packets transmitted, 3 received, 0% packet loss, time 2002ms
    rtt min/avg/max/mdev = 35.548/35.733/35.898/0.210 ms
    brom@brom-D415MT-BM2DK ~ $ traceroute 172.16.172.1
    traceroute to 172.16.172.1 (172.16.172.1), 30 hops max, 60 byte packets
     1  fw01.pride1.local (192.168.5.254)  0.175 ms  0.163 ms  0.151 ms
     2  * * *
     3  * * *
     4  * * *
     5  * * *
     6  * * *
     7  * * *
     8  * * *
     9  * * *
    10  * * *
    11  * * *
    12  * *^C
    brom@brom-D415MT-BM2DK ~ $ ping 172.16.172.101
    PING 172.16.172.101 (172.16.172.101) 56(84) bytes of data.
    ^C
    --- 172.16.172.101 ping statistics ---
    10 packets transmitted, 0 received, 100% packet loss, time 8999ms
    
    

    I CAN ping host 172.16.172.101 from the firewall at site B. I've disabled the local firewall on host 172.16.172.101.

    FROM SITE B TO SITE A

    
    C:\Users\Admin>ping 192.168.1.30
    
    Pinging 192.168.1.30 with 32 bytes of data:
    Reply from 192.168.1.30: bytes=32 time=37ms TTL=62
    Reply from 192.168.1.30: bytes=32 time=35ms TTL=62
    Reply from 192.168.1.30: bytes=32 time=35ms TTL=62
    
    Ping statistics for 192.168.1.30:
        Packets: Sent = 3, Received = 3, Lost = 0 (0% loss),
    Approximate round trip times in milli-seconds:
        Minimum = 35ms, Maximum = 37ms, Average = 35ms
    Control-C
    ^C
    C:\Users\Admin>tracert 192.168.1.30
    
    Tracing route to BROM-D415MT-BM2 [192.168.1.30]
    over a maximum of 30 hops:
    
      1    <1 ms    <1 ms    <1 ms  172.16.172.1
      2    36 ms    35 ms    35 ms  172.16.50.1
      3    35 ms    35 ms    35 ms  BROM-D415MT-BM2 [192.168.1.30]
    
    Trace complete.
    
    C:\Users\Admin>
    
    

    Here is the conf for openvpn from Site A's firewall..

    
    dev ovpns5
    verb 1
    dev-type tun
    tun-ipv6
    dev-node /dev/tun5
    writepid /var/run/openvpn_server5.pid
    #user nobody
    #group nobody
    script-security 3
    daemon
    keepalive 10 60
    ping-timer-rem
    persist-tun
    persist-key
    proto udp
    cipher AES-256-CBC
    auth SHA1
    up /usr/local/sbin/ovpn-linkup
    down /usr/local/sbin/ovpn-linkdown
    local 1.2.3.1
    ifconfig 172.16.50.1 172.16.50.2
    lport 1197
    management /var/etc/openvpn/server5.sock unix
    route 172.16.172.0 255.255.255.0
    secret /var/etc/openvpn/server5.secret 
    
    

    And here is the conf from the openvpn client on Site B's firewall..

    
    dev ovpnc1
    verb 1
    dev-type tun
    tun-ipv6
    dev-node /dev/tun1
    writepid /var/run/openvpn_client1.pid
    #user nobody
    #group nobody
    script-security 3
    daemon
    keepalive 10 60
    ping-timer-rem
    persist-tun
    persist-key
    proto udp
    cipher AES-256-CBC
    auth SHA1
    up /usr/local/sbin/ovpn-linkup
    down /usr/local/sbin/ovpn-linkdown
    local 2.2.3.3
    lport 0
    management /var/etc/openvpn/client1.sock unix
    remote 1.2.3.1 1197
    ifconfig 172.16.50.2 172.16.50.1
    route 192.168.0.0 255.255.248.0
    secret /var/etc/openvpn/client1.secret 
    resolv-retry infinite
    
    

    From the logs I can see that the VPN service is stable and not flapping/going up and down. Below is the activity since yesterday.

    
    Jul 13 16:41:10 	openvpn 	69474 	Peer Connection Initiated with [AF_INET]2.2.3.3:45950
    Jul 13 16:40:46 	openvpn 	69474 	Peer Connection Initiated with [AF_INET]2.2.3.3:25270
    Jul 13 15:40:01 	openvpn 	69474 	Initialization Sequence Completed
    Jul 13 15:40:00 	openvpn 	69474 	Peer Connection Initiated with [AF_INET]2.2.3.3:43229
    Jul 13 15:40:00 	openvpn 	69474 	UDPv4 link remote: [undef]
    Jul 13 15:40:00 	openvpn 	69474 	UDPv4 link local (bound): [AF_INET]1.2.3.1:1197
    Jul 13 15:40:00 	openvpn 	69474 	/usr/local/sbin/ovpn-linkup ovpns5 1500 1560 172.16.50.1 172.16.50.2 init
    Jul 13 15:40:00 	openvpn 	69474 	/sbin/ifconfig ovpns5 172.16.50.1 172.16.50.2 mtu 1500 netmask 255.255.255.255 up
    Jul 13 15:40:00 	openvpn 	69474 	do_ifconfig, tt->ipv6=1, tt->did_ifconfig_ipv6_setup=0
    Jul 13 15:40:00 	openvpn 	69474 	TUN/TAP device /dev/tun5 opened
    Jul 13 15:40:00 	openvpn 	69474 	TUN/TAP device ovpns5 exists previously, keep at program end
    Jul 13 15:40:00 	openvpn 	69474 	NOTE: the current --script-security setting may allow this configuration to call user-defined scripts 
    

    PS: I replaced the IP's listed in the logs with the fake ones used in the diagram.

    The openVPN interfaces on both SITE A's firewall and Site B's firewalls are set to allow anything through the vpn tunnel. Additionally, the WAN2 connection on Site A's firewall has a rule to allow from UDP port 1197. LAN and OpenVPN interfaces on both sites A and B allow anything as well. These firewall rules are up at the Top.

    SITE A WAN2 FIREWALL RULE (Rule at Top)

    
    Protocol         Source      Port  Destination  Port     Gateway   Queue   Schedule
    IPv4 TCP/UDP    2.2.3.3       *         *     1197-1198     *        *       none
    
    

    SITE A OPENVPN FIREWALL RULE (Rule at Top)

    
    Protocol         Source      Port  Destination    Port     Gateway   Queue   Schedule
    IPv4 *             *          *         *           *         *        *      none
    
    

    SITE A LAN1 FIREWALL RULE (Rule at Top)

    
    Protocol         Source      Port  Destination      Port     Gateway   Queue   Schedule
    IPv4 TCP/UDP     LAN net       *        *             *         *        *      none
    
    

    SITE B OPENVPN FIREWALL RULE (Rule at Top)

    
    Protocol         Source      Port  Destination    Port     Gateway   Queue   Schedule
    IPv4 *             *          *         *           *         *        *      none
    
    

    SITE B LAN1 FIREWALL RULE (Rule at Top)

    
    Protocol         Source      Port  Destination    Port     Gateway   Queue   Schedule
    IPv4 *          LAN net        *       *            *         *        *      none
    
    

    Outgoing NAT for Site A is set to manual.. I have tried both not having any additional outgoing NAT rules (which I believe should be correct) and also have tried having them, but it doesn't seem to make a difference in the behaviour. Below is the outgoing NAT rule I currently have in place on Site A.

    SITE A OUTGOING NAT (The Rule below is at the top of the list.)

    
    Interface             Source         Source Port        Destination       NAT Address      NAT Port      Static Port
    WAN2             192.168.0.0/21           *           172.16.172.0/24     WAN2 address         *        Randomized Port
    --- Various Other rules Below --
    
    

    And these are the current Outgoing NAT rules on Site B's firewall. It is set to Hybrid and has set 2.2.3.3 as the outgoing address.
    SITE B OUTGOING NAT

    
    Interface                   Source                         Source Port        Destination       NAT Address      NAT Port      Static Port 
    WAN                     172.16.172.0/24                         *                  *               2.2.3.3          *        Randomized Port
    ---Automatic Rules:---
    WAN          127.0.0.0/8 172.16.172.0/24 172.16.50.0/24         *                  *               WAN Address      *    "Keep Source Port Static"       
    WAN          127.0.0.0/8 172.16.172.0/24 172.16.50.0/24         *                  *               WAN Address      *        Randomized Port
    
    

    Packet Captures are showing that the pings are making their way from Site A to site B, reaching the destination host at site B, getting a response from that host, and then timing out trying to get back to site A. I'm not seeing the response coming back to site A. This same behaviour shows up when not using the CARP Wan IP address for the Openvpn address and removing the hybrid NAT rules.

    Well… I've dumped every bit of relevant info I can think of in this post, but just let me know if you'd like more or would like some packet captures. Thank you again for any help you guys may give me! My brain is so shot from the various issues I've ran into unrelated to this that I'm just not seeing what is wrong here..


  • LAYER 8 Netgate

    Packet Captures are showing that the pings are making their way from Site A to site B, reaching the destination host at site B, getting a response from that host, and then timing out trying to get back to site A. I'm not seeing the response coming back to site A.

    Please post a quick packet capture showing this. Please take the capture on the 172.16.172.X interface on whichever node is currently CARP MASTER.

    Set detail to advanced and hit view so we get the MAC addresses, etc.

    Thanks.



  • No problem, I'll get right on it!

    Not going to lie… I was hoping you were going to come back and say, "Stupid person! You forgot this super obvious thing! Just check box a and you'll be good." lol



  • Ok, here is a ping from a workstation on site b (172.16.172.52) to a machine at site A. (Captured on Interface LAN)  It behaves as expected.. (I filtered the output using '192.168.1.30,172.16.172.52' in the host address field.)

    06:16:13.846635 e8:40:f2:73:28:a9 > 00:00:5e:00:01:02, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 128, id 17628, offset 0, flags [none], proto ICMP (1), length 60)
        172.16.172.51 > 192.168.1.30: ICMP echo request, id 3, seq 52082, length 40
    06:16:13.880564 0c:c4:7a:7f:97:62 > e8:40:f2:73:28:a9, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 62, id 20767, offset 0, flags [none], proto ICMP (1), length 60)
        192.168.1.30 > 172.16.172.51: ICMP echo reply, id 3, seq 52082, length 40
    06:16:14.851254 e8:40:f2:73:28:a9 > 00:00:5e:00:01:02, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 128, id 17630, offset 0, flags [none], proto ICMP (1), length 60)
        172.16.172.51 > 192.168.1.30: ICMP echo request, id 3, seq 52083, length 40
    06:16:14.885552 0c:c4:7a:7f:97:62 > e8:40:f2:73:28:a9, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 62, id 20997, offset 0, flags [none], proto ICMP (1), length 60)
        192.168.1.30 > 172.16.172.51: ICMP echo reply, id 3, seq 52083, length 40
    06:16:15.855463 e8:40:f2:73:28:a9 > 00:00:5e:00:01:02, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 128, id 17632, offset 0, flags [none], proto ICMP (1), length 60)
        172.16.172.51 > 192.168.1.30: ICMP echo request, id 3, seq 52084, length 40
    06:16:15.889537 0c:c4:7a:7f:97:62 > e8:40:f2:73:28:a9, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 62, id 21096, offset 0, flags [none], proto ICMP (1), length 60)
        192.168.1.30 > 172.16.172.51: ICMP echo reply, id 3, seq 52084, length 40
    06:16:16.865600 e8:40:f2:73:28:a9 > 00:00:5e:00:01:02, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 128, id 17637, offset 0, flags [none], proto ICMP (1), length 60)
        172.16.172.51 > 192.168.1.30: ICMP echo request, id 3, seq 52085, length 40
    06:16:16.900040 0c:c4:7a:7f:97:62 > e8:40:f2:73:28:a9, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 62, id 21205, offset 0, flags [none], proto ICMP (1), length 60)
        192.168.1.30 > 172.16.172.51: ICMP echo reply, id 3, seq 52085, length 40

    And the ping from a workstation on site A to site B doesn't show as coming through the vpn tunnel at all when viewed from site B. :( Sorry, I know it did at some point because I took packet captures earlier, but that doesn't appear to be the case now… What I can show though is that it is showing as coming through the openvpn tunnel at site A. (This is monitoring the openvpn virtual interface at site A.)

    17:32:04.746976 AF IPv4 (2), length 88: (tos 0x0, ttl 63, id 45740, offset 0, flags [DF], proto ICMP (1), length 84)
        192.168.1.30 > 172.16.172.52: ICMP echo request, id 19109, seq 1, length 64
    17:32:05.755154 AF IPv4 (2), length 88: (tos 0x0, ttl 63, id 45837, offset 0, flags [DF], proto ICMP (1), length 84)
        192.168.1.30 > 172.16.172.52: ICMP echo request, id 19109, seq 2, length 64
    17:32:06.762968 AF IPv4 (2), length 88: (tos 0x0, ttl 63, id 45843, offset 0, flags [DF], proto ICMP (1), length 84)
        192.168.1.30 > 172.16.172.52: ICMP echo request, id 19109, seq 3, length 64
    17:32:07.770909 AF IPv4 (2), length 88: (tos 0x0, ttl 63, id 46000, offset 0, flags [DF], proto ICMP (1), length 84)
        192.168.1.30 > 172.16.172.52: ICMP echo request, id 19109, seq 4, length 64
    17:32:08.778926 AF IPv4 (2), length 88: (tos 0x0, ttl 63, id 46083, offset 0, flags [DF], proto ICMP (1), length 84)
        192.168.1.30 > 172.16.172.52: ICMP echo request, id 19109, seq 5, length 64
    17:32:09.786900 AF IPv4 (2), length 88: (tos 0x0, ttl 63, id 46201, offset 0, flags [DF], proto ICMP (1), length 84)
        192.168.1.30 > 172.16.172.52: ICMP echo request, id 19109, seq 6, length 64

    So… Hmm....


  • LAYER 8 Netgate

    Check the local, "software" firewall on 172.16.172.52. If the echo requests are going out and nothing is coming back either the replies are not being sent, they are being filtered, or they are being sent someplace else.



  • Ah, I actually had the firewall off for that test..

    I essentially hit the point where I said, "Everything is configured right. What else can I try?" and then FINALLY had the good sense to reboot all firewalls involved. That fixed everything. ^^;;;

    Incredibly frustrating, but I seem to have been neglecting the IT creed of "Have you tried turning it off and on again?" ^_^;;; Still not sure why it fixed anything… I had looked at the routing tables and all the routing seemed correct. But such is life I guess. lol


  • LAYER 8 Netgate

    The routing was correct. The packets were being sent out the correct interface.

    Rebooting other devices must have cleared something elsewhere.

    Glad you got it sorted out.


Log in to reply