Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Multi-WAN setup with OpenVPNs flaky

    Scheduled Pinned Locked Moved Routing and Multi WAN
    3 Posts 1 Posters 519 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • C
      cmsdloma
      last edited by cmsdloma

      Hi,

      I have pfSense 2.4.5, and I recently added another WAN interface for 5G mobile data and setup the gateway group, and I got everything working. However, it's been very unreliable lately, and I've spent day and night trying to fix it. Going mad now so I've joined the forums for help, please.

      I have always had an OpenVPN client (now AirVPN via UDP), but I created a 2nd instance for the 2nd WAN. Each WAN has 1 OpenVPN client associated with it. The two OpenVPN client interfaces are members of a Gateway Group. The firewall rules send most LAN traffic out to the Gateway Group for combined bandwidth.

      Because the 5G reception is flaky (separate issue being sorted out), sometimes that OpenVPN client goes down. I've configured the Gateway to recognize this via the "High latency or packet loss" option, and it goes down for short periods a few times an hour.

      All good up until now, but what happens is this. When the 5G gateway goes down, all outbound traffic stops, even over my old broadband WAN. My SSH session to the firewall gets killed. The traffic flow graph on the dashboard page shows traffic going outbound, but nothing coming back inbound. When I try to ping any IP from a LAN client, I get timeouts or destination host unreachable. On the pfSense shell, I get no route to host. Everything completely packs in. If I wait a few minutes it comes back up and eventually goes back to normal. But all VoIP/Video calls are killed. Sometimes it happens very frequently so becomes impossible to make stable calls.

      When things are working, and I print the routes, there is never a default route. Is this normal with a Multi-WAN setup? When I try to install an additional package from the front end, the list of available packages is empty. When I try to do pkg update from the Shell, I get an error; no route to host. (Even with both WANs up). When I do I fresh install and restore my configuration, I always get an error that packages can't be (re)installed, because there was no internet access.

      I'm convinced all of these problems all point to routing being screwed somehow. When I run nststat -rWh, I get this:

      [2.4.5-RELEASE][root@pfSense.int]/root: netstat -rWh
      Routing tables

      Internet:
      Destination Gateway Flags Use Mtu Netif Expire
      one.one.one.one 10.20.204.90 UGHS 174371 16384 lo0
      one.one.one.one 10.24.200.64 UGHS 163068 16384 lo0
      dns.google 192.168.8.1 UGHS 77886 1500 vtnet2
      10.20.204.0/24 10.20.204.1 UGS 0 1500 ovpnc7
      10.20.204.1 link#10 UH 0 1500 ovpnc7
      10.20.204.90 link#10 UHS 0 16384 lo0
      10.24.200.0/24 10.24.200.1 UGS 0 1500 ovpnc8
      10.24.200.1 link#11 UH 0 1500 ovpnc8
      10.24.200.64 link#11 UHS 19386 16384 lo0
      localhost link#4 UH 2639 16384 lo0
      192.168.8.0/24 link#3 U 78 1500 vtnet2
      192.168.8.253 link#3 UHS 0 16384 lo0
      192.168.21.0/24 link#1 U 0 1500 vtnet0
      192.168.21.253 link#1 UHS 0 16384 lo0
      192.168.42.0/24 link#2 U 39938647 1500 vtnet1
      pfSense link#2 UHS 0 16384 lo0

      Internet6:
      Destination Gateway Flags Use Mtu Netif Expire
      localhost link#4 UH 0 16384 lo0
      fe80::%vtnet0/64 link#1 U 0 1500 vtnet0
      fe80::20c:29ff:fe4f:8886%vtnet0 link#1 UHS 0 16384 lo0
      fe80::%vtnet1/64 link#2 U 0 1500 vtnet1
      fe80::20c:29ff:fe4f:8890%vtnet1 link#2 UHS 0 16384 lo0
      fe80::%vtnet2/64 link#3 U 0 1500 vtnet2
      fe80::24ae:e4ff:fed7:9170%vtnet2 link#3 UHS 0 16384 lo0
      fe80::%lo0/64 link#4 U 0 16384 lo0
      fe80::1%lo0 link#4 UHS 0 16384 lo0
      fe80::%ovpns1/64 link#9 U 0 1500 ovpns1
      fe80::2bd:16ff:fe1b:ff01%ovpns1 link#9 UHS 0 16384 lo0
      fe80::20c:29ff:fe4f:8886%ovpnc7 link#10 UHS 0 16384 lo0
      fe80::20c:29ff:fe4f:8886%ovpnc8 link#11 UHS 0 16384 lo0

      (All IPv6 is turned off).

      The Interfaces are:

      DMZ (wan) -> vtnet0 -> v4: 192.168.21.253/24 (my telephone broadband WAN)
      LAN (lan) -> vtnet1 -> v4: 192.168.42.253/24 (my internal LAN)
      OPENVPNCLIENTDMZ (opt1) -> ovpnc7 -> v4: 10.20.204.90/24 (my broadband OpenVPN client)
      OPENVPNSERVER (opt2) -> ovpns1 -> A Server I run (not an issue)
      OPENVPNLANBRIDGEINTERFACE (opt3) -> bridge0 -> (Bridge for my server)
      HUAWEI (opt4) -> vtnet2 -> v4: 192.168.8.253/24 (my 5G mobile WAN)
      OPENVPNCLIENTHUAWEI (opt5) -> ovpnc8 -> v4: 10.24.200.64/24 (my 5G mobile OpenVPN client)

      I use 8.8.8.8 as a monitor IP on vtnet2. I have already disabled the Monitoring on wan - it's considered always up. The OpenVPN client interfaces use the P2P tunnel end as a monitor IP. opt4 goes down occasionally, but this is not in a GW group. opt5 goes down when opt4 goes down.

      I guess my questions are:

      How can I approach fixing my routes? Why does all connectivity seize up every time one of the VPNs go down? How can I fix my packages because there's no default route.

      I'm happy to post config or anything - but the whole XML config backup is quite big and I'll have to strip out the certs etc. Let me know if its needed.

      Thanks in advance.

      Dave

      C 1 Reply Last reply Reply Quote 0
      • C
        cmsdloma @cmsdloma
        last edited by

        The main cause of everything hanging up was this option:

        System -> Advanced -> Miscellaneous
        "Flush all states when a gateway goes down"

        I had checked this option years ago when I only had one WAN interface.

        I still have the doubt about the default route though.

        1 Reply Last reply Reply Quote 0
        • C
          cmsdloma
          last edited by cmsdloma

          I'm still having severe problems with routing.

          When I ping 1.1.1.1 or 1.0.0.1 from the pfSense shell, it goes into a routing loop and exhausts the TTL.

          When I ping 8.8.8.8 or 8.8.4.4, I often get "no route to host". Sometimes it works.
          But if I specify the source address, it works well:

          [2.4.5-RELEASE][root@pfSense.int]/root: ping -S 10.20.204.90 8.8.4.4
          PING 8.8.4.4 (8.8.4.4) from 10.20.204.90: 56 data bytes
          64 bytes from 8.8.4.4: icmp_seq=0 ttl=116 time=21.044 ms
          64 bytes from 8.8.4.4: icmp_seq=1 ttl=116 time=20.887 ms
          64 bytes from 8.8.4.4: icmp_seq=2 ttl=116 time=21.234 ms
          64 bytes from 8.8.4.4: icmp_seq=3 ttl=116 time=21.606 ms
          
          [2.4.5-RELEASE][root@pfSense.int]/root: ping -S 10.20.204.90 8.8.8.8
          PING 8.8.8.8 (8.8.8.8) from 10.20.204.90: 56 data bytes
          64 bytes from 8.8.8.8: icmp_seq=0 ttl=116 time=21.235 ms
          64 bytes from 8.8.8.8: icmp_seq=1 ttl=116 time=20.973 ms
          64 bytes from 8.8.8.8: icmp_seq=2 ttl=116 time=21.790 ms
          64 bytes from 8.8.8.8: icmp_seq=3 ttl=116 time=21.884 ms
          
          round-trip min/avg/max/stddev = 20.973/21.486/22.240/0.308 ms
          [2.4.5-RELEASE][root@pfSense.int]/root: ping -S 10.20.204.90 1.1.1.1
          PING 1.1.1.1 (1.1.1.1) from 10.20.204.90: 56 data bytes
          64 bytes from 1.1.1.1: icmp_seq=0 ttl=58 time=15.984 ms
          64 bytes from 1.1.1.1: icmp_seq=1 ttl=58 time=15.907 ms
          64 bytes from 1.1.1.1: icmp_seq=2 ttl=58 time=15.715 ms
          64 bytes from 1.1.1.1: icmp_seq=3 ttl=58 time=15.637 ms
          
          [2.4.5-RELEASE][root@pfSense.int]/root: ping -S 10.20.204.90 1.0.0.1
          PING 1.0.0.1 (1.0.0.1) from 10.20.204.90: 56 data bytes
          64 bytes from 1.0.0.1: icmp_seq=0 ttl=58 time=15.852 ms
          64 bytes from 1.0.0.1: icmp_seq=1 ttl=58 time=16.028 ms
          64 bytes from 1.0.0.1: icmp_seq=2 ttl=58 time=16.030 ms
          64 bytes from 1.0.0.1: icmp_seq=3 ttl=58 time=15.974 ms
          
          

          Here's the end of the output from pinging without the source address:

          36 bytes from localhost (127.0.0.1): Redirect Host(New addr: 10.20.204.90)
          Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
           4  5  00 0054 77e2   0 0000  05  01 0000 127.0.0.1  1.1.1.1
          
          36 bytes from localhost (127.0.0.1): Redirect Host(New addr: 10.20.204.90)
          Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
           4  5  00 0054 77e2   0 0000  04  01 0000 127.0.0.1  1.1.1.1
          
          36 bytes from localhost (127.0.0.1): Redirect Host(New addr: 10.20.204.90)
          Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
           4  5  00 0054 77e2   0 0000  03  01 0000 127.0.0.1  1.1.1.1
          
          36 bytes from localhost (127.0.0.1): Redirect Host(New addr: 10.20.204.90)
          Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
           4  5  00 0054 77e2   0 0000  02  01 0000 127.0.0.1  1.1.1.1
          
          36 bytes from localhost (127.0.0.1): Time to live exceeded
          Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
           4  5  00 0054 77e2   0 0000  01  01 0000 127.0.0.1  1.1.1.1
          

          What's going on!?

          1 Reply Last reply Reply Quote 0
          • First post
            Last post
          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.