Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Strange Gateway Issues with 2.7.0 development builds

    Scheduled Pinned Locked Moved CE 2.7.0 Development Snapshots (Retired)
    22 Posts 3 Posters 2.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S
      stephenw10 Netgate Administrator
      last edited by

      I assume the VPN shows as linked though? If you run a pcap can you see any gateway pings actually leaving across it?
      Can you see them leaving on any other interface?
      Are you using DCO?

      Steve

      T 1 Reply Last reply Reply Quote 0
      • T
        townsenk64 @stephenw10
        last edited by

        @stephenw10
        Yes, the VPN(s) appears as linked.
        On the interface that shows offline at boot only ICMP echo requests are recorded No replies or any other activity. The other interface does show activity as expected and No, I'm not using DCO. It appears to be more of a gateway or interface issue than OpenVPN in my opinion.

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          Hmm, that's odd.

          What happens if you just kill the state on VPN2 without restarting dpinger? Does it open a new state and start working?

          This feels like its opening a state incorrectly when the VPN is not fully established. I sort of expected to see the gateway monitoring pings for it leaving across the wrong interface.

          T 1 Reply Last reply Reply Quote 0
          • T
            townsenk64 @stephenw10
            last edited by townsenk64

            @stephenw10 YES it did in fact come up. and without the latency strangeness I was experiencing before. I rebooted the system and verified these results. There was a bit of a delay after killing the state to when it came up (about 40 seconds) but I believe that's to be expected. So how do I translate this to a bug report so it can be fixed? I know the reporting process. Just not sure if I can effectively explain it.

            1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              Hmm, well is sounds like it did create a bad state somehow so what I'd do it get the state data from the bad state and the good state after it comes back up.
              At the CLI run: pfctl -vvvss
              That will dump all the states including the gateway monitoring state. Compare the before and after killing it.

              Steve

              T 2 Replies Last reply Reply Quote 0
              • T
                townsenk64 @stephenw10
                last edited by

                @stephenw10
                Thank you! I've submitted a bug report.

                1 Reply Last reply Reply Quote 1
                • T
                  townsenk64 @stephenw10
                  last edited by

                  @stephenw10 FYI, Discovered a workaround for this issue. Enabling the "Do not add Static Routes" in the Gateway monitoring options in System/Advanced/Miscellaneous allows both gateways to come up properly.

                  1 Reply Last reply Reply Quote 1
                  • M
                    marcosm Netgate
                    last edited by marcosm

                    Relevant routes:

                    Destination        Gateway            Flags     Netif Expire
                    default            70.188.246.1       UGS         em0
                    10.10.0.0/16       10.10.0.1          UGS      ovpnc2
                    10.10.0.1          link#11            UH       ovpnc2
                    10.10.0.3          link#11            UHS         lo0
                    10.11.0.0/16       10.11.0.1          UGS      ovpnc1
                    10.11.0.1          link#10            UH       ovpnc1
                    10.11.0.4          link#10            UHS         lo0
                    141.98.254.71      10.10.0.3          UHS      ovpnc2
                    141.136.106.30     10.11.0.4          UHS      ovpnc1
                    

                    The commands show the following states:

                    all icmp 10.11.0.4:24803 (10.11.0.4:27133) -> 141.136.106.30:24803       0:0
                       age 00:02:41, expires in 00:00:09, 313:313 pkts, 9077:9077 bytes, rule 47
                       id: 86a05e6300000000 creatorid: a7a94773 gateway: 10.11.0.4
                       origif: ovpnc1
                    all icmp 10.0.8.1:56301 (10.10.0.3:27587) -> 141.98.254.71:56301       0:0
                       age 00:02:41, expires in 00:00:09, 313:0 pkts, 9077:0 bytes, rule 45
                       id: 88a05e6300000000 creatorid: a7a94773 gateway: 0.0.0.0
                       origif: ovpnc2
                    

                    ... which are created from these rules:

                    @45 pass out inet all flags S/SA keep state allow-opts label "let out anything IPv4 from firewall host itself" ridentifier 1000009913
                      [ Evaluations: 11668     Packets: 13437     Bytes: 2665431     States: 146   ]
                      [ Inserted: uid 0 pid 9864 State Creations: 765   ]
                      [ Last Active Time: Sun Oct 30 10:51:37 2022 ]
                    @47 pass out route-to (ovpnc1 10.11.0.4) inet from 10.11.0.4 to ! 10.11.0.0/16 flags S/SA keep state allow-opts label "let out anything from firewall host itself" ridentifier 1000010012
                      [ Evaluations: 3401      Packets: 16        Bytes: 464         States: 0     ]
                      [ Inserted: uid 0 pid 9864 State Creations: 0     ]
                      [ Last Active Time: N/A ]
                    

                    The first rule @45 shows that the traffic is passing via a rule without route-to set which could mean the traffic actually went out a different interface even though origif shows ovpnc1 (due to some pf quirks I don't recall). The second rule @47 is concerning since I'd expect that to be the gateway 10.11.0.1 rather than pfSense's interface. See the warning here: https://docs.netgate.com/pfsense/en/latest/vpn/openvpn/configure-server-tunnel.html#ipv4-ipv6-local-network-s

                    Try comparing the states between the broken and working conditions.

                    T 1 Reply Last reply Reply Quote 0
                    • T
                      townsenk64 @marcosm
                      last edited by townsenk64

                      @marcosm Thank you for your detailed analysis but I simply don't understand all of this at the same level as you do. I suppose I will wait and see if a developer eventually has the same issue so it will be recognized. Enabling the "Do not add Static Routes" options keeps things working at least. I often hear that it's a "configuration issue" and not an issue with Pfsense. How is that when the configuration never changed? The Pfsense code is what is being changed and this is completely expected in a developement environmant.

                      1 Reply Last reply Reply Quote 0
                      • M
                        marcosm Netgate
                        last edited by marcosm

                        OK. At least for now, it would be helpful if you could provide the same logs/info while it's working so it can be compared.

                        FWIW this is what I have on my working setup after restarting the OpenVPN service:

                        all icmp 172.27.114.137:38623 -> 172.27.114.129:38623       0:0
                           age 00:51:56, expires in 00:00:10, 6012:6007 pkts, 174348:174203 bytes, rule 143
                           id: 28aa696300000000 creatorid: 4da82510 gateway: 0.0.0.0
                           origif: ovpnc2
                        all icmp 172.17.5.2:38750 -> 172.17.5.1:38750       0:0
                           age 00:51:56, expires in 00:00:10, 6013:6007 pkts, 174377:174203 bytes, rule 143
                           id: 29aa696300000000 creatorid: 4da82510 gateway: 0.0.0.0
                           origif: ovpnc3
                        
                        @143 pass out inet all flags S/SA keep state allow-opts label "let out anything IPv4 from firewall host itself" ridentifier 1000016215
                          [ Evaluations: 59832     Packets: 171879    Bytes: 57085442    States: 266   ]
                          [ Inserted: uid 0 pid 86586 State Creations: 967   ]
                          [ Last Active Time: Mon Nov  7 12:35:47 2022 ]
                        
                        

                        And sometime after reboot:

                        all icmp 172.27.114.137:61325 -> 172.27.114.129:61325       0:0
                           age 01:24:27, expires in 00:00:09, 9817:9811 pkts, 284693:284519 bytes, rule 140
                           id: 0e55696300000000 creatorid: 4da82510 gateway: 0.0.0.0
                           origif: ovpnc2
                        all icmp 172.17.5.2:61695 -> 172.17.5.1:61695       0:0
                           age 01:25:36, expires in 00:00:10, 9949:9940 pkts, 288521:288260 bytes, rule 140
                           id: 0f55696300000000 creatorid: 4da82510 gateway: 0.0.0.0
                           origif: ovpnc3
                        
                        @140 pass out on lo0 inet all flags S/SA keep state label "pass IPv4 loopback" ridentifier 1000016212
                          [ Evaluations: 22311     Packets: 0         Bytes: 0           States: 0     ]
                          [ Inserted: uid 0 pid 80759 State Creations: 0     ]
                          [ Last Active Time: N/A ]
                        

                        I'm not sure why it used @140 this time but it worked regardless. The differences here are I keep the monitoring IP as the tunnel gateway, and bind the service to localhost. I did test with it bound to the WAN interface itself and that was fine as well.

                        T 1 Reply Last reply Reply Quote 0
                        • T
                          townsenk64 @marcosm
                          last edited by

                          @marcosm Here are the outputs of the previously requested commands while my system is in a working state (with the static route disable option applied)

                          pfctl_vvss.txt netstat_rn4.txt pfctl_vvsr.txt

                          Could this be some strangeness with my VPN provider?

                          1 Reply Last reply Reply Quote 0
                          • M
                            marcosm Netgate
                            last edited by

                            The difference there is that the monitoring traffic is going out of the WAN instead of over the tunnel, hence it won't actually be useful in determining the status of the tunnel. I recommend keeping the monitoring IP as the tunnel gateway.

                            T 2 Replies Last reply Reply Quote 0
                            • T
                              townsenk64 @marcosm
                              last edited by townsenk64

                              @marcosm I agree, Using the gateway static route option doesn't effectively monitor the tunnel but I have no choice if a want to continue testing, it doesn't work otherwise. Since this issue currently only appears to affect myself in the entire Pfsense testing community It will probably be dismissed until the next release when more people will be exposed to the problem and hopefully it will be addressed at that point.

                              1 Reply Last reply Reply Quote 0
                              • T
                                townsenk64 @marcosm
                                last edited by

                                @marcosm It appears that this issue has been around to some degree for the better part of 3 years https://www.reddit.com/r/PFSENSE/comments/eznfsa/psa_gateway_group_vpn_interfaces_fail_on_reboot/ It's possible dpinger may be starting before the OpenVPn clients can initialize. That's my guess anyway. It's not worth my time to try and submit a bug report on this so I will just post my workaround.
                                I installed the cron package and created this long sloppy entry to restart dpinger and the open vpn clients when the machine is rebooted. The delays increase my boot time significantly but it appears to at least allow my vpn clients to connect. I still need to determine if it's routing the client pings out the proper interfaces for gateway monitoring.

                                sleep 7 && /usr/local/sbin/pfSsh.php playback svc restart dpinger && sleep 7 && /usr/local/sbin/pfSsh.php playback svc restart openvpn client 1 && sleep 7 && /usr/local/sbin/pfSsh.php playback svc restart openvpn client 2
                                
                                1 Reply Last reply Reply Quote 0
                                • M
                                  marcosm Netgate
                                  last edited by

                                  I'm glad you were able to find a workaround! As mentioned, it looks like the gateway being assigned to the interface is the pfSense interface address itself; this is likely related to the issue, if not the cause, and is the result of incorrect configuration.

                                  T 1 Reply Last reply Reply Quote 0
                                  • T
                                    townsenk64 @marcosm
                                    last edited by

                                    @marcosm Again, Your indicating that there is a configuration problem. The same configuration that has worked for 5 years prior to version 2.70. Just because you look at logs and see packets going out the wrong gateway does not necessarily mean my configuration told them to do so.

                                    1 Reply Last reply Reply Quote 0
                                    • First post
                                      Last post
                                    Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.