Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    All NAT routing stops until reboot

    Scheduled Pinned Locked Moved General pfSense Questions
    4 Posts 2 Posters 756 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S
      sotirone
      last edited by

      Hello guys, I have been having serious problems with all NAT routing since I added a second VPN gateway.

      I am running 2.4.4 (with ZFS) on an HP T620 Plus thin client with 4GB of RAM and an HP Intel PRO 1000 PT four port card. I have a single WAN connection. I also run pfblocker and ntopng.

      I have manual NAT settings and I am running two VPN gateways (two different VPN providers) plus two VPN site to site tunnels to other locations.

      Since I added the second VPN gateway I immediately observed a problem. I have multiple VLANs and each VLAN has either VPN1 or VPN2 gateway set in its Firewall rules. Let's say VLAN1 uses VPN1 and VLAN2 uses VPN2. Occasionally and with no reason (apart from maybe VPN1 tunnel reconnecting) VLAN1 clients would start using VPN2 to connect to the internet, without any failover or loadbalancing set. I removed the VPN2 NAT entries for VLAN1 and vice versa and the problem resolved itself.

      I now sometimes get absolutely no NAT on any interface for no reason. The firewall itself can ping e.g. 8.8.8.8 from all of its gateways but none of the clients can ping or access the internet. Restarting the tunnels, resetting all firewall states and rebooting the clients does not do anything. The only remedy is a total firewall reboot. I observed this running top in ssh after a brief duration (maybe 30 secs) of 100% CPU usage by php but that might be irrelevant.

      Logs do not seem to provide anything useful. Any ideas?

      H 1 Reply Last reply Reply Quote 0
      • H
        HansSolo @sotirone
        last edited by HansSolo

        @sotirone said in All NAT routing stops until reboot:

        Hello guys, I have been having serious problems with all NAT routing since I added a second VPN gateway.

        I am running 2.4.4 (with ZFS) on an HP T620 Plus thin client with 4GB of RAM and an HP Intel PRO 1000 PT four port card. I have a single WAN connection. I also run pfblocker and ntopng.

        I have manual NAT settings and I am running two VPN gateways (two different VPN providers) plus two VPN site to site tunnels to other locations.

        Since I added the second VPN gateway I immediately observed a problem. I have multiple VLANs and each VLAN has either VPN1 or VPN2 gateway set in its Firewall rules. Let's say VLAN1 uses VPN1 and VLAN2 uses VPN2. Occasionally and with no reason (apart from maybe VPN1 tunnel reconnecting) VLAN1 clients would start using VPN2 to connect to the internet, without any failover or loadbalancing set. I removed the VPN2 NAT entries for VLAN1 and vice versa and the problem resolved itself.

        I now sometimes get absolutely no NAT on any interface for no reason. The firewall itself can ping e.g. 8.8.8.8 from all of its gateways but none of the clients can ping or access the internet. Restarting the tunnels, resetting all firewall states and rebooting the clients does not do anything. The only remedy is a total firewall reboot. I observed this running top in ssh after a brief duration (maybe 30 secs) of 100% CPU usage by php but that might be irrelevant.

        Logs do not seem to provide anything useful. Any ideas?

        Do you have Log Settings set so that DEFAULT Deny AND Allow are logged?
        If yes, I suspect you are going to have to post your configurations/rules to get help with this/

        S 1 Reply Last reply Reply Quote 0
        • S
          sotirone @HansSolo
          last edited by

          @HansSolo No I only have Block rules set to be logged

          1 Reply Last reply Reply Quote 0
          • S
            sotirone
            last edited by sotirone

            Just happened again. Devices using the native WAN interface as a Gateway stay unaffected.

            Logs (System --> General) show ntopng crashing:

            May 7 17:38:10 kernel pid 15404 (ntopng), uid 0: exited on signal 11 (core dumped)
            May 7 17:38:10 kernel igb2: promiscuous mode disabled
            May 7 17:38:10 kernel igb3: promiscuous mode disabled
            May 7 17:38:32 ntopng [HTTPserver.cpp:924] ERROR: [HTTP] set_ports_option: cannot bind to 3000s: Address already in use
            May 7 17:38:32 ntopng [mongoose.c:4584] ERROR: set_ports_option: cannot bind to 3000s: No error: 0
            May 7 17:38:32 ntopng [HTTPserver.cpp:1104] ERROR: Unable to start HTTP server (IPv4) on ports 3000s
            May 7 17:38:32 ntopng [HTTPserver.cpp:1110] ERROR: Either port in use or another ntopng instance is running (using the same port)

            Logs (System --> Gateways)

            May 7 17:37:55 dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr REMOVED bind_addr REMOVED identifier "WAN "
            May 7 17:37:55 dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr REMOVED bind_addr REMOVED identifier "VPN1 "
            May 7 17:37:55 dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr REMOVED bind_addr REMOVED identifier "SITETOSITE1 "
            May 7 17:37:55 dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr REMOVED bind_addr REMOVED identifier "SITETOSITE2 "
            May 7 17:37:55 dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr REMOVED bind_addr REMOVED identifier "VPN2 "

            Bold Italics edited by me

            Edit: ntopng is the problem. Every time I restart a gateway tunnel, ntopng crashes and NAT stops working.

            Here is what the ntopng logs are filled with:

            [Mutex.cpp:46] WARNING: pthread_mutex_lock() returned 11 [Resource deadlock avoided][errno=0]
            

            RAM had ~1600M free so not running out of RAM. CPU as I said was 100% on one of four cores at the time of this happening.

            I uninstalled ntopng for now as it was unusable.

            Edit 2: Totally not fixed. Seems to happen when I restart VPN2 but not always I think. WAN and VPN1 gateways always register as Down in Status --> Gateways even when they are up. ntopgn not the problem!

            VPN2 has a NAT port forward rule with it's corresponding Firewall rule, will try to disable that and see if anything changes. Will investigate more and report back.

            Edit 3: Seems to be fixed by selecting System --> Advanced --> Misc --> Reset states on Gateway down. I also had to add VPN1 Gateway in LAN Firewall Rules as Gateway as it would still not work with the Gateway set to default. I would like some input from someone if this is correct.

            1 Reply Last reply Reply Quote 0
            • First post
              Last post
            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.