Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    [Resolved] pfSense hangs when WAN is unstable or lost

    Scheduled Pinned Locked Moved General pfSense Questions
    28 Posts 10 Posters 9.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • asv345hA
      asv345h
      last edited by asv345h

      Sometimes when WAN connection is lost or becomes unstable, pfSense will become unresponsive.

      • Can't login to web interface

      • Can't ssh

      • Must attach to serial connections and reboot.

      It's the same each time. The WAN starts acting up then unbound process takes over the cpu and I usually can't log in and must reboot via serial port. After that it all goes back to normal.

      Occasionally I am able to get to the gui or ssh but pfsense is very slow. Last time it happened I was able to get some screen shots.

      Even after the WAN side start working again pfsense stays in this state until I can reboot.

      I can't figure out why this is happing, tried a clean install and restore but eventually the same thing happens.

      Dashboard:
      0_1547029417437_Screen Shot 2019-01-09 at 10.21.11 AM.png

      WAN becomes unstable:
      0_1547028627605_Screen Shot 2018-12-20 at 8.50.11 AM.png

      unbound high cpu
      0_1547028703739_Screen Shot 2018-12-20 at 8.51.12 AM.png

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        Do you see anything logged after rebooting?

        A crash report?

        Steve

        asv345hA 1 Reply Last reply Reply Quote 0
        • asv345hA
          asv345h
          last edited by

          pfSense doesn't crash, just becomes very slow or unresponsive. If I am able to get the gui or ssh to respond and login, it's always unbound which is the cpu hog. Honestly, I'm usually so busy that after rebooting I just go back to work. I've been putting off troubleshooting this for months now. Next time this happens I'll grab the system log and post it.

          Can you think of any reason why unbound would be chewing cpu cycles like this after losing WAN access?

          GrimsonG 1 Reply Last reply Reply Quote 0
          • RicoR
            Rico LAYER 8 Rebel Alliance
            last edited by Rico

            You should check with your ISP why your WAN is unstable and/or goes down again and again in the first place. 😌
            Some WAN without 99.998% Uptime would make me crazy. 😬

            -Rico

            1 Reply Last reply Reply Quote 0
            • GrimsonG
              Grimson Banned @asv345h
              last edited by

              @asv345h said in pfSense hangs when WAN is unstable or lost:

              Can you think of any reason why unbound would be chewing cpu cycles like this after losing WAN access?

              Unbound get's restarted when the state of the WAN interface changes. So if your WAN interface is flapping it will cause constant restarts, and if you use DNSBL with a lot of lists the unbound restarts cost additional CPU cycles.

              asv345hA 1 Reply Last reply Reply Quote 0
              • asv345hA
                asv345h @Grimson
                last edited by

                @grimson said in pfSense hangs when WAN is unstable or lost:

                Unbound get's restarted when the state of the WAN interface changes. So if your WAN interface is flapping it will cause constant restarts

                Did not know that. Does gateway monitoring (dpinger) have any impact on the way unbound responds to WAN instability?

                and if you use DNSBL with a lot of lists the unbound restarts cost additional CPU cycles.

                I do use DNSBL and have 15 lists plus easylist - too many?

                1 Reply Last reply Reply Quote 0
                • Raffi_R
                  Raffi_
                  last edited by

                  I also noticed a while back that my web gui felt very slow on the LAN when having issues with my modem on the WAN. I think in my case I had very bad latency on the WAN due to the modem. Restarting my modem solved my latency issues and the slow web GUI issue. I was a little puzzled about why my web gui was slow on the LAN, when my real issue was on the WAN. I never gave it much thought because the service provided by the ISP is usually solid and therefore not an issue.

                  1 Reply Last reply Reply Quote 0
                  • asv345hA
                    asv345h @stephenw10
                    last edited by

                    @stephenw10

                    system.log

                    Same thing happened again this morning with the exception that unbound did not look to be a cpu hog. GUI was almost totally unresponsive but was able to ssh into system just fine.

                    • Restarting webConfigurator had no effect.
                    • This time I rebooted my Virgin modem and waited for it to come back online but pfSense remained in the same 'hung' state.
                    • I eventually rebooted pfSense and, as usual, all was good again.

                    I don't spend much time looking at the log so not sure what is normal. Can you see anything? Rebooted at 8:45:41.

                    fwiw
                    0_1547113492259_Screen Shot 2019-01-10 at 8.36.04 AM.png
                    0_1547113505412_Screen Shot 2019-01-10 at 8.37.44 AM.png

                    1 Reply Last reply Reply Quote 0
                    • Raffi_R
                      Raffi_
                      last edited by

                      Found many of these errors in your log. I'm not sure if any of this applies to you.
                      https://forum.netgate.com/topic/110858/unbound-error-bind-address-already-in-use-fatal-error-could-not-open-ports

                      asv345hA 1 Reply Last reply Reply Quote 0
                      • asv345hA
                        asv345h @Raffi_
                        last edited by

                        @Raffi_

                        From that thread

                        Hint: Do NOT ever add Unbound to Service Watchdog. Especially not if using pfBlockerNG.

                        I did have unbound monitored by Service Watchdog and just removed it. I also use pfBlockerNG.

                        Raffi_R 1 Reply Last reply Reply Quote 0
                        • Raffi_R
                          Raffi_ @asv345h
                          last edited by

                          @asv345h Hope it helps. Let us know how it goes.

                          1 Reply Last reply Reply Quote 0
                          • stephenw10S
                            stephenw10 Netgate Administrator
                            last edited by

                            Are you using block rule auto generated by pfBlocker as well as DNSBL?

                            pfctl at 98% implies something very odd going on. I would expect the system log to be showing a load of entries there.

                            If you can I would disable pfBlocker as a test. I imagine the problems will all go away...

                            Steve

                            asv345hA 1 Reply Last reply Reply Quote 0
                            • asv345hA
                              asv345h @stephenw10
                              last edited by

                              @stephenw10
                              Will disable pfBlocker and see. Just happened again as I was reading you response!

                              I am using DNSBL, but not using auto-gen block rules. I just use the aliases directly like this:

                              0_1547143832628_Screen Shot 2019-01-10 at 6.10.10 PM.png

                              1 Reply Last reply Reply Quote 0
                              • GertjanG
                                Gertjan
                                last edited by

                                Question : are you pushing DNS (unbound requests) over the WAN or a VPN tunnel ?

                                No "help me" PM's please. Use the forum, the community will thank you.
                                Edit : and where are the logs ??

                                asv345hA 1 Reply Last reply Reply Quote 0
                                • asv345hA
                                  asv345h @Gertjan
                                  last edited by

                                  @gertjan
                                  WAN using DNS over TLS to 1.1.1.1

                                  1 Reply Last reply Reply Quote 0
                                  • GertjanG
                                    Gertjan
                                    last edited by

                                    I advise you to use a remote syslogger : Status => System Logs => Settings and set a "Remote log servers".
                                    This will help you to see what happens real time, without the need to login by ssh or GUI.

                                    Checkout what happens with unbound when WAN goes up and down - knowing that you use pfBlocker or DNSBL. You will be surprised.

                                    Be aware of the fact that when an interface goes down, like a WAN, all related connections like VPN's and attached services (unbound, pfBlocker DNSBL, etc) will restart. This represents a boat load of code. It will take seconds, even on a fast system, to stabilize.

                                    I propose that you use a WAN - no VPN, no pfBlocker or DNSBL and see what happens.
                                    Then add one functionality after another and analyze the timing.

                                    No "help me" PM's please. Use the forum, the community will thank you.
                                    Edit : and where are the logs ??

                                    1 Reply Last reply Reply Quote 0
                                    • asv345hA
                                      asv345h
                                      last edited by

                                      So my WAN just suffered the same kind of transient disruptions I've been seeing. However, this time, no pfSense meltdown. The GUI was responsive and both unbound and pfctl processes are both well behaved. After the WAN went back to normal I didn't have to reboot pfSense to get it back.

                                      The difference seems to be that, following the advice on this thread, I disabled pfBlockerNG (both ip and DNSBL lists) and stopped monitoring unbound with Service Watchdog yesterday. I'll enable each one it turn and see what happens next time.

                                      @Gertjan
                                      Yesterday I setup a Splunk server and am sending all pfSense logs to it. Any advice on remote logging for pfSense? Splunk is working just fine but the logs are not as well formatted so harder to scan.

                                      1 Reply Last reply Reply Quote 0
                                      • S
                                        skullnobrains
                                        last edited by

                                        both ssh and the gui will perform a dns query to resolve the ip address the connection comes from.

                                        the timeout of the dns query is long, hence the sluggishness

                                        i kinda recollect that using the AddressFamily=inet flag allows connecting faster through ssh. this seems to works even when passing the flag to the client, though i'm unsure why.

                                        setting a reasonable timeout such as 5 seconds in resolv.conf should help. i kinda remember the default is 30

                                        GertjanG 1 Reply Last reply Reply Quote 0
                                        • GertjanG
                                          Gertjan @skullnobrains
                                          last edited by

                                          @skullnobrains said in pfSense hangs when WAN is unstable or lost:

                                          both ssh and the gui will perform a dns query to resolve the ip address the connection comes from.

                                          Dono if OpenSSL or PHP (GUI) are doing so, but true.
                                          On the other hand : a device from LAN connecting to pfSense is probably DHCP registered. So, normally, unbound will know right away the IP of it's own interface - and the IP of the connecting device.
                                          For example, the /etc/host file will contain this info.

                                          I guess, when SSL access is slow (from LAN), it is because the entire system hovers around 100 % occupation.

                                          No "help me" PM's please. Use the forum, the community will thank you.
                                          Edit : and where are the logs ??

                                          1 Reply Last reply Reply Quote 0
                                          • asv345hA
                                            asv345h
                                            last edited by

                                            I use DHCP static mappings for all my devices. My management workstation, the one I use to connect to pfSense, as well as all the others has an entry in /etc/hosts

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.