Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    pfSense 2.7.2 in Hyper-V freezing with no crash report after reboot

    Virtualization
    6
    46
    2.1k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S
      stephenw10 Netgate Administrator
      last edited by

      Hmm, what are those ssh connections from? Not that it should be a problem.

      T 1 Reply Last reply Reply Quote 0
      • T
        Techniker_ctr @stephenw10
        last edited by

        @stephenw10

        Hello,

        thanks for your reply.

        Those are connections from our monitoring server. No other users conncted to the system before the freeze.

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          Anything shown at the console? Can you setup a serial console and log that output?

          T 1 Reply Last reply Reply Quote 0
          • T
            Techniker_ctr @stephenw10
            last edited by

            @stephenw10

            Yes I checked console connection, but all you see are syslogd messages of user logins sometimes multiple days old.

            The affected systems are unfortunately all live systems, which have to be made functional again as soon as possible, so I don't have much time for longer tests.

            I have not yet been able to cause a freeze on my test systems.

            Normally a freeze occurs every 2 - 4 weeks. However, there is no guarantee of this.

            T 1 Reply Last reply Reply Quote 0
            • T
              Techniker_ctr @Techniker_ctr
              last edited by

              @Techniker_ctr

              We have currently moved the HA proxy from one of the affected systems to another system to see if the freezes are still occurring

              Perhaps this will help to narrow down the potential sources of error

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator @Techniker_ctr
                last edited by

                @Techniker_ctr said in pfSense 2.7.2 freezing with no crash report after reboot:

                The console is also no longer accessible in some cases.

                Hmm, so sometimes the console was still responsive? I assume you were not able to connect out at that point?

                T 1 Reply Last reply Reply Quote 0
                • T
                  Techniker_ctr @stephenw10
                  last edited by

                  @stephenw10

                  It seems so. In all the freezes I've seen, the console was no longer functional, but my colleague said he could access a device via console.

                  At the next freeze I will actively check if more info is available via console

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    Make sure you try using ctl+t at the console. That can often show a stuck process when nothing else is responding.

                    T 1 Reply Last reply Reply Quote 1
                    • T
                      Techniker_ctr @stephenw10
                      last edited by

                      @stephenw10

                      Hey,

                      We had another outage today, unfortunately no further information could be retrieved via the console.

                      The system that failed is an updated system that runs on ufs and does not use a ha proxy.

                      Failure pattern is the same again. Normal load and suddenly no response from the system and total traffic failure. As alsways no crash report.

                      The following are the system loads
                      RAM
                      Memory_2.PNG

                      Proccesses:
                      proccesses_2.PNG

                      Any other Idea how we can track down the cause?

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        Do you see anything in Hyper-V? Usage spikes etc?

                        T 1 Reply Last reply Reply Quote 0
                        • T
                          Techniker_ctr @stephenw10
                          last edited by Techniker_ctr

                          @stephenw10

                          I appreciate your help

                          I have checked our Grafana for the HyperV again and can indeed detect a CPU spike on the system, just before the freeze.
                          VCPU_Usage.PNG

                          And a complete loss of traffic
                          Network_per_secound(MB).PNG

                          1 Reply Last reply Reply Quote 0
                          • stephenw10S
                            stephenw10 Netgate Administrator
                            last edited by

                            Hmm, so like some process uses all the CPU cycles until it's rebooted. Yet nothing is logged.... 🤔

                            T 2 Replies Last reply Reply Quote 0
                            • T
                              Techniker_ctr @stephenw10
                              last edited by

                              @stephenw10

                              Today we had another freeze. Same symptomps, no logs, no console. All we see is a CPU Spike right before the freeze.

                              Any other Ideas how to continue? Are we the only ones with this issue?

                              We are currently running 71 pfSense 2.7.2 and around 10 of them are showing the issue. We tried reinstalls on fresh 2.7.2 images to make sure there is no issue with the updating from 2.6.

                              All our 2.6 and 2.5.2 System run without these issues.

                              1 Reply Last reply Reply Quote 0
                              • T
                                Techniker_ctr @stephenw10
                                last edited by

                                @stephenw10

                                we just had another freeze. This time i have a couple of new errors in the logs before the logs completely stop.

                                Maybe this helps narrowing it down.

                                Nov 22 16:55:35	kernel		Copyright (c) 1992-2023 The FreeBSD Project.
                                Nov 22 16:55:35	kernel		---<<BOOT>>---
                                Nov 22 16:55:35	syslogd		kernel boot file is /boot/kernel/kernel
                                Nov 22 16:52:00	sshd	11196	Accepted publickey for root from xxx.xxx.xxx.xxx port 6014 ssh2: RSA SHA256:xxxxxxxxxxxxxxxxxxxxxxxxxx
                                Nov 22 16:51:56	sshd	8691	Accepted publickey for root from xxx.xxx.xxx.xxx port 13587 ssh2: RSA SHA256:xxxxxxxxxxxxxxxxxxxxxxxxxx
                                Nov 22 16:51:19	pfSctl	2755	could not finish read in a reasonable time. Action of event might not be completed.
                                Nov 22 16:51:16	sshd	53470	Disconnected from user root xxx.xxx.xxx.xxx port 52790
                                Nov 22 16:51:16	sshd	53470	Received disconnect from xxx.xxx.xxx.xxx port 52790:11: disconnected by user
                                Nov 22 16:51:16	sshd	26642	fatal: Timeout before authentication for xxx.xxx.xxx.xxx port 27224
                                Nov 22 16:51:16	rc.gateway_alarm	22904	>>> Gateway alarm: WANGWv6 (Addr:xxx:xxx:xxx::xxx Alarm:0 RTT:50.416ms RTTsd:167.822ms Loss:0%)
                                Nov 22 16:50:46	php-fpm	60747	/rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed IP addresses. Reloading endpoints that may use GW_WAN.
                                Nov 22 16:50:46	check_reload_status	416	Reloading filter
                                Nov 22 16:50:46	check_reload_status	416	Restarting OpenVPN tunnels/interfaces
                                Nov 22 16:50:46	check_reload_status	416	Restarting IPsec tunnels
                                Nov 22 16:50:46	check_reload_status	416	updating dyndns GW_WAN
                                Nov 22 16:50:46	check_reload_status	416	updating dyndns WANGWv6
                                Nov 22 16:50:46	sshd	50953	Disconnected from user root xxx.xxx.xxx.xxx port 60047
                                Nov 22 16:50:46	sshd	50953	Received disconnect from xxx.xxx.xxx.xxx port 60047:11: disconnected by user
                                Nov 22 16:50:46	rc.gateway_alarm	10435	>>> Gateway alarm: GW_WAN (Addr:xxx.xxx.xxx.xxx Alarm:0 RTT:52.696ms RTTsd:60.464ms Loss:0%)
                                Nov 22 16:50:38	sshd	17812	fatal: Timeout before authentication for xxx.xxx.xxx.xxx port 33797
                                Nov 22 16:50:33	sshd	18507	fatal: Timeout before authentication for xxx.xxx.xxx.xxx port 39653
                                Nov 22 16:50:28	sshd	18455	fatal: Timeout before authentication for xxx.xxx.xxx.xxx port 21224
                                Nov 22 16:50:28	sshd	17705	fatal: Timeout before authentication for xxx.xxx.xxx.xxx port 1337
                                Nov 22 16:50:23	sshd	25669	Accepted publickey for root from xxx.xxx.xxx.xxx port 11618 ssh2: RSA SHA256:xxxxxxxxxxxxxxxxxxxxxxxxxx
                                Nov 22 16:50:01	pfSctl	17188	could not finish read in a reasonable time. Action of event might not be completed.
                                Nov 22 16:49:58	sshd	99832	Disconnected from user root xxx.xxx.xxx.xxx port 60268
                                Nov 22 16:49:58	sshd	99832	Received disconnect from xxx.xxx.xxx.xxx port 60268:11: disconnected by user
                                Nov 22 16:49:58	sshd	18141	Accepted publickey for root from xxx.xxx.xxx.xxx port 15849 ssh2: RSA SHA256:xxxxxxxxxxxxxxxxxxxxxxxxxx
                                Nov 22 16:49:58	sshd	1365	Disconnected from user root xxx.xxx.xxx.xxx port 53098
                                Nov 22 16:49:58	sshd	1365	Received disconnect from xxx.xxx.xxx.xxx port 53098:11: disconnected by user
                                Nov 22 16:49:58	sshd	53096	Disconnected from user root xxx.xxx.xxx.xxx port 65117
                                Nov 22 16:49:58	sshd	53096	Received disconnect from xxx.xxx.xxx.xxx port 65117:11: disconnected by user
                                Nov 22 16:49:58	sshd	97774	fatal: Timeout before authentication for xxx.xxx.xxx.xxx port 9367
                                Nov 22 16:49:58	rc.gateway_alarm	69423	>>> Gateway alarm: WANGWv6 (Addr:xxx:xxx:xxx::xxx Alarm:1 RTT:26.765ms RTTsd:25.516ms Loss:21%)
                                Nov 22 16:48:51	check_reload_status	416	Reloading filter
                                Nov 22 16:48:51	check_reload_status	416	Restarting OpenVPN tunnels/interfaces
                                Nov 22 16:48:51	check_reload_status	416	Restarting IPsec tunnels
                                Nov 22 16:48:51	check_reload_status	416	updating dyndns GW_WAN
                                Nov 22 16:48:51	sshd	50248	Disconnected from user root xxx.xxx.xxx.xxx port 25550
                                Nov 22 16:48:51	sshd	50248	Received disconnect from xxx.xxx.xxx.xxx port 25550:11: disconnected by user
                                Nov 22 16:48:51	sshd	51781	Disconnected from user root xxx.xxx.xxx.xxx port 45454
                                Nov 22 16:48:51	rc.gateway_alarm	59700	>>> Gateway alarm: GW_WAN (Addr:xxx.xxx.xxx.xxx Alarm:1 RTT:509.029ms RTTsd:1028.814ms Loss:0%)
                                Nov 22 16:48:51	sshd	51781	Received disconnect from xxx.xxx.xxx.xxx port 45454:11: disconnected by user
                                Nov 22 16:48:41	sshd	1365	Accepted publickey for root from xxx.xxx.xxx.xxx port 53098 ssh2: RSA SHA256:xxxxxxxxxxxxxxxxxxxxxxxxxx
                                Nov 22 16:48:35	sshd	99832	Accepted publickey for root from xxx.xxx.xxx.xxx port 60268 ssh2: RSA SHA256:xxxxxxxxxxxxxxxxxxxxxxxxxx
                                Nov 22 16:48:29	sshd	97374	Accepted publickey for root from xxx.xxx.xxx.xxx port 55569 ssh2: RSA SHA256:xxxxxxxxxxxxxxxxxxxxxxxxxx
                                Nov 22 16:48:22	sshd	53683	fatal: Timeout before authentication for xxx.xxx.xxx.xxx port 4106
                                Nov 22 16:48:22	sshd	52565	fatal: Timeout before authentication for xxx.xxx.xxx.xxx port 34558
                                Nov 22 16:47:53	sshd	78140	Disconnected from user root xxx.xxx.xxx.xxx port 49005
                                Nov 22 16:47:53	sshd	78140	Received disconnect from xxx.xxx.xxx.xxx port 49005:11: disconnected by user
                                
                                1 Reply Last reply Reply Quote 0
                                • stephenw10S
                                  stephenw10 Netgate Administrator
                                  last edited by

                                  Hmm, unfortunately nothing there really looks like an issue. Certainly not something that would cause it to stop responding entirely.

                                  We could try running dtrace against it to see what's using the cycles but I think we'd need some way to trigger it before it stopped responding.

                                  If you create a test instance in the same hypervisor does that also stop responding?

                                  1 Reply Last reply Reply Quote 0
                                  • stephenw10S stephenw10 moved this topic from General pfSense Questions on
                                  • A
                                    arion_p
                                    last edited by arion_p

                                    I've been having the same issue when I first upgraded to 2.7.0. (at least I think it is the same issue). After 3-4 days pfSense would freeze completely, no GUI, no SSH and no console.
                                    As I only have one installation of pfSense and use it as VPN (openVPN+IPSec) server to access work network from home, when it would freeze I also had no access to the VM until the next day. In the few cases that I was there during the freeze i noticed the following:

                                    1. GUI would go first. It would slow down to a crawl and soon stopped responding.
                                    2. By the time GUI was unresponsive, SSH would still work but was excruciatingly slow. Then after a while SSH would no longer connect. However OpenVPN / IPSec and routing in general seemed to be still working (probably with limited bandwidth).
                                    3. Not sure of the timeframe but once SSH failed to connect, console was also dead. Hitting Ctrl-T at the console did nothing, it's completely frozen.

                                    It is curious that IP connectivity was the last to go, i.e. there were still some packets passing through the router while the console was frozen. I know because, while being there, sometimes access to the internet would slow down and I would check the console and it was already frozen. After a few minutes IP connectivity would fail as well.

                                    I know all this doesn't help much, but the way I see it there must be something wrong at the kernel level (or kernel driver). No matter what a user level process does it cannot bring the entire system down (after all that's what the whole point of running the kernel in ring-0 protection level - isolating processes from one another and protecting the entire system from misbehaving processes).

                                    P.S.: I've since downgraded to 2.6 and have no issues at all. Still I would love to figure this out so I can upgrade to 2.7 again.

                                    1 Reply Last reply Reply Quote 0
                                    • stephenw10S
                                      stephenw10 Netgate Administrator
                                      last edited by

                                      Is this an old VM? Was it created as an old hyper-v version?

                                      Anything logged at all?

                                      A 1 Reply Last reply Reply Quote 0
                                      • A
                                        arion_p @stephenw10
                                        last edited by

                                        @stephenw10 it was quite few months ago, so I'm not really sure.
                                        I know it was on hyperv 2016 (the free version). I think at some point I had a crash that totally messed up the VHD so I just reinstalled 2.7.0 from scratch on new VM and imported last saved configuration. Didn't make any difference, same issue in a few days. No logs whatsoever. I mean nothing out of the ordinary, just started to slow down, and finally froze.

                                        1 Reply Last reply Reply Quote 0
                                        • stephenw10S
                                          stephenw10 Netgate Administrator
                                          last edited by

                                          I don't have anything hyper-V here but someone may have a suggestion.

                                          BismarckB 1 Reply Last reply Reply Quote 0
                                          • BismarckB
                                            Bismarck @stephenw10
                                            last edited by

                                            @stephenw10

                                            Monitor top -aSHm cpu via SSH, and you'll see [kernel{hveventN}] consuming all your CPU until pfSense crashes or freezes. While it's difficult to reproduce, certain factors increase its occurrence, such as using large filter lists, extensive MAC filters for CP, bogonv6, scheduled rule times (cron 00/15/30/45), and actions triggering filter reloads.

                                            This issue also occurs on bare metal, but it's so rare that most users don’t notice it; I experienced it twice since January. However, after transitioning from bare metal to Hyper-V two weeks ago, it now happens 2-3 times a week.

                                            I suspect that under specific conditions, the filter reload causes a process to lose its thread connections, resulting in an event storm.

                                            No such problems with version 2.6.0 on the same HW or Hypervisor.

                                            I'm sure dtrace or maybe procstat can shed some light here but that's beyond my capabilities.

                                            BismarckB 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.