Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Multiple issues, firewall freezes and whole network goes down.

    Scheduled Pinned Locked Moved General pfSense Questions
    75 Posts 4 Posters 4.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • LaxarusL
      Laxarus @Laxarus
      last edited by

      A short time later, this time I got an another crash with crash report.

      Dump header from device: /dev/nda0p2
        Architecture: amd64
        Architecture Version: 4
        Dump Length: 617472
        Blocksize: 512
        Compression: none
        Dumptime: 2024-09-08 20:08:22 +0300
        Hostname: FIREWALL.mydomain.org
        Magic: FreeBSD Text Dump
        Version String: FreeBSD 15.0-CURRENT #0 plus-RELENG_24_03-n256311-e71f834dd81: Fri Apr 19 00:28:14 UTC 2024
          root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-24_03-main/obj/amd64/Y4MAEJ2R/var/j
        Panic String: page fault
        Dump Parity: 44932402
        Bounds: 0
        Dump Status: good
      
      Fatal trap 12: page fault while in kernel mode
      cpuid = 6; apic id = 08
      fault virtual address	= 0x1c
      fault code		= supervisor read data, page not present
      instruction pointer	= 0x20:0xffffffff80f246e2
      stack pointer	        = 0x28:0xfffffe00e1f3bae0
      frame pointer	        = 0x28:0xfffffe00e1f3bb70
      code segment		= base 0x0, limit 0xfffff, type 0x1b
      			= DPL 0, pres 1, long 1, def32 0, gran 1
      processor eflags	= interrupt enabled, resume, IOPL = 0
      current process		= 2 (clock (6))
      rdi: 0000000000000000 rsi: 0000000000000000 rdx: fffffe00e1f3bcf8
      rcx: 0000000000000000  r8: 0000000000000528  r9: 0000000000000000
      rax: 0000000000000000 rbx: 0000000000000000 rbp: fffffe00e1f3bb70
      r10: 000000000000300f r11: 0000000000015069 r12: 0000000000000000
      r13: 0000000000000528 r14: fffff8027dfb5000 r15: 0000000000000034
      trap number		= 12
      panic: page fault
      cpuid = 6
      time = 1725815302
      KDB: enter: panic
                                                                              panic.txt                                                                                           0600    0       0       12          14667355006   7145                                                                                                       ustar   root                            wheel                                                                                                                                                                                                                  page fault                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      version.txt                                                                                         0600    0       0       457         14667355006   7635                                                                                                       ustar   root                            wheel                                                                                                                                                                                                                  FreeBSD 15.0-CURRENT #0 plus-RELENG_24_03-n256311-e71f834dd81: Fri Apr 19 00:28:14 UTC 2024
          root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-24_03-main/obj/amd64/Y4MAEJ2R/var/jenkins/workspace/pfSense-Plus-snapshots-24_03-main/sources/FreeBSD-src-plus-RELENG_24_03/amd64.amd64/sys/pfSense
      

      full crash dump here
      I see a lot of "Disabled multicast promiscuous mode" outputs here.
      textdump.tar.0

      right now, my ISP is working on the cables in the neighborhood and I am having frequent WAN downtime but for some reason, this is crashing the firewall.

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        Ok that crash is this: https://redmine.pfsense.org/issues/15684

        Try setting the workaround suggested there: https://redmine.pfsense.org/issues/15684#note-12

        The logs show all gateways going down including what looks like an internal gateway?

        Sep  8 17:37:07 FIREWALL rc.gateway_alarm[60969]: >>> Gateway alarm: MNG_DHCP (Addr:192.168.2.1 Alarm:1 RTT:24.549ms RTTsd:124.820ms Loss:21%)
        

        Are all those gateways using the same NIC(s)?

        LaxarusL 1 Reply Last reply Reply Quote 0
        • LaxarusL
          Laxarus @stephenw10
          last edited by Laxarus

          @stephenw10 I have set the workaround though I had to set it manualy from system tunables sine it was not there by default.

          There are 5 gateways with corresponding Interfaces
          b06ca774-6073-4ed3-a3a7-5e212873d5e9-image.png
          and the interfaces below
          729498a0-92df-4618-b5b7-694481122cf4-image.png

          1 Reply Last reply Reply Quote 0
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            Ok so 4 of those gateways are all using igb1 but the MNG gtaeway uses igb0. So you would not expect to see all 5 throwing packet loss unless they go through the same switch maybe?

            LaxarusL 1 Reply Last reply Reply Quote 0
            • LaxarusL
              Laxarus @stephenw10
              last edited by Laxarus

              @stephenw10 yep, igb1 goes to modem port and igb0 goes to different switch. MNG is a management network with a separate switch with dhcp server not connected to the internet. It has all the IPMI and critical management connections. The purpose is to provide an environment where even if the pfsense crashes, management interface should stay up to reach pfsense UI (if possible) and IPMI

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                Hmm, what hardware is this?

                Not much can cause two NICs to stop passing traffic like that. Especially igb NICs.

                LaxarusL 2 Replies Last reply Reply Quote 0
                • LaxarusL
                  Laxarus @stephenw10
                  last edited by Laxarus

                  @stephenw10 it is Supermicro SuperServer 5019D-4C-FN8TP with 32GB ECC RAM and with addon card AOC-S25G-I2S-O PCIe SFPP28 25gbps

                  S 1 Reply Last reply Reply Quote 0
                  • LaxarusL
                    Laxarus @stephenw10
                    last edited by

                    @stephenw10 to make it clear, the firewall just freezes itself, even directly connecting to the console, no inputs are registered by the firewall through console. Until reboot, it is just at stuck at something.

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      Hmm, so all 4 of those ports are on-board.

                      Does it not respond even to ctl+t?

                      LaxarusL 1 Reply Last reply Reply Quote 0
                      • LaxarusL
                        Laxarus @stephenw10
                        last edited by

                        @stephenw10 no, it does not respond to anything. I did not try ctrl + t but ctrl + c, ctrl + alt + del, enter, space, backspace, nothing works

                        1 Reply Last reply Reply Quote 0
                        • S
                          slu @Laxarus
                          last edited by

                          @Laxarus
                          we have the same hardware but not the 25 gbps card.

                          Please check over the IPMI interface for some PCIe, ... errors, we had a faulty broadcom card some months ago.

                          pfSense Gold subscription

                          1 Reply Last reply Reply Quote 0
                          • stephenw10S
                            stephenw10 Netgate Administrator
                            last edited by

                            Sometimes ctl+t is the only thing that will produce a response.

                            LaxarusL 1 Reply Last reply Reply Quote 0
                            • LaxarusL
                              Laxarus @stephenw10
                              last edited by

                              @stephenw10 will try ctrl + c, if the same thing happens again (hoping not), I will try to troubleshoot with WAN when I go back (right now I only have remote access).
                              There is only one constant in all the situations, when WAN goes down, there is a big chance of firewall crashing or freezing.
                              And the two bugs that you have stated is contributing to this somehow when WAN goes down. Hopefully, the next release of pfsense will take care of these bugs.
                              Thanks for bearing with me until now and I really appreciate it.

                              @slu thanks for the suggestion. I have checked the maintenance and health logs on the IPMI but there is nothing noteworthy there. It all seems normal.

                              1 Reply Last reply Reply Quote 0
                              • LaxarusL
                                Laxarus
                                last edited by

                                So, I had the same issue again this morning and I still have no idea why this is happening. @stephenw10 I have tried ctrl + t and no response to that neither.

                                Any advise to debugging this is very much appreciated.

                                Full log here, the freeze happened around Sep 16 07:00
                                system.log.0

                                1 Reply Last reply Reply Quote 0
                                • stephenw10S
                                  stephenw10 Netgate Administrator
                                  last edited by

                                  You need to tune the OVPN_S2S_VPNV4 gateway. It's throwing alarms repeatedly. It's clearly a pretty bad route because the alarms are legitimate for a default settings . However reloading the firewall each tie it fires is not helping anything. You might just disable the monitoring or monitoring action on that gateway.

                                  But that shouldn't cause it to stop responding. The actual failure appears to happen here:

                                  Sep 16 07:18:45 FIREWALL rc.gateway_alarm[63113]: >>> Gateway alarm: VPNAC_WG (Addr:10.11.0.1 Alarm:1 RTT:91.226ms RTTsd:79.944ms Loss:21%)
                                  Sep 16 07:18:45 FIREWALL check_reload_status[635]: updating dyndns VPNAC_WG
                                  Sep 16 07:18:45 FIREWALL check_reload_status[635]: Restarting IPsec tunnels
                                  Sep 16 07:18:45 FIREWALL check_reload_status[635]: Restarting OpenVPN tunnels/interfaces
                                  Sep 16 07:18:45 FIREWALL check_reload_status[635]: Reloading filter
                                  Sep 16 07:18:45 FIREWALL rc.gateway_alarm[65772]: >>> Gateway alarm: WAN_PPPOE (Addr:10.98.238.224 Alarm:1 RTT:5.947ms RTTsd:11.776ms Loss:21%)
                                  Sep 16 07:18:45 FIREWALL check_reload_status[635]: updating dyndns WAN_PPPOE
                                  Sep 16 07:18:45 FIREWALL check_reload_status[635]: Restarting IPsec tunnels
                                  Sep 16 07:18:45 FIREWALL check_reload_status[635]: Restarting OpenVPN tunnels/interfaces
                                  Sep 16 07:18:45 FIREWALL check_reload_status[635]: Reloading filter
                                  Sep 16 07:18:46 FIREWALL php-fpm[20435]: /rc.openvpn: The command '/sbin/route -n6 get 'default' 2>/dev/null | /usr/bin/egrep 'flags: <.*PROTO.*>'' returned exit code '1', the output was '' 
                                  Sep 16 07:18:46 FIREWALL php-fpm[20435]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed IP addresses. Reloading endpoints that may use VPNAC_WG.
                                  Sep 16 07:18:46 FIREWALL php-fpm[20435]: /rc.openvpn: The command '/sbin/route -n6 get 'default' 2>/dev/null | /usr/bin/egrep 'flags: <.*PROTO.*>'' returned exit code '1', the output was '' 
                                  Sep 16 07:18:46 FIREWALL php-fpm[20435]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed IP addresses. Reloading endpoints that may use WAN_PPPOE.
                                  Sep 16 07:18:46 FIREWALL php-fpm[51827]: /rc.dyndns.update: phpDynDNS (@.mydomain.org): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
                                  Sep 16 07:18:50 FIREWALL ppp[53627]: [wan_link0] LCP: no reply to 1 echo request(s)
                                  Sep 16 07:19:00 FIREWALL ppp[53627]: [wan_link0] LCP: no reply to 2 echo request(s)
                                  Sep 16 07:19:05 FIREWALL rc.gateway_alarm[23895]: >>> Gateway alarm: MNG_DHCP (Addr:192.168.2.1 Alarm:1 RTT:4.611ms RTTsd:15.937ms Loss:22%)
                                  

                                  Where all gateways start to indicate failures and the pppoe goes down. Effectively no traffic is passing from that point.

                                  But there are no lower level errors, the NICs do not show loss of link for example.

                                  The firewall is still logging and running scripts it doesn't appear to be down. At least until the end of that log.

                                  When did you try to connect? How did you connect?

                                  LaxarusL 1 Reply Last reply Reply Quote 0
                                  • LaxarusL
                                    Laxarus @stephenw10
                                    last edited by

                                    @stephenw10 I have further tweaked the ovpn gateway.

                                    Sep 16 07:18:46 FIREWALL php-fpm[20435]: /rc.openvpn: The command '/sbin/route -n6 get 'default' 2>/dev/null | /usr/bin/egrep 'flags: <.*PROTO.*>'' returned exit code '1', the output was '' 
                                    Sep 16 07:18:46 FIREWALL php-fpm[20435]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed IP addresses. Reloading endpoints that may use VPNAC_WG.
                                    Sep 16 07:18:46 FIREWALL php-fpm[20435]: /rc.openvpn: The command '/sbin/route -n6 get 'default' 2>/dev/null | /usr/bin/egrep 'flags: <.*PROTO.*>'' returned exit code '1', the output was '' 
                                    Sep 16 07:18:46 FIREWALL php-fpm[20435]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed IP addresses. Reloading endpoints that may use WAN_PPPOE.
                                    

                                    here, why is firewall trying to get routing for ipv6, the tunnel is ipv4 only.
                                    77982d0c-6409-4cad-9156-985e1a2d294c-image.png
                                    I had to power reset just to get everything back around 10:46.
                                    64a4a208-8a92-48ff-8f9e-736090de796f-system3.zip

                                    1 Reply Last reply Reply Quote 0
                                    • stephenw10S
                                      stephenw10 Netgate Administrator
                                      last edited by

                                      Just to be clear though you are unable to see any response from the firewall even on the local physical firewall?

                                      It's extremely unusual to see it still logging and running scripts at that time but unresponsive at the console. Like I don't think I've ever seen that.

                                      LaxarusL 1 Reply Last reply Reply Quote 0
                                      • LaxarusL
                                        Laxarus @stephenw10
                                        last edited by

                                        @stephenw10 said in Multiple issues, firewall freezes and whole network goes down.:

                                        Like I don't think I've ever seen that

                                        the attached capture for unresponsive console
                                        9979fb96-2531-417e-a50a-dd6d321f8e90-pfsense freeze.zip

                                        1 Reply Last reply Reply Quote 0
                                        • stephenw10S
                                          stephenw10 Netgate Administrator
                                          last edited by

                                          Ok that looks like the IPMI console? And I assume that usually works as expected?

                                          Is it configured for video as the primary console?

                                          Are you able to test using a physical console?

                                          LaxarusL 1 Reply Last reply Reply Quote 0
                                          • LaxarusL
                                            Laxarus @stephenw10
                                            last edited by

                                            @stephenw10 Yep, normally, it works without a problem. I am not sure how it is configured since there are no options to change the behavior.

                                            No, I cannot access the physical console since it is at remote site.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.