• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

Web GUI crashes after upgrade from 22.05 to 23.01

Plus 23.01 Development Snapshots (Retired)
4
77
14.3k
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S
    stephenw10 Netgate Administrator
    last edited by Jan 13, 2023, 1:11 AM

    If you install the debug kernel with:

    [23.01-RC][root@6100.stevew.lan]/root: pkg install pfSense-kernel-debug-pfSense
    Updating pfSense-core repository catalogue...
    pfSense-core repository is up to date.
    Updating pfSense repository catalogue...
    pfSense repository is up to date.
    All repositories are up to date.
    The following 1 package(s) will be affected (of 0 checked):
    
    New packages to be INSTALLED:
            pfSense-kernel-debug-pfSense: 23.01.b.20230106.0600 [pfSense-core]
    
    Number of packages to be installed: 1
    
    The process will require 709 MiB more space.
    145 MiB to be downloaded.
    
    Proceed with this action? [y/N]: y
    [1/1] Fetching pfSense-kernel-debug-pfSense-23.01.b.20230106.0600.pkg: 100%  145 MiB   5.2MB/s    00:29    
    Checking integrity... done (0 conflicting)
    [1/1] Installing pfSense-kernel-debug-pfSense-23.01.b.20230106.0600...
    [1/1] Extracting pfSense-kernel-debug-pfSense-23.01.b.20230106.0600: 100%
    

    Then when you reboot you can select that by hitting option 6 at the boot loader menu. However if you only have remote access that could be a problem.

    Steve

    1 Reply Last reply Reply Quote 0
    • S
      stephenw10 Netgate Administrator
      last edited by Jan 13, 2023, 1:38 AM

      Still failing to replicate this.

      What IPSec config are you using?

      What firewall rules do you have?

      J 1 Reply Last reply Jan 13, 2023, 12:25 PM Reply Quote 0
      • J
        jjstecchino @stephenw10
        last edited by Jan 13, 2023, 12:25 PM

        @stephenw10 I will be at the other house this weekend and I am going to try to get core dumps with the debug kernel.

        As far as ipsec config, it is a tunnel between the two firewall.
        Phase1: Key exchange is IKEv2, protocol is ipv4 only, auth is Mutual PSK, identifiers are the ip addresses. auth is AES 256 SHA256 DH 14, life time 31680 sec, rekey time at default 90% of lifetime.
        Phase2: mode is Tunnel ipv4, Local network is my lan subnet, remote network is set to the remote network ip/24, no nat. Key exchange/SA mode is ESP, encryption is AES256-GCM 128bits PFS key group 14, life time 5400, rekey time 4860 sec.

        Once I get to the other house (the remote pfsense), the first thing I want to try is to access the firewall GUI of my primary pfsense through the ipsec vpn and see if it crashes or not. The two firewalls run on different hardware with different packages installed.

        I will then try to get you core dumps. I will try to see if going back to a default config with just the ipsec configuration makes any difference.

        Anything else I should try?

        1 Reply Last reply Reply Quote 0
        • J
          jimp Rebel Alliance Developer Netgate
          last edited by Jan 13, 2023, 1:05 PM

          Do you have any special gateways/route table entries setup which refer to the IPsec network(s)? Or any other config in other areas of the firewall itself that relates to the tunnel outside of the IPsec config you described?

          Things like this sort of setup:
          https://docs.netgate.com/pfsense/en/latest/vpn/ipsec/access-firewall-over-ipsec.html

          Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

          Need help fast? Netgate Global Support!

          Do not Chat/PM for help!

          J 1 Reply Last reply Jan 13, 2023, 3:45 PM Reply Quote 0
          • J
            jjstecchino @jimp
            last edited by Jan 13, 2023, 3:45 PM

            @jimp yes I had to add gateways and static routes on both end points

            1 Reply Last reply Reply Quote 0
            • S
              stephenw10 Netgate Administrator
              last edited by Jan 13, 2023, 6:59 PM

              Did it just fail to connect without those static routes? I assume those changes were added in an earlier version, before 22.05?

              J 2 Replies Last reply Jan 13, 2023, 7:54 PM Reply Quote 0
              • J
                jjstecchino @stephenw10
                last edited by Jan 13, 2023, 7:54 PM

                @stephenw10 as we discussed it is not a failed connection. The firewall reboots with a kernel fault. Yes those gateways and static routes were added before 22.05. Is that workaround not needed anymore?

                S 1 Reply Last reply Jan 13, 2023, 8:48 PM Reply Quote 0
                • S
                  stephenw10 Netgate Administrator @jjstecchino
                  last edited by Jan 13, 2023, 8:48 PM

                  @jjstecchino said in Web GUI crashes after upgrade from 22.05 to 23.01:

                  Is that workaround not needed anymore?

                  Maybe not. At least for incoming connections like this. In my testing here I found I did not need it to access the webgui on a remote firewall. I'm just trying to determine if that's specific to my setup. Though I doubt it is since my setup is very basic.
                  I haven't been able to replicate the crash with or without the static routes though.

                  J 1 Reply Last reply Jan 13, 2023, 9:22 PM Reply Quote 0
                  • J
                    jjstecchino @stephenw10
                    last edited by Jan 13, 2023, 9:22 PM

                    @stephenw10

                    Ok this is puzzling to me. I am now at the remote location, if ii access the GUI of my primary home firewall through the IPsec tunnel it works just fine. The two firewalls have different hardware and different update paths. I am going to try a clean full install

                    1 Reply Last reply Reply Quote 1
                    • J
                      jjstecchino @stephenw10
                      last edited by jjstecchino Jan 15, 2023, 7:24 PM Jan 14, 2023, 3:23 PM

                      @stephenw10 @gimp

                      Here is what I did once I had physical access to my remote pfsense:

                      Reinstalled 2.6CE from scratch, kept default config but set LAN ipv4 to my network ip xxx.xxx.50.0/24. Lets call this my now local LAN
                      Updated to plus 22.01, kept default config
                      Updated to plus 22.05, default config
                      Updated to 23.01, default config.
                      Created ipsec vpn tunnel to my primary firewall. Lets call this the now remote LAN with ip xxx.xxx.100.0/24
                      Created a domain override on unbound to forward dns request for the domain on remote network.
                      I could ping remote hosts from my local clients but not from pfsense so ....
                      I needed to add a gateway to the remote network with a static route to xxx.xxx.100.0
                      IPSEC VPN working properly, can access remote network from any local clients and from pfsense.
                      Access to the local pfsense GUI gives no problems.
                      Connected to a windows pc on the xxx.xxx.100.0 network and tried to access the pfsense GUI on xxx.xxx.50.0 network and as before pfsense has a kernel fault.
                      restored my configuration file which adds a few DHCP static mapping
                      Installed debug kernel
                      rebooted to debug kernel
                      Retried to connect to the pfsense guy from the remote pc
                      kernel fault, I see the kernel dump in the local console

                      in /var/crash there is a text dump.tar.0 but no vmcore dump

                      Anything else I can do to help troubleshooting?

                      textdump.tar

                      J 1 Reply Last reply Jan 14, 2023, 4:39 PM Reply Quote 0
                      • J
                        jjstecchino @jjstecchino
                        last edited by jjstecchino Jan 14, 2023, 6:02 PM Jan 14, 2023, 4:39 PM

                        @stephenw10 @gimp

                        Looking at the debug trace

                        Fatal trap 12: page fault while in kernel mode
                        cpuid = 2; apic id = 02
                        fault virtual address	= 0x0
                        fault code		= supervisor read data, page not present
                        instruction pointer	= 0x20:0xffffffff81334a0a
                        stack pointer	        = 0x28:0xfffffe00d2378560
                        0xffffffff80fbe3c4 at tcp_defauframe pointer	        = 0x28:0xfffffe00d2378560
                        lt_output+0x2094
                        code segment		= base 0x0, limit 0xfffff, type 0x1b
                        			= DPL 0, pres 1, long 1, def32 0, gran 1
                        #10 0xffffffff80fd406e at tcp_usr_ready+0x11e
                        processor eflags	= interrupt enabled, resume, IOPL = 0
                        current process		= 36274 (nginx)
                        #11 0xffffffff80d7e39e at sendfile_iodone+0x23e
                        rdi: fffff80106c34113 rsi:                0 rdx:              42c
                        #12 0xfffffffrcx:              42c  r8:                1  r9: fffff80106c61e00
                        f80d7db68 at vn_sendfile+0x1868
                        rax: fffff80106c34113 rbx: fffff8005f313100 rbp: fffffe00d2378560
                        #13 0xffffffff80d7e877 at %[sys_sendfile]+0xf7
                        r10:                1 r11:                0 r12: fffff80106c61e00
                        #14 0xffffffff813393be at amd64_syscall+0x12e
                        r13:                0 r14: fffff8005f2a5e00 r15:                1
                        #15 0xffffffff8130c72b at fast_sytrap number		= 12
                        panic: page fault
                        cpuid = 2
                        time = 1673708059
                        KDB: enter: panic
                        
                        

                        it seems sendfile may be causing the fault.

                        Just to try I disabled sendfile in nginx configuration by editing /etc/inc/system.inc -> function system_generate_nginx_config and set sendfile to off.

                        This solves the issue.

                        Now I can access the GUI remotely without kernel faults

                        Interesting the other firewall (Xeon D-1518 32Gb Ram) doesn't have a problem with sendfile on

                        Another workaround is to set the sysctl kern.ipc.mb_use_ext_pgs = 0 and leave sendfile on in nginx config.

                        This solves the issue as well.

                        It seems to be related to https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254419 which was marked as fixed

                        1 Reply Last reply Reply Quote 0
                        • S
                          stephenw10 Netgate Administrator
                          last edited by Jan 15, 2023, 1:13 PM

                          Ah, these are some nice findings, good job.
                          I'll get this to the developer who is looking at it.

                          J 1 Reply Last reply Jan 16, 2023, 2:01 PM Reply Quote 0
                          • J
                            jjstecchino @stephenw10
                            last edited by jjstecchino Jan 16, 2023, 2:16 PM Jan 16, 2023, 2:01 PM

                            @stephenw10 @gimp

                            On further exploring the sysctl kern.ipc.mb_use_ext_pgs enables or disables the use of unmapped mbufs by sendfile(2) and other kernel functions.
                            Unmapped mbuffs can hold multiple pages of data in a single mbuff and are helpful in reducing cpu utilization https://reviews.freebsd.org/D20616.

                            For a NIC to use unmapped mbuffs the driver has to support this capability.
                            ifconfig will show NOMAP in the options flags if the driver offer unmapped mbuff support.

                            Checking both my firewalls, the one that faults uses the igb driver for its nic. This driver supports NOMAP.

                            The other firewall which doesn't fault uses the ix driver which also supports NOMAP but maybe a different implementation.

                            For me to fix the kernel fault it is either to disable sendfile on nginx, disable the use of unmapped mbuffs altogether or possibly disable the use of unmapped mbuffs on the nic with ifconfig -mextpg flag (did not test this).

                            In pfsense use scenario, the use of unmapped mbuffs may be useful to reduce load on the cpu and possibly beneficial to allow faster data move by the nic driver, so I would be reluctant to disable unmapped mbuffs altogether.

                            On the other hand, nginx in pfsense is not a high traffic web server as it just serves the GUI to a few users. In this scenario disabling the sendfile optimization would not make a difference and may avoid a kernel fault in few edge cases like mine.

                            This may be the preferred route, imo, until the issue is sorted out on freebsd 14.

                            Interestingly on pfsense 22.05 with freebsd 13 nginx has sendfile on, however the igb driver doesn't show the NOMAP capability flag

                            ifconfig igb1
                            igb1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
                                    description: LAN
                                    options=e120bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_MAGIC,VLAN_HWFILTER,RXCSUM_IPV6,TXCSUM_IPV6>
                            

                            whereas in 23.01 the NOMAP capability is supported by the nic driver:

                            ifconfig igb1
                            igb1: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
                            	description: LAN
                            	options=4e120bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_MAGIC,VLAN_HWFILTER,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP>
                            

                            Hope this helps reproducing the kernel fault. This is as far as my ability allows me to help. I wouldn't know how to submit bugs to the freebsd team.

                            I did not investigate the impact that the use of using ipsec may have but surely it is part of the problem as the fault happened only when accessing the pfsense GUI through the ipsec vpn, but didn't occur when accessing the GUI locally.

                            This bug may be difficult to reproduce as it may involve the interplay of a specific nic driver, ipsec and kernel optimizations which may be automatically enabled or disabled on specific hardware.

                            I wish I would have been able to get a full kernel core dump but even using the debug kernel package I get only the textdump in /var/crash.

                            1 Reply Last reply Reply Quote 0
                            • S
                              stephenw10 Netgate Administrator
                              last edited by Jan 16, 2023, 2:18 PM

                              Mmm, yeah, a number of moving parts there. Hard to see why ipsec would trigger it...

                              I'll wait to see what out developers think. At least there's relatively simple workaround with that sysctl though.

                              Steve

                              J 1 Reply Last reply Jan 16, 2023, 2:28 PM Reply Quote 0
                              • J
                                jjstecchino @stephenw10
                                last edited by Jan 16, 2023, 2:28 PM

                                @stephenw10
                                Yeah but that disables a kernel wide optimization that may be important to allow better handling of network traffic by the firewall. Turning off the sendfile optimization on nginx may be a better option as what it does is allows direct move of a file data to a tcp socket without copying to a memory buffer first. This is important for a high traffic web server but overall irrelevant for pfsense.

                                As more people will start to use pfsense 23.xx with freebsd 14 this bug may start to affect others as well.

                                Setting sysctl kern.ipc.mb_use_ext=0 would allow seamlessly updates if sendfile remains set to on on nginx config but it would turn off an important kernel optimization.

                                I would respectfully suggest to consider turning off sendfile in nginx config instead.

                                1 Reply Last reply Reply Quote 0
                                • J
                                  jimp Rebel Alliance Developer Netgate
                                  last edited by Jan 16, 2023, 3:36 PM

                                  I do have one box here with igb and NOMAP showing (A Netgate 7551), but so far I haven't been able to make it crash.

                                  That said, the only IPsec tunnel I have on there that is testable without some work is VTI, not tunnel mode.

                                  I'll see if I can rig up a tunnel mode test on there.

                                  Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                  Need help fast? Netgate Global Support!

                                  Do not Chat/PM for help!

                                  1 Reply Last reply Reply Quote 0
                                  • J
                                    jimp Rebel Alliance Developer Netgate
                                    last edited by Jan 16, 2023, 3:49 PM

                                    Setup a tunnel and still no crash. I can reach the GUI LAN to LAN with a full browser and it appears to be working fine.

                                    Do you have something enabled on the dashboard that might be contributing? Maybe the picture widget with a large image?

                                    Usually the web server wouldn't be using sendfile for much on pfSense since it doesn't have many static things to serve and typically that gets kicked in for stuff like large pictures.

                                    Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                    Need help fast? Netgate Global Support!

                                    Do not Chat/PM for help!

                                    J 1 Reply Last reply Jan 16, 2023, 3:51 PM Reply Quote 0
                                    • J
                                      jjstecchino @jimp
                                      last edited by jjstecchino Jan 16, 2023, 3:55 PM Jan 16, 2023, 3:51 PM

                                      @jimp No, this happened also with a bare bone default config no widgets. Clean install and ipsec tunnel vpn

                                      1 Reply Last reply Reply Quote 0
                                      • J
                                        jimp Rebel Alliance Developer Netgate
                                        last edited by Jan 16, 2023, 3:56 PM

                                        Curious. I even tried downloading a status output and some config backups with RRD (~4MB) but it keeps chugging along.

                                        I tried with no crypto acceleration and also with QAT enabled.

                                        There may be something specific to that exactl igb card that is different than mine.

                                        Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                        Need help fast? Netgate Global Support!

                                        Do not Chat/PM for help!

                                        J 2 Replies Last reply Jan 16, 2023, 4:00 PM Reply Quote 0
                                        • J
                                          jjstecchino @jimp
                                          last edited by Jan 16, 2023, 4:00 PM

                                          @jimp my nic is <Intel(R) I211 (Copper)> port 0xd000-0xd01f mem 0xf7200000-0xf721ffff,0xf7220000-0xf7223fff at device 0.0 on pci2

                                          1 Reply Last reply Reply Quote 0
                                          44 out of 77
                                          • First post
                                            44/77
                                            Last post
                                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.