Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Wan periodic reset causes system reboot.

    Scheduled Pinned Locked Moved General pfSense Questions
    152 Posts 6 Posters 51.9k Views 5 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • A Offline
      AlexanderK
      last edited by

      anything new?
      tried to replicate issue at a lab but nothing happened.
      I am using my production network without ipv6

      1 Reply Last reply Reply Quote 0
      • stephenw10S Offline
        stephenw10 Netgate Administrator
        last edited by

        There have been some backend changes to our build system preventing new snaps for a few days. Let me check.....

        1 Reply Last reply Reply Quote 1
        • stephenw10S Offline
          stephenw10 Netgate Administrator
          last edited by

          We are still digging into this. It looks like there may be several related issues here. The NDP issue being one of them.

          RobbieTTR 1 Reply Last reply Reply Quote 2
          • RobbieTTR Offline
            RobbieTT @stephenw10
            last edited by

            @stephenw10 said in Wan periodic reset causes system reboot.:

            We are still digging into this. It looks like there may be several related issues here. The NDP issue being one of them.

            If the coding is not too complicated the understanding of this will wipe at least 3 bugs away. A 4th could be the unexplained DNS Resolver cache wipe following a pfBlocker cron-job. Seems you have more in mind though!

            โ˜•๏ธ

            1 Reply Last reply Reply Quote 1
            • stephenw10S Offline
              stephenw10 Netgate Administrator
              last edited by

              Well I hope there's not more!

              RobbieTTR 1 Reply Last reply Reply Quote 1
              • RobbieTTR Offline
                RobbieTT @stephenw10
                last edited by

                @stephenw10 said in Wan periodic reset causes system reboot.:

                Well I hope there's not more!

                Surely one fix that fixes many is better than chasing down all these individual bugs? Well, unless you are the one unpicking the code... ๐Ÿคท

                Will this work fold into v23.09 or is it too late for that?

                โ˜•๏ธ

                1 Reply Last reply Reply Quote 0
                • stephenw10S Offline
                  stephenw10 Netgate Administrator
                  last edited by

                  I hope that it will be 23.09. The ndp fix certainly will be. ๐Ÿคž

                  RobbieTTR 1 Reply Last reply Reply Quote 1
                  • RobbieTTR Offline
                    RobbieTT @stephenw10
                    last edited by

                    @stephenw10

                    Ok, sounds hopeful but I appreciate this discovery came very late in the .09 workflow.

                    โ˜•๏ธ

                    1 Reply Last reply Reply Quote 0
                    • A Offline
                      AlexanderK
                      last edited by

                      anything new on this issue?

                      RobbieTTR 1 Reply Last reply Reply Quote 0
                      • stephenw10S Offline
                        stephenw10 Netgate Administrator
                        last edited by

                        Not yet. At least not as far as I know since we still have yet to replicate it locally. There are fixes for other things that could be interacting to cause this on some systems. If you're able to test a 23.09 snapshot and can repeatedly trigger this issue please do so.

                        1 Reply Last reply Reply Quote 0
                        • RobbieTTR Offline
                          RobbieTT @AlexanderK
                          last edited by

                          @AlexanderK

                          Still covered by this on redmine:

                          Regression #14431

                          No improvement yet on 23.09 dev and the issue is (probably) being pushed to 24.03, so another 6 months+ away.

                          It's not ideal, I know. I'm looking for a non-pfSense option in the interim to cover the periods when I may not be around to resolve these crashes & reboots.

                          In the meantime I've been pushing data at the Netgate team and running stuff whenever needed and trying every development load.

                          โ˜•๏ธ

                          1 Reply Last reply Reply Quote 1
                          • A Offline
                            AlexanderK
                            last edited by

                            anything on beta 23.09?

                            RobbieTTR 1 Reply Last reply Reply Quote 0
                            • RobbieTTR Offline
                              RobbieTT @AlexanderK
                              last edited by RobbieTT

                              @AlexanderK @stephenw10

                              No, nothing substantive. Netgate did ask me to produce results with a modified kernel but the moving set of instructions left me in a bit of a hole with a router that should have been in production.

                              I am building-up a new server with pfSense+ so I can test with more freedom but my testing hours with a live WAN are limited. I could do with more help really.

                              It would really help if Netgate provided a complete version to test with, rather than being left to modify stuff myself in order to provide data for them. They expect quite a bit from a paying customer (although I acknowledge this would be different if they could replicate the issue on their own dev systems).

                              Anyway, I remain committed and will still invest the time needed where I can.

                              โ˜•๏ธ

                              1 Reply Last reply Reply Quote 1
                              • stephenw10S Offline
                                stephenw10 Netgate Administrator
                                last edited by

                                Yes, I'm sorry about that. Not being able to replicate it ourselves makes everything much more difficult. It's especially annoying here because I have essentially an identical setup to you. The only significant difference is the connection speed.

                                We are working to add a coredump implementation in the gui to make this process much easier.

                                In the mean time I'd be happy to work with you here if you're able to.

                                If you're able to I would simply reinstall with a much larger SWAP size to avoid the need for an external drive with SWAP.

                                Also one thing I didn't realise until I tested it was that changes to the pfSnse-ddb.conf file are read in a boot so the system needs to be rebooted normally to apply them before the panic is triggered.

                                Steve

                                RobbieTTR 1 Reply Last reply Reply Quote 0
                                • RobbieTTR Offline
                                  RobbieTT @stephenw10
                                  last edited by RobbieTT

                                  @stephenw10

                                  Clearly I am happy to work with you Steve. ๐Ÿ‘

                                  I just looked at the swap in the GUI dashboard, only to find there isn't one showing on my Supermicro system. Did I miss a step or something?

                                  [23.05.1-RELEASE][admin@Router-7.redacted.me]/root: swapinfo -h
                                  Device              Size     Used    Avail Capacity
                                  
                                  /root: top
                                  
                                  last pid: 76190;  load averages:  0.14,  0.12,  0.09                      up 0+02:00:50  18:24:55
                                  67 processes:  1 running, 66 sleeping
                                  CPU:  0.2% user,  0.0% nice,  0.1% system,  0.1% interrupt, 99.6% idle
                                  Mem: 139M Active, 391M Inact, 870M Wired, 56K Buf, 29G Free
                                  ARC: 223M Total, 39M MFU, 177M MRU, 622K Anon, 1091K Header, 5669K Other
                                       164M Compressed, 419M Uncompressed, 2.56:1 Ratio
                                  
                                    PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
                                  34097 unbound      16  20    0   425M   316M kqread   3   0:51   3.03% unbound
                                  36766 root          1  20    0    14M  3856K CPU6     6   0:00   0.21% top
                                  

                                  I don't recall setting a swap on my Netgate 6100 or on this machine. The 6100 shows a swap of 1024 MiB - never seen it used though.

                                  โ˜•๏ธ

                                  1 Reply Last reply Reply Quote 0
                                  • stephenw10S Offline
                                    stephenw10 Netgate Administrator
                                    last edited by

                                    Mmm, yes by default the Plus installer sets 1G for swap. The CE installer at one time used half the RAM size by default if there was sufficient drive space. I'm not sure why yours you have none.

                                    A 6100 here dumps ~750MB from the kernel when configured to do so but that's after running only a short time. If you can trigger this quickly on the 6100 it would be worth trying since the worst case is that it just fails to hold the dump and reboots. If you're not using the 6100 you can reinstall it and add more SWAP, I would expect 2GB to be more than enough.

                                    Steve

                                    RobbieTTR 1 Reply Last reply Reply Quote 0
                                    • RobbieTTR Offline
                                      RobbieTT @stephenw10
                                      last edited by

                                      @stephenw10
                                      Under gpart I can see 1.0G of freebsd-swap - is the value used by pfSense?

                                      [23.05.1-RELEASE][admin@Router-7.redacted.me]/root: gpart show
                                      =>       40  231270320  nvd0  GPT  (110G)
                                               40     532480     1  efi  (260M)
                                           532520       1024     2  freebsd-boot  (512K)
                                           533544        984        - free -  (492K)
                                           534528    2097152     3  freebsd-swap  (1.0G)
                                          2631680  228636672     4  freebsd-zfs  (109G)
                                        231268352       2008        - free -  (1.0M)
                                      
                                      [23.05.1-RELEASE][admin@Router-7.redacted.me]/root: 
                                      

                                      It's still a bit odd for it to be missing from the GUI though.

                                      I'm happy to use either the Netgate 6100 or the Supermicro (SYS-510D-8C-FN6P) for testing; so your choice with repeatability vs flexibility.

                                      โ˜•๏ธ

                                      GertjanG 1 Reply Last reply Reply Quote 0
                                      • GertjanG Offline
                                        Gertjan @RobbieTT
                                        last edited by Gertjan

                                        @RobbieTT said in Wan periodic reset causes system reboot.:

                                        I can see 1.0G of freebsd-swap - is the value used by pfSense?

                                        See/etc/fstab

                                        [23.05.1-RELEASE][root@pfSense.bhf.net]/root: cat /etc/fstab
                                        # Device                Mountpoint      FStype  Options         Dump    Pass#
                                        /dev/gpt/efiboot0               /boot/efi       msdosfs rw              0       0
                                        /dev/nvd0p3             none    swap    sw       0       0
                                        

                                        When I got my 4100 (with pfSense 22.05 from back then ?), the swap line wasn't using "/dev/nvd0p3" but something else. The result : there was a swap partition but pfSense wasn't using it.
                                        "nvd0p3" is my swap partition. I had to change that by editing /etc/fstab.
                                        IFAIK : it was mentioning some disk ID, not the partition device name.

                                        If your system starts to use the swap, consider removing stuff, typically, when you use pfBlockerng, use less bigger DNS files ;)

                                        No "help me" PM's please. Use the forum, the community will thank you.
                                        Edit : and where are the logs ??

                                        RobbieTTR 1 Reply Last reply Reply Quote 0
                                        • RobbieTTR Offline
                                          RobbieTT @Gertjan
                                          last edited by RobbieTT

                                          @Gertjan
                                          I think that is my question, given the gpart above, should /etc/fstab be changed from /dev/nvd1p3 to /dev/nvd0p3?

                                          My presumption is that pfSense uses the freebsd-swap shown in gpart but I am not certain or know why the install points to the wrong location.

                                          I don't think I am close to needing swap due to lack of RAM though:

                                           2023-10-16 at 09.17.00.png

                                          This swap discussion is for kernel dumps, not day-to-day use.

                                          [edit:] Changing to nvd0p3 and rebooting did indeed bring the GUI swap graph back:

                                           2023-10-16 at 12.51.47.png

                                          โ˜•๏ธ

                                          1 Reply Last reply Reply Quote 2
                                          • stephenw10S Offline
                                            stephenw10 Netgate Administrator
                                            last edited by

                                            Nice. Also interesting.

                                            Ok so set the line in /etc/pSense-ddb.conf. I used:

                                            script kdb.enter.default=capture on; bt; show registers; show pcpu; capture off; dump; reset
                                            

                                            Then reboot to apply that.

                                            I then tested it by running sysctl debug.kdb.panic=1 which immediately panics the box and runs the script. At the console you see:

                                            panic: kdb_sysctl_panic
                                            cpuid = 3
                                            time = 1697460855
                                            KDB: enter: panic
                                            [ thread pid 1455 tid 100508 ]
                                            Stopped at      kdb_enter+0x32: movq    $0,0x2344f43(%rip)
                                            db:0:kdb.enter.default> capture on
                                            db:0:kdb.enter.default>  bt
                                            Tracing pid 1455 tid 100508 td 0xfffffe00b7ceaac0
                                            kdb_enter() at kdb_enter+0x32/frame 0xfffffe00b13afa10
                                            vpanic() at vpanic+0x163/frame 0xfffffe00b13afb40
                                            panic() at panic+0x43/frame 0xfffffe00b13afba0
                                            kdb_sysctl_panic() at kdb_sysctl_panic+0x61/frame 0xfffffe00b13afbd0
                                            sysctl_root_handler_locked() at sysctl_root_handler_locked+0x90/frame 0xfffffe00b13afc20
                                            sysctl_root() at sysctl_root+0x216/frame 0xfffffe00b13afca0
                                            userland_sysctl() at userland_sysctl+0x176/frame 0xfffffe00b13afd50
                                            sys___sysctl() at sys___sysctl+0x5c/frame 0xfffffe00b13afe00
                                            amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe00b13aff30
                                            fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00b13aff30
                                            --- syscall (202, FreeBSD ELF64, __sysctl), rip = 0xb03aaf1e18a, rsp = 0xb03a86d9c88, rbp = 0xb03a86d9cc0 ---
                                            db:0:kdb.enter.default>  show registers
                                            cs                        0x20
                                            ds                        0x3b
                                            es                        0x3b
                                            fs                        0x13
                                            gs                        0x1b
                                            ss                        0x28
                                            rax                       0x12
                                            rcx         0xffffffff814589e2
                                            rdx                      0x3f8
                                            rbx                      0x100
                                            rsp         0xfffffe00b13afa10
                                            rbp         0xfffffe00b13afa10
                                            rsi                 0xc3b4cdc4
                                            rdi                        0x4
                                            r8                0x7ac3b4cdc4
                                            r9          0xfffffe00b7ceaac0
                                            r10         0xfffffe00b13af8f0
                                            r11         0xcedfc2df9afff59c
                                            r12                          0
                                            r13                          0
                                            r14         0xffffffff814b6685
                                            r15         0xfffffe00b7ceaac0
                                            rip         0xffffffff80d388c2  kdb_enter+0x32
                                            rflags                    0x86
                                            kdb_enter+0x32: movq    $0,0x2344f43(%rip)
                                            db:0:kdb.enter.default>  show pcpu
                                            cpuid        = 3
                                            dynamic pcpu = 0xfffffe008efd7f00
                                            curthread    = 0xfffffe00b7ceaac0: pid 1455 tid 100508 critnest 1 "sysctl"
                                            curpcb       = 0xfffffe00b7ceafe0
                                            fpcurthread  = 0xfffffe00b7ceaac0: pid 1455 "sysctl"
                                            idlethread   = 0xfffffe0011fbde40: tid 100006 "idle: cpu3"
                                            self         = 0xffffffff84013000
                                            curpmap      = 0xfffff8012468fd38
                                            tssp         = 0xffffffff84013384
                                            rsp0         = 0xfffffe00b13b0000
                                            kcr3         = 0xffffffffffffffff
                                            ucr3         = 0xffffffffffffffff
                                            scr3         = 0x0
                                            gs32p        = 0xffffffff84013404
                                            ldt          = 0xffffffff84013444
                                            tss          = 0xffffffff84013434
                                            curvnet      = 0xfffff80001203980
                                            db:0:kdb.enter.default>  capture off
                                            db:0:kdb.enter.default>  dump
                                            Dumping 702 out of 8050 MB:..3%..12%..21%..32%..42%..51%..62%..71%..83%..92%
                                            Dump complete
                                            db:0:kdb.enter.default>  reset
                                            Uptime: 3m30s
                                            

                                            After rebooting you should see the crash report in the gui with the vmcore offered to download.

                                            If that's all working then delete that core and try to panic it by removing the interface again. Hopefully the core is not bigger than 1G if you can trigger it soon enough after boot.

                                            Steve

                                            RobbieTTR 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.