Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Adventures upgrading 2 SG-5100 to 21.02

    Scheduled Pinned Locked Moved Official Netgate® Hardware
    11 Posts 2 Posters 920 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S
      stephenw10 Netgate Administrator
      last edited by stephenw10

      Hmmm, I have WG running on an SG-5100 and have not hit that or anything close to it.

      Do you have any logs showing the actual output you saw?

      Both ends were running 21.02(p1)?

      Steve

      G 2 Replies Last reply Reply Quote 0
      • G
        gabacho4 Rebel Alliance @stephenw10
        last edited by

        @stephenw10 both were running 21.02 p1. I don’t have any logs as my priority was recovery. It might have been possible for me to get something off the devices after I CTRL-C’ed in console in order to get the never ending garble output to stop but I had no GUI or ssh access to either device. My Linux /BSD kung fu is not as strong. In retrospect I should have taken pictures at the least but the instinct to recover quickly lest I incur the wrath of those who need the internet was my driving force. I was willing to chalk the first device issue off as a fluke. But 2/2 of my devices barfing and both in the course of conducting wireguard configurations? Seems highly unlikely to me. I’ve done all sorts of madness with openvpn and ipsec and never had an issue. Do you have a list of commands I should/could run if something similar occurs again? My appetite, especially with the remote device being so far away, to continue to experiment is essentially null. Remote device went offline at 0100 for me and I didn’t end up getting back into bed with a working config until 0600.

        1 Reply Last reply Reply Quote 0
        • G
          gabacho4 Rebel Alliance @stephenw10
          last edited by

          @stephenw10 if I might ask, what does the red LED mean? If you are looking at the 5100 it was the one in the middle of the three. Also, I’ve been running mullvad wireguard client with no issues. My issues both occurred while making changes to a site to site type config.

          1 Reply Last reply Reply Quote 0
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            It's the status LED. It start duting boot and goes green at the end of boot.
            https://docs.netgate.com/pfsense/en/latest/solutions/sg-5100/io-ports.html#other-ports-and-indicators

            The fact it was red indicates it halted it rebooted. If it had crashed out completely it would have remained at it's current state, green.

            Just copy/pasting some of the console output might have been useful. Or it could have been meaningless garbage.

            Steve

            G 2 Replies Last reply Reply Quote 0
            • G
              gabacho4 Rebel Alliance @stephenw10
              last edited by

              @stephenw10 very interesting. Despite being red it was certainly working. Once I got the garble stopped I had an ash prompt which allowed me to issue commands, such as the reboot one that I issued. I was able to watch the device boot to a certain point before the terminal would go berserk. Should it happen again I'll capture some of the output. I know I wasn't much help on the troubleshooting front. Unfortunately my need to ensure internet service was available forced me to take the most direct action to meet that demand.

              1 Reply Last reply Reply Quote 0
              • G
                gabacho4 Rebel Alliance @stephenw10
                last edited by gabacho4

                @stephenw10 I'm back! I had yet another crash tonight. Router became unresponsive, no internet, no dhcp, dead. I was able to console in again and had the never ending line of ........................... running across the screen for eternity. I CTRL-C again and, for reasons I can't explain, while sitting there feeling dread about yet again having to install and restore, the device kicked and I got to the main console screen with interfaces and ip addresses etc. Oddly, the router had reverted to a state where my WAN was on PPPoE - I haven't had that for 6+ months now. Regardless, I logged in via the gui and had a crash notice on the dashboard. I was able to download two files. Can be found at files.

                Again I'm not guru but I did see panic in there which never seems like a good thing with *nix. Would greatly appreciate any information you all at Netgate can provide about what you see and if it is something already known or not. I was yet again making some changes to wireguard (MTU and MSS) when this issue occurred. I'm getting very paranoid about making any changes right now lest I press my luck and crash this sucker again. Cannot wait for 21.05 or 21.02 pX, whichever comes first and corrects some of the weirdness. Thanks in advance!

                1 Reply Last reply Reply Quote 1
                • stephenw10S
                  stephenw10 Netgate Administrator
                  last edited by

                  Mmm, that does look like something in WireGuard from the backtrace:

                  db:0:kdb.enter.default>  bt
                  Tracing pid 75993 tid 100526 td 0xfffff801708b0740
                  kdb_enter() at kdb_enter+0x37/frame 0xfffffe002cdb4c10
                  vpanic() at vpanic+0x197/frame 0xfffffe002cdb4c60
                  panic() at panic+0x43/frame 0xfffffe002cdb4cc0
                  trap_fatal() at trap_fatal+0x391/frame 0xfffffe002cdb4d20
                  trap_pfault() at trap_pfault+0x4f/frame 0xfffffe002cdb4d70
                  trap() at trap+0x286/frame 0xfffffe002cdb4e80
                  calltrap() at calltrap+0x8/frame 0xfffffe002cdb4e80
                  --- trap 0xc, rip = 0xffffffff80d84c37, rsp = 0xfffffe002cdb4f50, rbp = 0xfffffe002cdb4fd0 ---
                  __mtx_lock_sleep() at __mtx_lock_sleep+0xd7/frame 0xfffffe002cdb4fd0
                  wg_queue_out() at wg_queue_out+0x21b/frame 0xfffffe002cdb5010
                  wg_transmit() at wg_transmit+0xda/frame 0xfffffe002cdb5070
                  pf_test() at pf_test+0x22f0/frame 0xfffffe002cdb52b0
                  pf_test() at pf_test+0x20f6/frame 0xfffffe002cdb54f0
                  pf_check_out() at pf_check_out+0x1d/frame 0xfffffe002cdb5510
                  pfil_run_hooks() at pfil_run_hooks+0xa1/frame 0xfffffe002cdb55b0
                  ip_output() at ip_output+0xb4f/frame 0xfffffe002cdb56f0
                  udp_send() at udp_send+0xbbe/frame 0xfffffe002cdb57f0
                  sosend_dgram() at sosend_dgram+0x348/frame 0xfffffe002cdb5850
                  sosend() at sosend+0x50/frame 0xfffffe002cdb5880
                  kern_sendit() at kern_sendit+0x19d/frame 0xfffffe002cdb5920
                  sendit() at sendit+0x19c/frame 0xfffffe002cdb5970
                  sys_sendto() at sys_sendto+0x4d/frame 0xfffffe002cdb59c0
                  amd64_syscall() at amd64_syscall+0x387/frame 0xfffffe002cdb5af0
                  fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe002cdb5af0
                  --- syscall (133, FreeBSD ELF64, sys_sendto), rip = 0x800c421ca, rsp = 0x7fffdfbfb3f8, rbp = 0x7fffdfbfb440 ---
                  

                  Though the panic appears to be in unbound:

                  <4>matchaddr failed
                  <4>matchaddr failed
                  <4>matchaddr failed
                  <4>matchaddr failed
                  <4>matchaddr failed
                  <4>matchaddr failed
                  <6>wg0: link state changed to DOWN
                  <6>wg0: sc=0xfffff8000e00dc00
                  <6>wg0: link state changed to UP
                  <6>wg0: link state changed to DOWN
                  <6>wg0: sc=0xfffff8000e00dc00
                  <6>wg0: link state changed to UP
                  <6>wg1: link state changed to DOWN
                  
                  
                  Fatal trap 12: page fault while in kernel mode
                  cpuid = 3; apic id = 18
                  fault virtual address	= 0x410
                  fault code		= supervisor read data, page not present
                  instruction pointer	= 0x20:0xffffffff80d84c37
                  stack pointer	        = 0x28:0xfffffe002cdb4f50
                  frame pointer	        = 0x28:0xfffffe002cdb4fd0
                  code segment		= base 0x0, limit 0xfffff, type 0x1b
                  			= DPL 0, pres 1, long 1, def32 0, gran 1
                  processor eflags	= interrupt enabled, resume, IOPL = 0
                  current process		= 75993 (unbound)
                  trap number		= 12
                  panic: page fault
                  cpuid = 3
                  time = 1615314653
                  KDB: enter: panic
                  

                  The console buffer shows a quite a few WireGuard interface state chnages. Were you making those or was it losing connection?

                  Steve

                  G 1 Reply Last reply Reply Quote 0
                  • G
                    gabacho4 Rebel Alliance @stephenw10
                    last edited by gabacho4

                    @stephenw10 it was probably me. I was trying various mtu/mss combinations to see if they offered any discernible difference. Have had issues with the wireguard gateways reporting between 6 and 15 percent packet loss, as well as very slow page loading with various websites. Figured mtu/mss was a good place to start. It really seems like things just aren't 100% with wireguard. I have no issues with OpenVPN configurations, packet loss, page loading.

                    It really seems like the router struggles to figure out how to route requests when there are multiple wireguard gateways. I've noticed weirdness even with OpenVPN where, despite having PBR, my LAN devices will have a local WAN IP. If I restart the OpenVPN service, those devices then show a VPN IP instead. 21.02 seems to have some pretty rough edges compared to 2.4.5p1. I'm not angry or anything but just hope the next updatr is pushed out soon to resolve the ones you all know about. I don't really want to have to rely on applying patches one at a time. Do you know if there is a release expected again for the 21.02 install that's not the 21.05 release? I'm trying to be strong but man, I've had some issues I've never experienced before and have really fought the urge to roll back.

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      Ah, looks like you're hitting this: https://redmine.pfsense.org/issues/11585

                      G 1 Reply Last reply Reply Quote 0
                      • G
                        gabacho4 Rebel Alliance @stephenw10
                        last edited by

                        @stephenw10 I’d seen that one, along with another that causes a panic if there are multiple changes/saves in a short timeframe. Glad I’m not experiencing anything new. Here’s to hoping a fix comes very soon.

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.