Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Epyc 3251 and Wireguard

    Scheduled Pinned Locked Moved General pfSense Questions
    50 Posts 3 Posters 8.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • J
      Jarhead
      last edited by

      @stephenw10
      Can you tell me what's going on here?

      Aug 28 00:12:54 pfsense kernel: 
      Aug 28 00:12:54 pfsense kernel: 
      Aug 28 00:12:54 pfsense kernel: Fatal trap 9: general protection fault while in kernel mode
      Aug 28 00:12:54 pfsense kernel: 
      Aug 28 00:12:54 pfsense kernel: cpuid = 0; 
      Aug 28 00:12:54 pfsense kernel: Fatal trap 9: general protection fault while in kernel mode
      Aug 28 00:12:54 pfsense kernel: cpuid = 2; apic id = 00
      Aug 28 00:12:54 pfsense kernel: apic id = 02
      Aug 28 00:12:54 pfsense kernel: instruction pointer	= 0x20:0xffffffff8065f3d9
      Aug 28 00:12:54 pfsense kernel: instruction pointer	= 0x20:0xffffffff8065f3d9
      Aug 28 00:12:54 pfsense kernel: stack pointer	        = 0x28:0xfffffe009a0dd620
      Aug 28 00:12:54 pfsense kernel: stack pointer	        = 0x0:0xfffffe000055c660
      Aug 28 00:12:54 pfsense kernel: frame pointer	        = 0x28:0xfffffe009a0dd650
      Aug 28 00:12:54 pfsense kernel: frame pointer	        = 0x0:0xfffffe000055c690
      Aug 28 00:12:54 pfsense kernel: code segment		= base 0x0, limit 0xfffff, type 0x1b
      Aug 28 00:12:54 pfsense kernel: code segment		= base 0x0, limit 0xfffff, type 0x1b
      Aug 28 00:12:54 pfsense kernel: 			= DPL 0, pres 1, long 1, def32 0, gran 1
      Aug 28 00:12:54 pfsense kernel: 			= DPL 0, pres 1, long 1, def32 0, gran 1
      Aug 28 00:12:54 pfsense kernel: processor eflags	= processor eflags	= interrupt enabled, interrupt enabled, resume, resume, IOPL = 0
      Aug 28 00:12:54 pfsense kernel: IOPL = 0
      Aug 28 00:12:54 pfsense kernel: current process		= 0 (if_io_tqg_2)
      Aug 28 00:12:54 pfsense kernel: current process		= 12 (irq317: t5nex0:2a0)
      
      Aug 28 08:09:41 pfsense kernel: 
      Aug 28 08:09:41 pfsense kernel: 
      Aug 28 08:09:41 pfsense kernel: Fatal trap 9: general protection fault while in kernel mode
      Aug 28 08:09:41 pfsense kernel: cpuid = 10; apic id = 0a
      Aug 28 08:09:41 pfsense kernel: instruction pointer	= 0x20:0xffffffff8065f3d9
      Aug 28 08:09:41 pfsense kernel: stack pointer	        = 0x28:0xfffffe009a11e540
      Aug 28 08:09:41 pfsense kernel: frame pointer	        = 0x28:0xfffffe009a11e570
      Aug 28 08:09:41 pfsense kernel: code segment		= base 0x0, limit 0xfffff, type 0x1b
      Aug 28 08:09:41 pfsense kernel: 			= DPL 0, pres 1, long 1, def32 0, gran 1
      Aug 28 08:09:41 pfsense kernel: processor eflags	= interrupt enabled, resume, 
      Aug 28 08:09:41 pfsense kernel: IOPL = 0
      Aug 28 08:09:41 pfsense kernel: current process		= 12 (irq330: t5nex0:3a3)
      Aug 28 08:09:41 pfsense kernel: trap number		= 9
      
      Aug 28 08:17:47 pfsense kernel: 
      Aug 28 08:17:47 pfsense kernel: 
      Aug 28 08:17:47 pfsense kernel: 
      Aug 28 08:17:47 pfsense kernel: Fatal trap 9: general protection fault while in kernel mode
      Aug 28 08:17:47 pfsense kernel: 
      Aug 28 08:17:47 pfsense kernel: cpuid = 10; 
      Aug 28 08:17:47 pfsense kernel: 
      Aug 28 08:17:47 pfsense kernel: Fatal trap 9: general protection fault while in kernel mode
      Aug 28 08:17:47 pfsense kernel: cpuid = 4; Fatal trap 9: general protection fault while in kernel mode
      Aug 28 08:17:47 pfsense kernel: apic id = 0a
      Aug 28 08:17:47 pfsense kernel: cpuid = 0; 
      Aug 28 08:17:47 pfsense kernel: instruction pointer	= 0x20:0xffffffff8065f3d9
      Aug 28 08:17:47 pfsense kernel: apic id = 00
      Aug 28 08:17:47 pfsense kernel: apic id = 04
      Aug 28 08:17:47 pfsense kernel: instruction pointer	= 0x20:0xffffffff8065f3d9
      Aug 28 08:17:47 pfsense kernel: instruction pointer	= 0x20:0xffffffff8065f3d9
      Aug 28 08:17:47 pfsense kernel: stack pointer	        = 0x28:0xfffffe009a12d540
      Aug 28 08:17:47 pfsense kernel: stack pointer	        = 0x28:0xfffffe009a11e540
      Aug 28 08:17:47 pfsense kernel: stack pointer	        = 0x28:0xfffffe009a10f540
      Aug 28 08:17:47 pfsense kernel: frame pointer	        = 0x28:0xfffffe009a12d570
      Aug 28 08:17:47 pfsense kernel: frame pointer	        = 0x28:0xfffffe009a10f570
      Aug 28 08:17:47 pfsense kernel: code segment		= base 0x0, limit 0xfffff, type 0x1b
      Aug 28 08:17:47 pfsense kernel: frame pointer	        = 0x28:0xfffffe009a11e570
      Aug 28 08:17:47 pfsense kernel: 			= DPL 0, pres 1, long 1, def32 0, gran 1
      Aug 28 08:17:47 pfsense kernel: code segment		= base 0x0, limit 0xfffff, type 0x1b
      Aug 28 08:17:47 pfsense kernel: processor eflags	= 
      Aug 28 08:17:47 pfsense kernel: 			= DPL 0, pres 1, long 1, def32 0, gran 1
      Aug 28 08:17:47 pfsense kernel: interrupt enabled, processor eflags	= code segment		= base 0x0, limit 0xfffff, type 0x1b
      Aug 28 08:17:47 pfsense kernel: interrupt enabled, resume, 
      Aug 28 08:17:47 pfsense kernel: 			= DPL 0, pres 1, long 1, def32 0, gran 1
      Aug 28 08:17:48 pfsense kernel: IOPL = 0
      Aug 28 08:17:48 pfsense kernel: resume, processor eflags	= IOPL = 0
      Aug 28 08:17:48 pfsense kernel: current process		= 12 (irq333: t5nex0:3a6)
      Aug 28 08:17:48 pfsense kernel: current process		= 12 (irq327: t5nex0:3a0)
      Aug 28 08:17:48 pfsense kernel: trap number		= 9
      

      Just got a Supermicro AS-5019D-FTN4 and it's been crashing constantly.
      I think I got it nailed down to wireguard, at least it looks that way since it's been running fine with WG disabled.
      The weird thing is it runs fine with WG enabled until something tries to access something across the tunnel.

      Any ideas?

      stephenw10S 1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator @Jarhead
        last edited by

        @jarhead said in Epyc 3251 and Wireguard:

        Aug 28 08:17:48 pfsense kernel: current process = 12 (irq333: t5nex0:3a6)
        Aug 28 08:17:48 pfsense kernel: current process = 12 (irq327: t5nex0:3a0)

        Is that a Chelsio NIC? t5nex0?

        That's where it appears to be failing, which is odd.

        Do you have a full crash report with the backtrace?

        Steve

        J 1 Reply Last reply Reply Quote 0
        • J
          Jarhead @stephenw10
          last edited by

          @stephenw10 I have a chelsio t540-cr installed.
          /var/crash has nothing. Where else would I look?

          1 Reply Last reply Reply Quote 0
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            Hmm, is it actually panicking and rebooting when that happens?

            You have any of the Chelsio hardware off-loading enabled?

            J 1 Reply Last reply Reply Quote 0
            • J
              Jarhead @stephenw10
              last edited by Jarhead

              @stephenw10 Doesn't reboot, just keeps scrolling lines of errors (I assume). Let it go for 10 minutes once, then I rebooted it.
              Anywhere I can find those lines or are they not saved?

              Both LAN and WAN are on the chelsio card.

              ifconfig cxl3
              cxl3: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
                      description: WAN
                      options=3e800bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6,TXRTLMT,HWRXTSTMP>
                      ether 00:07:43:2c:e5:38
                      inet6 fe80::207:43ff:fe2c:e538%cxl3 prefixlen 64 scopeid 0x8
                      inet 32.219.x.x netmask 0xfffff800 broadcast 32.219.239.255
                      media: Ethernet 10Gbase-LR <full-duplex,rxpause,txpause>
                      status: active
                      nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
              
              ifconfig cxl2
              cxl2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
                      description: LAN
                      options=3e800bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6,TXRTLMT,HWRXTSTMP>
                      ether 00:07:43:2c:e5:30
                      inet6 fe80::207:43ff:fe2c:e530%cxl2 prefixlen 64 scopeid 0x7
                      inet 10.12.8.1 netmask 0xffffffc0 broadcast 10.12.8.63
                      inet 10.255.255.1 netmask 0xffffffff broadcast 10.255.255.1
                      media: Ethernet 10Gbase-LRM <full-duplex,rxpause,txpause>
                      status: active
                      nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
              

              Also, It's been running fine for a few hours with Wireguard disabled. I enabled WG, and once I tried to connect to the pfSense WebGUI on the other side it went down again.

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                I would expect to see something in the system log when that happens.

                That combination of things is not something I've seen before though. I'll run it past the devs tomorrow and see if any of them have.

                Steve

                J 2 Replies Last reply Reply Quote 0
                • J
                  Jarhead @stephenw10
                  last edited by

                  @stephenw10 Will have an update in a few minutes.

                  Disconnected the chelsio card, put wan and lan on gig ports. Did the same thing.
                  took a look at wireguard config and found the gateways were reversed. Started thinking if that got screwy in the config restore what else did??
                  So I completely removed WG and all config from it.
                  Rebooted, did a backup, removed all traces of WG from it and restored.
                  Just came back up now and waiting for the package reinstall.
                  Once done, I'll reinstall WG, recreate all tunnels and see what happens.

                  I did let it go through the whole process last crash and got the dump files if needed.
                  Will let you know how it goes.

                  1 Reply Last reply Reply Quote 0
                  • J
                    Jarhead @stephenw10
                    last edited by

                    @stephenw10
                    Still no good.
                    Just created 1 tunnel. It comes up fine but as soon as I try to use it, gone.

                    textdump.tar.0
                    info.0

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      Hmm, still showing issues in the Chelsio driver.

                      Panic:

                      Fatal trap 9: general protection fault while in kernel mode
                      cpuid = 12; apic id = 0c
                      instruction pointer	= 0x20:0xffffffff8065f3d9
                      stack pointer	        = 0x28:0xfffffe009a0fb540
                      frame pointer	        = 0x28:0xfffffe009a0fb570
                      code segment		= base 0x0, limit 0xfffff, type 0x1b
                      			= DPL 0, pres 1, long 1, def32 0, gran 1
                      processor eflags	= interrupt enabled, resume, IOPL = 0
                      current process		= 12 (irq323: t5nex0:3a2)
                      trap number		= 9
                      panic: general protection fault
                      cpuid = 12
                      time = 1661734572
                      KDB: enter: panic
                      

                      Backtrace:

                      db:0:kdb.enter.default>  bt
                      Tracing pid 12 tid 100213 td 0xfffff80005df0000
                      kdb_enter() at kdb_enter+0x37/frame 0xfffffe009a0fb250
                      vpanic() at vpanic+0x197/frame 0xfffffe009a0fb2a0
                      panic() at panic+0x43/frame 0xfffffe009a0fb300
                      trap_fatal() at trap_fatal+0x391/frame 0xfffffe009a0fb360
                      trap() at trap+0x67/frame 0xfffffe009a0fb470
                      calltrap() at calltrap+0x8/frame 0xfffffe009a0fb470
                      --- trap 0x9, rip = 0xffffffff8065f3d9, rsp = 0xfffffe009a0fb540, rbp = 0xfffffe009a0fb570 ---
                      cxgbe_transmit() at cxgbe_transmit+0x19/frame 0xfffffe009a0fb570
                      ether_output_frame() at ether_output_frame+0xb4/frame 0xfffffe009a0fb5a0
                      ether_output() at ether_output+0x676/frame 0xfffffe009a0fb620
                      ip_output() at ip_output+0x136c/frame 0xfffffe009a0fb770
                      ip_forward() at ip_forward+0x39e/frame 0xfffffe009a0fb840
                      ip_input() at ip_input+0x850/frame 0xfffffe009a0fb8f0
                      netisr_dispatch_src() at netisr_dispatch_src+0xca/frame 0xfffffe009a0fb940
                      ether_demux() at ether_demux+0x16a/frame 0xfffffe009a0fb970
                      ether_nh_input() at ether_nh_input+0x330/frame 0xfffffe009a0fb9d0
                      netisr_dispatch_src() at netisr_dispatch_src+0xca/frame 0xfffffe009a0fba20
                      ether_input() at ether_input+0x89/frame 0xfffffe009a0fba80
                      service_iq_fl() at service_iq_fl+0x5d2/frame 0xfffffe009a0fbb30
                      t4_intr() at t4_intr+0x2d/frame 0xfffffe009a0fbb50
                      ithread_loop() at ithread_loop+0x23c/frame 0xfffffe009a0fbbb0
                      fork_exit() at fork_exit+0x7e/frame 0xfffffe009a0fbbf0
                      fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe009a0fbbf0
                      --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
                      

                      But wireguard was no longer running on it when happened?

                      J 1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        What's the WG tunnel connected to there? Another pfSense install?

                        J 1 Reply Last reply Reply Quote 0
                        • J
                          Jarhead @stephenw10
                          last edited by

                          @stephenw10 said in Epyc 3251 and Wireguard:

                          Hmm, still showing issues in the Chelsio driver.

                          I assume that's only because my WAN is on the chelsio at the time. I didn't check when I disconnected the chelsio card but I would also assume it would've shown as the igb0 at that time.

                          But wireguard was no longer running on it when happened?

                          Probably the cause right there. WG shutting down when I try to use it?

                          stephenw10S 1 Reply Last reply Reply Quote 0
                          • J
                            Jarhead @stephenw10
                            last edited by

                            @stephenw10 said in Epyc 3251 and Wireguard:

                            What's the WG tunnel connected to there? Another pfSense install?

                            Unfortunately not.
                            That tunnel goes to an opnsense box. At least until the vlan0 is fixed. 😉

                            1 Reply Last reply Reply Quote 0
                            • stephenw10S
                              stephenw10 Netgate Administrator
                              last edited by

                              Hmm, so the encrypted WG traffic still runs over the Chelsio NIC, the WAN?

                              J 1 Reply Last reply Reply Quote 0
                              • J
                                Jarhead @stephenw10
                                last edited by

                                @stephenw10 Not really sure what you're asking there.
                                My WAN is on the chelsio card (cxl3), the WG tunnel comes up with handshakes, but as soon as I try to access the other side it crashes.

                                1 Reply Last reply Reply Quote 0
                                • stephenw10S
                                  stephenw10 Netgate Administrator
                                  last edited by

                                  Mmm, I'm unsure what you moved to igb0. I would have expected that to have to be the WAN for the WG interface to be running on it.

                                  J 1 Reply Last reply Reply Quote 0
                                  • J
                                    Jarhead @stephenw10
                                    last edited by Jarhead

                                    @stephenw10 I moved the WAN to igb0 and disconnected the chelsio card from the motherboard as a test.
                                    The trouble still happened.
                                    So I don't think focusing on the chelsio is the way to go.
                                    It happens with the onboard nics also.
                                    Because it still happened with the onboard nics, I reinserted the chelsio and moved WAN back to it.

                                    1 Reply Last reply Reply Quote 0
                                    • stephenw10S
                                      stephenw10 Netgate Administrator
                                      last edited by

                                      Right, I would agree except that it appeared the error was still on the Chelsio NIC even when it was not carrying WG traffic as I understand it.

                                      It would be good to get a crash report from the igb0 as WAN setup if that's possible. It would be very surprising to see the same error on igb sicne many people are running WG with an igb parent.

                                      J 1 Reply Last reply Reply Quote 0
                                      • J
                                        Jarhead @stephenw10
                                        last edited by

                                        @stephenw10 said in Epyc 3251 and Wireguard:

                                        Right, I would agree except that it appeared the error was still on the Chelsio NIC even when it was not carrying WG traffic as I understand it.

                                        How are you coming up with that?

                                        1 Reply Last reply Reply Quote 0
                                        • stephenw10S
                                          stephenw10 Netgate Administrator @Jarhead
                                          last edited by

                                          @jarhead said in Epyc 3251 and Wireguard:

                                          But wireguard was no longer running on it when happened?

                                          Probably the cause right there. WG shutting down when I try to use it?

                                          I may have read that wrong. But what I meant to ask there was; was WG running on the Chelsio NIC when that crash report was generated?

                                          J 1 Reply Last reply Reply Quote 0
                                          • J
                                            Jarhead @stephenw10
                                            last edited by

                                            @stephenw10 I'll go through the whole thing again, trying to be more clear.

                                            New router. Backed up old, restored on new changing interfaces as needed.
                                            Wireguard would crash.
                                            Moved WAN and LAN to onboard igb nic's.
                                            Wireguard would crash.
                                            Since this proves it's not related to the chelsio card, as it wasn't even plugged in to the motherboard, I reinstalled the chelsio and moved WAN and LAN back to it.
                                            Wireguard would crash.
                                            I found some weird errors in my gateways, as in network 1 was using gateway 2, and network 2 using gateway 1 when they should be 1 to 1 and 2 to 2, so I uninstalled wireguard then reinstalled it and recreated one tunnel.
                                            Wireguard crashed and that's the dump I posted here.

                                            So focusing on the chelsio card seems to be not the way to go.

                                            Have you guys used an Epyc 3251 in the office for testing at all?

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.