Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Epyc 3251 and Wireguard

    Scheduled Pinned Locked Moved General pfSense Questions
    50 Posts 3 Posters 8.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • J
      Jarhead @stephenw10
      last edited by Jarhead

      @stephenw10 Doesn't reboot, just keeps scrolling lines of errors (I assume). Let it go for 10 minutes once, then I rebooted it.
      Anywhere I can find those lines or are they not saved?

      Both LAN and WAN are on the chelsio card.

      ifconfig cxl3
      cxl3: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
              description: WAN
              options=3e800bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6,TXRTLMT,HWRXTSTMP>
              ether 00:07:43:2c:e5:38
              inet6 fe80::207:43ff:fe2c:e538%cxl3 prefixlen 64 scopeid 0x8
              inet 32.219.x.x netmask 0xfffff800 broadcast 32.219.239.255
              media: Ethernet 10Gbase-LR <full-duplex,rxpause,txpause>
              status: active
              nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
      
      ifconfig cxl2
      cxl2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
              description: LAN
              options=3e800bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6,TXRTLMT,HWRXTSTMP>
              ether 00:07:43:2c:e5:30
              inet6 fe80::207:43ff:fe2c:e530%cxl2 prefixlen 64 scopeid 0x7
              inet 10.12.8.1 netmask 0xffffffc0 broadcast 10.12.8.63
              inet 10.255.255.1 netmask 0xffffffff broadcast 10.255.255.1
              media: Ethernet 10Gbase-LRM <full-duplex,rxpause,txpause>
              status: active
              nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
      

      Also, It's been running fine for a few hours with Wireguard disabled. I enabled WG, and once I tried to connect to the pfSense WebGUI on the other side it went down again.

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        I would expect to see something in the system log when that happens.

        That combination of things is not something I've seen before though. I'll run it past the devs tomorrow and see if any of them have.

        Steve

        J 2 Replies Last reply Reply Quote 0
        • J
          Jarhead @stephenw10
          last edited by

          @stephenw10 Will have an update in a few minutes.

          Disconnected the chelsio card, put wan and lan on gig ports. Did the same thing.
          took a look at wireguard config and found the gateways were reversed. Started thinking if that got screwy in the config restore what else did??
          So I completely removed WG and all config from it.
          Rebooted, did a backup, removed all traces of WG from it and restored.
          Just came back up now and waiting for the package reinstall.
          Once done, I'll reinstall WG, recreate all tunnels and see what happens.

          I did let it go through the whole process last crash and got the dump files if needed.
          Will let you know how it goes.

          1 Reply Last reply Reply Quote 0
          • J
            Jarhead @stephenw10
            last edited by

            @stephenw10
            Still no good.
            Just created 1 tunnel. It comes up fine but as soon as I try to use it, gone.

            textdump.tar.0
            info.0

            1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              Hmm, still showing issues in the Chelsio driver.

              Panic:

              Fatal trap 9: general protection fault while in kernel mode
              cpuid = 12; apic id = 0c
              instruction pointer	= 0x20:0xffffffff8065f3d9
              stack pointer	        = 0x28:0xfffffe009a0fb540
              frame pointer	        = 0x28:0xfffffe009a0fb570
              code segment		= base 0x0, limit 0xfffff, type 0x1b
              			= DPL 0, pres 1, long 1, def32 0, gran 1
              processor eflags	= interrupt enabled, resume, IOPL = 0
              current process		= 12 (irq323: t5nex0:3a2)
              trap number		= 9
              panic: general protection fault
              cpuid = 12
              time = 1661734572
              KDB: enter: panic
              

              Backtrace:

              db:0:kdb.enter.default>  bt
              Tracing pid 12 tid 100213 td 0xfffff80005df0000
              kdb_enter() at kdb_enter+0x37/frame 0xfffffe009a0fb250
              vpanic() at vpanic+0x197/frame 0xfffffe009a0fb2a0
              panic() at panic+0x43/frame 0xfffffe009a0fb300
              trap_fatal() at trap_fatal+0x391/frame 0xfffffe009a0fb360
              trap() at trap+0x67/frame 0xfffffe009a0fb470
              calltrap() at calltrap+0x8/frame 0xfffffe009a0fb470
              --- trap 0x9, rip = 0xffffffff8065f3d9, rsp = 0xfffffe009a0fb540, rbp = 0xfffffe009a0fb570 ---
              cxgbe_transmit() at cxgbe_transmit+0x19/frame 0xfffffe009a0fb570
              ether_output_frame() at ether_output_frame+0xb4/frame 0xfffffe009a0fb5a0
              ether_output() at ether_output+0x676/frame 0xfffffe009a0fb620
              ip_output() at ip_output+0x136c/frame 0xfffffe009a0fb770
              ip_forward() at ip_forward+0x39e/frame 0xfffffe009a0fb840
              ip_input() at ip_input+0x850/frame 0xfffffe009a0fb8f0
              netisr_dispatch_src() at netisr_dispatch_src+0xca/frame 0xfffffe009a0fb940
              ether_demux() at ether_demux+0x16a/frame 0xfffffe009a0fb970
              ether_nh_input() at ether_nh_input+0x330/frame 0xfffffe009a0fb9d0
              netisr_dispatch_src() at netisr_dispatch_src+0xca/frame 0xfffffe009a0fba20
              ether_input() at ether_input+0x89/frame 0xfffffe009a0fba80
              service_iq_fl() at service_iq_fl+0x5d2/frame 0xfffffe009a0fbb30
              t4_intr() at t4_intr+0x2d/frame 0xfffffe009a0fbb50
              ithread_loop() at ithread_loop+0x23c/frame 0xfffffe009a0fbbb0
              fork_exit() at fork_exit+0x7e/frame 0xfffffe009a0fbbf0
              fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe009a0fbbf0
              --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
              

              But wireguard was no longer running on it when happened?

              J 1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                What's the WG tunnel connected to there? Another pfSense install?

                J 1 Reply Last reply Reply Quote 0
                • J
                  Jarhead @stephenw10
                  last edited by

                  @stephenw10 said in Epyc 3251 and Wireguard:

                  Hmm, still showing issues in the Chelsio driver.

                  I assume that's only because my WAN is on the chelsio at the time. I didn't check when I disconnected the chelsio card but I would also assume it would've shown as the igb0 at that time.

                  But wireguard was no longer running on it when happened?

                  Probably the cause right there. WG shutting down when I try to use it?

                  stephenw10S 1 Reply Last reply Reply Quote 0
                  • J
                    Jarhead @stephenw10
                    last edited by

                    @stephenw10 said in Epyc 3251 and Wireguard:

                    What's the WG tunnel connected to there? Another pfSense install?

                    Unfortunately not.
                    That tunnel goes to an opnsense box. At least until the vlan0 is fixed. ๐Ÿ˜‰

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      Hmm, so the encrypted WG traffic still runs over the Chelsio NIC, the WAN?

                      J 1 Reply Last reply Reply Quote 0
                      • J
                        Jarhead @stephenw10
                        last edited by

                        @stephenw10 Not really sure what you're asking there.
                        My WAN is on the chelsio card (cxl3), the WG tunnel comes up with handshakes, but as soon as I try to access the other side it crashes.

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by

                          Mmm, I'm unsure what you moved to igb0. I would have expected that to have to be the WAN for the WG interface to be running on it.

                          J 1 Reply Last reply Reply Quote 0
                          • J
                            Jarhead @stephenw10
                            last edited by Jarhead

                            @stephenw10 I moved the WAN to igb0 and disconnected the chelsio card from the motherboard as a test.
                            The trouble still happened.
                            So I don't think focusing on the chelsio is the way to go.
                            It happens with the onboard nics also.
                            Because it still happened with the onboard nics, I reinserted the chelsio and moved WAN back to it.

                            1 Reply Last reply Reply Quote 0
                            • stephenw10S
                              stephenw10 Netgate Administrator
                              last edited by

                              Right, I would agree except that it appeared the error was still on the Chelsio NIC even when it was not carrying WG traffic as I understand it.

                              It would be good to get a crash report from the igb0 as WAN setup if that's possible. It would be very surprising to see the same error on igb sicne many people are running WG with an igb parent.

                              J 1 Reply Last reply Reply Quote 0
                              • J
                                Jarhead @stephenw10
                                last edited by

                                @stephenw10 said in Epyc 3251 and Wireguard:

                                Right, I would agree except that it appeared the error was still on the Chelsio NIC even when it was not carrying WG traffic as I understand it.

                                How are you coming up with that?

                                1 Reply Last reply Reply Quote 0
                                • stephenw10S
                                  stephenw10 Netgate Administrator @Jarhead
                                  last edited by

                                  @jarhead said in Epyc 3251 and Wireguard:

                                  But wireguard was no longer running on it when happened?

                                  Probably the cause right there. WG shutting down when I try to use it?

                                  I may have read that wrong. But what I meant to ask there was; was WG running on the Chelsio NIC when that crash report was generated?

                                  J 1 Reply Last reply Reply Quote 0
                                  • J
                                    Jarhead @stephenw10
                                    last edited by

                                    @stephenw10 I'll go through the whole thing again, trying to be more clear.

                                    New router. Backed up old, restored on new changing interfaces as needed.
                                    Wireguard would crash.
                                    Moved WAN and LAN to onboard igb nic's.
                                    Wireguard would crash.
                                    Since this proves it's not related to the chelsio card, as it wasn't even plugged in to the motherboard, I reinstalled the chelsio and moved WAN and LAN back to it.
                                    Wireguard would crash.
                                    I found some weird errors in my gateways, as in network 1 was using gateway 2, and network 2 using gateway 1 when they should be 1 to 1 and 2 to 2, so I uninstalled wireguard then reinstalled it and recreated one tunnel.
                                    Wireguard crashed and that's the dump I posted here.

                                    So focusing on the chelsio card seems to be not the way to go.

                                    Have you guys used an Epyc 3251 in the office for testing at all?

                                    1 Reply Last reply Reply Quote 0
                                    • stephenw10S
                                      stephenw10 Netgate Administrator
                                      last edited by

                                      Ah, OK. Sorry I misinterpreted the responses there then.

                                      Is it possible to switch back to igb and try to generate a crash report?
                                      The crash you saw there in cxgbe looks very similar to some we have seen in other drivers but that I expect to be fixed in igb.

                                      I'm not aware of any testing that has been done on an Epyc platform device.

                                      Steve

                                      J 1 Reply Last reply Reply Quote 0
                                      • J
                                        Jarhead @stephenw10
                                        last edited by

                                        @stephenw10 Definitely possible, just don't know when I can get to it. Got a lot of painting to do tonight so maybe I can play around as I watch the paint dry. ๐Ÿ˜€

                                        1 Reply Last reply Reply Quote 0
                                        • stephenw10S
                                          stephenw10 Netgate Administrator
                                          last edited by

                                          Ha, sounds like a good option!

                                          I'll keep digging here, see if anyone has any suggestions.

                                          J 1 Reply Last reply Reply Quote 0
                                          • J
                                            Jarhead @stephenw10
                                            last edited by

                                            @stephenw10 Boy that paint took a long time to dry! ๐Ÿ˜€

                                            Gave me a lot of time to try this out.
                                            Kept going back to that config being messed up so I started from scratch.
                                            New install, chelsio card not connected, just changed my network address with WAN on igb0 and LAN on igb1.
                                            Installed Wireguard. It ran fine.
                                            Installed Chelsio. still ran fine on igb interfaces.
                                            Moved WAN and LAN to chelsio. Still ran fine!
                                            So the Chelsio is not causing the issue.
                                            Spent some time (a lot! long night) setting the config back to my usual. Not importing anything, had my old router up and used it as a reference.
                                            Wireguard crashed.
                                            In the process of setting up my old router as the new endpoint so I can rule out opnsense being the cause. (secretly hoping it is so I can get rid of it!!)

                                            Did get a new crash dump that might show something. This is still on the chelsio though.

                                            textdump.tar.0
                                            info.0

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.