Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Epyc 3251 and Wireguard

    Scheduled Pinned Locked Moved General pfSense Questions
    50 Posts 3 Posters 8.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S
      stephenw10 Netgate Administrator
      last edited by

      Hmm, no crash report from when it was running over igb though?

      Just very odd that it shows as a crash in the driver not in Wireguard.
      It must be some very unusual traffic the WG is creating and driver is trying to do something with. Hard to imagine what that could be though.

      J 1 Reply Last reply Reply Quote 1
      • J
        Jarhead @stephenw10
        last edited by

        @stephenw10
        No crash report from igb. I had mentioned that when it crashed with the igb too, I went back to the chelsio.

        Got it working though.
        Never would have guessed you can't "reuse" a tunnel but creating a new one fixed it.
        Does the wireguard tunnels 'key' to the actual hardware some how?

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          No way that I can think of. Hmm. I've imported configs with WG tunnels defined before and never seen an issue. 🤔

          J 1 Reply Last reply Reply Quote 0
          • J
            Jarhead @stephenw10
            last edited by

            @stephenw10
            I didn't import them, I recreated them with the same values. Shouldn't make a difference and I'm now guessing it didn't.
            Tunnels have been up for about an hour or so and it just went down.
            Same thing, tried to access the opnsense box.

            J 1 Reply Last reply Reply Quote 0
            • J
              Jarhead @Jarhead
              last edited by

              @stephenw10 Gonna try to switch LAN and WAN back to the igb at some point.
              If this turns out to be the chelsio card, any ideas why it would have a problem?
              Any suggested 'tuning' to it maybe?

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                You might check the mbuf usage history in the monitoring graphs. We have seen odd traffic create an mbuf leak before in cxgbe. The Wireguard traffic just isn't that odd though. Makes me wonder if it fails in the driver but is actually triggered somewhere else....

                J 1 Reply Last reply Reply Quote 0
                • J
                  Jarhead @stephenw10
                  last edited by

                  @stephenw10 So this is getting more weird by the minute!

                  Last night I switched the WAN to igb0. I had also planned on switching the LAN to igb1 but didn't, although I did free igb1 up and unassigned it. Might be relevant.

                  All was working great but I didn't have high hopes since it was all working fine with cxl3 for an hour yesterday.
                  Went to sleep, this morning I checked on it and it was still running. One tunnel has my camera network, another has backups.
                  'I was confused since it crashed twice on igb0.
                  Trying to find out what the difference was and I remembered igb1.
                  I went back and reassigned igb1 to my guest wifi network.
                  Tried to access the opnsense box on the other end and the page wouldn't load. Went straight to the monitor connected to my pfSense expecting to see the screen scrolling but it wasn't.
                  Went back to my pfSense webgui and saw the tunnel went down and was coming back up in the time it took me to get back to the interface.
                  That was at 7am, it still is not completely back. Right now the tunnel gateway shows 1,235.1ms 174.9ms 0.0%. It was at over 5000 rtt and over 500 rttsd.

                  So with 5 physical interfaces up, it went down but didn't crash.

                  Could this be a wireguard bug?
                  Gonna disable the 5th interface and see if it changes anything when I can.

                  For reference:

                  My side.
                  LAN = 10.12.8.0/26
                  camera = 10.12.8.64/27
                  Guest wifi = 10.12.8.96/27
                  IoT = 10.12.8.128/25

                  1st tunnel:
                  LAN = 10.8.19.0/26
                  camera = 10.8.19.248/29

                  2nd tunnel:
                  LAN = 192.168.1.0/24

                  Nothing overlapping.

                  Bob.DigB 1 Reply Last reply Reply Quote 0
                  • Bob.DigB
                    Bob.Dig LAYER 8 @Jarhead
                    last edited by

                    @jarhead I ran WG to Android, Windows and a Privacy-VPN, so far no crashes.

                    J 1 Reply Last reply Reply Quote 0
                    • J
                      Jarhead @Bob.Dig
                      last edited by

                      @bob-dig said in Epyc 3251 and Wireguard:

                      @jarhead I ran WG to Android, Windows and a Privacy-VPN, so far no crashes.

                      But did you have 5 physical interfaces up?

                      Bob.DigB 1 Reply Last reply Reply Quote 0
                      • Bob.DigB
                        Bob.Dig LAYER 8 @Jarhead
                        last edited by Bob.Dig

                        @jarhead I had 4 WG tunnels to the privacy VPN and 2 other tunnels. When it comes to physical interfaces, most are VLANs here, so no. Also I just wanted to mentioned it, don't really think that I could help here anyway.

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by

                          Hmm, hard to see how the number of interfaces would affect Wireguard. It would increase the mbuf allocation.
                          Do you see any errors on the console or logged even if it doesn't panic/reboot?

                          Not sure if it would better to get a crash report from igb at this point. 😉

                          I could imagine the cxgbe driver is crashing completely but igb, when faces with the same traffic, just starts dropping it causing the massive loss/latency you are seeing.

                          Steve

                          J 1 Reply Last reply Reply Quote 0
                          • J
                            Jarhead @stephenw10
                            last edited by

                            @stephenw10
                            No errors anywhere I can see but now the other WG tunnel is going out too.

                            pfsense.png

                            Bob.DigB 1 Reply Last reply Reply Quote 0
                            • stephenw10S
                              stephenw10 Netgate Administrator
                              last edited by

                              Your WAN is running on the cxl interface directly here right? Not on a VLAN or virtual interface?

                              J 1 Reply Last reply Reply Quote 0
                              • Bob.DigB
                                Bob.Dig LAYER 8 @Jarhead
                                last edited by

                                @jarhead I had latency problems with the WG-tunnels to the privacy VPN too but no crashes. I thought it is their fault, usually it is.

                                1 Reply Last reply Reply Quote 0
                                • stephenw10S
                                  stephenw10 Netgate Administrator
                                  last edited by

                                  Also just to confirm is your WG tunnel running on the WAN directly? It's not forwarding to localhost for example?

                                  The crash looks like cxgbe crashing whilst trying to forward traffic.

                                  Steve

                                  1 Reply Last reply Reply Quote 0
                                  • J
                                    Jarhead @stephenw10
                                    last edited by

                                    @stephenw10 said in Epyc 3251 and Wireguard:

                                    Your WAN is running on the cxl interface directly here right? Not on a VLAN or virtual interface?

                                    No. WAN has been on igb0 since last night.
                                    Just noticed in that picture how high the latency is on my WAN, haven't seen that before.
                                    I wonder if that's what the whole problem is.
                                    Again, not sure if it's been that high the whole time or not but I have to start there. It's usually around 3 to 4 ms.
                                    Just turned off both WG tunnels and it's still around 150ms.

                                    Time to call the ISP. Oh joy.

                                    J 1 Reply Last reply Reply Quote 0
                                    • J
                                      Jarhead @Jarhead
                                      last edited by

                                      No ISP call needed luckily.

                                      Stupid mistake on my part.
                                      When I put the guest wifi back this morning I moved the WAN port on my switch since I used that port for the WAN to igb0 yesterday.
                                      So I plugged the new WAN port into a switchport that was set to 10/full. Causing the high latency obviously.
                                      Changed that port to auto/auto and it's back up and running good.
                                      Both WG tunnels up and seeing normal latency on both.

                                      Will leave it up this way for the day but I'm starting to think it has to be the chelsio card now. It just doesn't like wireguard for some reason. Can't explain why it did crash on the igb before but it's not now. Yet.

                                      @stephenw10 I'm gonna want/need to move WAN back to cxl3 at some point. Other than mbuf, any other ideas of how to troubleshoot this?

                                      1 Reply Last reply Reply Quote 0
                                      • stephenw10S
                                        stephenw10 Netgate Administrator
                                        last edited by

                                        Yeah, it looks like the cxgbe driver to me too. And to the developers that looked at this.

                                        Previously when it was crashing on igb it never created a crash report right? But it was panicking and rebooting? Or just the stream of errors you originally posted?

                                        It must be something about the WG traffic that is triggering an issue.

                                        Try to grab the output of netstat -m.

                                        Otherwise I'd be looking at the sysctl mac stats for the NIC in use.

                                        Beyond that we might need a debug kernel or similar.

                                        If you're able to leave it running on igb0 for a while that might help. You might be logging errors there that could point to a cause. Or if it does crash and generate a report that would be useful.

                                        Steve

                                        J 1 Reply Last reply Reply Quote 0
                                        • J
                                          Jarhead @stephenw10
                                          last edited by

                                          @stephenw10 said in Epyc 3251 and Wireguard:

                                          Yeah, it looks like the cxgbe driver to me too. And to the developers that looked at this.

                                          Previously when it was crashing on igb it never created a crash report right? But it was panicking and rebooting? Or just the stream of errors you originally posted?

                                          Steve

                                          Unfortunately I never let it complete the dump when it was on the igb interface. I've been thinking about it and I think that was caused by the bad config import. If you remember I found the gateways were all screwed up and who knows what else was. That's the only thing I can attribute to it crashing on the igb.
                                          It's been running for a few hours now and it's perfect. This is with a new config I did from scratch and all new tunnels instead of trying to recreate the old ones.
                                          It still crashes if I use the cxgbe and it doesn't with the igb.
                                          So it must've been the config causing it then.

                                          When I can I'll put it back on the cxgbe and monitor what I can. Maybe do a sniff on it too.

                                          1 Reply Last reply Reply Quote 0
                                          • stephenw10S
                                            stephenw10 Netgate Administrator
                                            last edited by

                                            Yeah a pcap might show something if it's some unusual traffic triggering it.

                                            J 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.