Epyc 3251 and Wireguard
-
Ah, OK. Sorry I misinterpreted the responses there then.
Is it possible to switch back to igb and try to generate a crash report?
The crash you saw there in cxgbe looks very similar to some we have seen in other drivers but that I expect to be fixed in igb.I'm not aware of any testing that has been done on an Epyc platform device.
Steve
-
@stephenw10 Definitely possible, just don't know when I can get to it. Got a lot of painting to do tonight so maybe I can play around as I watch the paint dry.
-
Ha, sounds like a good option!
I'll keep digging here, see if anyone has any suggestions.
-
@stephenw10 Boy that paint took a long time to dry!
Gave me a lot of time to try this out.
Kept going back to that config being messed up so I started from scratch.
New install, chelsio card not connected, just changed my network address with WAN on igb0 and LAN on igb1.
Installed Wireguard. It ran fine.
Installed Chelsio. still ran fine on igb interfaces.
Moved WAN and LAN to chelsio. Still ran fine!
So the Chelsio is not causing the issue.
Spent some time (a lot! long night) setting the config back to my usual. Not importing anything, had my old router up and used it as a reference.
Wireguard crashed.
In the process of setting up my old router as the new endpoint so I can rule out opnsense being the cause. (secretly hoping it is so I can get rid of it!!)Did get a new crash dump that might show something. This is still on the chelsio though.
-
Mmm, pretty much identical crash and it's in the cxgbe driver.
I have no idea how the WG encrypted traffic could be triggering it though.I'd be willing to bet it would not crash with igb as WAN. Though you said you were still seeing the errors logged with igb?
After reconfiguring it with your previous settings did it start crashing immediately?
Was it actually panicking and rebooting or just Wireguard erroring out?Steve
-
@stephenw10
But it did crash with the igb driver.
When I have a clean config (meaning no other interfaces assigned) it runs fine on both igb and cxgbe.
When I put my config back on it, it crashes on igb and cxgbe.
Gonna try disabling all other interfaces when I get a chance later. -
The firewall itself crashed and rebooted with the igb NIC as WAN?
-
@stephenw10
Yes, pointed that out many posts ago. That's why I keep saying stop focusing on the chelsio.
But I did make some progress.
I have WG working from pfSense to pfSense.
Don't think that matters because I did have it running to this same opnsense box.
What I'm thinking, and about to try, is that I've been recreating the tunnel I used with the old router to this opnsense junk.
I now created a whole new tunnel as a test between pfSenseseses.Gonna now try to create a new tunnel to the opnsense.
-
Hmm, no crash report from when it was running over igb though?
Just very odd that it shows as a crash in the driver not in Wireguard.
It must be some very unusual traffic the WG is creating and driver is trying to do something with. Hard to imagine what that could be though. -
@stephenw10
No crash report from igb. I had mentioned that when it crashed with the igb too, I went back to the chelsio.Got it working though.
Never would have guessed you can't "reuse" a tunnel but creating a new one fixed it.
Does the wireguard tunnels 'key' to the actual hardware some how? -
No way that I can think of. Hmm. I've imported configs with WG tunnels defined before and never seen an issue.
-
@stephenw10
I didn't import them, I recreated them with the same values. Shouldn't make a difference and I'm now guessing it didn't.
Tunnels have been up for about an hour or so and it just went down.
Same thing, tried to access the opnsense box. -
@stephenw10 Gonna try to switch LAN and WAN back to the igb at some point.
If this turns out to be the chelsio card, any ideas why it would have a problem?
Any suggested 'tuning' to it maybe? -
You might check the mbuf usage history in the monitoring graphs. We have seen odd traffic create an mbuf leak before in cxgbe. The Wireguard traffic just isn't that odd though. Makes me wonder if it fails in the driver but is actually triggered somewhere else....
-
@stephenw10 So this is getting more weird by the minute!
Last night I switched the WAN to igb0. I had also planned on switching the LAN to igb1 but didn't, although I did free igb1 up and unassigned it. Might be relevant.
All was working great but I didn't have high hopes since it was all working fine with cxl3 for an hour yesterday.
Went to sleep, this morning I checked on it and it was still running. One tunnel has my camera network, another has backups.
'I was confused since it crashed twice on igb0.
Trying to find out what the difference was and I remembered igb1.
I went back and reassigned igb1 to my guest wifi network.
Tried to access the opnsense box on the other end and the page wouldn't load. Went straight to the monitor connected to my pfSense expecting to see the screen scrolling but it wasn't.
Went back to my pfSense webgui and saw the tunnel went down and was coming back up in the time it took me to get back to the interface.
That was at 7am, it still is not completely back. Right now the tunnel gateway shows 1,235.1ms 174.9ms 0.0%. It was at over 5000 rtt and over 500 rttsd.So with 5 physical interfaces up, it went down but didn't crash.
Could this be a wireguard bug?
Gonna disable the 5th interface and see if it changes anything when I can.For reference:
My side.
LAN = 10.12.8.0/26
camera = 10.12.8.64/27
Guest wifi = 10.12.8.96/27
IoT = 10.12.8.128/251st tunnel:
LAN = 10.8.19.0/26
camera = 10.8.19.248/292nd tunnel:
LAN = 192.168.1.0/24Nothing overlapping.
-
@jarhead I ran WG to Android, Windows and a Privacy-VPN, so far no crashes.
-
@bob-dig said in Epyc 3251 and Wireguard:
@jarhead I ran WG to Android, Windows and a Privacy-VPN, so far no crashes.
But did you have 5 physical interfaces up?
-
@jarhead I had 4 WG tunnels to the privacy VPN and 2 other tunnels. When it comes to physical interfaces, most are VLANs here, so no. Also I just wanted to mentioned it, don't really think that I could help here anyway.
-
Hmm, hard to see how the number of interfaces would affect Wireguard. It would increase the mbuf allocation.
Do you see any errors on the console or logged even if it doesn't panic/reboot?Not sure if it would better to get a crash report from igb at this point.
I could imagine the cxgbe driver is crashing completely but igb, when faces with the same traffic, just starts dropping it causing the massive loss/latency you are seeing.
Steve
-
@stephenw10
No errors anywhere I can see but now the other WG tunnel is going out too.