Epyc 3251 and Wireguard
-
@stephenw10
But it did crash with the igb driver.
When I have a clean config (meaning no other interfaces assigned) it runs fine on both igb and cxgbe.
When I put my config back on it, it crashes on igb and cxgbe.
Gonna try disabling all other interfaces when I get a chance later. -
The firewall itself crashed and rebooted with the igb NIC as WAN?
-
@stephenw10
Yes, pointed that out many posts ago. That's why I keep saying stop focusing on the chelsio.
But I did make some progress.
I have WG working from pfSense to pfSense.
Don't think that matters because I did have it running to this same opnsense box.
What I'm thinking, and about to try, is that I've been recreating the tunnel I used with the old router to this opnsense junk.
I now created a whole new tunnel as a test between pfSenseseses.Gonna now try to create a new tunnel to the opnsense.
-
Hmm, no crash report from when it was running over igb though?
Just very odd that it shows as a crash in the driver not in Wireguard.
It must be some very unusual traffic the WG is creating and driver is trying to do something with. Hard to imagine what that could be though. -
@stephenw10
No crash report from igb. I had mentioned that when it crashed with the igb too, I went back to the chelsio.Got it working though.
Never would have guessed you can't "reuse" a tunnel but creating a new one fixed it.
Does the wireguard tunnels 'key' to the actual hardware some how? -
No way that I can think of. Hmm. I've imported configs with WG tunnels defined before and never seen an issue.
-
@stephenw10
I didn't import them, I recreated them with the same values. Shouldn't make a difference and I'm now guessing it didn't.
Tunnels have been up for about an hour or so and it just went down.
Same thing, tried to access the opnsense box. -
@stephenw10 Gonna try to switch LAN and WAN back to the igb at some point.
If this turns out to be the chelsio card, any ideas why it would have a problem?
Any suggested 'tuning' to it maybe? -
You might check the mbuf usage history in the monitoring graphs. We have seen odd traffic create an mbuf leak before in cxgbe. The Wireguard traffic just isn't that odd though. Makes me wonder if it fails in the driver but is actually triggered somewhere else....
-
@stephenw10 So this is getting more weird by the minute!
Last night I switched the WAN to igb0. I had also planned on switching the LAN to igb1 but didn't, although I did free igb1 up and unassigned it. Might be relevant.
All was working great but I didn't have high hopes since it was all working fine with cxl3 for an hour yesterday.
Went to sleep, this morning I checked on it and it was still running. One tunnel has my camera network, another has backups.
'I was confused since it crashed twice on igb0.
Trying to find out what the difference was and I remembered igb1.
I went back and reassigned igb1 to my guest wifi network.
Tried to access the opnsense box on the other end and the page wouldn't load. Went straight to the monitor connected to my pfSense expecting to see the screen scrolling but it wasn't.
Went back to my pfSense webgui and saw the tunnel went down and was coming back up in the time it took me to get back to the interface.
That was at 7am, it still is not completely back. Right now the tunnel gateway shows 1,235.1ms 174.9ms 0.0%. It was at over 5000 rtt and over 500 rttsd.So with 5 physical interfaces up, it went down but didn't crash.
Could this be a wireguard bug?
Gonna disable the 5th interface and see if it changes anything when I can.For reference:
My side.
LAN = 10.12.8.0/26
camera = 10.12.8.64/27
Guest wifi = 10.12.8.96/27
IoT = 10.12.8.128/251st tunnel:
LAN = 10.8.19.0/26
camera = 10.8.19.248/292nd tunnel:
LAN = 192.168.1.0/24Nothing overlapping.
-
@jarhead I ran WG to Android, Windows and a Privacy-VPN, so far no crashes.
-
@bob-dig said in Epyc 3251 and Wireguard:
@jarhead I ran WG to Android, Windows and a Privacy-VPN, so far no crashes.
But did you have 5 physical interfaces up?
-
@jarhead I had 4 WG tunnels to the privacy VPN and 2 other tunnels. When it comes to physical interfaces, most are VLANs here, so no. Also I just wanted to mentioned it, don't really think that I could help here anyway.
-
Hmm, hard to see how the number of interfaces would affect Wireguard. It would increase the mbuf allocation.
Do you see any errors on the console or logged even if it doesn't panic/reboot?Not sure if it would better to get a crash report from igb at this point.
I could imagine the cxgbe driver is crashing completely but igb, when faces with the same traffic, just starts dropping it causing the massive loss/latency you are seeing.
Steve
-
@stephenw10
No errors anywhere I can see but now the other WG tunnel is going out too. -
Your WAN is running on the cxl interface directly here right? Not on a VLAN or virtual interface?
-
@jarhead I had latency problems with the WG-tunnels to the privacy VPN too but no crashes. I thought it is their fault, usually it is.
-
Also just to confirm is your WG tunnel running on the WAN directly? It's not forwarding to localhost for example?
The crash looks like cxgbe crashing whilst trying to forward traffic.
Steve
-
@stephenw10 said in Epyc 3251 and Wireguard:
Your WAN is running on the cxl interface directly here right? Not on a VLAN or virtual interface?
No. WAN has been on igb0 since last night.
Just noticed in that picture how high the latency is on my WAN, haven't seen that before.
I wonder if that's what the whole problem is.
Again, not sure if it's been that high the whole time or not but I have to start there. It's usually around 3 to 4 ms.
Just turned off both WG tunnels and it's still around 150ms.Time to call the ISP. Oh joy.
-
No ISP call needed luckily.
Stupid mistake on my part.
When I put the guest wifi back this morning I moved the WAN port on my switch since I used that port for the WAN to igb0 yesterday.
So I plugged the new WAN port into a switchport that was set to 10/full. Causing the high latency obviously.
Changed that port to auto/auto and it's back up and running good.
Both WG tunnels up and seeing normal latency on both.Will leave it up this way for the day but I'm starting to think it has to be the chelsio card now. It just doesn't like wireguard for some reason. Can't explain why it did crash on the igb before but it's not now. Yet.
@stephenw10 I'm gonna want/need to move WAN back to cxl3 at some point. Other than mbuf, any other ideas of how to troubleshoot this?