tun_wg0 reports (through snmp) some amount of Ierr's and Oerrs (mostly Oerrs) and triggers nagios-like warnings
-
On an existing setup of two remote pfSense+ with Wireguard tunnelling between sites, over fibre, which runs flawlessly (no issues ever suspected), we changed the monitoring of our resources to "checkmk" (remotely related to nagios / forked as far as I understand).
The new monitoring has started to trigger warnings and sometimes critical alerts, as well as a hint that the interface might be "flapping", only on the tun_wg0 interface.
On the human visible side of things, everything is still running fine.
In fact, SNMP polling from pfSense reports some error counts on tun_wg0 that are slightly above the warning and sometimes critical limits.I've checked everything I can think of and can't find the cause of these oerrs and some ierrs on the tun_wg0 software interface. They certainly don't match similar errors on the underlying physical interfaces.
How do I go about debugging this? Does anyone recognise this as a known problem? Are these really packets issues or false error counts?
I will probably set things up on the "checkmk" side to record the error channels, but not to trigger alerts about them (on that interface). If there is something "real" I could tweak to "fix" these errors, assuming they are real, it would probably be better.
-
Could it be that normal discards on the synthetic Wireguard interfaces such as tun_wg0 are incorrectly counted in the error counts?
[tun_wg0]
Operational state: up
Speed: unknown
In: 78.3 kB/s
Out: 12.9 kB/s
Errors in: 0%
Discards in: 0 packets/s
Multicast in: 0 packets/s
Broadcast in: 0 packets/s
Unicast in: 130.38 packets/s
Non-unicast in: 0 packets/s
Errors out: 0.202% (warn/crit at 0.01%/0.1%)CRIT
Discards out: 0 packets/s
Multicast out: 0 packets/s
Broadcast out: 0 packets/s
Unicast out: 80.84 packets/s
Non-unicast out: 0 packets/s -
Here are two graphs of the snmp reported data.
First the wan interface over which the wireguard tunnel goes through, among other trafic. No errors at all.
Next is the tun_wg0 interface (wg), which shows those outgoing errors, with some pattern by the way.
I cannot make sense of it.
-
What's the theory here? If a packet enters pfSense through, let's say, a LAN interface with an MTU of 1500 and ends up being routed through the Wireguard interface (MTU 1432 for example) like tun_wg0 to reach the other side of the tunnel? Are the oversized packets properly fragmented or are they considered errors at this point? Possibly returning unreachable/oversized ICMP to the LAN interface origin? I mean, what if the packets counted as errors on the tun_wg0 interface are not actually errors (and should not be counted as such)? Any PMTUD attempt from the LAN to the remote destination through Wireguard would then accumulate "errors" in those counters, when it shouldn't?
Pure conjecture. I'm just trying to make sense of it.