High interrupts on WAN/LAN interfaces?
-
Hey all, I've been trying to chase down a packet loss issue with my PFSense on Topton (AliExpress) N5105 (i226 interface) router. I get random bouts of enormous packet loss that doesn't affect other devices plugged into my provider modem, so I'm thinking it has to be with my PFSense/Topton router. I've disabled all hardware offloading, made sure I'm on the latest version of software, but no matter what I do I still see random, sporadic runs of 60-80% packet loss. The other 20-40% of packets get through mostly fine, with no significant delay, it's just a ton of dropped packets.
I've got a few system tunables set (dev.cpu.0.cx_lowest = C3, dev.hwpstate_intel.0.epp = 90) and had flow control turned off on the intel cards, but have since deleted those. Here are my interrupt numbers:
WAN:
In/out packets: 3736652/3186398 (3.89 GiB/1.99 GiB)
In/out packets (pass): 3736652/3186398 (3.89 GiB/1.99 GiB)
In/out packets (block): 15919/235 (1.88 MiB/57 KiB)
In/out errors: 0/0
Collisions: 0
Interrupts: 4583207 (1126/s)LAN:
In/out packets: 3216598/3722933 (1.98 GiB/3.86 GiB)
In/out packets (pass): 3216598/3722933 (1.98 GiB/3.86 GiB)
In/out packets (block): 76029/598 (4.45 MiB/255 KiB)
In/out errors: 0/0
Collisions: 0
Interrupts: 4441446 (1090/s)Something feels hokey here, right? What would be causing interrupt numbers like this and might it be the source of my sporadic high packet loss?
-
This is to my (ISP) gateway router, adjacent to me. If I plug another router in there it works fine with seemingly no loss. And this isn't event entirely consistent-it might happen for 10m or 10h but then stop for a day or a week or even multiple months. The packet loss kicked up again for seemingly no reason -
The only thing I see in the logs, which might be a bit suspicious is an awful lot of messages like these:
Sep 16 16:57:35 dpinger 35604 exiting on signal 15
Sep 16 16:57:35 dpinger 34661 exiting on signal 15
Sep 16 16:57:35 dpinger 36758 exiting on signal 15I have 3 wireguard tunnels running that are bound to the public interface-I wonder if these, in some way, are interfering with the WAN connection...
-
One last insight here... When I make any change to the WAN interface (change speed/duplex, disable bogon networks/etc) I see 15-30s of loss-free traffic. Every time. But then it comes and goes after that. I have absolutely no idea what that means, but it's at least another data point...
-
@rmeskill
Go to the gateway settings and state an alternative monitoring IP to ensure, that it's not just the gateway dropping ICMP packets. -
@viragomann Yeah, it's a good idea, but I've also set Disable Gateway Monitoring Action on the main WAN gateway, so I wouldn't expect it to have any effect...
-
Interestingly I added a USB ethernet adapter and I'm not seeing any loss...
So. yeah, I'm almost positive it's hardware related, probably with the i226-v NICs... -
dpinger restarting like that is almost certainly a symptom of the packet loss alarms not the cause.
@rmeskill said in High interrupts on WAN/LAN interfaces?:
I've got a few system tunables set (dev.cpu.0.cx_lowest = C3, dev.hwpstate_intel.0.epp = 90
I'd be suspicious of those. Try running the CPU at a higher target speed. Do you have those settings for all CPU cores?
Some similar boxes like that have some terrible default BIOS values to reduce CPU heat that cause them to run at reduced speeds.
Do you have eee enabled for the i226 NICs?
When you are seeing those interrupt rates what traffic is passing?
-
I went to try and turn eee on for the NICs and it completely hosed my system-had to go back and turn it off (which appears the default) via a direct console connection to be able to get things working again. I think the interrupt rates themselves are actually irrelevant, rather it's the packet loss that's the symptom and the issue. I have absolutely no idea why or what is correlated to the loss. I'm currently trying an external USB NIC to see if that has the same issues or not, setup in a GW Group configuration so it should automatically failover if the main i226-v NIC starts seeing loss...
-
@stephenw10 I did also move the dev.cpu.0.cx_lowest = C3, dev.hwpstate_intel.0.epp = 90 to = 100. But in the 5m since I added the USB NIC it seems that's also seeing loss. But I can confirm I have another (GLiNet) router plugged into this modem, running a similar setup, and it works fine with no loss, so the only consistency here is the Topton PFSense box...
-
Actually at this point it just looks like the PFSense system itself is completely hosed. I've no idea why, but now even the GUI loads awfully slowly from the local network and sometimes gives me a 503 error on reboot. I'm probably just going to blow it away and start fresh
-
Yup eee should be disabled.
The values for hwpstate_intel should be set lower for higher performance. Setting it to 100 means the most power saving / lowest performance. Try setting it to 50 or just disable that. The default value depends on what the BIOS passes though.
-
@stephenw10 as it is I just deleted it entirely, so it shouldn't be doing any limiting at all. I couldn't find any specific features in the BIOS pointing at this, though, so I'm just running on defaults now. Fully rebuilt PFSense and still having the same loss issues, though, but, as I've confirmed loss on a second router as well, I'm leaning back towards an issue with the provider instead...
-
The hwpstate_intel driver is enabled by default and should select 50 by default. That should be more than sufficient but as I said some of those devices have some very odd choices set in the BIOS by default. You should try setting a high performance value there like 30 and see if it makes any difference.
-
@stephenw10 I'll give 30 a go. The seeming irony to my situation is my RTTs to my gateway monitor IP aren't even bad, it's just I'm seeing enormous loss (on ICMP):
If I run a throughput test I sometimes see 200-400Mbps down, which is in line with what I'd expect, but the issue is tunnels are dropping and apps disconnecting -
Check the mac stats shown in:
sysctl dev.igc.0
or whichever NICs you are using. -
@stephenw10 What should I be looking for?
dev.igc.0.mac_stats.tso_txd: 0
dev.igc.0.mac_stats.tx_frames_1024_1522: 1344934
dev.igc.0.mac_stats.tx_frames_512_1023: 15901
dev.igc.0.mac_stats.tx_frames_256_511: 13890
dev.igc.0.mac_stats.tx_frames_128_255: 21666
dev.igc.0.mac_stats.tx_frames_65_127: 1302018
dev.igc.0.mac_stats.tx_frames_64: 680328
dev.igc.0.mac_stats.mcast_pkts_txd: 0
dev.igc.0.mac_stats.bcast_pkts_txd: 3
dev.igc.0.mac_stats.good_pkts_txd: 3378737
dev.igc.0.mac_stats.total_pkts_txd: 3378737
dev.igc.0.mac_stats.good_octets_txd: 2039654276
dev.igc.0.mac_stats.good_octets_recvd: 4277785341
dev.igc.0.mac_stats.rx_frames_1024_1522: 2895839
dev.igc.0.mac_stats.rx_frames_512_1023: 83944
dev.igc.0.mac_stats.rx_frames_256_511: 18340
dev.igc.0.mac_stats.rx_frames_128_255: 38581
dev.igc.0.mac_stats.rx_frames_65_127: 613592
dev.igc.0.mac_stats.rx_frames_64: 1816390
dev.igc.0.mac_stats.mcast_pkts_recvd: 0
dev.igc.0.mac_stats.bcast_pkts_recvd: 1759297
dev.igc.0.mac_stats.good_pkts_recvd: 5466687
dev.igc.0.mac_stats.total_pkts_recvd: 5467091
dev.igc.0.mac_stats.xoff_txd: 0
dev.igc.0.mac_stats.xoff_recvd: 0
dev.igc.0.mac_stats.xon_txd: 0
dev.igc.0.mac_stats.xon_recvd: 0
dev.igc.0.mac_stats.alignment_errs: 0
dev.igc.0.mac_stats.crc_errs: 0
dev.igc.0.mac_stats.recv_errs: 0
dev.igc.0.mac_stats.recv_jabber: 0
dev.igc.0.mac_stats.recv_oversize: 0
dev.igc.0.mac_stats.recv_fragmented: 0
dev.igc.0.mac_stats.recv_undersize: 0
dev.igc.0.mac_stats.recv_no_buff: 0
dev.igc.0.mac_stats.missed_packets: 0
dev.igc.0.mac_stats.defer_count: 0
dev.igc.0.mac_stats.sequence_errors: 0
dev.igc.0.mac_stats.symbol_errors: 0
dev.igc.0.mac_stats.collision_count: 0
dev.igc.0.mac_stats.late_coll: 0
dev.igc.0.mac_stats.multiple_coll: 0
dev.igc.0.mac_stats.single_coll: 0
dev.igc.0.mac_stats.excess_coll: 0igc.0 is my WAN, fwiw
-
Missed packets or errors. Check the other NICs in use. Also check the other sysctl values for errors.
Also check dev.igc.X.iflib.override_nrxqs. We had to set that 1 on the 4200 to prevent context switching issues.
-
[24.03-RELEASE][admin@pfSense.home.arpa]/root: sysctl -a | grep iflib.override_nrxqs
dev.igc.3.iflib.override_nrxqs: 0
dev.igc.2.iflib.override_nrxqs: 0
dev.igc.1.iflib.override_nrxqs: 0
dev.igc.0.iflib.override_nrxqs: 0 -
Mmm, try setting those to 1. You may need to add them as loader values to /boot/loader.conf.local.