Is anyone else seeing lots of Oerrs on a PPPoE ISP connection running over VLAN 911?
-
Hmm, in that case it could be real errors. That fact they appear on the pppoe, vlan and vlan-parent interface implies that.
Check the mac stats for igc0 in the output of
sysctl dev igc.0
for what type of errors those might be. -
@stephenw10 Note that they do NOT appear on the VLAN parent interface, only the VLAN interface and the pppoe interface.
Nothing showing in the Mac_stats:
dev.igc.0.mac_stats.tso_txd: 0 dev.igc.0.mac_stats.tx_frames_1024_1522: 549766654 dev.igc.0.mac_stats.tx_frames_512_1023: 2940126 dev.igc.0.mac_stats.tx_frames_256_511: 3424072 dev.igc.0.mac_stats.tx_frames_128_255: 4512551 dev.igc.0.mac_stats.tx_frames_65_127: 143405142 dev.igc.0.mac_stats.tx_frames_64: 687663 dev.igc.0.mac_stats.mcast_pkts_txd: 0 dev.igc.0.mac_stats.bcast_pkts_txd: 2 dev.igc.0.mac_stats.good_pkts_txd: 704736205 dev.igc.0.mac_stats.total_pkts_txd: 704736208 dev.igc.0.mac_stats.good_octets_txd: 849936678074 dev.igc.0.mac_stats.good_octets_recvd: 1048569211119 dev.igc.0.mac_stats.rx_frames_1024_1522: 689280502 dev.igc.0.mac_stats.rx_frames_512_1023: 4436609 dev.igc.0.mac_stats.rx_frames_256_511: 2158430 dev.igc.0.mac_stats.rx_frames_128_255: 5224710 dev.igc.0.mac_stats.rx_frames_65_127: 55784079 dev.igc.0.mac_stats.rx_frames_64: 655772 dev.igc.0.mac_stats.mcast_pkts_recvd: 0 dev.igc.0.mac_stats.bcast_pkts_recvd: 0 dev.igc.0.mac_stats.good_pkts_recvd: 757540102 dev.igc.0.mac_stats.total_pkts_recvd: 757540102 dev.igc.0.mac_stats.mgmt_pkts_txd: 0 dev.igc.0.mac_stats.mgmt_pkts_drop: 0 dev.igc.0.mac_stats.mgmt_pkts_recvd: 0 dev.igc.0.mac_stats.unsupported_fc_recvd: 0 dev.igc.0.mac_stats.xoff_txd: 0 dev.igc.0.mac_stats.xoff_recvd: 0 dev.igc.0.mac_stats.xon_txd: 0 dev.igc.0.mac_stats.xon_recvd: 0 dev.igc.0.mac_stats.alignment_errs: 0 dev.igc.0.mac_stats.crc_errs: 0 dev.igc.0.mac_stats.recv_errs: 0 dev.igc.0.mac_stats.recv_jabber: 0 dev.igc.0.mac_stats.recv_oversize: 0 dev.igc.0.mac_stats.recv_fragmented: 0 dev.igc.0.mac_stats.recv_undersize: 0 dev.igc.0.mac_stats.recv_no_buff: 0 dev.igc.0.mac_stats.recv_length_errors: 0 dev.igc.0.mac_stats.missed_packets: 0 dev.igc.0.mac_stats.defer_count: 0 dev.igc.0.mac_stats.sequence_errors: 0 dev.igc.0.mac_stats.symbol_errors: 0 dev.igc.0.mac_stats.collision_count: 0 dev.igc.0.mac_stats.late_coll: 0 dev.igc.0.mac_stats.multiple_coll: 0 dev.igc.0.mac_stats.single_coll: 0 dev.igc.0.mac_stats.excess_coll: 0
I've switched back to the old PPPoE driver for now. No Oerrs so far so let's see how it goes.
-
Ah, yes I misread that. Hmm. You might try disabling any hardware off-loading still enabled.
Is that connected to a switch or to an ISP modem/ONT directly? Can you see any errors there?
-
@stephenw10 It's connected directly to the ONT so sadly I have no visibility. However...
Having switched back to the old driver and rebooted, so far no errors at all. Normally by now there would be several thousand at least.
So the good news is that the old driver doesn't report any errors and so likely these are not real errors but some artefact of the new driver (hopefully that will get fixed at some point).
The bad news is that the old driver seems to struggle to keep up with a 2.5 Gbit/s connection. My iperf3 throughput is down substantially compared to the new driver.
I guess I'll switch back to the new driver and just ignore the errors for now.
With regard to hardware offloading, the only thing I have enabled (and it has been like this since forever) is checksum offloading. Should I disable that too? Will doing so impact throughput/latency?
-
Ah, that's good to hear. I guess!
Curious that the VLAN shows errors.
-
@stephenw10 So I switched back to the new driver and also disabled checksum offloading. Guess what - no errors reported now. Not sure what that means exactly but an interesting result I think.
-
Ah, interesting. Unexpected for output errors but that's the sort of thing I might expect to cause errors.
-
@stephenw10 Sadly, I spoke too soon. This morning when I checked there were a load of Oerrs against the VLAN and pppoe interfaces (but NOT the base interface). Again the error count for the VLAN was a few more than the pppoe count and overall it represents an error rate of ~0.02%. No Ierrs.
Mac stats show one CRC error and one collision (though given this is just a cable between the NetGate port and the ONT it's hard to see how there could be a collision):
dev.igc.0.mac_stats.tso_txd: 0 dev.igc.0.mac_stats.tx_frames_1024_1522: 106987943 dev.igc.0.mac_stats.tx_frames_512_1023: 575465 dev.igc.0.mac_stats.tx_frames_256_511: 549671 dev.igc.0.mac_stats.tx_frames_128_255: 729276 dev.igc.0.mac_stats.tx_frames_65_127: 25797022 dev.igc.0.mac_stats.tx_frames_64: 131923 dev.igc.0.mac_stats.mcast_pkts_txd: 0 dev.igc.0.mac_stats.bcast_pkts_txd: 3 dev.igc.0.mac_stats.good_pkts_txd: 134771300 dev.igc.0.mac_stats.total_pkts_txd: 134771300 dev.igc.0.mac_stats.good_octets_txd: 165755531650 dev.igc.0.mac_stats.good_octets_recvd: 131017743687 dev.igc.0.mac_stats.rx_frames_1024_1522: 86115978 dev.igc.0.mac_stats.rx_frames_512_1023: 425579 dev.igc.0.mac_stats.rx_frames_256_511: 211253 dev.igc.0.mac_stats.rx_frames_128_255: 806539 dev.igc.0.mac_stats.rx_frames_65_127: 7578356 dev.igc.0.mac_stats.rx_frames_64: 102537 dev.igc.0.mac_stats.mcast_pkts_recvd: 0 dev.igc.0.mac_stats.bcast_pkts_recvd: 0 dev.igc.0.mac_stats.good_pkts_recvd: 95240242 dev.igc.0.mac_stats.total_pkts_recvd: 95240243 dev.igc.0.mac_stats.mgmt_pkts_txd: 0 dev.igc.0.mac_stats.mgmt_pkts_drop: 0 dev.igc.0.mac_stats.mgmt_pkts_recvd: 0 dev.igc.0.mac_stats.unsupported_fc_recvd: 0 dev.igc.0.mac_stats.xoff_txd: 0 dev.igc.0.mac_stats.xoff_recvd: 0 dev.igc.0.mac_stats.xon_txd: 0 dev.igc.0.mac_stats.xon_recvd: 0 dev.igc.0.mac_stats.alignment_errs: 0 dev.igc.0.mac_stats.crc_errs: 1 dev.igc.0.mac_stats.recv_errs: 0 dev.igc.0.mac_stats.recv_jabber: 0 dev.igc.0.mac_stats.recv_oversize: 0 dev.igc.0.mac_stats.recv_fragmented: 0 dev.igc.0.mac_stats.recv_undersize: 0 dev.igc.0.mac_stats.recv_no_buff: 0 dev.igc.0.mac_stats.recv_length_errors: 0 dev.igc.0.mac_stats.missed_packets: 0 dev.igc.0.mac_stats.defer_count: 0 dev.igc.0.mac_stats.sequence_errors: 0 dev.igc.0.mac_stats.symbol_errors: 0 dev.igc.0.mac_stats.collision_count: 1 dev.igc.0.mac_stats.late_coll: 0 dev.igc.0.mac_stats.multiple_coll: 0 dev.igc.0.mac_stats.single_coll: 0 dev.igc.0.mac_stats.excess_coll: 0
Not sure where this leaves me. The errors don't seem to be 'real' but if so why is the device counting so many?
-
A single collision or other error like that can be from when the link came up for example or of it renegotiated. Not anything I'd worry about.
You could try switching to a different igc NIC in case it's somehow just not reporting errors on the parent that are in fact there.
If it really was seeing errors though I'd expect to see your upload speed impacted but you're seeing good throughput?
-
@stephenw10 I'm seeing good throughput for bith upload and download. In fact upload is generally better than download (but that is likely due to Internet factors more than anything). Given that the 'old' PPPoE driver does not report any errors it seems to me that this is somehow related to the new driver so given that switching to a different igc NIC is a bit of a faff (and disruptive) I think I won't try that for now. It certainly looks (to me at least) like there is at least one additional bug beyond the one you identified (unless that is somehow also responsible for this).
-
The interesting part to me is that it's shown on the VLAN. We know the old driver doesn't log some errors like the dropped packets from traffic shaping. We old found that when using the new driver.
But the VLAN should only be passing the PPPoE packets which should be the same for both drivers. I assume you don't see errors on the VLAN with the old driver?
-
@stephenw10 Nope, no errors at all (on pppoe0 or igc0.911) with the old driver. And of course nothing in mac_stats for igc.0 with old or new drivers.
-
Hmm, could be a clue there.
-
@stephenw10 I've been doing more experiments and have some more info on all of this.
-
With the new driver it seems that heavy traffic triggers an increase in the number of reported errors. For example a HTTP or Iperf3 speed test that pushes the link close to the limit (though the speeds are still very good).
-
I'm now running with the old driver (as an experiment for comparison purposes). I saw a small number of errors on the VLAN only just after the router had restarted and some small increase in those over time, and no errors at all on the pppoe0 interface, even after several hours of running including many speed tests.
After boot
Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll igc0 1500 <Link#1> XX:XX:77:7f:c9:d6 4804079 0 0 5717254 0 0 ... igc0.911 1500 <Link#15> XX:XX:77:7f:c9:d6 4804079 0 0 5717254 820 0 igc0.911 - fe80::%igc0.911/64 fe80::XXXX:77ff:fe7f:c9d6%igc0.911 0 - - 0 - - ... pppoe0 1492 <Link#17> pppoe0 4803848 0 0 5717807 0 0 pppoe0 - XXX.69.48.XXX/32 abcdef.com 15982 - - 7 - - pppoe0 - fe80::%pppoe0/64 fe80::XXXX:77ff:fe7f:c9d6%pppoe0 1386 - - 1391 - - pppoe0 - abcdef.com abcdef.com 1232 - - 3075 - - pppoe0 - 2XX2:XXX:62fb::123/128 2XX2:XXX:62fb::123 501 - - 1 - - pppoe0 - fe80::%pppoe0/64 fe80::XXXX:77ff:fe7f:c9d9%pppoe0 0 - - 0 - - pppoe0 - 2XX2:XXX:feed:62fb::/64 2XX2:XXX:feed:62fb:92ec:77ff:fe7f:c9d6 1848 - - 0 - - ...
An hour, and several speed tests, later
Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll igc0 1500 <Link#1> XX:XX:77:7f:c9:d6 15059231 0 0 18533104 0 0 ... igc0.911 1500 <Link#15> XX:XX:77:7f:c9:d6 15059231 0 0 18533104 820 0 igc0.911 - fe80::%igc0.911/64 fe80::XXXX:77ff:fe7f:c9d6%igc0.911 0 - - 0 - - ... pppoe0 1492 <Link#17> pppoe0 15058839 0 0 18533476 0 0 pppoe0 - XXX.69.48.XXX/32 abcdef.com 111491 - - 28 - - pppoe0 - fe80::%pppoe0/64 fe80::XXXX:77ff:fe7f:c9d6%pppoe0 8527 - - 8535 - - pppoe0 - abcdef.com abcdef.com 4958 - - 12415 - - pppoe0 - 2XX2:XXX:62fb::123/128 2XX2:XXX:62fb::123 3912 - - 1 - - pppoe0 - fe80::%pppoe0/64 fe80::XXXX:77ff:fe7f:c9d9%pppoe0 0 - - 0 - - pppoe0 - 2XX2:XXX:feed:62fb::/64 2XX2:XXX:feed:62fb:92ec:77ff:fe7f:c9d6 8376 - - 0 - - ...
Several hours later
Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll igc0 1500 <Link#1> XX:XX:77:7f:c9:d6 57841186 0 0 68454730 0 0 igc0 - fe80::%igc0/64 fe80::XXXX:77ff:fe7f:c9d6%igc0 0 - - 1 - - ... igc0.911 1500 <Link#15> XX:XX:77:7f:c9:d6 57841186 0 0 68454730 1590 0 igc0.911 - fe80::%igc0.911/64 fe80::XXXX:77ff:fe7f:c9d6%igc0.911 0 - - 0 - - ... pppoe0 1492 <Link#17> pppoe0 57840234 0 0 68454319 0 0 pppoe0 - XXX.69.48.XXX/32 abcdef.com 513438 - - 250 - - pppoe0 - fe80::%pppoe0/64 fe80::XXXX:77ff:fe7f:c9d6%pppoe0 39368 - - 39387 - - pppoe0 - abcdef.com abcdef.com 21290 - - 62223 - - pppoe0 - 2XX2:XXX:62fb::123/128 2XX2:XXX:62fb::123 18804 - - 1 - - pppoe0 - fe80::%pppoe0/64 fe80::XXXX:77ff:fe7f:c9d9%pppoe0 0 - - 0 - - pppoe0 - 2XX2:XXX:feed:62fb::/64 2XX2:XXX:feed:62fb:92ec:77ff:fe7f:c9d6 40841 - - 0 - - ...
mac_stats for igc.0 show no errors of any kind. I'd be interested to know where the (very few) errors counted against the VLAN are coming from.
It seems to me that maybe the new driver is not quite ready for prime time.
-
-
Hmm, seeing a few errors when the interface comes up is not that unusual. Errors on the VLAN only is odd though. Especially as they are increasing after boot.
You have hardware VLAN tagging enabled on igc0?
Shown in options out of capabilities like:[25.07.1-RELEASE][admin@6100.stevew.lan]/root: ifconfig -vm igc0 igc0: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1300 options=48020b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_MAGIC,HWSTATS,MEXTPG> capabilities=4f43fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWTSO,NETMAP,RXCSUM_IPV6,TXCSUM_IPV6,HWSTATS,MEXTPG>
Try disabling that and see if the errors stop:
ifconfig igc0 -vlanhwtag
-
@stephenw10 Yes it is enabled:
igc0: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500 options=4e427bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_MAGIC,VLAN_HWTSO,RXCSUM_IPV6,T XCSUM_IPV6,HWSTATS,MEXTPG> capabilities=4f43fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC ,VLAN_HWTSO,NETMAP,RXCSUM_IPV6,TXCSUM_IPV6,HWSTATS,MEXTPG> ether XX:XX:77:7f:c9:d6 inet6 fe80::XXXX:77ff:fe7f:c9d6%igc0 prefixlen 64 scopeid 0x1 media: Ethernet autoselect (2500Base-T <full-duplex>) status: active supported media: media autoselect media 2500Base-T media 1000baseT media 1000baseT mediaopt full-duplex media 100baseTX mediaopt full-duplex media 100baseTX media 10baseT/UTP mediaopt full-duplex media 10baseT/UTP nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> drivername: igc0
I've turned it off now:
igc0: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500 options=4e427ab<RXCSUM,TXCSUM,VLAN_MTU,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_MAGIC,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,HWSTATS,MEXTPG> capabilities=4f43fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWTSO,NETMAP,RXCSUM_IPV6,TXCSUM_IPV6,HWSTATS,MEXTPG> ether XX:XX:77:7f:c9:d6 inet6 fe80::XXXX:77ff:fe7f:c9d6%igc0 prefixlen 64 scopeid 0x1 media: Ethernet autoselect (2500Base-T <full-duplex>) status: active supported media: media autoselect media 2500Base-T media 1000baseT media 1000baseT mediaopt full-duplex media 100baseTX mediaopt full-duplex media 100baseTX media 10baseT/UTP mediaopt full-duplex media 10baseT/UTP nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> drivername: igc0
Is that likely to have any detrimental effect?
-
Potentially it might make the connection fractionally slower but I'd be surprised if you're able to detect it!
-
@stephenw10 I'll post an update after it has been running like this for several hours (this is still with the old PPPoE driver). If this eliminates the errors then I could also try it with the new driver to see if it has any effect on that.
-
@stephenw10 Looking good so far; since turning off hardware VLAN tagging no further errors have accrued on the vlan interface, and still zero on the base interface and the pppoe interface, this being with the old PPPoE driver.
If the situation remains the same by the morning (UK time), is it worth me trying with the new PPPoE driver to see if this also eliminates the errors that was reporting? I've set up a 'scriptcmd' to disable the hardware tagging on every reboot in case I forget.
-
Yes, definitely try the if_pppoe driver if you can. That would be odd if the pppoe packets are somehow triggering some issue there but at least consistent. And it does nicely tie in with the 'vlan not parent' behaviour. You might not be seeing it as much in the old driver simply because it's not pushing the NIC as hard.