Interface errors, missed packets and rec overruns
-
Looks like slightly different cause though. recv_no_buff vs defer_count
The first thing I'd try is reassigning the NICs to use a different one as WAN and see if the issue follows it.
Next I'd try a different flow-control setting and check the current negotiated value. That should prevent the other side overloading the receive buffers if both ends support it.
Steve
-
@stephenw10 said in Interface errors, missed packets and rec overruns:
Looks like slightly different cause though. recv_no_buff vs defer_count
I noticed that too, but I'm not sure what "defer" means here. I mean I'm also not sure what "recv_no_buff" means, but my educated guess is it received a packet but had no buffer space available to place it in.
@stephenw10 said in Interface errors, missed packets and rec overruns:
The first thing I'd try is reassigning the NICs to use a different one as WAN and see if the issue follows it.
I tried that already, the 82583V NICs all do the same thing.
@stephenw10 said in Interface errors, missed packets and rec overruns:
Next I'd try a different flow-control setting and check the current negotiated value.
So it's currently set to 3 for flow control. How do I check the negotiated value?
-
AFAIK
recv_no_buff
implies there are no available receive buffers. So potentially you could increase the buffers.I will say that I see some no_buff failures on a box here and don't see any connection issues:
[2.7.2-RELEASE][admin@xtm5.stevew.lan]/root: sysctl dev.em.0 | grep buf dev.em.0.mac_stats.recv_no_buff: 36 dev.em.0.iflib.rxq1.rxq_fl0.buf_size: 2048 dev.em.0.iflib.rxq0.rxq_fl0.buf_size: 2048 dev.em.0.iflib.txq1.mbuf_defrag_failed: 0 dev.em.0.iflib.txq1.mbuf_defrag: 0 dev.em.0.iflib.txq0.mbuf_defrag_failed: 0 dev.em.0.iflib.txq0.mbuf_defrag: 0
Good question about seeing how it's linked though! em doesn't appear to report that. I'd try setting fc to 0 and see if that changes anything.
-
OK, I found some interesting things. Turns out old Intel datasheets have really thorough descriptions of what all these counters mean. Intel 82583V datasheet
Defer Count: This register counts defer events. A defer event occurs when the transmitter cannot immediately send a packet due to the medium being busy either because:
• Another device is transmitting
• The IPG timer has not expired
• Half-duplex deferral events
• Reception of XOFF frames
• The link is not up
This register only increments if transmits are enabled. The behavior of this counter is slightly different in the 82583V relative to previous devices. For the 82583V, this counter does not increment for streaming transmits that are deferred due to TX IPG.Receive No Buffers Count: This register counts the number of times that frames were received when there were no available buffers in host memory to store those frames (receive descriptor head and tail pointers were equal). The packet is still received if there is space in the FIFO. This register only increments if receives are enabled. This register does not increment when flow control packets are received.
Missed Packets Count: Counts the number of missed packets. Packets are missed when the receive FIFO has insufficient space to store the incoming packet. This could be caused because of too few buffers allocated, or because there is insufficient bandwidth on the IO bus. Events setting this counter cause RXO, the receiver overrun interrupt, to be set. This register does not increment if receives are not enabled. Note that these packets are also counted in the Total Packets Received register as well as in the Total Octets Received register.
-
So Defers are specifically transmits and won't ever be Missed packets. And Recv_No_Buff is the NIC has a frame, but the CPU has no buffer for it. Recv_no_buff can become missed_packets if the situation persists long enough.
I reconfigured things. Instead of shoving everything into em.0 in a router on a stick config, I changed em.0 to be the only WAN and all the LAN VLANs on em.1. This was weird as I can literally just watch the recv_no_buff incrementing on em.0, but em.1 which is seeing the same number of packets is not having any trouble.
Then I reconfigured things again, this time put the WAN on em.5 and the LAN VLANs on em.4. This setup has no immediate issues, but in the past it slowly accumulates errors just the same.
My brother's stats confuse me. Defers seem like they should mostly not happen on a full-duplex link, but maybe this is older hardware that increments this when XOFF is received. His defers count is very similar to the XOFF received count. More confusing is that he's getting missed packets, but without recv_no_buff. That doesn't seem like it should be possible.
-
Hmm, interesting. Are those all the exact same NIC chip? Different PCIe bus maybe?
-
@stephenw10 em.0 is a 82574L, but em.1-5 are 82583V's. According to pciconf I think they all have a direct x1 lane to the CPU.
My brother is using an old HP 4 port nic with 82571 chips. I'm starting to think the older 82571 just doesn't have the recv_no_buff register.
-
Actually I just found a bit in the 2014 errata update for the 82571EB that might explain my brother's missed packets. It might just be old crap.
- Missed RX Packets
Problem:
When the device operates with multiple-requests or Large Send enabled, there could be receive packet loss. When the Tx FIFO is full, the Tx flow may block the host DMA interface of the device. When the transmission of packets is prevented for a long time, due to capture effect or very long backoff in half-duplex, the transmit FIFO is filled and the fetch of Rx descriptors is prevented also. This will prevent the release of the packets from the Rx FIFO to the host, causing the Rx buffer to overflow and the loss of incoming packets. This is a temporary state that will be released once the transmit side is be able to empty the Tx packet buffer.Implication:
There could be some packet loss in the Rx path if the transmission of packets is prevented for a long time. Normally, if this occurs, these packets will be re-transmitted by upper-layer protocols.Workaround:
None -
Hmm the 82574 is extremely common. I would have expected that to work more reliably if anything. But there is a difference there so whatever is causing a problem on em0 the 82583V apparently doesn't suffer from it.
-
Of the 2 options I have on this box it's supposed to be the "better" one. It has MSI-X and dual tx/rx queues to the 82583V's MSI and single tx/rx queue.
Also, I definitely had the em.5/WAN em.4/LAN setup in the past and it would miss packets over time, but this time it's all good.
Only I've reconfigured this so many times and it's never worked as well as it finally is. Computers man, what the hell.