Troubleshoot packet loss



  • Hi All,
    Maybe someone has an idea here.
    I'm having packet loss while trying to send 200K pps udp packets.
    Testing using iperf -u -c (IP) -i1 -l 64 -b 100M -t 10 Im having around 1% to 30% packets loss.

    The packets that are lost even not showing in tcpdump in the pfsense input interface.

    I have another Pfsense that has net.isr.direct=1 and net.isr.direct_force=1  (IT HAS NO LOSS AT ALL).

    Both setups are the same Dell R-610 with Quad IGB card, lan and wan are LACP lagg and both are clusters.

    Tried to disable LAGG no change.
    If net.isr.direct to 0 and net.isr.direct_force are 0 as well it changes a lot.
    using netstat -in shows no Ierrors.
    While it set to 1 there are Ierrors.

    Also i can see drops on net.inet.ip.intr_queue_drops.

    So I think that with net.isr set to 0 its better but the net.isr is set to 1 thread and I'm unable to change net.isr.numthreads or maxthreads.

    Any ideas are welcome.

    Thanks
    David,



  • @David:

    So I think that with net.isr set to 0 its better but the net.isr is set to 1 thread and I'm unable to change net.isr.numthreads or maxthreads.

    Perhaps net.isr.numthreads and net.isr.maxthreads can be changed only at boot time. (There are a number of such sysctls.)

    @David:

    The packets that are lost even not showing in tcpdump in the pfsense input interface.

    Perhaps they are dropped before they make it to the output queue of the sending device. Or they could be discarded because the input receive rings fill up and consequently the receive FIFO fills up. There should be counters for both cases.



  • Thanks wallabybob,
    I have added the net.isr via /boot/loader.conf.local. it did not changed it.

    I remembered that the non loosing packets pfsense was installed on the same day but the other firewall is an upgrade.

    The Kernel version is 8.1-RELEASE-P4#0 on the working one. the packet loosing is 8.1-RELEASE-P4#1 both build on Sat Jul 16.

    Im installing from scratch and update shortly.

    Thanks
    David,



  • It didn't make any change.. so its not the install.



  • I did a bit of research and it looks as if net.isr.numthreads reports the number of threads in use (and hence is read only) and net.isr.maxthreads can be set at boot time and is limited to the number of CPUs.



  • Thanks wallabybob,
    I'm trying it now….



  • No it didn't work..



  • I have added Net.inet.ip.fastforwarding=1
    Now while sending 250kpps only 3% I loss.
    Still don't know why the other machine can handle 350k+ pps.

    Do you think I should try Amd64? Will it change something.



  • If they are identical you can comapre their sysctl -a outputs and see the differences there.
    It is a start to tell you what really differs between them.

    Maybe different hw?



  • Thanks Ermal.
    I didnt found something special I changed it according the working one.
    One difrrence in the hardware is disk controller.
    Beside that I dont see any change.

    I will try the bce nic.

    Thnaks.



  • Well if you have shared IRQs with the disk controller that might give you issues.
    One last thing to try is if increasing kern.hz to about 4000 value helps.

    Also you need to check how many queues are configured on the igb cards as well.


Locked